Present: Ralph Niederberger, Stephane Coutin, Fotis Gagadis, Magda Haver
Apologised: Alessandra Scicchitano, Thilina Pathirana
Alessandra shared the document - https://docs.google.com/document/d/1mGgeLVDvor89Q2rzojS9BIfgsYByj41QNjP86shuleE/edit#
(ACTION: Alessandra to make sure all group members have editing rights)
Comments to the document:
Stephane:
“We also assume that transferring it across secure connections to their storage location is handled with care*” - Even the initial data transfer might be cross infra, without user interaction (example of large instruments data going into PRACE or EUDAT). Security is required (integrity, access rules).
“Who is allowed to access the data?” - refer to some metadata (recommendation in EUDAT training slides). External view only - we need to be careful as, from the security point of view, we do not focus on all aspects of big data.
5 Vs - From Wikipedia: Big data can be described by the following characteristics:
· Volume - The quantity of generated and stored data. The size of the data determines the value and potential insight- and whether it can actually be considered big data or not.
· Variety - The type and nature of the data. This helps people who analyze it to effectively use the resulting insight.
· Velocity - In this context, the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development.
· Variability - Inconsistency of the data set can hamper processes to handle and manage it.
· Veracity - The quality of captured data can vary greatly, affecting accurate analysis.
We are interested by Volume and velocity (a black box view, an external view), not really by the others (internal view).
Ralph: new services might be directly transferred; at the end when the data is available - you need to handle it with care and do it in secure way. Similarities between open and close data from the user perspective. Accessing the data: reading only or changing it?
Stephane: Data transfer limitation means remote handling - big volumes means as less transfer as possible. So a user (or program) from one infra might need to execute a program on another infra where data is stored. And get the result sent back.
It is important to have a common language for data localization and access rights descriptions - (should be based on some metadata), so that einfra can communicate.
Stephane - when we talk big data - data transfer is not as easy (use cases in EUDAT pilot). Ralph - this is also a problem in brain project (?). Where the process of extracting data should be located? “We provide the storage but are not responsible for extracting data”. You are interested in small part of data and need procedures to extract it. How can this be done in secure way?
Fotis: can we put as case scenario after 2020 - Internet of Everything (CISCO). Major issue - medical information - transfer storage and access of this data can be very problematic. Predefined views might be good. Do you consider medical info as use case. Ralph - not easy to answer; medical data can be also big data. The question is who is responsible? How can this be accessed through cloud services and how secure clouds are? Stephane - there is one EUDAT project with definitions of policies.
Ralph - we should describe those use cases - based on what we have we can define what is in scope and what out of scope. We should have broad range of use cases - see similarities and decide which use case we should concentrate on. Important is to include a short description what is meant.
ACTIONS based on discussion:
ACTION - Define what big and open data and is and concentrate on use cases; also who is allowed to access it and how. (Stephane to follow up with EUDAT on pilots, then identify cross infrastructures, increasing use cases).
ACTION - Ralph - to collect use cases (plus short description). Give guidance when describe the cases in an email that will be prepared together with Stephane. (examples - Internet of things, smart cities - traffic - Stephane knows people who work in this area; medical data?)
ACTION - Magda/Alessandra to create spaces on Wiki to collect use cases and useful links.
ACTION - Magda to create doodle for next call (check with Alessandra first)
Pilots, use cases can be- PRACE / EUDAT data pilots.