Facility-Local Deployment

A centralized deployment of the DataFlow software will always be available at https://dataflow.ornl.gov. A "facility-local deployment" would provide the same or similar capabilities as the centralized deployment via a physical server placed in close proximity to scientific instruments at a facility.

Note

A single dedicated server could serve multiple instruments and may be sufficient for an entire group, building, or facility, depending on the data needs.

Do you need one?

Briefly, a facility-local deployment may be necessary if at least one of these needs are important to you:

bring data services (such as data upload, metadata capture, data management) and user services (secure authentication per user) to off-network computers / instruments
upload large volumes of data from one or more instruments as fast as possible
pre-process data or metadata before uploading
manage mismatch between data generation rates from instruments and network bandwidth

Please refer to the "Facility-Local Deployment" section in the "About" page for further information

Customizable components

There are several components that can be customized to suit the needs of individual groups and facilities:

Data storage location - By default, data are stored in the Home directory of the user in CADES' NFS storage. However, data can be moved instead to any storage location that is within ORNL and uses the UCAMS / XCAMS user accounts for authentication and authorization. A Globus Endpoint attached to the storage system is needed, which our team can help with.
Metadata schema - We can create a custom schemas to capture and validate metadata entered by users. Facility managers can specify the names of the fields, whether or not the fields are required or optional, the expected data type (string, number, choice within specified values, etc.), which fields should persist / be pre-populated between two subsequent datasets being created, etc.
In future iterations of DataFlow, users would be able to customize and/or link to the following capabilities:
- Data management
- Data pre-processing / standardization
- Data post processing
- Data publication, etc.

Getting a deployment

Please reach out to Suhas Somnath at somnaths@ornl.gov to request a facility-local deployment of DataFlow on a dedicated server.

We would encourage interested facility managers / researchers to prepare for a conversation with Suhas with details such as:

Logistical:
- Building number(s) where you would like dedicated server(s)
- If you are know, the name and contact information of any IT people who work for / with the facility
- If you are know, the room number(s) of the building's IT closet(s) for the server to be installed in.
Data Generation:
- Listing of every instrument that will be (eventually) connected to the server with the following details:
  - Make, model, and type of instrument
  - Location of instrument - building number and room number
    - Are there ethernet ports available in this room?
  - Instrument control computer:
    - Make and model of the computer
    - Operating system running on the computer
    - Is the computer connected to the ORNL enterprise network?
  - Number of users using the instrument
  - Data generation rates:
    - Frequency of experiments. e.g. - monthly, weekly, daily, hourly, round the clock, etc.
    - Types of experiment(s)
    - Durations for experiment(s)
    - Nature of data generation. e.g. - one or more files after the experiment completes or files being generated periodically.
    - Volume of data generated per experiment
    - File type(s) of generated data
    - lower-bound, upper-bound, and average data generated per day / week / month
Data Transfer:
- Current mechanism. e.g. - USB thumb drive, connected to Network Attached Storage (NAS) system, Dropbox, email, etc.
- Time of transfer. e.g. - once a user finishes working on the instrument, overnight, etc.
- Desired / required transfer speeds.
Data Storage:
- Current state:
  - location(s) of storage systems
  - capacity and utilization rates
  - Access mechanisms for users
  - Deletion / archival policy
- Possible candidates for future data storage with same details as those for the current state.
Data Analysis:
- Kind of analysis
- Computational needs for analysis (CPU cores, GPUs, memory, operating system, network access)
- Duration for analyzing a typical file
- Software used to analyze data
  - Open or closed source.
- Computer(s) where data is analyzed and its configurations (CPU cores, GPUs, memory)
Downstream usage and lifecycle:
- What happens or needs to happen after data is analyzed?
- Who consumes this data and how is data shared with them?

Existing deployments

Scanning Probe Microscopy Laboratory at the Center for Nanophase Materials Science