Note: Given that DataFlow is currently supported only by the Globus data adapter, the following pointers would focus on this data adapter.
The graphic above provides a high-level view of the key steps in using DataFlow via the web interface.
Regardless of the kind of deployment (central or facility-local), users start to use DataFlow by logging into DataFlow using their XCAMS / UCAMS credentials.
Once authenticated, you will most likely be prompted for UCAMS / XCAMS credentials again, this time to activate the Globus Endpoints between which data are transferred.
Please see instructions below for changing the destination for the data transfers. The default is the user's Home directory in CADES' Open Research file system.
At this point, users are free to click on the New Dataset button to start the process of capturing metadata and uploading raw data associated with a single experiment. Once the New Dataset button is clicked, users will be presented with a panel where they can enter metadata regarding their current experiment, starting with a Title.
Upon clicking the Next button on this panel, the user would now be asked to specify the data files they wish to upload from the computer they are working on.
Users can either select files or folders to upload by clicking on the Upload File or Upload Folder buttons.
Alternatively, users are welcome to drag and drop files or folders into the upload window.
Note:
As and when files or folders are selected, users will see a blue progress bar indicating the progress in uploading these dataset to the destination.
Once the files are uploaded, users are free to close the file-upload window.
Users can repeat steps 4-5 for subsequent experiments.
Before beginning, make sure to follow the steps in the Getting Started page to generate the API key, generate encrypted versions of your password(s), and setup the programming environment of your choice.
The python interface to the REST API mimics the REST API almost exactly. Therefore, users are welcome to use the quick walkthrough as a guide to using either the REST or python interface to DataFlow.
The basic steps that are recommended to upload a dataset with metadata are:
Users are recommended to use combinations of these functions / API calls to accomplish bigger tasks.
Note that DataFlow currently does not have a mechanism to track progress / completion of a file upload. It is safe to assume that if you get a safe response from DataFlow, then the transfer was at least successfully initiated.
By default, data are transferred to the users' HOME directory in CADES' Open Research (OR) Network File System. However, DataFlow allows users to change the destination Globus endpoint.
Most high performance computing and data facilities have a Globus Endpoint (or more) that allow one to access their file-system(s). You can also set up Globus endpoint(s) on your personal computer(s) using Globus Personal Connect.
Once you have identified the endpoint you'd like to use:
Web Interface 1. Once logged into DataFlow, click on your name on the top right of the screen 2. Select the "Settings" option 3. You should be able to see a box underneath the Destination Endpoint ID option. Paste the Globus Endpoint's UUID there.
Programming interfaces
By default, data are transferred to the users' HOME directory in CADES' Open Research (OR) Network File System. If you changed the default destination endpoint, you can still follow the same steps but you would need to change how you access / address the file system.
Users can acess this filesystem through multiple ways. Here we list two:
DataFlow uploads data according to the following nomenclature:
/~/dataflow/<Scientific_Instrument_Name>/<YEAR>-<MONTH>-<DATE>/<HOUR>-<MIN>-<SEC>-<Dataset_Title>/
As an example
/~/dataflow/CNMS-SEM-Hitachi/2022-04-28/08-30-12-Titanium_from_GaTech/
Note:
Note
Given that data on the User's HOME on CADES are expected to be stored indefinitely, the allocated capacity per user is expected to be finite. Therefore, we encourage facilities or users interested in using large (> 50 GB) volumes of data to get in touch with CADES for a more scalable storage solution