HD4DP v2 CSV Upload

Laatst bijgewerkt: 2024-09-30 08:55

General description of the CSV Upload

(under construction)

The CSV Upload functionality in HD4DP v2.0 is designed for bulk record uploads. This feature facilitates the submission of data by making use of CSV files, where each row in the file corresponds to a single record. Hence, multiple records can be included in a single CSV for efficient data submission.

For uploading a csv file, the data provider needs to drop this file

Currently, there is no user interface in HD4DP v2.0 for uploading csv files. If a data provider wants to upload a csv file, it needs to be dropped at a specific location. The file will be picked up and processed periodically. The file should be “final”, meaning that no application is writing to them. The pick-up location will be identical for all registries.

Additionally, pre-registry processing relies on adherence to a specific naming convention for the CSV files.

Furthermore, this page explains the functioning of the CSVUploader feature of HD4DP v2. The CSVUploader feature is aimed to do a bulk upload of records: by filling a csv file, one record per row represents one submission so a user can fill as much records as needed.

Training

In order to make users familiar with preparing csv files for upload, we have organized a training of which the recording is shared with you below.

Below you can review the HD4DP v2 CSV Upload training organized by healthdata.be:

HD4DP2_CSV_Training_recording_part01 Download

HD4DP2_CSV_Training_recording_part02 Download

HD4DP2_CSV_Training_recording_part03 Download

Architecture

The CSVUploader is located under hd-connect/csvuploader. It uses both hd-connect-csvuploader and hd-connect-proxy modules.

The general architecture of the CSVUploader is explained in the sequence diagram below.

Third-party libraries and frameworks

Apache Camel: https://camel.apache.org/
Spring Boot

Testing and working

The CSVUploader creates a folder at root level (SFTP for end-user, or hd-all for developer) that contains a subfolder per existing organization.
The CSVUploader will poll with a delay of 1 min, process the csv file and then create 3 folders:
- ARCHIVE folder: contains the source csv file.
- RESULTS folder: contains the results of processing the csv file. This file contains the specified data, and the final status of the processing: Success or Error. If an error occurred, the error message is displayed. For multiple uploads, the result is added to the end of the result file each time.
- ERROR folder: this folder is created if the csv test file was not processed, due to an I/O error (file corrupted, not found etc.). So for now, only technical errors are caught and the source csv file is moved to that folder instead of the ARCHIVE folder. In principle, this folder should contain any result that is an error. The RESULT folder should only contain results that end with a SUCCESS status.

Formats

Some formats are specific. In addition to this, detailed information about the codes and the formats used for the Author group, NISS code, status, Postal Code, "Date" - "Date:Time", repeatable fields and multiple choice fields are available here.