HD4DP v2 CSV Upload

Laatst bijgewerkt: 2024-09-30 09:14

General description of the CSV Upload

The CSV Upload functionality in HD4DP v2.0 is designed for bulk record uploads and facilitates the submission of data by making use of CSV files. Each row in such file corresponds to a single record. Hence, multiple records can be included in a single CSV for efficient data submission.

For uploading a csv file, the data provider needs to drop this file at a specific location on the own installation. This pick-up location is identical for all registries. The file will then be picked up and processed periodically. The file should be “final”, meaning that no application is writing to them.

Additionally, pre-registry processing relies on adherence to a specific naming convention for the CSV files.

A technical explanation of the CSVUploader and its architecture can be found further down this page.

Training

In order to make users familiar with preparing csv files for upload, we have organized a training of which the recording is shared with you below.

Below you can review the HD4DP v2 CSV Upload training organized by healthdata.be:

Architecture

The CSVUploader is located under hd-connect/csvuploader. It uses both hd-connect-csvuploader and hd-connect-proxy modules.

The general architecture of the CSVUploader is explained in the sequence diagram below.

Third-party libraries and frameworks

  • Apache Camel: https://camel.apache.org/
  • Spring Boot

Testing and working

  • The CSVUploader creates a folder at root level (SFTP for end-user, or hd-all for developer) that contains a subfolder per existing organization.
  • The CSVUploader will poll with a delay of 1 min, process the csv file and then create 3 folders:
    • ARCHIVE folder: contains the source csv file.
    • RESULTS folder: contains the results of processing the csv file. This file contains the specified data, and the final status of the processing: Success or Error. If an error occurred, the error message is displayed. For multiple uploads, the result is added to the end of the result file each time.
    • ERROR folder: this folder is created if the csv test file was not processed, due to an I/O error (file corrupted, not found etc.). So for now, only technical errors are caught and the source csv file is moved to that folder instead of the ARCHIVE folder. In principle, this folder should contain any result that is an error. The RESULT folder should only contain results that end with a SUCCESS status.

Formats

Some formats are specific. In addition to this, detailed information about the codes and the formats used for the Author group, NISS code, status, Postal Code, "Date" - "Date:Time", repeatable fields and multiple choice fields are available here.