(NEW) HD4DP v2 CSV Upload

(NEW) HD4DP v2 CSV Upload

Laatst bijgewerkt: 2024-09-16 16:27

Introduction

This page details the CSV Upload functionality in HD4DP v2.0, designed for bulk record uploads. This feature facilitates the submission of data by making use of CSV files, where each row in the file corresponds to a single record. Hence, multiple records can be included in a single CSV for efficient data submission.

Currently, there is no user interface in HD4DP v2.0 for uploading csv files. If a data provider wants to upload a csv file, it needs to be dropped at a specific location. The file will be picked up and processed periodically. The file should be “final”, meaning that no application is writing to them. The pick-up location will be identical for all registries.

Additionally, pre-registry processing relies on adherence to a specific naming convention for the CSV files.

(CSV file details)

  • Author group, Author and Coauthor:
    • When the Author Group, Author and Coauthor have been left out in the csv file, the default Author group, Author and Coauthor will be used automatically.
    • When the desired Author Group, Author and Coauthor are specified in the csv file, the following headers TX_AUTHOR_GR, TX_AUTHOR and TX_COAUTHOR must be added to the csv file with their values respectively.

      Example:
  • Author group, Author and Coauthor:
    • When the Author Group, Author and Coauthor have been left out in the csv file, the default Author group, Author and Coauthor will be used automatically.
    • When the desired Author Group, Author and Coauthor are specified in the csv file, the following headers TX_AUTHOR_GR, TX_AUTHOR and TX_COAUTHOR must be added to the csv file with their values respectively.

      Example:
TX_AUTHOR_GR;TX_AUTHOR;TX_COAUTHOR
Test group;test@sciensano.be;test@sciensano.be

Note:
The Author group, Author and Coauthor must exist and are well configured at the back-end of the system. TX_AUTHOR_GR can be a string that identifies the Author group to which this Author belongs. Commonly, the first name and last name are used to identify the TX_AUTHOR_GROUP. Be sure to avoid leading and trailing spaces when entering the Author group value.

  • To submit a record necessitating  a manual intervention in HD4DP before submission.
    For CSV files, add the field name 'STATUS' in capitals in an additional column. Add the value 'draft' in case a manual submission of the record is requested.
    If not, the record will be submitted without manual intervention.
  • Adding separators to a NISS number:
    It is not necessary to add separators in a NISS number when uploading a file using CSV Upload. You can fill out the NISS number both with or without separators. E.g.: 85.04.02-169.32 or 85040216932.

    Example:
  • Only for the registries that need completion of the field Country (CD_CNTRY_RES):
    If the country (CD_CNTRY_RES) to be selected is NOT Belgium, then the postal code (CD_POSTCODE) needs to be kept empty and thus the code “9999 – Woonplaats niet in België” should not be selected. This is important to avoid blocking and not uploading of your csv uploads.
  • Make sure the name of the csv file has the correct format:
    HD_DCD_submcsv_HDBPnumber_HDBPabbreviation_versionnumber_versionreleasedate

Training

Below you can review the HD4DP v2 CSV Upload training organized by healthdata.be:

Architecture

The CSVUploader is located under hd-connect/csvuploader. It uses both hd-connect-csvuploader and hd-connect-proxy modules.

The general architecture of the CSVUploader is explained in the sequence diagram below.

Third-party libraries and frameworks

  • Apache Camel: https://camel.apache.org/
  • Spring Boot

Testing and working

  • The CSVUploader creates a folder at root level (SFTP for end-user, or hd-all for developer) that contains a subfolder per existing organisation.
  • The CSVUploader will poll with a delay of 1 min, process the csv file and then create 3 folders:
    • ARCHIVE folder: contains the source csv file.
    • RESULTS folder: contains the results of processing the csv file. This file contains the specified data, and the final status of the processing: Success or Error. If an error occurred, the error message is displayed. For multiple uploads, the result is added to the end of the result file each time.
    • ERROR folder: this folder is created if the csv test file was not processed, due to an I/O error (file corrupted, not found etc.). So for now, only technical errors are caught and the source csv file is moved to that folder instead of the ARCHIVE folder. In principle, this folder should contain any result that is an error. The RESULT folder should only contain results that end with a SUCCESS status.

Formats

Some formats are specific:

  • Dates: should be dd/mm/yyyy
  • Boolean: true / false
  • Codes: the value of the code (not the translation)
  • Multi codes: there is only one column per field. So when a select box is set as multiple, values have to be separated by a "|". e.g.: 68452|68453|68454
  • Repeatable blocks: in some DCDs, a complete block of fields is repeatable. In that case, value have to be separated by a ";"
    • e.g.: A block is containing 3 fields: A (Lob), B (Type klep) and C (Aantal kleppen)

Examples:

In case of a multiple choice:

Example of a multiple choice as presented in the DCD:

As presented in the form:

For reporting fields with a multiple choice formatting, the selected answers can be reported separated by the pipe symbol (|).

Example of a multiple field reporting:

cftr_modulating_therapy_1;

1|2|3|4|5|6;

In case of a repeatable block

Example of a repeatable block as presented in the DCD:

As presented in the form:

Manually enter the repeatable fields as shown in the following template: field_category|<index>|field.

Example of a repeatable block reporting:

Repeatable 1:

Transplants

transplant_status

Repeatable 2:

Transplants

transplant_status

becomes:

transplants|0|transplant_status;transplants|1|transplant_status

As presented in the form:

For reporting fields with a multiple choice formatting, the selected answers can be reported separated by the pipe symbol (|).

Example of a multiple field reporting:

cftr_modulating_therapy_1;

1|2|3|4|5|6;

In case of a repeatable block

Example of a repeatable block as presented in the DCD:

As presented in the form:

Manually enter the repeatable fields as shown in the following template: field_category|<index>|field.

Example of a repeatable block reporting:

Repeatable 1:

Transplants

transplant_status

Repeatable 2:

Transplants

transplant_status

becomes:

transplants|0|transplant_status;transplants|1|transplant_status