Skip to content

Dynamic data inventory

Please follow the instructions given hereafter in case the data set(s) your want to make available via the DMA do have the following characteristics:

  • data set consists of multiple files which are aggregated as a data set collection.
  • metadata information available per file in a self hosted data catalogue.
  • total size of data set makes is very hard to transfer data to different location.
  • the data set is dynamic by means of either updated or extended regularly over time.

Step 1: Create Metadata Mapping

You are already running a metadata infrastructure optimised for your business holding metadata information in a specific metadata standard unequal to DCAT-DMA. In order to map your metadata standard to DCAT-DMA, the Metadata Mapping Builder service was developed. The Mapping Builder will guide you through a number of steps in order to create a RML mapping file. This file will be used afterwards to map your metadata attributes to the corresponding attributes in the DCAT-DMA metadata standard. Each of your source (not dataset or file but rather each metadata structure you use) needs a separate mapping and consequently a separate RML file. Download these files and save it for next steps.

Step 2: Run a Mapper Service

In step 1 you have created the actual translation of your metadata into the DMA specific metadata standard (DCAT-DMA). The actual mapping of your metadata is done via a service called the DMA Simple RML Mapper. Having dynamic data inventories it is assumed that the amount of metadata and data is such big data multiple copies of those are not an option. Hence, an instance of the mapper service needs to be set up with the objective to provide mappings on the fly without the need of replicating any kind of data. Documentation about how to set up an instance of the DMA Simple RML Mapper can be found within the source code repository.

Step 3: Register Service Endpoint as Client

In order to interact with central services of the DMA, one needs to register ones organisation in the DMA authentication service CAS. Organisation and client registration can be done directly via the DMA usermanagement service or via the CAS. After successful registration of your service endpoint as a client you should have the following credentials available:

  • client_id
  • client_secret
  • token_url

Step 4: Get asset IDs

Access to data within the DMA is based on specific smart contracts assigned to each data set individually. Therefore, each data set has to be identified uniquely, so that contracts can be created between the data provider and the user. This identification is done via assets which are stored in the DMA Blockchain. Blockchain assets are retrieved by making use of the DMA Blockchain API. See also more documentation about the blockchain here A python package was developed to support the retrieval of blockchain assets. Full documentation about package installation and how to use are available at the package source code repository.. This package is complemented by a second python package referred to as assetsDB representing a Object Relational Mapper (ORM) to persistently store the retrieved blockchain assets to be used for the data sets. An installation and usage guide is available besides the actual source code of the package.

For further instructions please visit the dedicated package documentation.

Step 5: Run an External Node Service Endpoint

Actions undertaken in step 1 to 4 are preparations for the final and key component required to host a dynamic data inventory. The externalNodeEndpoint represents an RESTful API endpoint implemented in a python package making use of the lightweight Flask framework. This package is a reference implementation to demonstrate how to publish DMA compliant metadata and data into the DMA ecosystem. This is done by implementing the ResourceSync protocol. In general, the two main objectives of an external node service endpoint are the publication of an changelist and the so called resource xml files. Those are picked up by a DMA internal service, the ResourceSync, which triggers the ingestion of your metadata into the DMA data catalogue. As a consequence, an External Node Service Endpoint is the minimum you require to act as an DMA data provider. Details and how to set up the endpoint can be found in the documentation of the reference implementation

Step 6: Register as a Data Provider

In order to make the DMA aware of you as a data provider you need to register your external node service endpoint. Process needs to be specified