Becoming a data provider
A Data Provider (DP) is an entity (data platform / natural person) that has data in whatever form and delivers it on demand or shelves it at the Broker. Data made available within the DMA ecosystem can have any arbitrary binary digital form. Accordingly, data management within the DMA can not be generalised, because of that freedom.
The key concept of making data available within and through the DMA relies on the requirement to add metadata information to the actual data. A dedicated DMA metadata standard (DCAT-DMA) was established to fulfil that requirement which is based on the well established DCAT (Data Catalog) vocabulary. The DCAT-DMA makes use of the standard DCAT by extending vocabulary. From a data provider side a certain set of metadata elements need to be provided (see DMA - Core Metadata). As a consequence, each data set made available within the DMA ecosystem consists of a metadata and data element fully bonded together.
In order to become a data provider within DMA, one has to make sure that his data set is made up of those two elements (metadata and data). Ultimately, the goal is to publish the metadata information of the data set in a centralised DMA data catalogue. By publishing the metadata to the DMA catalogue, the data will be made available within DMA so that others can find or get recommendations about the provided data sets. Based on the provided license model of the data, others may have to sign a smart contract to finally get access to the data or not. As soon as a valid contract is in place, the data set can be access via a provided link contained in the metadata published on the central DMA data catalogue.
Static data inventory
A common use case is that data sets are made up of a single file in a specific format, eg. csv, xls, sqllite, NetCDF, etc. . Most often the metadata information of such files is missing and not provided alongside the data. Therefore, the DMA consortium developed various services to ease and support the ingestion of such data sets. You should make use of those exiting services as described in the next section if your data set has the following characteristics:
- data set consists of a single file.
- the file size allows a complete upload/download of the file.
- non or very limited metadata information available.
- the data set is licensed to be hosted on a remote location.
- the data set is static by means of not changing over time or at least not regularly.
Please consider using the static data inventory if you have multiple data sets with those characteristics as long as it stays manageable for your purposes.
Dynamic data inventory
In addition to the static data inventory, the dynamic data inventory can be identified. The dynamic inventory complements the static one and can be characterised as follows:
- data set consists of multiple files which are aggregated as a data set collection.
- metadata information available per file in a self hosted data catalogue.
- total size of data set makes is very hard to transfer data to different location.
- the data set is dynamic by means of either updated or extended regularly over time.
If your data set can be categorised based on those features, please follow the given instructions in section Dynamic data inventory. Please be aware that a dynamic data inventory requires a set of know-how, actions and resources on the data provider side to connect to the DMA ecosystem. The data provider is ask to develop and implement his own metadata publishing workflow. However, the ultimate goal of publishing standardised metadata (DCAT-DMA) to the central DMA data catalogue stays the same. A reference implementation will be provided as starting point for the development of a customised publishing workflow.