Frequently asked questions about the Dataset Register by dataset managers at heritage institutions
What is a data set?
A dataset (or dataset) is a collection of data (data or metadata). In the context of heritage institutions, you can think of data from/about heritage objects, such as a catalog, a set of museum objects or a collection of archives or further accesses. This (meta) data is often managed in an archive management or collection registration system and made accessible to its users in one form or another via its own website. The data can also be shared for reuse, through a service portal or aggregator. The heritage institution's system must make the data available for this purpose via a data dump (export) or API.
What is a dataset description?
A dataset itself also needs to be provided with metadata. The description is therefore data about the dataset.
Why is a dataset description important?
Just like heritage collections, it is important that datasets are also findable. A rich dataset description in a standard way helps to make the dataset discoverable. The dataset descriptions, provided they are formatted in a standard format, are not just "feed" for the Dataset register. Search engines such as Google also recognize dataset descriptions and make them searchable via Dataset Search. The better datasets are described and these descriptions can be found, the better re-users can find and use the datasets and possibly also link them.
What information does a dataset description contain?
A dataset description contains information that describes the dataset, such as an identifier (URI), a name, a content description, a license, a language, and a data owner (the creator). In addition to the mandatory information elements, information can be given about the creation date, publication date, version, contact information, coverage in terms of place/area and time/period, keywords and genre.
The dataset can be presented in different ways, this is called the distribution. It can be a data dump that can be downloaded in some format or an API that is queryable, for example a SPARQL endpoint. The information about distributions consists of a URL, a format and type, and optionally a license, description, language, publication date, modification date and file size.
A set of dataset descriptions is called a dataset catalog. It provides a total overview of the available datasets of an organization.
Where is a dataset description published?
Just like publishing the dataset, it is up to the heritage institution to publish the dataset description. Information, readable by man and machine, must be made available online.
If publishing via your own infrastructure such as CMS does not work, you can use the "publication service" Dataset Register entries via Github.
How can I check my dataset description?
Via the REST API, the dataset description can be checked whether it complies with the Requirements for Datasets.
You can also check the dataset description using the more general Schema Markup Validator. Enter here the URL of the (online) page that contains the dataset description or paste a code snippet to run the test.
What is the Dataset Register?
The purpose of the Dataset Register is to gain insight into heritage datasets. Heritage institutions are encouraged to provide datasets from their system, describe these datasets and publish them online and to submit the URLs of dataset descriptions to the Dataset Register. The Dataset Register retrieves the dataset descriptions, creating an overall picture of what is available.
How can I submit a dataset description in the Datasets Register?
Ideally, the registration of a dataset description takes place (automatically) via your own management system.
A URL of a dataset description (which has therefore been published by the heritage institution) can be registered with the Dataset Register. After registration, the Dataset Register will check and retrieve the dataset description. The Dataset Register will repeat this frequently to notice changes or deletions to update the Dataset Register.
For heritage institutions that are not yet able to generate a dataset description from their own system, the Digital Heritage Network offers a form for creating a dataset description. The result is a piece of JSON-LD that can be published on your own website or other platform.
How can I try out the Dataset Register?
Via this website, a demonstrator for the Datasetregister API, you can submit and search dataset descriptions. This demonstrator uses the REST API, which you can also address directly.
Who creates and manages the Dataset Register?
The Dataset Register was created by the collaborating heritage institutions in the Network Digital Heritage and is managed and maintained by the National Archives. The National Archives is responsible for the operation and availability of the Dataset Register.
Can I get started with creating dataset descriptions?
Yes, of course! The definition of what information should be included in the dataset description is fixed. Also the mechanism to register dataset descriptions and the storage of dataset descriptions in a triplestore are operational. As a dataset administrator you can therefore start by describing the available datasets, see where the dataset descriptions can be published and register them. This often also means contacting your software supplier.
You can manually create a dataset description using a form. This gives you an idea of what information is required and within the system and organization. From a maintenance point of view, a solution from your system is more sustainable.
If you already publish the dataset descriptions online, then search engines like Google's Dataset Search may already be picking up the datasets!
We'd also love to hear what you think. For example: is it clear which information belongs in the dataset description? Is it clear how you fill the dataset descriptions, in terms of process and software used?
Are you getting started?
If you are going to start publishing dataset descriptions and the API on the Dataset Register, let us know so that we can keep you informed of developments, updates and availability.
Do you want to know more?
If you have any questions and/or comments about the functioning of the Dataset Register, please contact firstname.lastname@example.org.