How thematic datasets are selected
The COORDINATE Portal is dedicated to providing access to datasets specifically related to child and youth wellbeing. To ensure that the datasets included are relevant to this theme, we follow a straightforward selection process.
Selection Criteria
Thematic datasets are selected by searching for the terms “child/children” and “youth” in three key metadata fields:
-
Topic Classifications: The categories that describe the dataset’s subject matter.
-
Keywords: The descriptive words or phrases assigned to the dataset.
-
Titles: The name or title of the dataset.
These terms are translated into all included languages, and the search queries are applied in each language separately.
Targeting additional fields or using more search terms has been tested, but it led to too many irrelevant results (false positives). For this reason, we focus only on these three fields to ensure the most accurate and relevant datasets are selected.
Query Sources
The queries are run against metadata repositories provided by CESSDA’s Service Providers (SPs). These repositories contain a wide range of datasets, and by focusing on the terms child and youth, we aim to include only the most relevant datasets.
Continuous Refinement
We may review and refine the selection process to make sure the most relevant datasets are included. This can involve updating search terms or refining how we search the metadata fields.
This simple, targeted approach helps the COORDINATE Portal provide access to datasets that focus on the wellbeing of children and youth.