OICR created the ICGC ARGO data dictionary to harmonize genomic and clinical data collected from around the world.
A new tool to better integrate data into one of OICR’s signature genomics research programs could help generate cancer discoveries by ensuring researchers speak the same ‘language’.
Now published in Nature’s Scientific Data journal, the ICCG ARGO ‘data dictionary’ provides guidance and standardized terminology to help researchers record, label and input information into data repositories using shared language.
This helps ensure new data can seamlessly be combined and compared with other datasets, allowing researchers to generate new insights about cancer, how it develops and how to treat it.
“The data dictionary is like a recipe card for your cancer data,” says Dr. Mélanie Courtot, Senior Director of Genome Informatics and a Principal Investigator at OICR. “It tells you what each data element is, how it’s measured and what values are allowed, so anyone can understand and reproduce the study.”
ICGC ARGO is the latest evolution of the International Cancer Genome Consortium (ICGC), an OICR-led initiative to collect and catalogue cancer genomes from around the world. The ARGO project takes the next step to integrate clinical information — including which treatments a patient received for their cancer and how they responded — alongside genomic data, creating a comprehensive database researchers can access for cutting-edge cancer genomics projects.

ICGC ARGO integrates data from institutions across 13 different countries, where data collection and classification practices can vary. This sometimes forces researchers to ‘clean up’ their data after the fact so it can fit into the database. Using the data dictionary from the start will save them from the work of retrofitting their findings.
“The data dictionary explains what essential details to enter and how to record them using standardized terminology,” says Hardeep Nahal-Bose, Senior Bioinformatics Data Manager at OICR. “This common language makes it easier to share and combine data from studies around the word, so scientists can work together to find answers about cancer faster.”
The ICGC ARGO data dictionary is interoperable with other common data standards and has been adopted by other global initiatives like the Marathon of Hope Cancer Centres Network. It is free and publicly available at docs.icgc-argo.org/dictionary