OICR’s Genome Informatics team announces international release of the ICGC-ARGO Data Platform, the all-in-one data hub for the largest clinical-genomic data sharing initiative in the world
We’re in the midst of an era of big data that is changing the way we understand the world – including how we study, diagnose and treat cancers.
Improvements in sequencing technology and computational power have allowed us to collect massive amounts of information about cancer patients and their tumours. This information, however, is only powerful if it can be accessed by those who can transform big data into new discoveries.
Over the last decade, OICR’s Genome Informatics has built a reputation for developing robust big data portals that provide cancer data access to thousands of researchers around the world. Now, the Genome Informatics team has set out to do it again – this time with bigger data.
At the International Cancer Genome Consortium Accelerating Research in Genomic Oncology (ICGC-ARGO) Virtual Workshop on June 22, OICR’s Genome Informatics Director, Dr. Christina Yung, announced the launch of the ICGC-ARGO Data Platform, an all-in-one hub for submitting, visualizing and accessing data from donors around the world. The launch corresponds with the initiative’s inaugural data release, which includes processed whole genome sequencing data from the first 177 donors.
“This launch represents years of learning from the research community while adapting to new technologies and standards,” says Yung. “We’re proud to enable the community to submit, curate and access these valuable data in one easy-to-use platform, but this is only the beginning.”
100,000 donors
The previous ICGC project, which is now referred to as ICGC 25k, was an international data sharing collaboration that analyzed genomic information from 25,000 donors around the world. At the launch in 2008, ICGC 25k was said to be the most ambitious biomedical research effort since the Human Genome Project.
Now, ICGC-ARGO aims to collect whole genome sequencing data and clinical data from 100,000 donors – meaning 4000 times more data than ICGC 25k – which poses significant technical and logistical challenges in developing a data platform.
“One of our key priorities was to design and develop the simplest possible tool for the job,” says Yung. “In software engineering, we call this a minimal viable product, which is the most distilled, elegant and minimalistic product as the foundation to achieve our goals.”
As one of the first steps to make this seemingly impossible task possible, OICR Genome Informatics streamlined and automated all data management and processing. Instead of processing data manually during specific data submission periods of the year, they’ve automated the process and allowed data curators around the world to submit data year-round.
Now, when a curator submits patient information, they automatically receive feedback on whether there are errors or typos in their data. This automatic “spell check” allows them to correct these mistakes in real-time rather than waiting for days to process and submit data.
“We did a rough calculation of how long it would have taken us to manage molecular data processing if we did it manually,” says Rosita Bajari, Senior Technical Business Analyst in OICR Genome Informatics. “We approximate it would have taken us 300 years just to go through a quarter of the data we expect to collect. We’re using technology to make something possible that simply wasn’t possible before.”
The OICR Genome Informatics team’s experience in data management and coordination has grown from previous experiences developing the ICGC Data Portal, the Genomic Data Commons Data Portal and the Gabriella Miller Kids First Data Resource Portal. Through extensive consultations with the research community and ongoing collaborations, they’ve designed new features and built their learnings into the ICGC-ARGO Data Platform.
“What we’ve created is a platform that enables researchers and clinicians to apply their time, effort and creativity to making discoveries instead of wrestling with technical inconveniences,” says Bajari. “The platform’s ease-of-use and speed-of-processing will allow them to access data in a friendly, ergonomic and efficient manner so they can turn these data into benefits for patients.”
On what’s possible
The ICGC-ARGO Data Platform will serve as an access portal to petabytes of high-quality health data that will be used by thousands of researchers for years to come. As data are collected, OICR Genome Informatics will continue to build in new functions and features for the community while adopting the latest standards in data curation and sharing such as those established by the Global Alliance for Genomics and Health (GA4GH).
“We’re working on functions to better store, aggregate, integrate, analyze and interpret these vast datasets,” says Bajari. “There’s much more work to be done but – for now – it’s about prioritizing new features that will bring the most value to our users.”
“What ICGC-ARGO will discover has yet to be realized,” says Yung. “But that’s exactly what is so exciting. We’re making tools that will enable researchers to potentially change the face of cancer.”
Learn more about ICGC-ARGO at ICGC-ARGO.org.