OICR’s Genome Informatics team has created a unique software suite to manage Big Data in cancer research.
You’ve probably heard that Big Data is the future of healthcare.
An explosion of health information, fuelled in part by powerful new tools to sequence the human genome, is starting to revolutionize how diseases like cancer are studied, diagnosed and treated.
Looking across the OICR community you’ll see research projects producing huge amounts of genomic data, analyzing them, and exploring how they interact with other influences to understand how cancer develops and the best ways to stop it.
But data in isolation cannot make cancer care better. Genomics data must be securely stored, carefully managed and made easily accessible to researchers and health professionals who can harness it. And that’s a huge undertaking when you’re dealing with thousands of gigabytes of complex personal information.
At OICR, the Genome Informatics Program specializes in developing systems to store, organize and share genomics data. Their websites and software platforms support several Big Data projects led by OICR, and even some that go beyond the scope of the Institute and beyond the borders of this country.
Those platforms are driven by one foundational tool: an open-source, customizable suite of data management applications called Overture.
“Overture underpins just about everything we do,” says Dr. Mélanie Courtot, Director of Genome Informatics at OICR. “It allows researchers to collaborate, share data and make it discoverable for themselves and other researchers who are driving exciting innovations.”
Laying the foundation for Big Data platforms
The story of Overture began around 2008 with the launch of the International Cancer Genome Consortium (ICGC), a global initiative co-founded by OICR to collect and store data from 25,000 genomes covering 50 different types of cancer and make them accessible to researchers.
Tasked with building the ICGC database, the Genome Informatics team determined it had five core needs: a way to securely transfer files into the database, a way to manage the associated metadata, a way to index data, a way to search through data, and a way to authenticate users and give them access to the database.
The next few years would see major advancements in the field of genomics. As the Genome Informatics team was called upon to build other systems to manage genomics data, they recognized that many projects had the same core requirements as ICGC.
They came up with a way to avoid the wasted effort of developing the same basic features repeatedly. Instead of building out a full data management platform, they developed five distinct and adaptable pieces of software to meet their core needs. These were assembled and released as the first Overture open-source package in 2017.
“We decided to make these core elements more generic, so they could be adapted across different use cases,” says Courtot. “This is much more difficult to do, but it pays off in the end because of its versatility.”
As the years passed, the Genome Informatics team developed several more data platforms, gaining the attention of other data scientists who inquired about the tools used to build these platforms.
This interest led to numerous collaborations. The Genome Informatics team would use Overture components in projects like the Kids First data portal and the next phase of ICGC, called ICGC Accelerating Research in Genomics Oncology (ICGC-ARGO).
The suite has even been employed beyond cancer. In 2021, the Genome Informatics team used the Overture suite to build the Canadian VirusSeq Data Portal, a project aimed at sequencing 150,000 samples of the COVID-19 virus and making data available to researchers and policymakers. Building on the Overture suite instead of starting from scratch allowed them to build the portal in just four weeks, which was especially important in the midst of a fast-moving pandemic.
The building blocks of genomics research
The Overture suites five core applications – Song, Score, Ego, Maestro and Arranger – can be used together as a full data management system, or researchers can pick and choose the individual components that meet their requirements. Genome Informatics can also help researchers adapt and employ these components through both academic collaboration and consulting services.
“Think of the applications like building blocks,” says Mitchell Shiell, Outreach Lead for the Genome Informatics group. “If there’s a different piece of software you want to use for authenticating users, you can swap that in place and still use the rest of the Overture components.”
Overture’s modular nature makes it very scalable. If a Big Data project needs more capacity, you can duplicate services to balance the load. A suite like Overture could also help harmonize different data sets because data platforms can be built on the same underlying software and deployed as a federated network.
Overture is also designed to be extended and integrated with third-party applications to perform other data analysis and management functions. For example, the Genome Informatics team recently integrated JBrowse, a genome browser developed by Dr. Lincoln Stein’s lab, into the Overture suite.
“We’re not just building genomic data management systems in isolation. We’re contributing to a larger research software ecosystem,” Shiell says.
Driving discovery in Ontario and around the world
The Genome Informatics team’s latest contributions span a wide variety of Big Data projects. This includes working with scientists in South Africa to create a Pan-African platform to share data about pathogens and help address disease outbreaks across the continent.
Closer to home, Overture will power the Ontario Hereditary Cancer Research Network registry, which will collect data from participating Ontarians with genetic mutations that make them susceptible to cancer in the hopes of driving new research.
“With each project, we are learning and updating the suite to better support subsequent genomic research,” Courtot says.
With the field of genomics continuing to grow, Shiell says that data – and how it is organized with software like Overture – will play a major role in advancing the knowledge and treatment of cancer.
“When you make data discoverable, you’re driving discovery,” he says. “Ultimately, this will lead to better treatment options, helping people live longer and healthier lives.”
Overture is supported by a grant from the National Cancer Institute at the US National Institutes of Health, and additional funding from Genome Canada, the Canada Foundation for Innovation, the Canadian Institutes of Health Research, CANARIE, and the Ontario Institute for Cancer Research.