OICR platform gearing up to store vast amounts of data

Left: Dr. Francis Ouellette; right: David Sutton

While researchers in OICR’s new genomics labs are busy testing new DNA sequencers, OICR’s informatics and biocomputing team is working to find solutions to manage the vast amounts of data these researchers produce.

Led by Informatics and Biocomputing Platform Director Dr. Lincoln Stein, OICR’s informatics and biocomputing team has designed the infrastructure to process, store, and analyze the data generated in OICR’s labs and will create the software and data solutions to ensure that the data produced can be properly accessed and analyzed.

“Informatics and biocomputing are central to all biomedical research,” explains Francis Ouellette, OICR’s Associate Director of Informatics and Biocomputing. “It is basically a scientific approach to understanding and analyzing data. Bench scientists work directly with biological material and use sophisticated machines and technologies to investigate the abnormalities of cancer cells. We write and build software and databases to address the same biomedical problems.”

The field of informatics and biocomputing has grown significantly in recent years to respond to the increasingly massive amounts of data modern equipment generates. New technologies such as DNA microarrays and high-throughput genotyping create data that require larger amounts of processing and storage space than has ever been necessary before, as well as cutting-edge software and databases to interpret the large data output.

“Any large scale biomedical research activity requires computer analysis of the data,” Ouellette says. “These are very large datasets that can’t be analyzed without sophisticated software and databases. We are working closely with scientists at OICR to develop the most efficient rapid pipeline possible, allowing for analysis of large amounts of data quickly.”

David Sutton, OICR’s Director of Research IT Infrastructure is implementing the informatics and biocomputing systems. “Biocomputing and informatics is hugely technology dependant,” Sutton says. “Applications need to run in a fast, efficient and timely manner where researchers can focus on results, not on waiting for results.”

A typical run on the current DNA sequencers in OICR’s genomics labs produces between 500 and 600 gigabytes of data that need to be stored immediately. Researchers then need to access and analyze that data, producing another output file that goes back into storage. This creates huge pressures on the computer infrastructure, both in terms of processing such large files and making them quickly and easily accessible.

The informatics and biocomputing team hopes to stay ahead of the curve by building the system in phases as OICR opens new labs and requires more power and storage.

“We are building this infrastructure proactively to meet OICR’s ever changing technological needs,” Sutton says. “We went to the technology companies and asked them to work with us to build an adaptive solution that will work not only today, but will also scale to meet our future needs. This means we have a well-built system that can be tailored exactly to OICR’s current and future needs.”

Construction is currently underway on a high performance computing cluster with 20 nodes and 50 terabytes of storage. Within a few months, this will grow to about 100 nodes and 100 terabytes of storage. This will allow the unit to grow as more OICR labs open and more data is generated by OICR researchers.

The backbone of the system is a state-of-the-art server room. The room is a fully managed raised-floor environment where humidity, temperature and power can be monitored from anywhere in the world through remote notification devices. It will also function as a “lights out” data centre, meaning that all system administration tasks can be completed remotely and no one will need to go into the room for day-to-day activities. The floor is raised 18 inches to allow for essential air-conditioning to keep the servers from overheating. A second room is currently being retrofitted and Sutton expects the available square footage to grow from a current size of about 500 square feet to about 10,000 square feet in 2010, about one quarter of one floor in the west tower of the MaRS complex now under construction.

In the months ahead, Ouellette looks forward to the expansion of the infrastructure and creating ways for bioinformatics to assist other areas of research more efficiently. “The challenge for us is to be fully integrated so that wet lab people can be aware of what we are doing and vice versa. Working together to turn data generation into data analysis quickly and accurately – this is how we will translate what we are doing into patient care.”

Ouellette is the former Director of Bioinformatics at University of British Columbia Bioinformatics Centre (UBiC) in Vancouver. Before that, he served as GenBank coordinator at the National Center for Biotechnology Information, part of the U.S. government’s National Institutes of Health.

Date: 
November 1, 2007
Issue: 
4
Volume: 
1