New software uses machine learning to identify mutations in tumours without reference tissue samples

DNA sequence

One of the main steps in analyzing cancer genomic data is to find somatic mutations, which are non-hereditary changes in DNA that may give rise to cancer. To identify these mutations, researchers will often sequence the genome of a patient’s tumour as well as the genome of their normal tissue and compare the results. But what if normal tissue samples aren’t available?

Drs. Irina Kalatskaya and Quang Trinh, researchers in OICR’s Informatics Technology Program, were part of a group that encountered this problem. “We were working to analyze about 1,500 early breast cancer samples from a clinical trial, however due to restrictions imposed by the IRB (institutional review board) we didn’t have access to the matching normal samples,” explains Kalatskaya. “So over the course of two years we designed and validated an algorithm that can predict somatic mutations in coding regions of the tumour genome without the use of matching normal samples.”

The algorithm has been released as a software package called ISOWN (Identification of SOmatic mutations Without matching Normal tissues), which was recently published in Genome Medicine and is freely available from GitHub. Rigorous testing and validation of ISOWN by Kalatskaya and Trinh revealed that it can correctly identify 95 to 98 per cent of somatic mutations. ISOWN uses a machine learning approach that takes information from various public databases as well as internal data characteristics to identify somatic mutations.

Matched normal tissue samples can be unavailable for a number of reasons, including a lack of funding to obtain them or when conducting retrospective studies using incomplete data from previous clinical trials. Making ISOWN available to the research community provides an opportunity to carry out studies that otherwise would not have been feasible.

“Obviously it is always best to have the actual matching normal samples for your research project, but we think that ISOWN provides a robust alternative in cases where these samples are not available,” says Trinh. He and Kalatskaya can’t say for certain how many researchers are making use of the software but within last couple of months have corresponded with users in Austria, China, the Netherlands, Russia and South Korea, suggesting a strong appetite for this type of tool.

Breast cancer Computational Biology and Genome Informatics Genomics Irina Kalatskaya Machine learning Open source Quang Trinh

Join our Mailing List