Researchers from the CNAG-CRG led the development of a method to smooth the differences emerging from samples’ origin and production, helping researchers assess the quality of the samples before analysing their genomic content

BCN, 7 October, 2020.- Cancer is a disease caused by mutations that make cells lose their capacity to function normally and regulate their growth. Each tumour arises from specific mutations across distinct genomic regions. Scientists identify this genetic variation by comparing the genome of normal cells to the genome of cancer cells. This helps scientists and doctors identify the mutations that are causing the cancer. Some cancer variants allow different treatment, some more effective than others.


Sequencing techniques are expensive and patient samples are not easy to obtain. In the past few years, scientists have been working in large international consortiums such as the Pan-Cancer Analysis of Whole Genomes (PCAWG) to analyse cancer from different patients around the world and build a reference for 38 different types of tumours.


The data generated by PCAWG consisted of almost 3000 normal-cancer genome pairs sequenced at 18 different research institutions. Researchers collected vast amounts of data in just five years, during which methods evolved rapidly. As they did this, they realised that they had been using different sequencing protocols and quality control criteria. This made it difficult to compare each samples’ genomic content, limiting the extent of conclusions researchers could draw from the data.


Researchers from the Centro Nacional de Análisis Genómico (CNAG-CRG), led the development of a method to smooth the differences emerging from samples’ origin and production, helping researchers assess the quality of the samples before analysing their genomic content. Their findings are published in Nature Communications. According to this method, scientists assess five parameters of each normal-cancer genome pair to rate them according to their quality.


“If we compare different M-size cotton t-shirts produced by different clothes brands, the differences in patterns or the materials used to make them will result in a kaleidoscope of the quality of the final product,” says Ivo Gut, principal investigator and last author of the study. “This makes them incomparable and, maybe, the poor quality ones will not meet the standards required to enter in the market. The same can be said about genome sequencing.”


The developed framework is highly valuable because it discriminates low quality results and avoids drawing false conclusions based on lower scoring genomes or masking relevant results.  The exclusion of low quality samples provided by this method enables the use of sequencing data provided by different institutions to maximize the number of data included in each genomic study.  


“To avoid comparing apples to oranges, tools like ours are crucial to aid large, ambitious international efforts such as the 1+Million Genomes project, which will lay the foundation for sharing cancer genome sequencing data across Europe”, concludes Ivo Gut. “This is just one of many quality control frameworks we must establish. The analysis of exomes or genomes for the identification of variants causing rare diseases are to be tackled next.”

Europe is pursuing the ambitious goal of sequencing at least 1 million human genomes by 2022. This will improve the health of EU citizens by providing new insights into diseases of genetic origin such as cancer and various types of rare diseases.


Work of reference

Framework for quality assessment of whole genome cancer sequences


We acknowledge support of the European Regional Development Funds by the Ministerio de Ciencia e Innovación corresponding to the Programa Operativo FEDER Plurirregional de España (POPE) 2014-2020 and by the Secretaria d’Universitats i Recerca, Departament d’Empresa i Coneixement of the Generalitat de Catalunya corresponding to the Programa Operatiu FEDER de Catalunya 2014-2020.