Development of new statistical tools for improving the detection of population substructure at individual level with applications in humans

Detecting population substructure in Homo sapiens from the genetic diversity of the species is cumbersome given its evolutionary history and recent origin. Nevertheless, population substructure represents a serious threat when conducting GWAS and properly identifying its signature is essential for interpreting the different evolutionary processes (both demographic and selective) that shaped the genome of the species. Given its importance, estimating ancestry proportions from an individual is an extremely active field within the community of population genetics, and in particular in human population genomics. Several algorithms have been proposed for unraveling hidden population substructure. However, it has been suggested that the proposed methods show a list of limitations, both in biological and technical terms.

First of all, the algorithms do not model the relationship between the ancestral populations. As a consequence, several demographic scenarios can produce the same output and interpretation of the outcome is complex. This situation is even more complex when considering both ancient and modern samples at the same time. The algorithms do not correct for the temporal difference among the samples, thus producing a systematic bias on the estimated proportions of ancestry in the ancient sample. Second, the sensitivity for estimating the ancestry proportions considerably differ among the different algorithms; even in the simplest demographic scenarios, the mean deviation of the estimated ancestry can differ up to 5% with regards to the real one. Third, so far the proposed algorithms do not consider polymorphisms at low frequency, despite these variants represent a considerable proportion of the genetic variation of the species and they can be more informative for detecting population substructure. Furthermore, including these markers can produce artifacts in the results from the different algorithms.

The PopSub project proposes developing new tools for alleviating these reported problems with applications in detection of genetic variants associated to particular phenotypes and/or polygenic traits.

Pop Sub is a Plan Nacional project funded by the 2015 call “Proyectos EXCELENCIA y Proyectos RETOS” of the Spanish Ministry of Economy and Competitiveness.


Oscar Lao