Research Centre for Medical Genetics
1 Moskvorechye St,
Moscow 115522, Russian Federation
Mo-Fr: 9:00 - 17:00
Рус

Gene geography celebrates its centenary in 2019. Back in the years of the First World War, they were surprised to discover that, although the blood types of all people were the same, they varied among the diverse nations with very different frequencies. The discoverers described it as follows: “Many peoples and races accumulated on the Macedonian frontline... And the differences in the frequencies of blood groups were found in the most ultimate way”. Moreover, the gene geography can help to predict in which groups of people with a similar genetic history the spectrum of hereditary pathology will be similar, and in which different. Development of genome-wide research, personalized medicine, pharmacogenetics, and forensic science promote to raise of the practical significance of gene geography basic research dramatically.

Basic research usually has numerous practical applications, sometimes unexpected. For example, our team determined the probable origin of the perpetrator of terrorist attack at Domodedovo Airport in 2011 as from the indigenous population of Ingushetia, having examined his Y-chromosome, which made it possible to identify the perpetrator within a few days.

BIOBANK OF NORTH EURASIA

The long-term efforts of our team, a number of employees of the Genomic Geography Laboratory of Vavilov Institute of General Genetics (VIGG) RAS and many research teams in Russia and neighboring countries created the Biobank of the indigenous population of Northern Eurasia, which is the world's largest collection of biological samples of the indigenous populations representatives of the Northern half of Eurasia.

The Biobank of Northern Eurasia contains more than 30 thousand biological samples of 300 populations of 100 ethnic groups (Fig. 3) of Eastern Europe, the Caucasus, the Volga-Ural region, Central Asia, Siberia and the Far East (including the republics of Adygea, Altai, Bashkortostan, Buryatia, Dagestan, Ingushetia, Kabardino-Balkaria, Kalmykia, Karachay-Cherkessia, Karelia, Komi, Crimea, Mari El, Mordovia, Sakha, North Ossetia, Tatarstan, Tyva, Udmurtia, Khakassia, KHMAO, Chechnya, Chuvashia) and 17 foreign countries - Abkhazia, Azerbaijan, Alania, Armenia, Afghanistan, Belarus, Georgia, Kazakhstan, Kyrgyzstan, Lithuania, Macedonia, Mongolia, Uzbekistan, Ukraine, Tajikistan, Turkey, Sri Lanka.

Map of populations, which biological samples are included in the Biobank of Northern Eurasia.
Translation: More than 30 000 samples of 300 indigenous populations of the North Eurasia were collected by E.B. Balanovska’s research team

To replenish it several expeditions are held annually. They have a specific feature, that is the unified high requirements to the field studies meeting the best international standards: the venous blood samples are collected (> 90% of samples); mostly from men (> 90%) over 18 years of age who are not related to a depth of at least three generations; all the ancestors of the surveyed, at least three generations, attributed themselves to that ethnos and were born in that population; a population sampling was 75-100 people on the average; several populations from different parts of the area usually represented the ethnos; a mandatory written informed consent to the survey; the ethics committee controls the work.

DATABASES AND GENE GEOGRAPHIC ATLASES

Databases are not just the repositories of information, but the data processing systems. The databases on the variability of Y chromosome (Y-base) and mitochondrial DNA (“MURKA”) in the world's population created by our team have become the most complete and received recognition from the international scientific community. Autosomal STR Markers of Russia, Russian Gene Pool, Russian Surnames databases were created. Databases on distribution of genome-wide autosomal panels and exome sequencing data in the population of Russia and the world are being created.

The most important area is the cartographic analysis of the gene pools and atlases creation. It allows you to see the gene pool spatial variability with your own eyes. We developed the GeneGeo software for a gene-geographic analysis of the gene pools meeting the best cartography standards and used in the GENOGRAPHIC international project, the largest in the field of human population genetics.

GenGeo provides a multivariate analysis of not only a single feature, but also more complex types of population analysis (the analysis of genetic distances and boundaries, interpopulation variability, correlations, etc.). Therefore, the map is not an illustration, but serves as a research tool.

We created a series of world and regional atlases on the variability of Y-chromosome, mitochondrial DNA, various parameters of the autosomal gene pool, analyzed using genome-wide data and exomes, the Russian Surnames Atlas, the Russian People Anthropology Atlas, the Autosomal STR Markers of Russia Atlas used in forensic science.

Here are some examples of the integrated use of the biobank, databases, bioinformatics and cartographic technologies.

ANALYSIS OF THE WORLD’S POPULATIONS GENOMES ORIGIN

Analysis of ancestral great-populations based on genome-wide data (ADMIXTURE) simulates the contribution of alleged ancestral great-populations for each individual in a loaded data set. In other words, the ADMIXTURE program receives only a genetic data table for a multitude of samples at input and only the number of alleged ancestral great-populations for all of them (from k = 2 and above), while the program does not take into account the belonging of specific samples to specific populations. At the output, the researcher gets an assessment of the contribution of each of the k great-populations for each individual, and further, the structure of each of the studied populations having grouped the individuals according to their belonging to the populations.

Fig. 4a shows the analysis of ancestral components of the world’s populations using the ADMIXTURE method with six alleged ancestral great-populations (k = 6). The mapping of each ancestral component (Fig. 4b - 4e) allows us to reconstruct the center of the ancestral great-population origin and the present range of its distribution.

Maximum of the ancestral component k1 (Fig. 4b) is characteristic of the indigenous population of sub-Saharan Africa, its share in North Africa and the Middle East is much smaller, the minimum is in South-West Asia and Spain, and the ancestral component k1 is practically absent in the rest of Eurasia. The ancestral component k3 (Fig. 4c), by contrast, predominates in European populations and the contribution of this hypothetical great-population decreases from northwest to southeast of Eurasia. The ancestral k5 component distribution map (Fig. 4d) represents the exact opposite trend: its maximum is in Central and Eastern Siberia, and then a gradual decrease follows in the direction from northeast to southwest: it is clearly expressed in the populations of Western Siberia and Trans-Urals Central Asia, but virtually absent in Europe, the Middle East and South Asia. The maximum of ancestral k6 component (Fig. 4e) falls on Southeast Asia with a noticeable manifestation in Southern Siberia and Central Asia and actual absence in Eurasia subarctic region, Europe and the Middle East.

Whole genome data analysis: ancestral components

Problem: if all peoples of Eurasia and Africa had only 6 ancestral groups (it is not known what and wherefrom), how did their contribution to the today’s population distribute?

Gene components: from 1 to 6

ADMIXTURE analysis of the ancestral components of the world’s populations with six alleged ancestral great-populations (k = 6): mapping each ancestral component allows you to reconstruct the center of its origin and area

Translation:

1k Африканский - изученные популяции
1k African - studied populations

3 k Общеевропейский - изученные популяции
3k Pan-European - studied populations

5k Сибирский - изученные популяции
5k Siberian - studied populations

6k Восточноазиатский - изученные популяции
6k East Asian - studied populations

DETECTION OF THE AREAS OF NEW HAPLOGROUPS FOR DETAILED ANALYSIS OF POPULATIONS GENETIC HISTORY

The technologies of the whole genome Y-chromosome analysis help to identify many young thin branches among the “old” haplogroups (thick “trunks” of the Y-chromosome tree), usually confined to a limited area and therefore highly informative to determine the origin.

We have developed an approach based on the integrated use of biobank, databases, bioinformatics and cartographic technologies, which includes four stages:

(1)   sequencing of the entire main part of the Y chromosome for several samples of the haplogroup under study;

(2)   building a phylogenetic tree according to this data, determining the branches age and choosing the defining SNP for each branch;

(3)   mass screening: genotyping these SNPs for many representatives of many indigenous populations; a biobank, containing samples from all key populations of the entire area of the country and neighboring countries is required to complete this stage;

(4)   mapping distribution of branches. The set of distribution maps for all branches of different hierarchical levels, analyzed together with branch dating (obtained in stage 1), allows us to develop a haplogroup holistic history model, which, in turn, is one of the projections of demographic population history.

Figure 5 shows the example of this approach. Haplogroup N is so widespread across almost the entire area of ​​Northern Eurasia (Fig. 5a) that you can say nothing about the genetic history of the population when you find it therein: you cannot identify if it was received from Yakuts, Tuvans or Estonians. However, the implementation of the first two stages of the approach made it possible to subdivide it into sub-branches (Fig. 5 b), and the implementation of 3 and 4 stages — the mass screening and mapping of the branches distribution, revealed their unique ranges (Fig. 5 c). Now, it is possible to analyze the genetic history of those populations in detail where one or another sub-branch of haplogroup N was found.

Identification of N haplogroup sub-branches, confined to a limited territory and therefore highly informative for determining the origin (population or individual).

Translation:

Гаплогруппа N подразделена на субветви, благодаря секвениронию Y-хромосомы
Haplogroup N is subdivided into sub-branches by the sequencing of the Y chromosome.

Каждая выявленная субветвь имеет собственный географический ареал в Северной Евразии
Each identified sub-branch has its own geographical area in Northern Eurasia.

Это позволяет детально анализировать генетическую историю популяций
This allows for a detailed analysis of the genetic history of populations.

It is possible to make such analysis both in space and in time.

Fig. 6 shows the reconstruction of haplogroup Q3 phylogeography based on a full sequencing of Y chromosome. In the Neolithic, haplogroup Q3 gave rise to five branches (Q3a - Q3e), which later spread to West, Central and South Asia. In the Bronze Age, Q3a split up into seven branches. One of them, Q3a1, was acquired by the population ancestral to Ashkenazi Jews and increased in the 1st millennium AD as a percentage within this population, reaching 5% in today’s Ashkenazi Jewish gene pool.

Reconstruction of the haplogroup Q3 phylogeography in space and time

Translation:

Карты распространения ветвей гаплогруппы – реконструкция ее носителей
Haplogroup branch distribution maps - reconstruction of its carriers

ПАЛЕОЛИТ: возникновение в Центральной Азии
PALEOLITE: emergence in Central Asia

МЕЗОЛИТ и НЕОЛИТ: разделение на пять редких ветвей (от Q3a до Q3e), распространение в Юго-Западную Азию
MESOLITE and NEOLITE: divided into five rare branches (from Q3a to Q3e), spread to South-West Asia

БРОНЗОВЫЙ ВЕК (3-4 тыс. лет назад): Быстрая экспансия ветви Q3a, ее разделение на 7 ветвей, некоторые из которых мигрируют в Европу
BRONZE AGE (3-4 thousand years ago): Rapid expansion of Q3a branch, its division into 7 branches, some of which migrate to Europe

РАННЕЕ СРЕДНЕВЕКОВЬЕ: Субветвь Q3a1 проникает в популяцию, предковую для евреев-ашкенази и ее частота растет вплоть до 5 %у современных ашкенази
EARLY MEDIEVAL: Q3a1 sub-branch penetrates into the population, ancestral to Ashkenazis Jews, and its frequency rises to 5% in modern Ashkenazis.

Our team carries out the similar analysis for a number of other haplogroups (C, Q, G1, G2, R1b, R1a, etc.).

RECONSTRUCTION OF EVENTS OF SEPARATE POPULATIONS GENETIC HISTORY

The technology of cartographic, phyogeographic and whole genome analysis of Y-chromosome makes it possible to analyze not only the history of the world gene pool, but also individual events of the genetic history of peoples (Fig. 7-9).

The map of total frequencies of Eastern Eurasian Y-chromosome haplogroups makes it possible to see the mark of a powerful migratory wave in the modern population that swept from Mongolia to Kazakhstan and Central Asia, and stopped at the Volga (Fig. 7). Further analysis of the sub-branches of Eastern Eurasian Y -chromosome haplogroups clarified this picture and allowed to date the events that left that mark.

Карта суммарной частоты восточно-евразийских гаплогрупп Y-хромосомы

Translations:

Следы империи Чингисхана? (анализ «отцовских» линий – гаплогрупп Y-хромосомы)
Красные тона – высокая суммарная частота центрально-азиатских вариантов
Traces of the empire of Genghis Khan? (analysis of "paternal" lines - Y-chromosome haplogroups)
Red Colors - High Total Frequency of Central Asian Options

Карта суммарной частоты восточно-евразийских гаплогрупп выявляет след мощной миграционной волны, накатившейся из Монголии, через Джунгарский проход, в Среднюю Азию и Казахстан
Предел распространения этой миграционной волны четко виден на карте. Эта границы совпадает не с Уральским хребтом, а с реками (Камой, Волгой и Доном)
The map of total frequency of the Eastern Eurasian haplogroups discovers the mark of a powerful migration wave that rolled from Mongolia, through the Dzungarian pass, to Central Asia and Kazakhstan.
The limit of distribution of this migration wave is clearly visible on the map. This border does not coincide with the Ural Mountains, but with the rivers (the Kama, Volga and Don)

Phylogenetic analysis of generic and territorial structure of populations (Fig. 8) allows us to separate the genetic relationship from the legendary one and to date the events of genetic history. A whole genome analysis of the Y-chromosome haplogroups made it possible (Fig. 9) to say that the legendary genealogy of Kazakh clans (genus) has a clear genetic confirmation. Since the historical sources know the lifetime of an ancestor of Argyn genus, this made it possible to calculate the mutation rate of Y chromosome. Based on this rate, it is now possible to calculate the time dates of the events of other nation’s genetic history.

Phylogenetic analysis of genus structure

Translation:

Филогенетический анализ гаплогруппы R1a по панели быстро мутирующих 17 STR маркеров различных родов (кланов) и территориальных групп башкир
Phylogenetic analysis of haplogroup R1a on a panel of rapidly mutating 17 STR markers of various clans (genus) and territorial Bashkir groups

Табын, Катай, Кудей, Кошсо, Улей, Северо-восточные башкиры (другие кланы), Северо-западные башкиры, Юго-восточные башкиры, Юго-западные башкиры
Tabyn, Katay, Kudey, Coshso, Uley, Northeastern Bashkirs (other clans), Northwest Bashkirs, Southeast Bashkirs, Southwestern Bashkirs

Выявлен кластер северо-восточных популяций башкир, указывающий на:
1) генетическое, а не только легендарное родство башкирских родов (кланов);
2) позднюю миграцию за запад Башкирии
3) ассимиляцию северо-восточных башкир
A cluster of northeastern Bashkir populations has been identified, indicating:
1) genetic, and not only the legendary kinship of the Bashkir clans (genus);
2) late migration to the west of Bashkiria
3) assimilation of the northeastern Bashkir

Translation:

Традиционная генеалогия блестяще подтвердилась данными о полногеномном анализе Y-хоромосомы
Traditional genealogy had outstanding confirmation by full genome data analysis of Y-chromosome

Караходжа – предок –основатель казахского рода Аргын с известным временем жизни
Karakhodzha - the ancestor- founder of the Kazakh genus of Argyn with a known lifetime

Генетическая реконструкция родства
Genetic reconstruction of kinship

Генеалогические предания о родстве
Genealogical legends about kinship