Artikel Human Population

Rabu, 12 Juni 2013

Genetics and racism (3)

In the previous posts of this series on genetics and racism, I talked about two recent academic disputes over human races. With this post I hope to give a wider overview of what biology has to say about species, breeds and races.

Darwin’s pigeons

Moderngenetics was born in 1900 with the re-discovery ofMendel's laws. Since theNeolithic Revolution, genetics had been an empirical art. Our ancestors isolated most of the breeds of animals and plants that we know today, i.e.groups that carry a trait of interest to the next generation when crossed together (for instanceChihuahuas are small dogs and Great Dane are large dogs).

But over the generations, pedigrees got lost in the myst of time and the overwhelming differences between some breeds of the same species raised the question whether they share the same natural origin. Before Darwin, it was difficult to imagine that the Chihuahuas and the Great Dane would have a common ancestor, and the theory went that breeds actually came from different species. This is actually one of the first questions tackled by Darwin in The Origin of Species. In the following passage, he exposes his conclusions after a hybrid cross between different breeds of pigeons.

Great as the differences are between the breeds of pigeons, I am fully convinced that the common opinion of naturalists is correct, namely, that all have descended from the rock-pigeon (Columba livia), (...) I crossed some uniformly white fantails with some uniformly black barbs, and they produced mottled brown and black birds; these I again crossed together, and one grandchild of the pure white fantail and the pure black barb was of as beautiful a blue colour, with the white rump, double black wing-bar, and barred and white-edged tail-feathers, as any wild rock pigeon!

Even if Darwin could not figure out the genetics of pigeon colour, he understood that the fantail and the barb were different exploits of the variability of the rock pigeon. And in fact, domestic breeds are not part of a natural, pre-established order, they are human artifacts.

Bits and species

In the quote above, Darwin refers to the rock pigeon as Columba livia. These Latin and Greek names were introduced as a part of the great endeavor of the Swedish naturalistCarl Linnaeus to classify all living organisms. Taxonomy, the classification of living organisms, rests on the tacit assumption that species form essentially distinct groups. However, speciation (the appearance of new species) is an ongoing process. Between the time when a species does not exist and the time it comes into existence, there is a grey zone where the population is somewhere in between one and two species.

What is the point of classifying organisms if the classes keep changing? As traumatic as the theory of Evolution was for taxonomy, it gave it a more meaningful and noble aim, namely to recapitulate the tree of life. In other words, it is understood that whenever possible, taxonomy must coincide with phylogenetics. This principle in mind, Ernst Mayr proposed the definition of species that is the most commonly accepted today.

Groups of actually or potentially interbreeding natural populations, which are reproductively isolated from other such groups.

In other words, two individuals are from the same species if they can have fertile descendants together. Why is that a better criterion than having the same number of legs, or the same hair color? Reproductively isolated populations have no way of exchanging their genes, so their characters will diverge as time goes by. Conversely, if a character appears in a group of inter-fertile individuals, it can spread by descent in the population, which remains more homogenous over time.

This definition is not without difficulties, such as the existence of ring species (groups of species that can be inbred with their geographical neighbors but not with the neighbors of their neighbors) or populations without sexual reproduction. Yet, it captures an essential aspect of the genetic dynamics of natural populations.

For races though, things get more difficult. Again according to Ernst Mayr

a subspecies is a geographic race that is sufficiently different taxonomically to be worthy of a separate name.

So here the purpose is to know what we are talking about and keep the collections of stuffed animals well ordered in the museums. This definition admirably illustrates the difficulty of the concept of race. Species can show geographical variations, but is it a property of the geography or a property of the population?

Ceci n’est pas une race

Finding objective criteria to delineate races is still an ongoing debate among biologists. One of them in particular,genetic diversity — which was already touched upon in the first post of this series — is often put forward as a natural choice. Is it reasonable to qualify as different races two populations that are sufficiently different from the point of view of genetic variation? Since genetic differences keep increasing between species, it is tempting to use it as a measure of ongoing speciation.

An important issue, mentioned by Johnathan Kaplan in the comments of thefirst post of this series, is that the way of measuring genetic diversity matters. There is a cornucopia of indicators out there, and the attitude is often to consider them equivalent. In reality they measure one of three different quantities, namelydiversity (allele richness),differentiation (allelic distance) or heterozygosity.

Most studies of the variability of human races use indicators of the third type, which do not meet the expectations of conservation biologists. Kaplan and Winther^[1] give the example (due to Lou Jost) of two sub-populations, each with ten equally frequent alleles, and such that each allele is in only one population. Using the same measure asLewontin, we would conclude that 77% of the variability is within sub-populations (the calculation is detailed in the technical section below). Still, we would lose 50% of the alleles should one of these sub-populations disappear. The choice of heterozygosity is somewhat arbitrary and uninformative regarding diversity and differentiation. However, the choice of one or the other will lead to the conclusion that the differences between populations are large or negligible, giving support for or against the existence of races.

But genetic diversity is not universally recognized as a good criterion. Genetic differences do not need to be large between different species because speciation does not need to be a gradual phenomenon. Mosquitoes Anopheles gambiae actually consist of two reproductively isolated populations that constitute two species separated less than 10,000 years ago^[2]. Only three regions representing ~ 1% of the genome are fixed between the two species. The alleles of the remaining 99% are the same in both species, but they occur with different frequencies. In this example, a definition of race based on genetic diversity would lead to the paradoxical conclusion that Anopheles gambiaeconsists of two species, but of a single race.

Genetic clustering of the Europeans by Principal Component Analysis shows that genetic fingerprints correlate with geographic origin of the individuals. Reproduced from reference [3] with permission from John Novembre.

Even when genetic differences are small, human races can be discriminated byclustering analysis, provided enough loci are considered — this is the argument put forward by Anthony W.F. Edwards in Lewontin's fallacy (see the first post of this series for the details). Population history and local inbreeding leave marks on the genome. In Europe for instance, genetic markers can reveal your origin within a 500 km radius^[3](see figure above).

If genetics can discriminate human populations, why don't we use them as races, you may ask? And I would return the question, why would you use populations as races?The term race is loaded with so many connotations that it cannot be used in a purely scientific context. It has never been, it probably never will. It does not belong to genetics to define human races.

Human Population Genetics and History

Population genetics is one of the tools we can use to reconstruct the prehistory of humans. By looking at the patterns of genetic diversity present in modern humans (and sufficiently well preserved ancient human remains), we can uncover evidence that favors or disfavors

For example, in the 1990s, it was discovered that human genetic diversity, particularly in the paternally inherited Y chromosome and the maternally inherited mitochondrial DNA, was surprisingly small. This led to the ascension of the “Out of Africa” model of human origins, where all modern humans are descended from a population of Anatomically Modern Humans who most likely evolved in East Africa and then expanded around the globe, emerging from Africa as recently as 100k years ago, and replacing the groups that had populated Eurasia for hundreds of thousands of years.

More recently, large-scale genome sequencing, along with sequencing of DNA recovered from ancient remains of human relatives like the Neanderthals, have complicated the picture yet again. It now appears that the replacement of non-African populations by Anatomically Modern Humans involved a certain degree of gene flow. For instance, it seems that modern humans outside of Africa inherited something like 4% of their DNA from the Neanderthals.

In my own work, I have looked at how the patterns of genetic relatedness can shed light on the social structures of ancient humans. For example, how large were ancient human groups? How connected were they by migration? Were there differences in the migration patterns of males and females? Meaningful answers to these questions require the principled integration of genetic data with information from other sources, including ethnographic data on modern hunter gatherers, archaeological records, and linguistic patterns.

Department of Human Genetics at the University of Utah

The Department of Human Genetics is dedicated to studying the genetic control of development and disease. Research interests of our faculty are wide-ranging and include the identification of genes implicated in human disease using the major model systems for genetic research: C. elegans, Drosophila, mice, and zebrafish. Our research interests include bioinformatics, genomics, statistical genetics, population genetics, clinical genetics, and evolution. Evolutionarily-conserved genetic pathways important for development, growth, and physiology are a major focus of study as well as the genetics underlying disease risk and complex disease traits. Researchers in the Department collaborate widely with both basic science and clinical labs on campus. Our faculty also participate actively in graduate education. The Eccles Institute of Human Genetics houses graduate programs in Genetic Counseling and Molecular Biology as well as the Genetic Science Learning Center, which develops science and health education materials for the public and public educators.

Human Population Genetics and Society

The ability to sequence an entire human genome has ushered in a whole new era of scientific study. Along with this ability has come the realization of the incredible genetic similarity between humankind as a whole, juxtaposed by the minute differences that cause the uniqueness of each individual. Intrigued by the genetic similarities and differences among peoples, some researchers have looked to see if these similarities and differences could be grouped and in what ways. One such study was conducted by a team of scientists led by N.A. Rosenberg in which they concluded that genetically, people seem to fall consistently into six groups that correspond to certain geographic areas. While this study is fairly thorough and lacks obvious bias, the inaccessibility of this specialized scientific knowledge allows for the data to be easily misconstrued within society. The scientific data and the social misconstructions that stems from this data are inevitably intertwined, illuminating the need for interdisciplinary mediation and understanding.

In 2002 a team of population geneticists headed by Noah Rosenberg performed a study, “Genetic Structure of Human Populations”, looking at genetic similarities between the genomes of 1056 individuals from 52 populations. By viewing the same specific locations of each individual’s genome, the team could then compare similarities and differences between them. They then used a computer program called STRUCTURE, which analyzed this data and

allotted individuals into pre-assigned groups based on their similarities. For example, STRUCTURE could be set to specify two groups among these individuals and would then place individuals into these groups based on percentage of similarity to each other. Rosenberg et al. found that 6 was the optimal number of clusters, or groups, that individuals naturally fell into. That is, STRUCTURE found that individuals could be placed into groups with a much nearer to 100 percent similarity to each other when they were assigned into six groups than when any other number of groups were specified. The team found that the individuals comprising these six groups were also mostly found to be from similar geographic regions. Thus, they concluded that people from the same geographic area are more genetically similar to one another.

Science is not infallible; it is undeniably a field of study that makes mistakes and is constantly evolving. There are also scientific studies that are done well and ones that are not. The study we are examining, “Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure” by Rosenberg et. al., is science done well, taking as many variables into account as possible as well as being self critical of their own methodology. After their first study, many critiques were made concerning the study design of Rosenberg et al.’s research. Serre and Paabo posed one specific critique of the study. They posed that the geographic dispersion of the sample inherently led to increased clusteredness. In response to these critiques, Rosenberg et al. re-conducted their study in 2005, and looked at the effect different aspects of their study design had on their results. The study design variables, such as sample size or the geographic dispersion of samples, were analyzed one by one, holding all other variables constant in order to determine a fair assessment of the effects of each. The team found their results to be “robust” to any study design techniques. Even taking into account suggested method biases; the data still seemed to point to six genetic clusters as the best fit for grouping genetic similarities.

Yet there are a couple of outstanding issues with this data and its findings. One is the number of subjects and the geographic dispersion these samples come from. Scientifically speaking 1056 subjects is quite extensive, but ultimately, in order to fully support the broad claims this study makes about people and their genetic similarities, many more samples should be used. We understand that sequencing the human genome for almost the entire world is highly unlikely and the processing of that amount of data could take decades, but the optimal support for the claims being made would be a larger, more uniformly distributed, sample size that Rosenberg and his don’t have.

Also, there is a lack of statistical support. While Rosenberg et al. used clusteredness to analyze the probability of individuals falling into pre-assigned groups, there was limited analysis done on the probability of observing this data given different values of K. At the time this study was conducted, the program STRUCTURE did not have a statistical technique that would analyze the highest likelihood of the number of clusters. For example, this new technique would be able to statistically identify if six was the optimal number of clusters for this data, based on the likelihood of observation given K. Scientists have yet to run this statistical analysis of Rosenberg et al.’s STRUCTURE results. In Rosenberg et al.’s initial study, they performed runs with STRUCTURE using values of K ranging up to 20. Having never performed the newly created statistical analysis on the likelihood of the clusters, then the question still remains, what is the justification for positing that 6 is the “best representation of human genetic variation”.

While the results of this study do not provide a genetic basis for race, the data has in many instances been taken as proof of genetic, or inherent, differences between races. The fact that there appears to be some kind of connection between geographic origin and allele frequency lead many to jump to the conclusions. It is true that people who have lived in the same geographic region for many generations may have some genetic similarities (due to adaptations to environment, mating, and other factors), but there is no genetic material that is found in all members of one population and no members of others. The main trouble with saying that these differences in allele frequencies are race based is that this gives people a way to try to attach complex environmentally and socially determined behaviors, like “propensity to violence” and intelligence, to genetic variation.

In the medical field, the use of genetics can promote the resurgence of a very essentialist form of thinking. Troy Duster, Professor of Sociology and Director of the Institute for the History of the Production of Knowledge at New York University, draws connections between this surge of essentialist thinking and the new biotechnologies that tend to show greater frequency of genetic deficiencies among “at risk populations”, which typically correspond to specific ethnic groups. He warns against the risks of stigmatization, discrimination, and marginalization of groups in a society where power is not evenly distributed. Some time in the near future, he believes various institutions could determine access to education, employment and insurance using genetic screening tests. Duster believes that these individuals could become “biological pariahs” deemed “unfit” due to their biological status. Advances in the human genome and the understanding of human differences, through such studies as the one presented, have the ability to“prove to be empowering, or it could simply add to the legacy of hatred, alienation, and distrust”.

We have seen in the past how science has claimed to have found racial difference, and associated value with those differences, in examples like Morton’s skull measuring or in IQ testing, so to find an acclaimed genetic difference amongst races would certainly prove disastrous for generations of advocates of human equality. Yet, this study never claimed to have found racial differences among populations. Through understanding that what society thinks about this study and what it actually claims to have found, we see that Duster’s rejection of the validity of ancestry and its relationship to race its completely applicable. What Duster does not address is that the study done by Rosenberg et. al. does not claim to have ever determined genetic racial difference. It claims to have found 6 specific populations that generally coincide with major geographic locations. The genetic similarities among the populations and the differences between the populations demonstrate that around the world there is genetic variation.

When first looking over this study, it was clear that this article was not written for the general public. When reading this article you need to have an advanced knowledge of the subject matter to comprehend what is being said. As we mentioned above, it would be very easy to misunderstand and/or misinterpret the contexts when being read by an average person with no complex knowledge of human genetics.Although the article was made available to the public its findings are still unclear to most who read it. This lack of explanation is one of the main

problems with human genetic studies and their findings. At the end of the study Rosenberg et. al state that “the arguments about the existence or non-existence of “biological races” in the absence of a specific context are largely orthogonal to the question of scientific utility, and they should not obscure the fact that, ultimately, the primary goals for studies of genetic variation in humans are to make inferences about human evolutionary history, human biology, and the genetic causes of disease.” They place this at the very end of the article as a disclaimer of their social agenda; how they are not trying to scientifically claim race. This one sentence, social explanation at the end of the study is not enough to clarify in the publics mind the true results of the study. The fact that this sentence was even included means that the scientists realized that the results of this study could be misconstrued; yet no further explanation is given as to what the results of their study really mean and its possible implications.

Troy Duster also has much to say about this study and its faults. In a presentation given as UCSB, Duster concludes his lecture with the statement “the claims that are being made in sober circumstances, on PBS, about ancestry, have no valid basis”. The presentation that Duster gave and the conclusions he came to were in direct reference to this study done by Rosenberg et. al. From a sociological perspective, the concerns that Duster has with this study are valid; however, in his interpretation of the studies’ results seems to illuminate his sociological perspectives overshadowing the scientific data and its intended results. We have indicated that those not privy to the specialized knowledge and research it is investigating have misinterpreted this study as a validation for “race”. Troy Duster certainly realizes, and is speaking to, the impact of this interpretation among the general public. Making the argument for and claiming to have scientifically proved genetic racial difference would have numerous effects on society.

The science world cannot be divorced from the social world. It is absolutely impossible for scientists to separate their studies from individual and societal worldviews. When considering why this study was even done by the U.S. born Rosenberg it is difficult to believe that an interest in race and American society was nonexistent. Even in something as simple as allowing study participants to self- identify native ancestry disintegrates the notions of a completely objective study from the very beginning. To present scientific information that is as important and controversial as the Rosenberg study without significant mentions of the social implications of said study is irresponsible. The voice of the scientific community is exceedingly necessary for proper interpretation of presented material. Their expertise is invaluable. Data means little without interpretation. On the flipside, sociologists need to respect the work of scientists and the society at large need to take the time to attempt to understand the meaning

and validity of these scientific findings. Although historically there have been reasons for distrust of science, sociologists need not assume that science actively seeks to create or maintain racial hierarchy. In science there is truth and means for understanding our world. The scientific community and the social science community need to work together to make their work accessible and understandable for each other as well as the general public. Proper analysis of the complex ways in which society and genetics interact requires interdisciplinary discussion, open-mindedness, and most importantly the building of bridges that will allow all to cross into a place of understanding.

Just a Niel Risch paper on population genetics

Categorization of humans in biomedical research: genes, race and diseaseNeil Risch,1,2 Esteban Burchard,3 Elad Ziv,3 and Hua Tang4

A debate has arisen regarding the validity of racial/ethnic categories for biomedical and genetic research. Some claim ‘no biological basis for race’ while others advocate a ‘race-neutral’ approach, using genetic clustering rather than self-identified ethnicity for human genetic categorization. We provide an epidemiologicperspective on the issue of human categorization in biomedical and genetic research that strongly supports the continued use of self-identified race and ethnicity.

A major discussion has arisen recently regarding optimalstrategies for categorizing humans, especially in the United States, for the purpose of biomedical research, both etiologic and pharmaceutical. Clearly it is important to know whether particular individuals within the population are more susceptible to particular diseases or most likely to benefit from certain therapeutic interventions. The focus of the dialogue has been the relative merit of the concept of ‘race’ or ‘ethnicity’, especially from the genetic perspective. For example, a recent editorialin the New England Journal of Medicine [1] claimed that “race is biologically meaningless” and warned that “instruction in medicalgenetics should emphasize the fallacy of race as a scientific concept and the dangers inherent in practicing race-based medicine.” In support of this perspective, a recent article in Nature Genetics [2] purported to find that “commonly used ethnic labels are both insufficient and inaccurate representations of inferred genetic clusters.” Furthermore, a supporting editorial in the same issue [3] concluded that “population clusters identified by genotype analysis seem to be more informative than those identified by skin color or self-declaration of ‘race’.” These conclusions seem consistent with the claim that “there is no biological basis for ‘race’” [3] and that “the myth of major genetic differences across ‘races’ is nonetheless worth dismissing with genetic evidence” [4]. Of course, the use of the term “major” leaves the door open for possible differences but a priorilimits any potential significance of such differences.

In our view, much of this discussion does not derive from an objective scientific perspective. This is understandable, given both historic and current inequities based on perceived racial or ethnic identities, both in the US and around the world, and the resulting sensitivities in such debates. Nonetheless, we demonstrate here that from both an objective and scientific (genetic and epidemiologic) perspective there is great validity in racial/ethnic self-categorizations, both from the research and public policy points of view.

An interesting read, one that rather disproves the idea that genetics proves race is a social construct. but the real interesting bit to me was..

For example, east African groups, such as Ethiopians and Somalis, have great genetic resemblance to Caucasians and are clearly intermediate between sub-Saharan Africans and Caucasians [5]. The existence of such intermediate groups should not, however, overshadow the fact that the greatest genetic structure that exists in the human population occurs at the racial level.

Most recently, Wilson et al. [2] studied 354 individuals from 8 populations deriving from Africa (Bantus, Afro-Caribbeans and Ethiopians), Europe/Mideast (Norwegians, Ashkenazi Jews and Armenians), Asia (Chinese) and Pacific Islands (Papua New Guineans). Their study was based on cluster analysis using 39 microsatellite loci. Consistent with previous studies, they obtained evidence of four clusters representing the major continental(racial) divisions described above as African, Caucasian, Asian, and Pacific Islander. The one population in their analysis that was seemingly not clearly classified on continental grounds was the Ethiopians, who clustered more into the Caucasian group. But it is known that African populations with close contact with Middle East populations, including Ethiopians and North Africans, have had significant admixture from Middle Eastern (Caucasian) groups, and are thus more closely related to Caucasians [14].

… because I’m interested in Ethiopian DNA. Which backs up the use of the Mt DNA/Y DNA as genetic markers to measure racial admixture in populations, showing Ethiopians to be almost half Arab, essentially.

If anyone is suspicious of Dr Risch’s motives, he makes quite clear that his main concern is that a race/colour blind approach to medicine is that minority health-care will suffer.

Thus, results from such studies would be largely derived from the Caucasian majority, with obtained parameter estimates that might not apply to the groups with minority representation.

And quite right too. I had an accusation of a neo-Nazi eugenics motive thrown at a study of racial differences in gestation length who’s sole purpose was to lower the mortality rate of black and Asian babies in the UK.

Recognising racial differences saves lives.

Genetics Research Confirms Biblical Timeline

Exciting research from the summer of 2012 described DNA variation in the protein coding regions of the human genome linked to population growth. One of the investigation's conclusions was that the human genome began to rapidly diversify not more than about 5,000 years ago (1,2). This observation closely agrees with a biblical timeline of post-flood human diversification. Yet another study, this one published in the journal Nature, accessed even more extensive data and unintentionally confirmed the recent human history described in Genesis (3).

Differences in human DNA can be characterized across populations and ethnic groups using a variety of techniques. One of the most informative genetic technologies in this regard is the analysis of rare DNA variation in the protein coding regions of the genome. Variability in these regions is less frequent than the more numerous genetic differences that occur in the non-coding regulatory regions. Researchers can statistically combine this information with demographic data derived from population growth across the world to generate time scales related to human genetic diversification (4).

What makes this type of research unique is that evolutionary scientists typically incorporate hypothetical deep time scales taken from the authority of paleontologists or other similar deep-time scenarios to calibrate models of genetic change over time. Demographics-based studies using observed world population dynamics do not rely on this bias and are therefore more accurate and realistic.

In a 2012 Science report, geneticists analyzed DNA sequences of 15,585 protein-coding gene regions in the human genome for 1,351 European Americans and 1,088 African Americans for rare DNA variation.1,2 This new study accessed rare coding variation in 15,336 genes from over 6,500 humans—almost three times the amount of data compared to the first study (3). A separate group of researchers performed the new study.

The Nature results convey a second spectacular confirmation of the amazingly biblical conclusions from the first study. These scientists confirmed that the human genome began to rapidly diversify not more than 5,000 years ago. In addition, they found significant levels of variation to be associated with degradation of the human genome, not forward evolutionary progress. This fits closely with research performed by Cornell University geneticist John Sanford who demonstrated through biologically realistic population genetic modeling that genomes actually devolve over time in a process called genetic entropy (5).

According to the Bible, the pre-flood world population was reduced to Noah's three sons and their wives, creating a genetic bottleneck from which all humans descended. Immediately following the global flood event, we would expect to see a rapid diversification continuing up to the present. According to Scripture, this began not more than about 5,000 years ago. We would also expect the human genome to devolve or degrade as it accumulates irreversible genetic errors over time. Now, two secular research papers confirm these biblical predictions.

References

Tomkins, J. 2012. Human DNA Variation Linked to Biblical Event Timeline. Creation Science Update. Posted on icr.org July 23, 2012, accessed December 31, 2012.
Tennessen, J. et al. 2012. Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes. Science. 337 (6090): 64-69.
Fu, W, et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. Published online before print, July 13, 2012.
Keinan, A and A. Clark. 2012. Recent Explosive Human Population Growth Has Resulted in an Excess of Rare Genetic Variants. Science. 336 (6082): 740-743.
Sanford, J. C. 2008. Genetic Entropy and the Mystery of the Genome, 3rd ed. Waterloo, NY: FMS Publications.

- See more at: http://designed-dna.org/blog/files/7f012159cfdc699209b642d76887be0e-52.php#sthash.EvvwoY6e.dpuf

The main orientations of human genetic differentiation

For the past couple of weeks I've been running experiments with SPatial Ancestry analysis (SPA) software, trying to design model files that place personal genomics customers as close as possible to their geographic points of origin on a Google map. It's proving a bit of a headache, largely because I'm finding it difficult to get the longitude right for everyone, especially for samples from across Northern Europe. In comparison, the latitude results almost take care of themselves. The abstract below, from the recent ASHG 2012 conference, explains why..

Anisotropic isolation by distance: the main orientations of human genetic differentiation. F. Jay1,2, P. Sjödin3, M. Jakobsson3,4, M. G. B. Blum2 1) Laboratoire TIMC-IMAG UMR 5525, Université Joseph Fourier, Centre National de la Recherche Scientifique, Grenoble, France; 2) Department of Integrative Biology, UC Berkeley, Berkeley, CA; 3) Department of Evolutionary Biology, Uppsala University, Uppsala, Sweden; 4) Science for Life Laboratory, Uppsala University, Uppsala, Sweden.

Genetic differentiation among human populations is greatly influenced by geography due to the accumulation of local allele frequency differences. However, little is known about the possibly different increment of genetic differentiation along the different directions (north-south, east-west, ...). We analyzed genome-wide polymorphism data from African (n=29), Asiatic (n=26), Native American (n=9) and European (n=38) populations, and we found that the major orientations of genetic differentiation are north-south in Europe and Africa, east-west in Asia, but no preferential orientation was found in the Americas. A practical consequence of the anisotropic pattern of genetic differentiation is that the localization of an individual's geographic origin based on SNP data should be more precise along the orientation of maximum differentiation. We compared the localization of geographic origin obtained with principal component regression with a baseline method and confirmed that the largest improvement was obtained along the orientation of maximum differentiation. Our findings have implications for interpreting the making of human genetic variation in terms of isolation by distance and spatial range expansion processes

Update 8/12/2012: the full study is now out, and well worth a read (see here). But someone should remove that Slovakian individual from the POPRES dataset. This person clearly has exotic ancestry.

Jay et al., Anisotropic isolation by distance: the main orientations of human genetic differentiation, Mol Biol Evol (2012) doi: 10.1093/molbev/mss259 First published online: November 20, 2012

News on Population Genetics—Human Genome Research Shows Race Is Real

The human-genetics websites are abuzz over two papers in the July 6 issue of Science. The papers discuss rare variants in the human genome, which highlight differences between big old localized populations.

Steve Hsu at infoproc:

Deep sequencing of the human genome, which reveals rare variants (here, defined as those found in fewer than 0.5 percent of the population), shows that there is actually more variation between groups than within groups. (So what you may have been taught in school is not true ─ sorry, that's how science works sometimes.) The figure below, from this July 6 Science article, shows that over 50 percent of rare genetic variants are found in African populations (which have greater genetic diversity) but not in European populations. About 41 percent of all rare variants are found only in Europeans and not in Africans, and only 9 percent of the variants are common to both groups.

Razib Khan at Gene Expression:

I suspect this is going to be a big deal for some time. For humans we are coming to toward the end of the SNP-age and entering into the whole-genome-age. That means that the emphasis on common variation at the genomic level is going to give way somewhat to rarer, more particular, variation. One of the major takeaways is that a lot of this variation is going to be population specific . . . If I read this right we may be entering into a golden age of demographic history reconstruction, as rare variants and whole-genome catalogs of a huge number of humans are going to allow us to generate a very fine-grained map of human population diversity.

One thing's for sure: the continued survival of Lewontin's Fallacy ─ I still hear it all the time ─ is ever more inexplicable.

VanBUG: Andrew G Clark, Professor of Population Genetics, Cornell University

[My preamble to this talk is that I was fortunate enough to have had the opportunity to speak with Dr. Clark before the talk along with a group of students from the Bioinformatics Training Program. Although asked to speak today on the subject of the 1000 genomes work that he's done, I was able to pose several questions to him, including "If you weren't talking about 1000 Genomes, what would would you have been speaking about instead?" I have to admit, I had a very interesting tour of the chemistry of drosophila mating, parental specific gene expression in progeny and even some chicken expression. Rarely has 45 minutes of science gone by so quickly. Without further ado (and with great respect to Rodrigo Goya, who is speaking far too briefly - and at a ridiculous speed - on RNA-seq and alternative splicing in cancer before Dr. Clark takes the stage), here are my notes. ]

Human population genomics with large sample size and full genome sequences

Talking about two projects – one sequencing a large number of genomes (1000 Genomes project), the other sequencing a very large number of samples in only 2 genes (Rare Variant studies).

The ability to predict phenotype from genotype is still small – where is the heritability? Using simple snps is insufficient to figure out disease and heritibility. Perhaps it’s rare variation that is responsible. That launched the 1000 Genome project.

1000 Genome was looking to find stuff down to 1% of population. (In accessible regions)

See Nature for pilot project publication of the 1000 Genomes project.. This included several trios (Parents and child). Found more than 15M snps across the human genome. Biggest impact, however, has been the impact on informatics – How do you deal with that large volume of snps? Snp calling, alignment, codification, etc…

Much of the standard file formats, etc came from the 1000 Genomes groups working on that data. Biggest issue is (of course) to avoid mapping to the wrong reference! “High quality mismatches” -> Many false positives that failed to validate: misalignments of reads. Read length improvements helped keep this down, as did using the insertions found in other 1000 Genome project subjects.

Tuning of snp callling made a big difference. Process with validations made a significant impact. However, for rare snps, it’s still hard to call snps.

Novel SNPs tend to be population specific. Eg. Yoruban vs. European have different patterns of SNPs. There is a core of common SNPs, but each has it’s own distribution of the rare or population specific SNPs.

“Imputation” using haplotype information (phasing) was a key item for making sense of the different sources of the data.

Great graph on fequency spectrum. (Number of variants – log vs allele frequency (0.01 – 1)) Gives a lying out flat hockey stick. Lots of very rare frequency snps, decreasing towards 1, but a spike at 1.

>100kb from each gene there is reduced variation (eg, Transcription start site.)

Some discussion of recombination hotspots, which were much better mapped by using the 1000 genome project data.

Another application: de novo mutation. Identify where there are variations in the offspring where they are not found in either present. Roughly about 1000 mutations per gamete. ~3×10^-8 substitution per generation.

1000 Genomes project is now expanding to 2500 samples. Trying to distribute across 25 population groups, with 100 individuals per group.

Well, what do we expect to discover from ultra-deep sampling?

There are >3000 mutations in dystrophin. (Ascertained cases of muscular dystrophy. – Flanagan et al, 2009, Human Mutation)

If you think of any gene, you can expect to find every gene mutated at every point across every population… eventually. [Actually, I do see this in most genes, but not all... some are hyper conserved, if I've interpreted it correctly.]

Major problem, tho: sequencing error. If you’re sampling billions of base pairs, with 1/100,000 error rate, you’ll still find bad base calls!

Alex Coventry: There are only 6 types of heterozygotes (CG, CT, GT, AC, AG, AT)… ancient technology, not getting into it – was developed for sanger.

Studied HHEX and KCNJ11 genes, sequenced in 13,715 people. Validated by Barcoding and 454 sequencing.

Using the model from Alex’s work, you could use a posterior probabilty of each SNP. Helped in validating. When dealing with rare variants, there isn’t a lot of information.

The punchline: “There are a lot of rare SNPs out there!”

Some data shown (site frequency) as sample data increases. The vast majority of what you get in the long run is the rare SNPs.

Human rare variation is “in excess” of what you’d expect from classical theory. So why are there so many variants?

Historical population was small, but underwent a recent population explosion in the last 2000 years. This allows for a rapid diversity to be generated as each new generation has new variants, and no dramatic culls to force this rare variation to consolidate.

How many excess rare variants would you expect from the population explosion? (Guttenkunst et al, 2009, PLOS Genetics) Population has expanded 100x in about 100 generations. Thus, we see the core set, which were present in the population before the explosion, followed by the rapid diversification explosion of rare snps.

You can do age inferrence, then, with the frequency of SNPs. older snps must be present across more of the population. Very few SNVs are older than 100 generations. If you fit the population model back to the expected SNV frequency in100 generations ago, the current data fits very well.

When fitting to effective sample size of humans, you can see that we’re WAY out of equilibrium from what the common snps would suggest. [I'm somewhat lost on this, actually. Ne (parent) vs n (offspring). I think the point is that we've not yet seen consolidation (coalescence?) of SNPs.]

“Theory of Multiple Mergers” Essentially, we have a lot of branches that haven’t had the chance to blend – each node on the variation tree has a lot of unique traits (SNPs) independent of the ancestors. (The bulk of the weight of the branch lengths is in the many many leaves at the tips of the trees.)

[If that didn't make sense, it's my fault - the talk is very clear, but I don't have the population genetics vocabulary to explain this on the fly.]

What proportion of SNPs found in each new full genome sequence do we expect to be novel? (For each human.) “It’s a fairly large number.” It’s about 5-7%, Outliers from ]3-17%. [I see about the same for my database, which is neat to confirm.] Can fit this to models: constant population size would give a low fraction (0.1%), with explosive model (1.4%) over very large sample sizes.

Rare variants are enriched for non-synonymous and premature terminations (Marth et al , submitted) [Cool - not surprising, and very confounding if you don't take population frequency into account in your variant discovery.]

What does this mean in complex diseases? Many of our diseases are going to be caused by rare variants, rather than common variants. Analogy of jets that have 4x redundancy, versus humans with 2x redundancy at the genome level.

Conclusions:

Human population has exploded, but it has a huge effect on rare variations.
Huge samples must be sequenced to detect and test effects
Will impact out studies of diseases, as we have to come to terms with the effects of the rare variations.

Out of Africa: Startling New Genetics of Human Origins

Western Pygmies

I love population genetics for its ability to peer back into human history through the medium of DNA’s ATCGs.

One of the stars of this discipline is Sarah Tishkoff, a standout in African genetics, someone who will readily haul a centrifuge into the bush in Cameroon.

Tishkoff of the University of Pennsylvania is lead author on a paper published online July 26 in Cell that details whole-genome sequencing of five individuals each from three extant hunter-gatherer groups—the Pygmies of Cameroon as well as the Hadza and the Sandawe of Tanzania. The results reveal millions of newly discovered genetic variants—differences in single genetic letters, the ATCGs—and indicate that early modern humans may have interbred long ago in Africa with another species of hominid (although the fossil record does not provide much support for the latter finding).

Tishkoff answered a few questions for us about this paper, co-authored with Joseph Lachance and 11 other researchers. An edited version of the interview appears below:

Please describe the research that led to the paper that was published today:

We’re the first ones to look at these diverse groups of hunter-gathers in Africa who descend from some of the most ancestral lineages in the world. They’re interesting because they have very unique and distinct lifestyles There are few populations that maintain this active hunter-gatherer lifestyle.

This is the most extensive study in Africa using high-coverage deeply detailed sequence data. We focused on three groups because they’re anthropologically interesting. They’re thought to be descended from groups that are ancestral to all modern humans. We wanted to understand the genetic basis of adaptation to their local environment including, for instance the short stature trait in Pygmies.

So what did you find?

We discovered 13 million variants and, of those variants, greater than 3 million are completely novel, meaning that they have not been reported in any database. The current public database has 40 million variants. So we found 3 million novel variants by simply sequencing 15 individuals. That increases by about 8 percent all known human genetic variation. It also demonstrates that we’re missing a lot of really important variation that’s out there, particularly in Africa, which is the homeland of modern humans and a place where there’s been a lot of time for differentiation to have occurred in very diverse environments. What this means is that there’s s probably a lot of regional or population-specific variation out there that has not been that well characterized, some of which is functionally very important.

What about natural selection?

Natural selection seems to be operating more on the non-coding genome [the regulatory portion that does not contain genes] than the coding region. A lot of people are doing exome sequencing [looking only at genes]. I think they’re missing a lot of important variation.

In our study, we looked at what regions of these groups’ genomes were uniquely differentiated to their local environments. There wasn’t a huge amount of overlap between the groups—or between them and other non-hunter-gatherer groups from Africa. Due to natural selection, we found there were distinctive adaptations for immunity, taste and smell.

In the Pygmies, we discovered genes involved with thermal regulation, immunity and stature, all likely to be adaptive to a tropical environment. We pinpointed genes related to pituitary and thyroid function, the latter perhaps an adaptation to a low-iodine environment.

In the Sandawe, we found a variant for melanin, a gene involved in skin color. The Sandawe are among the most fair-skinned groups in Africa. When I went to work with them, they said, ‘We’re like brothers and sisters because you look like us.’ This is not because of any European admixture; they look like the San [a hunter-gatherer group from southern Africa]. When I said: ‘Where do you come from, they pointed to a mountain in the distance. When I said ‘Can you take me there?” we went but there was no road. We went through the bush and they showed me cave paintings. Having lived in South Africa, I’ve seen the cave paintings of the San.

What about interbreeding with other human species?

A number of studies have shown a low amount of interbreeding between early modern humans outside of Africa and archaic species outside of Africa including Neandertals and, in Asia, with the species they call Denisova.They’ve never found any evidence of Neandertal DNA in Africa. The problem is that you just don’t get good preservation of fossils in Africa. So what we did was collaborate with Josh Akey and Ben Vernot at the University of Washington and used a statistic they developed to recognize regions of genome that appear to be of archaic origin.

The first thing we did is to test this statistic by applying it to non-Africans and we found a very strong enrichment for Neandertal DNA in those genomes. But we didn’t see that in the Africans. They had no Neandertal DNA. When we applied the statistic to Africans, though, we still saw a lot of evidence for interbreeding from a hominid who diverged from a common ancestor that we shared about 1.2 million years ago, about the time that Neandertals split off as well. This suggests that there could have been a sister species in Africa. What it was nobody knows. But it seems to show that modern humans have been interbreeding and it’s not unique to non-African species.

Why are African genetics so exciting?

Africa was the site of origin of all modern humans and if you want to learn about when, where and how we evolved, you want to look at this continent. It has a long history of population subdivision and adaptation of those populations to very distinct environments and a broad range of phenotypes, ranging from the short stature of the Pygmies to the very tall stature of the pastoralists in the east. It also has very different disease exposure and very different disease prevalence throughout.

What’s next?

We want to expand our genome-wide analysis to other populations, and we want to do so with larger sample sizes. We’re going to continue to try to correlate genetic variants with different phenotypic traits. We’d love to do functional studies of these genes to see, for instance, how they are regulating pituitary development. Is there some totally novel mechanism involved. We’re going to look at the Pygmies and other groups with a systems approach. You can’t look at height, as an example, by itself. You have to look at it in relation to metabolism and immunity and see how everything interacts.

population genetics

This is a little bit different than most posts here. I have a paper out today inMolecular Ecology Resources: “mmod: an R library for the calculation of population differentiation statistics” (doi: 10.1111/j.1755-0998.2012.03174.x). Looking around the web, there aren’t many simple expositions of just what a “differentiation statistic” might be, and why the “modern measures of differentiation” my little R package can calculate might improve on the more traditional ones. So, I thought I’d have a go here.

Biologists often want to be able to measure the degree to which a population is divided into smaller sub-populations. This can be an important thing to quantify, because sub-populations within highly structured populations are, to some extent, genetically distinct from other sub-populations and therefore have their own evolutionary histories (and perhaps futures).

To illustrate this point I’ve run some simulations. Imagine if we had 5 subpopulations, each with a thousand individuals. In each population we will follow the fate of a locus with two alleles, R and r that have no effect on survival or reproduction and start with frequencies 0.8 and 0.2 respectively (these numbers motivated by this post). In the absence of gene flow between these populations (Panel 1) the frequency of ther allele bounces around due to genetetic drift (evolutionary change, after all, is inevitable). Crucially though, changes in one population can’t effect other populations so we end up with substantial among-population differences in allele frequency. In the next two panels, in each generation a proportion of each population’s individuals (0.001 and 0.01 respectively) are drawn from the other populations in the simulation. Now that the populations are sharing genes the lines that represent their allele frequencies pull together (that is, the among-population variation is reduced).

One way to quantify the among-population variation displayed in these simulations is to look at the number of heterozygotes you expect to observe across the entire population. The final values for P(r) in the first simulation were {0.33, 0.47. 0.88. 0.10. 0.33} with a mean frequency of 0.42 (so the frequency of the Rallele would be 0.58). Knowing our Hardy Weinberg, if we had one big population with two alleles, one being at a frequency of 0.42 we’d expect to get 2pq = 2 * 0.42 * 0.58 = 0.40 heterozygotes. We can call that number H_Tfor expected total heterozygosity. But thats not what we’d actually see in this case. The sub-populations that make up this larger population have their own allele frequencies, when we calculate the expected proportion of heterozygotes for each of these populations by themselves we end up with {0.44, 0.49, 0.21, 0.18, 0.44} for a within-population expected heterozygosity (H_S) of 0.35*. This lack of heterozygotes within sub-populations compared with the total population expectation will always arise when genetic drift makes sub-populations distinct from each other.
Masatoshi Nei used this pattern to propose a statistic to quantify population divergence called G_ST,which he defined like this:

G_{ST =}(H_T- H_S) / H_T

Nei’s motivaton with G_STwas to generalise Sewall Wright‘s F_ST**, which was defined for diploid organisms and two-allele systems, so that it could be used for any genetic data. But there’s a problem with this formulation. Because H_Tis always larger than H_S and can’t be greater than one, the maximum possible value of
G_ST is 1-H_S. This dependency on the within-population genetic diversity means comparisons between studies, and even between loci in one study, are difficult (since H_Swill likely be different in each case). This is particularly worryingly for highly polymorphic makers like microsatellites, which can give values of H_S as high as 0.9, severely constraining the possible values of G_ST.

Although the problem of
G_ST‘s dependence on H_S has been known for a while, it’s taken some time for new statistics that get around this problem to be developed. Philip Hedrick (doi: 10.1554/05-076.1) along with Patrick Meirmans (doi: 10.1111/j.1755-0998.2010.02927.x) introduced G”_ST - a version of G_STthat is corrected for the observed value of H_S as well as the number of sub-populations being considered. Meirmans used a similar trick to define φ’_ST(doi: 10.1111/j.0014-3820.2006.tb01874.x), another F_STanalogue that partitions genetic distances into within- and between-population components. Most recently, Lou Joust introduced an entirely separate statistic, D, that directly measures allelic divergence (doi 10.1111/j.1365-294X.2008.03887.x).

The statistical programming language R is becoming increasingly popular among biologists. Although there is a strong suite of tools for performing population genetic analyses in R, code to calculate these “new” measures of population divergence have not been available. My package, mmod, fills this gap. I won’t give too many details of the package here, as that’s detailed in the paper and the package is will documented. Briefly, mmod has functions to calculate the three statistics described above (and Nei’s
G_ST ), as well as pairwise versions of each statistic for every population in a datastet. It also allows users to perform bootstrap and jacknife re-sampling of datasets, the results of which are returned as user-accessable objects which can be examined with any R function (there is also a helper function to easily apply differentiation statistics to bootstrap sample and summarise the results) . The library is on CRAN, so installation is as easy as typing “install.pacakge(“mmod”)”, the source code is up on github. If want to use the package I’d suggest reading the vignette (“mmod-demo”) before you dive in.

I’m keen to hear about bugs or feature requests from users, just email them to david.winter@gmail.com

Reference:

Winter, D.J. (in press). MMOD: an R library for the calculation of population differentiation statisticsMolecular Ecology Resources : dx.doi.org/10.1111/j.1755-0998.2012.03174.x

* mmod actually uses nearly unbiased estimators for these parameters, to deal with the way small population samples can mis-represent the actual allele frequencies in populations.

** I don’t want to write an entire history of F-statisitcs here, because it’s a big and murky topic, but I did want to make the point that the formulation I gave for G_STis often presented as “Wright’s F_ST” in genetics courses. Wright was certainly aware that his statistic was related to the proportion of heterozygotes you expect to get in a populaiton, but, when he introduced F-statistics in general, and F_STin particular, he was really dealing with correlation among gametes at various levels of population structure. Unfortunately, there are now many many definitions of F_STfloating around, and it’s probably pointless to argue about a “right one”. If you use my package I encourage you to be explicit about, and cite, the particular statistic that you are using. For each of the the F_ST analogues that the package calculates the in-line help contains the correct reference.

Pages

Artikel Human Population

Categories

Recently Viewed

Rabu, 12 Juni 2013

Genetics and racism (3)

Darwin’s pigeons

Bits and species

Ceci n’est pas une race

Human Population Genetics and History

Department of Human Genetics at the University of Utah

Human Population Genetics and Society

Just a Niel Risch paper on population genetics

Genetics Research Confirms Biblical Timeline

The main orientations of human genetic differentiation

News on Population Genetics—Human Genome Research Shows Race Is Real

VanBUG: Andrew G Clark, Professor of Population Genetics, Cornell University

Out of Africa: Startling New Genetics of Human Origins

population genetics

Bannerad

Popular Posts

Blogger templates

Followers

SearchThisSite

Your ads

Blogger news

Arsip Blog

Blogroll

About

Pages

Categories

Recently Viewed

Rabu, 12 Juni 2013

Darwin’s pigeons

Bits and species

Ceci n’est pas une race

Mengenai Saya

Bannerad

Popular Posts

Blogger templates

Followers

SearchThisSite

Your ads

Blogger news

Arsip Blog

Blogroll

About