Greater Human Genetic Diversity Than Previously Believed
Evan E. Eichler and coworkers of the University of Washington, Seattle, used data from the 1000 Genomes Project to analyze copy-number variations, which are differences in the number of times a particular gene sequence appears in the genome (Science 2010, 330, 641). About 1,000 genes "have been largely inaccessible to traditional genetic study as a result of their repetitive nature," Eichler said at the press briefing. Using newly developed sequence analysis algorithms and sequence tags, his team investigated copy-number variations in these genes, he said.Whether 1000 genomes or 2500 genomes, the study is still quite preliminary, in terms of understanding the astounding magnitude of variation within the broad human genome and epigenome. As we begin to comprehend the vast numbers of differences in gene expression between even the closest of relatives, we may get a glimmer of understanding of how our molecular makeup generates the diverse worlds we inhabit.
Eichler's team found that copy-number variations occur in fewer than 10% of human genes. Many of these genes map to regions that had been previously identified as highly repetitive and have been implicated in diseases such as schizophrenia and autism, the authors note.
Even at the pilot stage, the 1000 Genomes Project has already provided "a more complete catalog" of human genetic variation than was available previously, Durbin said. The project is already moving forward with its main phase, with the goal of sequencing 2,500 genomes. _ACSPubs
Abstract of paper:
Copy number variants affect both disease and normal phenotypic variation, but those lying within heavily duplicated, highly identical sequence have been difficult to assay. By analyzing short-read mapping depth for 159 human genomes, we demonstrated accurate estimation of absolute copy number for duplications as small as 1.9 kilobase pairs, ranging from 0 to 48 copies. We identified 4.1 million "singly unique nucleotide" positions informative in distinguishing specific copies and used them to genotype the copy and content of specific paralogs within highly duplicated gene families. These data identify human-specific expansions in genes associated with brain development, reveal extensive population genetic diversity, and detect signatures consistent with gene conversion in the human species. Our approach makes ~1000 genes accessible to genetic studies of disease association.
Labels: epigenetics, genetics, genomics