Volume 3 No.5 September-November 2000

Science

Human Genome: The Book of Life

The Human Genome Project formally began in 1990. The project was originally planned to last 15 years, but rapid technological advances have accelerated the expected completion date to 2003. The target date of 2003 also will mark the 50th anniversary of Watson and Crick’s description of DNA’s fundamental structure. Project goals are to identify all the approximately 100,000 genes in human DNA, determine the sequence of the 3 billion DNA building blocks that underlie all of human biology and its diversity. A genome is the entire DNA in an organism, including its genes. Each DNA molecule contains many genes—the basic physical and functional units of heredity. A gene is a specific sequence of nucleotide bases whose sequences carry the information required for constructing proteins, which provide the structural components of cells and tissues as well as enzymes for essential biochemical reactions. The human genome is estimated to comprise approximately 100,000 genes. Human genes vary widely in length, often extending over thousands of bases, but only about 10% of the genome are known to include the protein-coding sequences (exons) of genes. Interspersed with many genes are intron sequences, which have no coding function. The balance of the genome is thought to consist of other non-coding regions (such as control sequences and intergenic regions), whose functions are obscure. All living organisms are composed largely of proteins; humans can synthesize about 80,000 different kinds. Genes carry information for making all the proteins required by all organisms. These proteins determine, among other things, how the organism looks, how well its body metabolizes food or fights infection, and sometimes even how it behaves. DNA is made up of four similar chemicals (called bases and abbreviated A, T, C, and G) that are repeated millions or billions of times throughout a genome. The human genome, for example, has 3 billion pairs of bases. The particular order of As, Ts, Cs, and Gs are extremely important. The order underlies all of life’s diversity, even dictating whether an organism is human or another species such as yeast, rice, or fruit fly, all of which have their own genomes and are themselves the focus of genome projects.

24 Distinct Separate Microscopic Genome Units

The 3 billion base pair (bp) in the human genome are organized into 24 distinct, physically separate microscopic units called chromosomes. These 3 billion bases are arranged along the chromosomes in a particular order for each unique individual. All genes are arranged linearly along the chromosomes. The nucleus of most human cells contains two sets of chromosomes, one set given by each parent. Each set has 23 single chromosomes—22 autosomes and an 'X' or 'Y' sex chromosome. (A normal female will have a pair of X-chromosomes; a male will have an X and Y pair.) Chromosomes contain roughly equal parts of protein and DNA, chromosomal DNA contains an average of 150 million bases.

wpe5B.jpg (81754 bytes)

Figure: Human Chromosomes.

Painting (staining) allows visual distinction between
chromosomes and can be used to identify translocations

Chromosome, {Length (Mb)}; 1, {263}; 2, {255}; 3, {214}; 4, {203}; 5{194}; 6, {183}; 7, {171}; 8, {155}; 9, {145}; 10, {144}; 11, {144}, 12, {143}; 13, (114}; 14, {109}; 15, {106}; 16, {98}; 17, {92}; 18, {85}; 19, {67}; 20, {72}; 21, {50}; 22, {56}; X {164}/Y {59} (total = 3286)

DNA molecules are among the largest molecules now known. Chromosomes can be seen under a light microscope and, when stained with certain dyes, reveal a pattern of light and dark bands reflecting regional variations in the amounts of A and T Vs G and C. Differences in size and banding pattern allow the 24 chromosomes to be distinguished from each other, an analysis called a karyotype (Figure). A few types of major chromosomal abnormalities, including missing or extra copies or gross breaks and rejoining (translocations), can be detected by microscopic examination. Most changes in DNA, however, are too subtle to be detected by this technique and require molecular analysis. These subtle DNA abnormalities (mutations) are responsible for many inherited diseases such as cystic fibrosis and sickle cell anemia or may predispose an individual to cancer, major psychiatric illnesses, and other complex diseases. Each time a cell divides into two daughter cells, its full genome is duplicated; for humans and other complex organisms, this duplication occurs in the nucleus. During cell division the DNA molecule unwinds and the weak bonds between the base pairs break, allowing the strands to separate. Each strand directs the synthesis of a complementary new strand, with free nucleotides matching up with their complementary bases on each of the separated strands. Strict base-pairing rules are adhered to; adenine will pair only with thymine (an A-T pair) and cytosine with guanine (a C-G pair). Each daughter cell receives one old and one new DNA strand. The cells’ adherence to these base-pairing rules ensures that the new strand is an exact copy of the old one. This minimizes the incidence of errors (mutations) that may greatly affect the resulting organism or its offspring.

The protein-coding instructions from the genes are transmitted indirectly through messenger ribonucleic acid (mRNA). For the information within a gene to be expressed, a complementary RNA strand is produced (a process called transcription) from the DNA template in the nucleus. This mRNA is moved from the nucleus to the cellular cytoplasm, where it serves as the template for protein synthesis. The cells’ protein-synthesizing machinery then translates the codons into a string of amino acids that will constitute the protein molecule for which it codes.

The Biology (21st) Century

Observers have predicted that the 21st century will be the "biology century."

In the laboratory, the mRNA molecule can be isolated and used as a template to synthesize a complementary DNA (cDNA) strand, which can then be used to locate the corresponding genes on a chromosome map. The analytical power arising from the reference DNA sequences of several entire genomes and other genomic resources is anticipated to help jump-start the new millennium. The HGP’s continued emphasis is on obtaining a complete and highly accurate reference sequence (1 error in 10,000 bases) that is largely continuous across each human chromosome. Scientists believe that knowing this sequence is critically important for understanding human biology and for applications to other fields. On June 26th, 2000, President Clinton, leaders of the Human Genome project (HGP) and representatives of the Biotechnology Company Celera announced completion of a "working draft" reference for DNA sequence of the human genome. The achievement has provided scientists worldwide with a road map to an estimated 90% of genes on every chromosome. Although the draft contains gaps and errors, it provides a valuable scaffold for generating a high-quality reference genome sequence — the ultimate HGP goal, the finished ‘Book of Life’ expected to be achieved by 2003. A new goal focuses on identifying individual variations in the human genome. Although more than 99% of human DNA sequences are the same across the population, variations in DNA sequence can have a major impact on how humans respond to disease; environmental insults such as bacteria, viruses, toxins, and chemicals; and drugs and other therapies. Methods are being developed to detect different types of variation, particularly the most common type called single-nucleotide polymorphism (SNPs). Scientists believe SNP maps will help them identify the multiple genes associated with such complex diseases as cancer, diabetes, vascular disease, and some forms of mental illness. These associations are difficult to establish with conventional gene-hunting methods because a single altered gene may make only a small contribution to disease risk.

Efficient interpretation of the functions of human genes and other DNA sequences requires that resources and strategies be developed to enable large-scale investigations across whole genomes. A technically challenging first priority is to generate complete sets of full-length cDNA clones and sequences for human and model-organism genes. Other functional-genomics goals include studies into gene expression and control, creation of mutations that cause loss or alteration of function in non-human organisms, and development of experimental and computational methods for protein analyses. Ethical, Legal, and Social Implications (ELSI) in the science of genetics and its applications present new and complex issues for individuals and society. A continuing challenge is to safeguard the privacy of individuals and groups who contribute DNA samples for large-scale sequence-variation studies. Other concerns are to anticipate how the resulting data may affect concepts of race and ethnicity; identify potential uses (or misuses) of genetic data in workplaces, schools, and courts; identify commercial uses; and foresee impacts of genetic advances on the concepts of humanity and personal responsibility. At least 18 countries have established human genome research programs. Some of the larger programs are in Australia, Brazil, Canada, China, Denmark, European Union, France, Germany, Israel, Italy, Japan, Korea, Mexico, Netherlands, Russia, Sweden, United Kingdom, and the United States. Some developing countries are participating through studies of molecular biology techniques for genome research and studies of organisms that are particularly interesting to their geographical regions. The Human Genome Organization (HUGO) helps to coordinate international collaboration in the genome project. This research will reap fantastic benefits for humankind, some that we can anticipate and others that will surprise us. Generations of biologists and researchers will be provided with detailed DNA information that will be key to understanding the structure, organization, and function of DNA in chromosomes. Genome maps

of other organisms will provide the basis for comparative studies that are often critical to understanding more complex biological systems.

To get an idea of the size of the human genome present in each of our cells, consider the following analogy: If the DNA sequence of the human genome were compiled in books, the equivalent of 200 volumes the size of a book-series (at 1000 pages each) would be needed to hold it all. It would take about 9.5 years to read out loud (without stopping) the 3 billion bases in a person’s genome sequence. This is calculated on a reading rate of 10 bases per second, equaling 315,360,000 bases/year.

Bioinformatics

Storing all this information is a great challenge to computer experts known as bioinformatics specialists. One million bases (called a megabase and abbreviated Mb) of DNA sequence data is roughly equivalent to 1 megabyte of computer data storage space. Since the human genome is 3 billion base pairs long, 3 gigabytes of computer data storage space are needed to store the entire genome. This includes nucleotide sequence data only and does not include data annotations and other information that can be associated with sequence data. Planners suggest developing a human genome database, analogous to model organism databases, that will link to phenotypic information . Also needed are databases and analytical tools for studying the expanding body of gene-expression and functional data, for modeling complex biological networks and interactions. One of the best places to get maps is by accessing the Genome Database (GDB), which is the worldwide repository of human genome mapping data. A feature allows users to list genes by chromosome and to print maps and for collecting and analyzing sequence-variation data.

Jumping Genes

Nearly half of the human genome is composed of transposable elements or jumping DNA. First recognized in the 1940s by Dr. Barbara McClintock in studies of peculiar inheritance patterns found in the colors of Indian corn, jumping DNA refers to the idea that some stretches of DNA are unstable and "transposable," i.e., they can move around on and between chromosomes. This theory was confirmed in the 1980s when scientists observed jumping DNA in other genomes. Now scientists believe transposons may be linked to some genetic disorders such as hemophilia, leukemia, and breast cancer. They also believe that transposons may have played critical roles in human evolution. A gene can produce more or less protein in different cells at various times in response to developmental or environmental cues, and many proteins can express disparate functions in various biological contexts. Thus, subtle distinctions are multiplied by more than 100,000 estimated genes. The often-quoted statement that we share over 98% of our genes with apes (chimpanzees, gorillas, and orangutans) actually should be put another way. That is, there is more than 95% to 98% similarities between related genes in humans and apes in general. Just as in the mouse, quite a few genes probably are not common to humans and apes, and these may influence uniquely human or ape traits. Similarities between mouse and human genes range from about 70% to 90%, with an average of 85% similarity but a lot of variation from gene to gene (e.g., some mouse and human gene products are almost identical, while others are nearly unrecognizable as close relatives). Some nucleotide changes are "neutral" and do not yield a significantly altered protein. Others, but probably only a relatively small percentage, would introduce changes that could substantially alter what the protein does. Put these alterations in the context of known inherited human diseases: a single nucleotide change can lead to inheritance of sickle cell disease, cystic fibrosis, or breast cancer. A single nucleotide difference can alter protein function in such a way that it causes a terrible tissue malfunction. Single nucleotide changes have been linked to hereditary differences in height, brain development, facial structure, pigmentation, and many other striking morphological differences; due to single nucleotide changes, hands can develop structures that look like toes instead of fingers, and a mouse’s tail can disappear completely. Single-nucleotide changes in the same genes but in different positions in the coding sequence might do nothing harmful at all. Evolutionary changes are the same as these sequence differences that are linked to person-to-person variation: many of the average 15% nucleotide changes that distinguish humans and mouse genes are neutral; some lead to subtle changes, whereas others are associated with dramatic differences. Add them all together, and they can make quite an impact, as evidenced by the huge range of metabolic, morphological, and behavioral differences we see among organisms.

Knockout Model

Knockout mice are transgenic mice whose genetic code has been altered by the insertion of foreign genetic material into their DNA. Using this technology, researchers target specific genes —causing them to be expressed or inactivated. These mice are then bred —creating a population of offspring with the trait. When researchers isolate human genes with unknown functions, they can create knockout mice with these genes and observe the results. Instead of creating merely the mouse equivalent of the human gene, researchers are able to reproduce and express actual human genes and their corresponding proteins in mice. Subsequent offspring will inherit not only the instructions coded by their original mouse genome, but also the traits coded for by the inserted human DNA. This helps researchers understand health and disease by observing how genes work in cells. Knockout mice have many benefits. They not only allow researchers to determine gene function and understand diseases at the molecular level, but they also aid scientists in testing new drugs and devising novel therapies. Why are mice used in this research? Mice are genetically very similar to humans. They also reproduce rapidly, have short life spans, are inexpensive and easy to handle, and can be genetically manipulated at the molecular level. What are the comparative genome sizes of humans and other organisms being studied? Estimated sizes are the following: Human 3000 million bases (~100,000 genes); Mouse 3000 million bases (50,000 to 100,000 genes); Drosophila (fruit fly) 165 million bases (15,000 to 25,000 genes); Nematode (roundworm) 100 million bases (11,800 to 13,800 genes); Yeast (fungus) 14 million bases (8355 to 8947 genes); E. coli (bacteria) 4.67 million bases (3237 genes); H. influenzae (bacteria) 1.8 million bases; M. genitalium (bacteria) 0.58 million bases. . In addition, a gene can produce more than one protein product through alternative splicing or post-translational modification; these events do not always occur in an identical way in the two species. Gene duplication occurs frequently in complex genomes; sometimes the duplicated copies degenerate to the point where they no longer are capable of encoding a protein. However, many duplicated genes remain active and over time may change enough to perform a new function. Since gene duplication is an ongoing process, mice may have active duplicates that humans do not possess, and vice versa. These appear to make up a small percentage of the total genes. We won’t know for certain until both genomes are completely sequenced, but we believe the number of human genes without a clear mouse counterpart, and vice versa, won’t be significantly larger than 1% of the total. Nevertheless, these novel genes may play an important role in determining species-specific traits and functions. However, the most significant differences between mice and humans are not in the number of genes each carries but in the structure of genes and the activities of their protein products. Gene for gene, we are very similar to mice. What really matters is that subtle changes accumulated in each of the approximate 100,000 genes add together to make quite different organisms. Further, genes and proteins interact in complex ways that multiply the functions of each.

Functional-Genomics

What is functional genomics? Understanding the function of genes and other parts of the genome is known as functional genomics. The Human Genome Project is just the first step in understanding humans at the molecular level. Mice and humans (indeed, most or all mammals including dogs, cats, rabbits, monkeys, and apes) have roughly the same number of nucleotides in their genomes about 3 billion base pairs. This comparable DNA content implies that all mammals contain more or less the same number of genes, and indeed work of many have provided evidence to confirm that notion. We know of only a few cases in which no mouse counterpart can be found for a particular human gene, and for the most part we see essentially a one-to-one correspondence between genes in the two species. The exceptions generally appear to be of a particular type —genes that arise when an existing sequence is duplicated. The resulting DNA sequence maps will be used by 21st century scientists to explore human biology and other complex phenomena.

Scientists believe these variations may underlie disease susceptibility and drug responsiveness, particularly the most common variations that are called SNPs (single nucleotide polymorphism). The DNA resources used for these studies came from anonymous donors of European, African, American (north, central, south), and Asian ancestry. Single nucleotide polymorphism (SNPs) are DNA sequence variations that occur when a single nucleotide (A, T, C, or G) in the genome sequence is altered. For example a SNP might change the DNA sequence AAGGCTAA to ATGGCTAA. SNPs can occur in both coding (gene) and noncoding regions of the genome. Many SNPs have no effect on cell function, but scientists believe others could predispose people to disease or influence their response to a drug. Although more than 99% of human DNA sequences are the same across the population, variations in DNA sequence can have a major impact on how humans respond to disease; environmental insults such as bacteria, viruses, toxins, and chemicals; and drugs and other therapies. This makes SNPs of great value for biomedical research and for developing pharmaceutical products or medical diagnostics. SNPs are also evolutionarily stable —not changing much from generation to generation —making them easier to follow in population studies. Scientists believe SNP maps will help them identify the multiple genes associated with such complex diseases as cancer, diabetes, vascular disease, and some forms of mental illness. These associations are difficult to establish with conventional gene-hunting methods because a single altered gene may make only a small contribution to the disease. The quest for an understanding of how genetic factors contribute to human disease is gathering speed with the Human Genome Programs. We now know that there are 46 human chromosomes, which between them house 3000 million base pairs of DNA and encode about 80,000 proteins. These coding regions make up only about 2% of the genome (the function of the remaining 98% is unknown) and some chromosomes have a higher density of genes than others. A great deal of effort over the past ten years has been put into creating a physical map of the human genome and for the complete sequencing of the human genome. The physical map has assisted directly in identifying about 100 disease-causing genes. One of the most difficult challenges ahead is to find genes involved in diseases that have a complex pattern of inheritance, such as those that contribute to diabetes, asthma, cancer and mental illness. In all these cases, no one gene has the yes/no power to say whether a person has a disease or not. It is likely that more than one mutation is required before the disease is manifest. A number of genes may each make a subtle contribution to a person’s susceptibility to a disease; genes may also affect how a person reacts to environmental factors. Unraveling these networks of events will undoubtedly be a challenge for some time to come. Individuals with Turner’s syndrome [karyotype (45, X)] have a single X chromosome (and no Y) as their sex chromosome. They develop as girls, but are sterile, small in stature, and have a pattern of major and minor malformations. Individuals with Klinefelter’s Syndrome [karyotype (47, XXY)] have XXY sex chromosomes. They develop as males, with subtle phenotypic anomalies. Down syndrome, caused by the presence of three copies of chromosome 21, is the most commonly observed imbalance in the number of non-sex chromosomes. Medical problems associated with Down syndrome include mental retardation, congenital organ defects, respiratory infections, and leukemia. Immune system’s mission is simple: to seek and kill invaders. If a person is born with a severely defective immune system, death from infection by a virus, bacterium, fungus or parasite will occur. In severe combined immunodeficiency, lack of an enzyme means that toxic waste builds up inside immune system cells, killing them and thus devastating the immune system. A lack of immune system cells is also the basis for DiGeorge syndrome: improper development of the thymus gland means that T cell production is diminished. Most other immune disorders result from either an excessive immune response or an ‘autoimmune attack’. A key part of the immune system’s role is to differentiate between invaders and the body’s own cells - when it fails to make this distinction, a reaction against ‘self’ cells and molecules causes autoimmune disease. Disease will occur if a critical enzyme is disabled, or if a control mechanism for a metabolic pathway is affected. Many of these are inborn errors of metabolism: inherited traits that are due to a mutation in a metabolic enzyme or regulatory protein for which a gene has been identified, cloned and mapped. There are a number of diseases (Duchenne muscular dystrophy, Huntington disease) that are caused by defects in genes important for the formation and function of muscles, and connective tissues. While the gene for Ellis-van Creveld syndrome has been mapped, we await the function of the protein to understand the molecular basis for this disease. Several diseases that directly affect the nervous system have a genetic component: some are due to a mutation in a single gene, others are proving to have a more complex mode of inheritance. Alzheimer and Parkinson diseases contain at least one common component, Huntington disease, fragile X syndrome and spinocerebellar atrophy are all ‘dynamic mutation’ diseases in which there is an expansion of a DNA repeat sequence.. Intracellular signaling defects account for several diseases, including cancers, ataxia telangiectasia and Cockayne syndrome. The end-result of many cell signals are to alter the expression of genes (transcription) by acting on DNA-binding proteins. Some diseases are the result of a lack of or a mutation in these proteins, which stop them from binding DNA in the normal way. A number of other diseases, including type IX Ehlers-Danlos syndrome, may be the result of allelic mutations (i.e. mutations in the same gene, but having slightly different symptoms) and it is hoped that research working with model organisms(mice) will help to furnish insight into human copper transport mechanisms, so helping to develop effective treatments for Menkes’ sufferers. Wilson’s disease is a rare autosomal recessive disorder of copper transport, resulting in copper accumulation and toxicity to the liver and brain. The gene for Wilson’s disease (ATP7B) was mapped to chromosome 13. Cystic fibrosis is caused by mutations in its conductance regulator gene , located on chromosome 7. Several hundred mutations have been found in this gene, all of which result in defective transport of sodium and chloride by epithelial cells. Mutations in Cx26 gene located on chromosome 13 cause congenital syndromic and nonsyndromic deafness - that is, the deafness is not accompanied by other symptoms, such as blindness. Cx26 codes for a gap junction protein called connexin 26. It has been proposed that mutations in Cx26 may disrupt potassium circulation of the inner ear and result in deafness. In the heart, defects in potassium channels do not allow proper transmission of electrical impulses, resulting in the arrythmia seen in long QT syndrome. In the lungs, failure of a sodium and chloride transporter found in epithelial cells leads to the congestion of cystic fibrosis, while one of the most common inherited forms of deafness, Pendred syndrome, looks to be associated with a defect in a sulphate transporter. Genome view on the X chromosome showed the presence of gene responsible for disorder in the ability of the cell to absorb copper (Menkes’ Syndrome). Sufferers can not transport copper, which is needed by enzymes involved in making bone, nerve and other structures.

Genome view on chromosome 11 showed the autosomal recessive disease caused by a point mutation in the hemoglobin beta gene the sickle cell anemia disease. Mutant beta globin of hemoglobin that sickles causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. Patients with sickle cell anemia or thalassemia live with severe anemia, as they are unable to maintain a normal hemoglobin level. Though, as yet, there is no cure for these, a combination of fluids, painkillers, antibiotics and transfusions are used to treat symptoms and complications. . The potential for using genes themselves to treat disease—known as gene therapy—is the most exciting application of DNA science. It has captured the imaginations of the biomedical community for good reason. This rapidly developing field holds great potential for treating or even curing genetic and acquired diseases, using normal genes to replace or supplement a defective gene. As bone marrow synthesizes hemoglobin, transplantation with healthy donor was performed for thalassemia offering a high probability of complication-free survival.

Pharmacogenomics holds the promise that drugs might one day be tailor-made for individuals and adapted to each person’s own genetic makeup. Environment, diet, age, lifestyle, and state of health all can influence a person’s response to medicines, but understanding an individual’s genetic make up is thought to be the key to creating personalized drugs with greater efficacy and safety.

Cloning

For Human Genome Project researchers, cloning refers to copying genes and other pieces of chromosomes to generate enough identical material for further study. Two other types of cloning produce complete, genetically identical animals. Blastomere separation creates identical twins (clones) which involves splitting a developing embryo soon after fertilization of the egg by a sperm to give rise to two or more embryos. These clones contain DNA from both the mother and the father. Dolly, on the other hand, is the result of another type of cloning that produces an animal carrying the DNA of only one parent. Using somatic cell nuclear transfer, scientists transferred DNA from an adult sheep’s udder cell to a DNA free (empty pellucida) egg. The embryo thus produced and developed is a clone of the adult sheep. Dolly’s creators demonstrated that nuclei of an adult animal’s specialized (udder) cells can be made to revert to a nonspecialized, embryonic state, thus restoring the ability to give rise to any kind of cell. The meaning of "cloning" is traditionally used by scientists to describe different processes for duplicating biological materials. One goal of this and similar research is to develop efficient ways to alter animals genetically and reproduce them reliably. Alterations have included adding genes (such as those for human proteins) to create drug-producing animals as well as inactivating genes to study the effects and possibly create animal models of human diseases. Cloning technology also may someday be used in humans to produce whole organs from single cells or to raise animals having genetically altered organs suitable for transplanting to humans. The technology is there and one is tempted to clone a genius like Einstein when his brain cells are preserved and available. But will cloning of humans be morally acceptable?

Acknowledgements: The data of this article were obtained from the public domain report of Human Genome Project of the U.S. Department of Energy and the National Institutes of Health, USA.

Sabyasachi Sarkar
Department of Chemistry

wpe59.jpg (20789 bytes)

Sounil Biswas 98371


[back] [next]