Advances in technology allowed for the sequencing of entire genomes of organisms. For scientists, the next step was to map the entire human genome (every gene contained in human DNA). This undertaking began in 1990 and took approximately 15 years to complete, with the collaboration of many labs in multiple countries. Findings included the fact that there are a relatively small number of genes in the human genome, considering the large number of proteins that are made in the human body. Scientists also learned that only a small portion of DNA codes for proteins (1.5% of the genome). The majority of the human genome consists of noncoding regions of DNA that include repeating sequences, called repetitive DNA, which is present in multiple copies throughout the genome. It was once thought to be "junk DNA," but there is evidence that repetitive DNA is highly conserved. In evolutionary terms, conserved sequences are identical or similar sequences in DNA, RNA, or proteins within a genome or across species. This conservation indicates that natural selection has led to the maintenance of a sequence, likely because it has an important function. A highly conserved sequence is a sequence that has not changed significantly from one branch of the phylogenetic tree to the next and, hence, has not changed much over geological time.
To track the newly available genome-sequence information, several databases have been established around the world. The United States started the National Center for Biotechnology Information (NCBI), in association with the National Institutes of Health and the National Library of Medicine. The NCBI GenBank database collects information on genome sequences, individual gene sequences, and protein sequences. Having all this information at hand has assisted in the study of an organism's entire set (sequence) of DNA and how its genes interact, an area of research called genomics.
The Human Genome Project (an international research effort to map all of a human's genes) paved the way toward matching certain genes with certain human traits. There is much public interest in personal genome sequencing for the purpose of identifying an individual's risk of developing gene-related illnesses. Services are already available to scan for the presence of specific genes known to be associated with an increased risk of certain diseases. However, these tests and their predictions are still under development, and they are not entirely reliable. One reason is that a genome sequence alone is not particularly useful. It is like a dictionary with a list of proteins but no definitions saying how those proteins work in living things. The next step after genome sequencing is gene annotation, which is the process of determining the function of a gene and its associated protein. The GenBank database also keeps track of new information regarding gene and protein functions. As the knowledge of gene functions improves, so will the ability to predict a person's likelihood of developing genetic illnesses. This information can be used to diagnose illnesses as well as to genetically customize medical treatments.