Every living organism is produced from DNA (Deoxyribonucleic acid) contained within the nucleus of their cells. DNA is primarily two strands of corresponding base/nucleotide pairs, consisting of Adenine, Thymine, Cytosine and Guanine, arranged in a double helix linked by hydrogen bonds. The human genome is the ‘order’ in which these base pairs are arranged in humans which would allow certain amino acids, polypeptides and proteins to be formed by the process of translation of mRNA (formed by transcription). What is the Human Genome Project?
The Human Genome Project was established in 1990, when public funding was agreed for the purpose of determining the human genome in terms of the order of the base pairs. Its original target completion date was 2005, but advancing technologies have allowed this to be brought forward to 2003. In June 2000, the first ever rough map of the human genome was completed, but not by the publicly-funded Humane Genome Project; instead by an independently run private research institute named ‘Celera Genomics’, which went on the complete the entire human genome in 2001 with the aid of genetic pioneer Frederick Sanger.
Beginning the Human Genome Project Imagine that the human genome, which consists of over 3 billion nucleotide pairs, is the earth. In order to produce a map of its surface, it is essential to break it down into smaller, more manageable areas. To attempt to find a specific location on the earth, without any information on area, landmarks, etc. would be virtually impossible. Thus, the earth is split into continents, countries, and then progressively smaller sub-divisions, ending with a house number on a specific road in a specific part of the country.
For the human genome, the problem is very similar to this analogy, with the sequencing of the nucleotide pairs in human DNA being near impossible unless the DNA strand is broken down into smaller divisions. In terms of DNA, there are two types of ‘mapping’ which can be done: GENETIC and PHYSICAL. Genetic mapping of the human genome involves establishing approximately 3000 initial genetic markers, spaced evenly throughout the code. On the other hand, physical mapping actually involves ‘cutting’ the DNA strand into identifiable fragments which can then be individually worked on.
This is enabled with the use of restriction enzymes, which will cut and split a strand of nucleotide pairs at a certain point, leaving an ‘overlap’ of genetic material between the two segments, so that they can be put back together, or at least matched up. Then all that is left is for the genetic coding of each of these smaller segments to be established, and then the DNA coding for humans is complete. But, one important point is the genetic material from which to work from. There is only a limited amount, thus a question arises how do we ensure there is enough to work with?
The answer to this if a process called P. C. R, or the Polymerase Chain Reaction, a type of ‘in vitro replication’. Essentially, what happens is that the target length of DNA being copied is selected using artificially synthesised ‘primers’. The primer contains the DNA polymerase enzyme, the four nucleotide phosphates and a buffer.
The polymerase is taken from the bacteria Thermus aquaticus, which survives at high temperatures, so the enzyme it produces, taq polymerase, is able to work efficiently at temperatures of above 95c, allowing for the procedure to take place. This basically allows the DNA to be separated into its seprated strands, be replicated, and then cooled to allow the strands to anneal (join) together again. This process can be repeated as many times as is necessary. How is the Genome Coded? Firstly, the specific nucleotide pairs within a sample of the DNA being coded must be determined. This is made possible by a process called ‘gel electrophoresis’.
This involves placing a sample of the DNA with a buffer into a ‘well’ in a substance known as ‘agarose’, a type of gel/jelly with a matrix like structure created from cross-links formed when it cools and solidifies. Then, a current is passed through the agarose, from one end to the other. This causes the constituent molecules of the DNA (i. e. macromolecules) to separate and filter through the agarose towards the anode due to the electric current. This amount that the individual substances move is dependant mainly on mass and charge.
The matrix means that the macromolecules with a larger mass cannot move as far, and the macromolecules with a lower mass can move further. Thus, the macromolecules are separated in the agarose, and produce a series of bands spread from one end of the gel to the other. Alongside this separation of the sample, in another well will be placed a mixture of DNAs with known molecular weights. This allows for comparison between the two mixtures, so that the different macromolecules within the DNA sample can be established. Sequencing the Nucleotides
Now that the different nucleotides within the segment of the DNA strand have been found, one essential point remains to be determined… which order do they go in? This can be achieved using a technique named the dideoxy method (alternative the chain-termination method or Sanger method after the scientist). What is the Dideoxy Method? The dideoxy method gets its name from the critical role played by synthetic nucleotides that lack the OH at the 3′ carbon atom and it is an extension of the gel electrophoresis used to find the nucleotides within the DNA segment.
It involves use of an enzymes to synthesise DNA chains of varying lengths (similar to PCR), stopping DNA replication (hence the name ‘chain-termination method’) at one of the four bases and then determining the resulting fragment lengths. Each sequencing reaction tube (T, C, G, and A) in the diagram to the left contains: 1. A DNA template, a primer sequence and a DNA polymerase to initiate synthesis of a new strand of DNA at the point where the primer is hybridised to the template.
The four deoxynucleotide triphosphates (dATP, dTTP, dCTP and dGTP). These will extend the DNA strand. 3. A radioactively labelled deoxynucleotide triphosphate. 4. One dideoxynucleotide triphosphate that terminates the growing chain wherever it is incorporated. Tube A has didATP, tube C has didCTP and so on. In the diagram, in the A reaction tube the ratio of the dATP to didATP (dideoxyATP) is adjusted so that each tube will have a collection of DNA fragments with a didATP incorporated for each adenine position on the template DNA fragments.
The fragments of varying length are then separated by electrophoresis (see step 1) and the positions of the nucleotides is analysed to determine the sequence of nucleotides within the sample of DNA. The fragments are separated on the basis of size, with the shorter fragments moving faster and appearing at the bottom of the gel. Note that the sequence of nucleotides is read from bottom to top of the polyacrylamide gel.