What are the advantages and disadvantages of using DNA sequence data for assessing relationships between the major groups of land plants? The relationships between land plants, particularly the lower levels of the Bryophytes and the origins of the Angiosperms, has been a highly contested debate throughout the history of plant sciences. The introduction of molecular analysis of the relationships between these major groups, in the form of DNA sequence data, has revolutionised the subject in the last twenty years. It has affected the key aspects of plant phylogeny. Robust and unequivocal interactions have been identified confirming the phylogenetic tree, although it’s intricacies and temporal detail is still far from complete. Although it may seem that DNA sequence data is the answer to these problems there are still things that require morphological evidence, a practice that is as old as plant sciences itself. This essay will outline the contributions made by DNA sequencing data to plant phylogeny and its limitations.
The taxonomy of the land plants and subsequent theories of their evolution has been investigated for the past two centuries, however the contributions of Linnaeus in 1753 laid a sound foundation for future study of the subject. Morphological data has been accumulating steadily for the past 250 years, but for the first 100 it mainly concerned the question of taxonomy. Dawson initiated a more important question regarding the “roots” of relationships between plants in 1859. Dawson described several early Devonian plants from Canada and Scotland and interpreted these early fossils as primitive vascular plants showing a degree of morphological simplicity unknown in extant groups.
This began to give hints at the evolution of land plants, but the first detailed evidence was provided by Bowers’ work on the anatomy of primitive Pteridophytes in Rhynie Chert in 1920. This bridged the diversity between the bryophytes and the Tracheophytes and gave rise to the creation of a monophyletic hypothesis for the evolution of land plants. Up until the 80’s studies of Devonian macrofloras increased rapidly but the evolutionary interpretations were based on stratiographic patterns that did not hold up under closer scrutiny. With the advent of molecular biology and the availability DNA sequencing techniques, using DNA sequences to infer evolutionary and taxonomic patterns started to take hold. This was because the new techniques offered a lot of advantages over the previous methods.
Much of our current knowledge about plant phylogeny stems from classification, which in turn is based on morphology. Morphology uses phenetic characteristics to group and classify, resulting in inherent problems. Firstly the characteristics used to create the classification must be decided. The definition of characters is prone to subjectivity. Quoting Smith (1994) “different workers will perceive and define different characters in different ways”. This has had grave implications on the creation of phylogenetic trees. Doyle and Donoghue [1] whilst studying the role the Gnetales play in macroevolution stated that with slightly different characters, the relative parsimony of the two arrangements (referring to the nesting of Gnetales near the Angiosperms or the Coniferopsids) could of easily been reversed. In that context their approach was admitted to be overly optimistic in modelling how well a hypothetical student of extant groups would analyse characters. This plasticity in interpretation does not give rise to sound phylogenies.
There is a limitation in morphology to the number of characters that can be studied before the characters become too specific. Studies have shown that increasing the amount of characters provides a higher accuracy for the construction of a tree [2]. There is a point where additional morphological characters add little extra value (due to specificity) and the important factor is the quality of the data rather than the quantity.
This limitation is compounded by the number of characters that are subject to homoplasy (similarity due to parallel or convergent evolution); this is especially prevalent when investigating the relationships between land plants. Gnetales were originally thought to be related to Angiosperms due to their net like venation, vessels in wood and possession of a precursor to a flower. This was rejected as it was discovered that vessels arose independently a few times in plants and the precursors to flowers were the Amentiferae. Therefore morphology as a basis for the construction of phylogenetic trees is a very low resolution and highly contestable method.
DNA sequencing provided a high resolution, reliable method for the construction of phylogeny. DNA sequencing has many advantages over the previous method and was able to avoid the problems inherent to morphology. Character conceptualisation is rendered more straightforward for molecular data than for morphological data [2]. The characters used to provide classification are numerous, because they can be any nucleotide sequence that shows a steady rate of evolution. Due to the chemical nature of the analysis the characters studied are well defined making the genetic (rather than phenetic) analysis objective.
There is no ambiguity that the unit of comparison is the nucleotide and that adenine thymine guanine and cytosine represent different versions of the same entity. Subjectiveness concerning the data is minimal as the nucleotides are of a set length with a specific sequence to base comparisons on. Originally 5S rRNA was used to create phylogenetic trees but the data proved to be inconclusive, the character was only 120 bases long and most of them were uninformative. This lead scientists to look at larger subnits, which provided the higher resolution needed, but as seen with morphology the best results were to be obtained from multiple character analysis. The introduction of multigene analysis has greatly increased the detail of the trees produced. Most DNA sequence analysis now includes upwards of 4 genes and takes them from different plant genomes (e.g. plastid and nuclear). An example would be the use of plastid genes rbcL, atpB, rps4 and nuclear small subnit ribosomal DNA to elucidate the basal elements of plant phylogeny [3].
Homoplasy s avoided in sequence analysis because in the majority of cases in eukaryotes there is a linear inheritance of DNA from parent to offspring, meaning that characters must be related to that lineage of plants and there is very little statistical probability that an exactly comparable character could of arose independently in another lineage, even in 5S rRNA the chances of an exact match are 4120 and even if an exact match is not required the numbers are still astronomical. Another advantage of the heritable traits of DNA is that specific markers exist which are common to all plants, the characters are not reliant on being expressed in the phenotype, which allows for analysis across families and of loss of function (the gene will still be present but inactivated).
There are however problems regarding the collection of DNA sequence data. The main problem is that DNA at the moment can only be obtained from extant species; the implications of this will be discussed later. Obtaining ancient DNA from fossilised tissue is very difficult and prone to experimental contamination. 1990 was the first occurrence of ancient DNA sequence data (Goldberg et al) obtained from plant fossils 15-20 MYA.
Although sequences had been identified the microscopic amounts of DNA were subject to PCR product contamination and the results were disputed. Smith [4] created these guidelines for the acceptance of ancient DNA data; 1. Amplification products should make sense. 2. Other associated biomolecules should be well preserved. 3. Results should be replicated by another independent lab.