Large trees, supertrees and the grass phylogeny

Salamin, Nicolas

dc.contributor.advisor	Hodkinson, Trevor R.
dc.contributor.author	Salamin, Nicolas
dc.date.accessioned	2016-08-31T14:51:31Z
dc.date.available	2016-08-31T14:51:31Z
dc.date.issued	2003
dc.identifier.citation	Nicolas Salamin, 'Large trees, supertrees and the grass phylogeny', [thesis], Trinity College (Dublin, Ireland). Department of Botany, 2003, pp 223
dc.identifier.other	THESIS 7197
dc.description.abstract	During the last decade, the advances of molecular techniques have profoundly changed the way scientists build and use phylogenetic trees. Vast fields of research as different as ecology, evolution of development, genomics, and systematics have been influenced by the growth of phylogenetics, and the possibilities offered by new techniques of tree reconstruction are likely to further anchor the discipline as a core component of evolutionary biology. Despite this, phylogenetic inference remains a particularly difficult task because no polynomial-time algorithm is available to reconstruct optimal trees based on a given data set and the problem is getting more difficult as the number of taxa handled in reconstructions increases. The last decade has witnessed the development of powerful computer architectures and software that have alleviated this burden. However, the reconstruction of comprehensive phylogenetic trees still has to rely on heuristic searches and sound statistical methods are often prohibited for large data sets due to associated computational difficulties. In this thesis, I explored the problem of reconstructing large phylogenetic trees. One aspect was to investigate how well current methods of tree reconstruction performed when faced with matrices containing hundreds or thousands of taxa. In chapter 2 of this thesis, computer simulations based on four large angiosperm trees were performed to assess the success of maximum parsimony and neighbour-joining to infer trees. The results indicated that the size of the matrix was not a problem in itself, and that the distribution of changes along the tree could be a more important factor. For instance, when conditions were favourable, more than 80% of the nodes from a tree containing 13,000 taxa could be correctly inferred with simulated data sets of 10,000 bp. With real data sets, it is however impossible to know how far the trees obtained are from the ‘true’ underlying evolutionary hypothesis. Resampling techniques, such as bootstrap or jackknife, have been developed to estimate how much confidence one can put on a particular node of a phylogenetic tree. With large numbers of taxa, these procedures become computationally intensive, especially if thorough heuristic searches are used. It is therefore important to understand the effects of different heuristic strategies on the support obtained for large phylogenetic trees, and whether faster tree search options could be used to reduce the time of the analyses without biasing the support obtained. In chapter 3, the level of support obtained by bootstrapping and jackknifing a 357 taxa molecular matrix for the angiosperms using four different heuristic search options were compared. Heuristic searches that performed rearrangements on the original tree obtained by stepwise addition of the taxa yielded comparable values of support for bootstrap and jackknife. However, the fastest technique could reduce the time of the analyses by 30-fold. These classical phylogenetic analyses are based on biological characters, such as morphological traits or DNA sequences, but supertree reconstruction methods have also been developed to build large phylogenetic trees by gathering the information directly from existing ‘source’ trees. An overlap of taxa between the source trees is sufficient for the methods to be applied, and the process allows very large trees to be created quickly. Several methods have been proposed to build supertrees and chapter 4 examined the ‘matrix representation using parsimony’ method. An empirical assessment using several different data sets from the grass family was made by comparing several modifications of this method. The data sets were analysed separately and the resulting topologies were used as source trees in the supertree reconstructions. Modifications that took into account the level of support present in the source trees produced supertrees that were closer to a classical analysis combining the different DNA sequences. Supertrees were also built from 55 published topologies for the grass family to create the largest grass phylogenetic trees containing 401 genera. The supertrees obtained highlighted interesting questions concerning the evolutionary history of the grass family, and the relationships between the clade comprising maize, wheat, and rice were further investigated in chapter 5. In this chapter, extensive simulations were performed to investigate whether the discrepancies between topologies obtained from different molecular data sets could be affected by random or systematic errors. The results indicated that several DNA sequences have a strong bias towards a particular placement of wheat. However, the general result suggested that the level of taxa and character sampling in studies of grass phylogenetics have not been sufficient to avoid high rates of errors and that these have impaired the ability of methods to correctly reconstruct grass evolutionary history. Finally, in response to the previous results, a large phylogenetic analysis of the trnLF and rbcL plastid regions is presented in chapter 6. The rbcL data set placed wheat as sister to maize, while this topology was only obtained with trnLF when Bayesian analysis was performed. With this DNA region, maximum parsimony analysis placed wheat within the BEP clade. The main subfamilies were supported, but the relationships between these groups could not be clearly defined. Divergence times were estimated by calibrating these phylogenetic trees with four grass fossils, suggesting a rapid diversification of the grasses between 40 to 30 Mya. The calibrated dates also allowed an estimate of the appearance of the C4 photosynthetic pathway in the grasses at 20 to 10 Mya, an origin that corresponded to low levels of past CO2 concentrations. Therefore, CO2 levels could have been a factor in the origin of C4 photosynthesis in grasses, an adaptation that could have helped the huge diversification of this important angiosperm family.	en
dc.format	1 volume
dc.language.iso	en
dc.publisher	Trinity College (Dublin, Ireland). Department of Botany
dc.relation.isversionof	http://stella.catalogue.tcd.ie/iii/encore/record/C__Rb12455134
dc.subject	Botany, Ph.D.
dc.subject	Ph.D. Trinity College Dublin
dc.title	Large trees, supertrees and the grass phylogeny
dc.type	thesis
dc.type.supercollection	thesis_dissertations
dc.type.supercollection	refereed_publications
dc.type.qualificationlevel	Doctoral
dc.type.qualificationname	Doctor of Philosophy (Ph.D.)
dc.rights.ecaccessrights	openAccess
dc.format.extentpagination	pp 223
dc.description.note	TARA (Trinity’s Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ie
dc.identifier.uri	http://hdl.handle.net/2262/76996

Files in this item

Name:: Salamin TCD THESIS 7197 Large ...
Size:: 6.055Mb
Format:: PDF

View/Open

Name:: license.txt
Size:: 3.419Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Botany (Theses and Dissertations)
Botany (Theses and Dissertations)
Trinity College Dublin Theses & Dissertations

Show simple item record

Browse

My Account

Large trees, supertrees and the grass phylogeny

Files in this item

This item appears in the following Collection(s)