FastML Logo
The FASTML Server
Server for computing Maximum Likelihood
ancestral sequence reconstruction
HOME    OVERVIEW    GALLERY    SOURCE CODE    CITING & CREDITS    OLD VERSION

The FASTML Server - Server for computing Maximum Likelihood ancestral sequence reconstruction - overview

FastML Overview





Introduction

The FastML server is a bioinformatics tool for the reconstruction of ancestral sequences based on the phylogenetic relations between homologous sequences. The server runs several algorithms that reconstruct the ancestral sequences with emphasis on an accurate reconstruction of both indels and characters. For character reconstruction the previously described FastML algorithms [1, 2] are used to efficiently infer the most likely ancestral sequences for each internal node of the tree. Both joint and the marginal reconstructions are provided. For indels reconstruction the sequences are first coded according to the indel events detected within the multiple sequence alignment (MSA) [3] and then a state-of-the-art likelihood model is used to reconstruct ancestral indels states [4, 5]. The server results are the most probable sequences, together with posterior probabilities for each character and indel at each sequence position for each internal node of the tree. FastML is generic and is applicable for any type of molecular sequences (nucleotide, protein, or codon sequences).



Methodology

Given a multiple sequence alignment (MSA) and optionally a phylogenetic tree, the ancestral reconstruction process can be divided into two parts:

    1) Character reconstruction - two methods are implemented: the joint and the marginal. In the joint reconstruction, one finds the set of all the internal nodes sequences. In the marginal reconstruction, one infers the most likely sequence in a specific internal node. The results of these two estimation methods are not necessarily the same [1, 2]. Both methods are based on maximum likelihood (ML) algorithms and on an empirical Bayesian approach taking into account the rate variation among sites of the MSA.

    2) Reconstruction of indels - a two steps approach is used in order to take into account the dependency among sites:

      a. Indels coding.
      In this step, the input MSA is coded into a binary indels matix. The server uses an efficient implementation of the simple indel coding [3] according to which each indel with different start and/or end positions is considered to be a separate character. All indels in the data are coded as binary (presence\absence) characters, each of which may represent a gap of multiple sites.

      b. Indels reconstruction
      In this step, the evolutionary analysis of indels is performed. Given the presence and absence binary matrix of indels in the extant sequences, the algorithms reconstructs the ancestral state of each indel in each internal node of the tree. It is assumed that the observed pattern of indels is the result of deletions and insertions dynamics along a phylogenetic tree. Our state-of-the-art inference methodology is based on a likelihood-based mixture model that allows variable rates of insertions and deletions among indel sites to reliably capture the underlying evolutionary processes [4, 5]. In this approach the posterior probability of indel presence (gap) is computed for each indel site and each node. Alternatively, users can select to reconstruct the ancestral states of the indels based on the maximum parsimony approach. The parsimonious ancestral reconstruction is based on the Sankoff algorithm [6]. In this approach the parsimonious assignment of indel presence (gap) is computed for each indel and each internal node.