|
The FASTML Server Server for computing Maximum Likelihood ancestral sequence reconstruction |
|
Click here to view an example output page for ancestral sequence reconstruction of HIV-1.
Reconstructing ancestral sequences of HIV-1 is a challenging task due to its fast rate of evolution. Nevertheless, ancestral sequence reconstruction (ASR) was suggested to be of great value to HIV-1 vaccine design that aims to elicit an immune response against a broad spectrum of contemporary viral strains [e.g., 1]. Specifically, the envelope protein (Env) exhibits an extraordinary diversity (up to 35% diversity among different HIV-1 subtypes), which is attributed to mutational escape of the virus from the host immune system. The viral high mutation rate is also responsible for the ability of the virus to acquire resistance to drug treatments, and is also a major obstacle towards developing an efficient vaccine.
Here we illustrate the ability of FastML to reconstruct ancestral Env sequences. We run FastML on a sample of HIV-1 group M sequences from subtypes B and C taken from a previous study [2]. Our analysis is focused on the marginal reconstruction of the ancestral sequences of subtype C, which is the most prevalent subtype and accounts for nearly half of all infections globally, and subtype B, which is predominant in the western world and accounts for about 12% of global infections. Sequences were aligned using MAFFT [3]. Several differences were found between the clade B and clade C ancestral sequences, including both different character assignments and different indels.
The different reconstruction is visually presented by the logo of the posterior probability at ancestral node of subtype B (N33) and Subtype C (N32). Interestingly, some sites were reconstructed with high confidence in subtype C and low confidence in subtype B, and vice versa. Among these sites is position 592 in the MSA, which corresponds to position 414 of gp120, a derived protein of Env. This position is involved in the binding of the co-receptor CCR5. FastML inferred that this site in the ancestral of subtype C was threonine with a high posterior probability (0.997) while the reconstruction of the ancestor of subtype B is arginine with a much lower posterior probability (0.628 only). The difference in the posterior probability between the ancestors of these two clades in this position may be explained by a previous analysis that suggested that the intensity of selection forces on this position is not constant among the various HIV-1 lineages [2]. Specifically, this position is highly conserved in subtype C but is variable in subtype B, which is directly reflected in the posterior probabilities.
We further used FastML to provide the 100 most likely ancestral sequences of the ancestral of subtype C. At the abovementioned site, threonine is always inferred, which is in agreement with its high posterior probability. Notably, the difference in log-likelihood between the most likely ancestral sequence at this node and the 100th most likely sequence is only 0.141, indicating that both sequences are almost as likely to reflect the "true" ancestral sequence.
References