Many aspects of intrapatient HIV-1 evolution are predictable

Within HIV infected individuals, the human immune system tries to prevent virus replication while HIV continuously changes to avoid recognition by the immune system. The resulting evolution of the virus population has become a paradigmatic example of rapidly adapting populations. We just published a paper that provides unprecendented insight into intrapatient HIV evolution. This work is the product of an extremely enjoyable collaboration involving Fabio Zanini from our group here in Tuebingen and the group of Jan Albert at the Karolinska Institute in Stockholm.

Whole genome deep sequencing

Our aim in this study was to provide a comprehensive assessment of the evolutionary dynamics ulogonfolding within the body of HIV infected people. We developed a strategy to sequence the entire virus genome such that even rare mutations are accurately represented in our data set. The impressive sample collections in Sweden and the generous participation of patients in this study allowed us to follow HIV evolution densely in multiple patients. We developed an interactive web application that allows users to explore HIV evolutionary dynamics and access the data in a convenient way.

What did we find?

Mutations occur at random and while selection for replication weeds out harmful mutation and amplifies useful ones. The common conception is that mutation rates are low and/or useful mutations are rare. Furthermore, biological reality is complicated and predicting what might be a useful mutations seems hopeless. In HIV, however, we find a high degree of reproducibility and predictability, indicating that “finding the right mutation” is not a rare, fortunate event for HIV but rather a fast and reliable mechanism of survival.

The predictability extends to single positions in the genome. More then 20% of sites that are globally unconserved (such as sites at which mutations are synonymous) are measurably diverse after a few years within each patient. This diversity is growing continuously with little signs of loss of diversity through genetic drift or hitch-hiking with beneficial mutations. This implies a large population that systematically explores sequence space. In contrast, at conserved sites, we observe next to no diversity indicating efficient selection against deleterious variants.

We can not only predict where mutations accumulate because they are tolerated, but also where they spread because they help the virus. By looking specifically at sites where the virus population was initially different from the majority of HIV sequences known, we found toaway_croppedthat the virus has a strong tendency to come back to the global consensus state. 30% of all substitutions occur at the 5% of sites were the initial virus differed from this consensus and represent reversions. The tendency to revert to this global attractor is stronger at sites that are globally more conserved.  Within the diversity of HIV-1, this attractor seems universal. The picture on the right shows the rate of evolution (divergence after 6 years) separately sites that can revert and sites already in the consensus state. At the most globally most conserved sites, about 50% of all non-consensus positions revert to consensus after 5 years — a roughly 1000 fold excess over evolution away from consensus. We also found that reversions are happening not only soon after infection, but rather all along, for many years.

What does this mean?

Our data are consistent with HIV as a large population that systematically explores a mostly universal fitness landscape and returns to favoured state when possible. The reproducible patterns of evolution are only possible since HIV recombines extensively within patients — without recombination it would be much more difficult for the virus population to simultaneously revert and escape in different regions of the genome as these mutations would interfere with each other as they spread. Sweeping of adaptive mutations would wipe out diversity and the reproducible patterns of mutation accumulation. The reproducibility of minor variation further suggests that the fitness costs of individual mutations are similar among unrelated viruses and explains why inference of fitness landscapes of HIV from cross-sectional data is possible. visualizes antigenic evolution of influenza viruses

In a new preprint by Trevor Bedford, Rodney Daniels, Colin Russell , Boris Shraiman, and myself, we show how antigenic evolution can be integrated with the phylogenetic tree of HA sequences.
Human immunity is the main driving force of the seasonal influenza viruses. Only when changing their antigenic properties, influenza viruses are able to re-infect previously infected humans. Mutation that prevent immune recognition often rapidly spread across the globe. Human antibodies typically recognize the tip of the HA trimer on the surface of the influenza virus and the amino acids at those positions change very often through time.

Mutations with antigenic effects highlighted on a HA structure
Mutations with antigenic effects highlighted on a HA structure

The antigenic properties of circulating influenza viruses are constantly monitored by the national influenza centres and the WHO collaborating centres for influenza using hemagglutination inhibition assays. These assays basically record how much an antiserum (obtained from a ferret) can be diluted and still recognise a virus — if the virus has changed antigenically relative to the serum, even high serum concentrations don’t inhibit the virus.

Results from such assays come in large tables of numbers quite removed from the molecular evolution of the HA protein. In the preprint, we show that antigenic distances behave similarly to sequence distances on the tree and can be explained as a sum of contributions associated with amino acid substitutions or contributions on branches of the tree.

To visualize antigenic evolution, we integrated HI titer data (mostly from the Worldwide Influenza Centre; in London and was generated by John McCauley, Rodney Daniels, and colleages) into nextflu. The site allows to selected a particular reference virus or vaccine strain and will color all viruses on the tree according to their measured or predicted antigenic similarity to the reference virus, see screen shot below. This integration makscreenshotes it easy to associate antigenic changes with genotypic changes and the dynamics of the corresponding clades.

In the manuscript, we further investigate to what extent antigenic changes are predictive of the composition of future influenza populations. Successful clades tend to be antigenically advanced, but a substantial fraction of antigenic advances fail to spread. The appeal of HI data in the context of prediction is the early detection of a novel variant that is suboptimally recognized by the vaccine. However, many such “hopefuls” often go extinct and reliable prediction using HI data has find ways to differentiate antigenic changes that are likely to be successful from those that compromise virus functionality.

Seasonal influenza in 2015 and future projections

Together with Trevor Bedford, we have put together an informal summary of recent patterns of seasonal influenza evolution. We use nextflu to explore which clades are on the rise or about to go extinct and discuss strains that likely dominate 2016. Read the full report on

Recombination in HIV-1 and the “book” of genealogical trees

In our new preprint, we report whole genome deep sequencing of longitudinally sampled HIV-1 populations from multiple patients — effectively a movie of evolution at about 6 month resolution. This work was led by Fabio and is the product of a fantastic collaboration with the group of Jan Albert at the Karolinska Institute in Stockholm.

Among the many things we can study in detail using this data set, we looked at linkage and recombination. We find that linkage disequilibrium in chronic infection is typically limited to about 100bps. Consistent with this lack of long range linkage, the shapes of trees reconstructed from 400bp reads varies greatly in different regions of the genome. 400bp are often too short to construct well supported phylogenetic trees. Nevertheless, the trees are instructive to illustrate diversity in the population. The figure below animates trees when moving through genome from 5′ to 3′ end.

Every position in genome has a unique genealogical tree, but through recombination genealogical trees of two sites diverge as the distance between the sites increases. One way to picture this process is to think of a book in which each page show the genealogy corresponding to a particular nucleotide. Skimming through the book results in a movie of gradually changing trees. We need diversity to resolve trees and can’t reconstruct a tree for an individual site, but the trees obtained from sliding 400bp windows approximate this process.

Trees of longitudinally sampled sequences in various parts of the HIV-1
Trees of longitudinally sampled sequences in various parts of the HIV-1 genome. Big circles correspond to common variants, small circles to rare variants. Early samples are shown in blue, followed by green, yellow and red.

Trees in different parts of the genome vary widely in shape and depth. This is consistent with extensive recombination. The scale of linkage — about 100bp — is compatible with earlier estimates of the intrapatient recombination rate by us and Thomas Leitner or Batorsky and colleagues.