Establishment and dynamics of the latent HIV-1 reservoir

Our new study (work with the group of Jan Albert) on HIV-1 evolution and turnover during suppressive anti-retroviral therapy has just come out in eLife. In this paper, we combined our previous data on HIV-1 evolution in plasma prior to therapy (Zanini et al, 2015) with HIV-1 DNA sequences from peripheral blood cells (PBMC) after many years of therapy. This combination of pre-therapy and on-therapy data from the same individuals allowed us to investigate the origin of integrated HIV-1 DNA and determine whether viral DNA in cells change during therapy:

  • We find no evidence of replication/evolution during suppressive therapy
  • Even after 18y of therapy HIV-1 DNA looks very similar to the HIV-1 RNA from samples right before treatment
  • The HIV reservoir is turning over fast in absence of therapy. This turnover is dramatically slowed by therapy, suggesting that HIV-1 infection is a major contributor to T-cell death.

Our results are at odds with a recent study by Lorenzo-Redondo et al 2016. Using sequence data from HIV RNA at treatment initiation and HIV DNA 3 and 6 month into therapy, Lorenzo-Redondo et al estimated a very high rate of sequence evolution. The evolution of the root-to-tip distance predicted on the basis of their rate estimate is included in the graph below as shaded area – clearly incompatible with our results. In fact, the rate estimated by Lorenzo-Redondo et al is faster than the pre-therapy rate in the individuals we investigated. combined_root_to_tip_clustered_good_hap_count

Lorenzo-Redondo et al studied sequences from blood and lymph tissue, while we had only access to blood samples. This, however, is unlikely to explain this discrepancy: Lorenzo-Redondo et al estimate similar rates in PBMCs and lymp tissue. Furthermore, several studies, including Lorenzo-Redondo et al, estimate that HIV sequences from lymph and PBMCs mix on a time scale of a few month such that PBMCs should be an accurate reporter. The rapid evolution inferred by Lorenzo-Redondo might be explained in part by the following factors:

  • The samples come from a six month interval, which is much shorter than the coalescence time scale of HIV. With sequences from such small time intervals, rooting of the phylogenetic tree to maximize the correlation between root-to-tip distance and sampling date can generate an exaggerated temporal signal.
  • With increasing time since start of therapy, the HIV-1 DNA positive pool of cells will become dominated by long-lived cells which sample deeper into the history of the HIV infection prior to therapy. This could generate a signal of spurious signal “backward” evolution.

The graph below illustrates the latter. If HIV positive cells are a mix of short-lived (blue) and long-lived (red) cells, a sample taken at treatment start will be dominated by short-lived cells and virus that was replicating very recently. A few month into treatment, short-lived cells will be mostly HIV negative while HIV positive cells tend to be long-lived cells that sample deeper into the history of the infection. This shift can generate a spurious signal of evolution.back_sampling

While we cannot rule out that HIV does replicate in compartments that are missed by our sequencing of HIV from PBMCs, ongoing replication is not the dominant mechanism by which HIV DNA is maintained in circulating cells.



Colistin resistance evolution

The wide spread use of antibiotic has lead to evolution and spread of bacteria resistant to drugs. Multi-resistant strains tend to circulate in hospitals with a high density of people on antibiotic therapy. Treatment options are running out for some of the most resistant strains. To limit the spread and emergence of such variants, we need to understand the conditions under which resistance evolves, and how we can avoid or delay it.

Matthias Willmann and Silke Peter from the Tübingen medical school and my group teamed up to study evolution of colistin resistance in clinical isolates of Pseudomonas aeruginosa using a morbidostat. This computer controlled continuous culture device was designed by Erdal Toprak and colleagues in Roy Kishony’s lab. The morbidostat adjust drug concentrations such that the bugs struggle but grow. Over days and weeks, the bacteria become resistant while we can take samples and sequence the population to track the changes in their genomes.

The preprint we just posted describes results by Bianca Regenbogen who took on the ambitious task of i) finalizing our morbidostat set-up, (ii) establishing the experimental procedures, (iii) running the experiments for two strains (PA77 and PA83), (iv) sequencing the sample, and (v) analysing the data. Every part of this would have been impressive for a six month thesis – she did all five.

The morbidostat


The left part of the picture shows the laptop controlling the experiment and a large array of small pumps that deliver drug and growth medium into the culture vials. The culture vials sit inside an incubator (right picture). All the tubing connecting pumps to the vials and for waste removal run through a hole that our workshop drilled into the back of incubator (yes, they did ask whether we were serious about this).

The experiments


The experiments typically ran for 3 weeks, while we recorded optical density every 30s, adjusted medium and drug every 10min, and took samples three times per week. The plot on the left shows the colistin concentration over these three weeks – we have a total of 18 such traces, each corresponding to one culture. The colistin concentrations tolerated by the bacteria increased 10fold after 7-12 days, and then steadily increased further to a total of a 100fold increase. The colored bars above the graph indicated the trajectories of commonly observed mutations: more below.

The mutations

We wanted to know what happened as these populations became resistant. Which genes mutated? Always the same genes? Same mutations? …. We sequenced the seven samples for each culture (~100 samples) with high coverage. In one vial, the dynamics of mutations looked like this: pa77_v01_snps

The left panel shows the trajectories of mutations that we didn’t see in initial sample, the right panel are the preexisting mutations. The plot only shows mutations observed at >20% at least once.

All PA77 cultures mutated pmrE and pmrB. All mutations in pmrE were at position 28, either to Y->C or Y->N. In contrast, mutations in pmrB were all over the place. In the case shown above, two pmrB mutations came up, but only L17Q together with Y28N in pmrE was successful. Curiously, PA77 is an outlier at position 28 of pmrE: The majority of P. aeruginosa strains in GenBank have a C at this position.

PA83 evolved colistin resistance via much more varied patterns of mutations. While we didn’t find any mutations in pmrE, all strains mutated lpxC, pmrB, 7/9 mutated migA (+2 other poorly annotated genes), 6/9 lpxO2, 5/9 pmrA. In total more than 20 different genes mutated in more than 2 cultures.

The above mutation seem to be directly related to colistin resistance and most mutated genes are involved in some part of the synthesis of lipopolysaccharides in the outer membrane. In addition to those directly relevant mutations, 6/9 and 1/9 cultures of PA83 and PA77, respectively, carried mutations in mutS. These populations had a 100fold elevated mutation rate and developed many other unique mutations (~50 in 3 weeks).

What did we learn?

  • a rich culture of 20ml (100M cells) develops resistance in 10 days
  • the pathways differ between strains:
    LpxC, migA,.. mutations were common in PA83, not observed in PA77
    PmrE reverted to wildtype in PA77, PA83 has pmrE 28C
  • despite different pathways, phenotypes evolve in a similar fashion
  • mutator phenotype arise frequently, but are not essential for resistance evolution
  • the mutation target for colistin is large and clones with similar mutations compete.
  • MICs measurements in liquid culture differ from those of ETests.




Does HIV evolve during therapy?

Even though modern HIV therapy reduces HIV replication to undetectable levels, the virus almost always rebounds when therapy is stopped. The rebound comes from a reservoir of virus integrated into the genome of T-cells, which can start to produce virus even decades after the virus integrated.  How this reservoir is maintained has been the subject of intensive research. Most studies point towards long lived T-cell lineages that carry integrated virus, while other studies claim that low level replication of HIV maintains and reseed the reservoir. Together with the group of Jan Albert at the Karolinska Institute, we now show that the long term dynamics of the HIV DNA reservoir is perfectly compatible with long lived T-cell lineages, while we find no evidence of ongoing replication.

What is unique about our study?

We obtained sequences of HIV integrated into host cell genomes from 10 patients 3-18 years after they started successful therapy. These samples came from the same patients that participated in our earlier longitudinal study of intra-patient evolution. In this study we performed whole genome deep sequencing of several samples from early infection up until start of therapy. We hence know exactly what variants were circulating in these patients before they started fully suppressive ART. This enables us to determine the most likely ancestors of the sequences found in the reservoir, date their establishment, and determine whether sequences changed since start of therapy.

Circles: RNA from replicating virus. Squares/triangles: DNA reservoir

Overwhelmingly, we find that sequences from the DNA reservoir are very similar to sequences of replicating viruses observed in RNA samples right before the onset of therapy, sometimes up to 18 years earlier. The DNA sequences from the reservoir contain much of the diversity virus population before treatment, as evidenced by the joint phylogenetic trees of RNA and DNA samples.

No evidence of evolution in the reservoir

A quantitative analysis of virus sequences before and after start of therapy shows very clearly that sequences obtained from the HIV DNA reservoir have not undergone any significant replication or evolution. The figure below shows genetic distances of samples relative to the sequences observed right before start of treatment:


While evolution is fast and steady before treatment, no consistent change is observed after start of treatment.

Why do others see evolution in the HIV-1 DNA reservoir?

They probably don’t. Data by Lorenzo-Redondo et al come from a 6 month period following start of treatment (the checkered area in the figure above). During this brief time window, they claim to observe evolution at a rate of 1% per year — about 5 fold faster than without treatment. Lorenzo-Redondo et al have samples from lymphoid tissue in addition to PMBCs (white blood cells, our samples are also PBMCs). However, they find rapid migration between these tissues such that the restriction of our study to PBMCs is unlikely to explain the discordant findings.

One possible explanation (besides noise and artefacts do to hypermutation screening and 454 sequencing), could be the changing age distribution of cells latently carrying HIV genomes. Right after start of therapy, these cells will be dominated by short lived cells that got recently infected. After 3 and 6 month (the time points of their samples), the reservoir might get dominated by older cells that contain viral genomes from further back in the past. Without a dense history of the replicating HIV population prior to therapy, the effect of sampling deeper into the past cannot be distinguished from forward evolution. That said, we don’t see a strong effect in this direction and the DNA reservoir is dominated by RNA sequences that circulated right before the start of treatment, even 15 years later.

Do our findings rule out replication?

No. We show that HIV sequences obtained from PBMCs derive from virus populations before start of treatment. We can’t rule out that (i) a tiny minority does change but is so rare that we don’t observe it, or (ii) that replication happens in compartments that don’t contribute to the DNA reservoir accessible through PBMCs. To our knowledge to credible evidence exists in favor of replication, but we can’t rule it out.

Mutation rates and fitness costs of HIV-1

The rates at which mutations arise and the effects these mutations have on phenotypes and replications are key determinants of how populations change and adapt – but measuring them is often hard. While mutation rates in animals or plants can be obtained quite easily by sequencing parents and children, fitness effects are much more difficult to ascertain: Only the most dramatic mutations have a big enough effects that can be measured over a few generations or leave strong signals in genetic diversity.

In viruses like HIV-1, mutation rates and effects of mutations are more readily accessible since their generation times are short and their genomes are compact. However, these measurements cannot be done in the natural environment – the infected host – but typically in cell culture systems. In our new preprint, Fabio, Vadim, myself and our colleagues Johanna and Jan from Sweden present estimates of mutation rates and fitness costs in-vivo.

How did we do it?

We have previously presented longitudinal whole genome deep sequencing data from multiple patients (Zanini et al, 2016). At each position of the genome, we can observe the frequency of different mutations at different times during the course of infection. A subset of positions don’t seem to matter muchmut_matrix for virus replication. We found that at those sites, mutations accumulate almost linearly: The rate of accumulation is the in vivo mutation rate. The estimates so obtained agree very well with cell culture estimates. The figure on the right summarizes these findings: The thickness of the arrows indicate the relative rates – the overall rate is 1.2 mutations per site and day.

At these approximately neutral sites, mutation accumulation is linear (at least over the few years we looked at it). At other sites, mutations arise very much the same way, but they reduce the rate of virus replication and are hence weeded out. As a result, mutation frequencies don’t accumulate linearly but saturate. The time its takes to saturate and the level at which the frequencies saturate depend on the selection coefficient. We use this dependence to estimate the landscape of fitness costs at almost every site of the HIV-1 genome.

fitness_costThis graph shows a slightly smoothed landscape of fitness costs in units of 1/day separately for non-synonymous mutations (solid) and synonymous mutations (dashed) for the major genes of HIV-1 (colors). As expected, fitness costs of non-synonymous mutations are a lot larger than those of synonymous mutations (about 50% of nonsyn mutations have costs of 10% or more). But subsets of synonymous mutations are also very costly, in particular in RNA secondary structure rich regions at the 5′ end or in envelope.


Estimating fitness costs requires accurate estimates of mutation frequencies. The accuracy of the latter is limited by small numbers of HIV genomes that enter the sequencing library, amplification biases during PCR, and possibly through hitch-hiking effects that bring deleterious alleles to high frequencies. To nevertheless get reasonable estimates of fitness costs at individual sites, we used weighted averages of all sequenced samples that we had available. This is sensible, since the frequencies of deleterious mutations decorrelate rapidly such that different samples from the same patient are approximately independent. By combining multiple samples with weights proportional to the number of genomes contributing to the sample, we generate a meta sample that represents a much larger population.The individual samples are sequenced with an error rate below 0.002 per site and the pooled sample then allows us to estimate frequencies far below this threshold.

Why do we care?

We have previously shown that reversion to the consensus is a dominant force in HIV-1 evolution. These reversion mutations are driven by the fitness costs of these mutations. The landscape we determined will allow to look more closely at the driving forces of reversion. Furthermore, the landscape can pin-point regions of vulnerability and target particular regions with unexpected conservation patterns for follow-up analysis.

On a more general note, fitness landscapes and the distribution of effect sizes of mutations are the most important parameters we need to know in order to decide what kind of model of the population genetics is appropriate. We have very little knowledge how these distributions look like for any organism. Our work is one of the first examples where such a landscape has been determined in-vivo on a genome wide scale.

Prediction of antigenic phenotypes of influenza viruses

Influenza viruses evolve rapidly, in part to evade recognition by human antibodies generated during previous infections. Mutations that change antigenic properties are common and rapidly spread through the virus population, making frequent updates of the seasonal influenza vaccine necessary. A close match between the vaccine and the circulating viruses is necessary to ensure vaccine efficiency

Antigenic change can be detected in HI assays with anti-sera raised against reference and vaccine viruses. Low titers indicate that a virus is antigenically different from the virus used to produce the serum (a reference virus). Members of the WHO Global Influenza Surveillance and Response System perform many HI assays every year to monitor antigenic dynamics of influenza. The results are reported in tables like the one below from John McCauley and colleagues at the Crick Institute. HI_table

Each column corresponds to one anti-serum, each row to one virus. Large numbers indicated strong binding. The red values on the diagonal in the upper half highlight homologous titers, that is titers of serum against the virus it was raised against.

To explore and visualize such data, Derek Smith and colleagues have developed antigenic cartography, a variant of multi-dimensional scaling that maps titers to difference in two or more euclidean dimensions. Unfortunately, these 2D projections are difficult to combine with the genome sequences of the corresponding influenza viruses – a type of data that is becoming ever more abundant.

Integration of HI data with sequence data

Together with Boris Shraiman, Trevor Bedford, Colin Russell and Rod Daniels, we have developed models and visualizations to directly integrate HI titer data with influenza virus sequences and phylogenies. This work was published in PNAS this week. Our models infer antigenic distance as additive contributions of branches in the tree or similarly as illustrated below:


The titer distance is modelled either as a sum of terms on the path between virus a and b (used to raise the serum), or as a sum of contributions associated with amino acid difference between the sequences. Both models are similar and describe the data well, for details have a look at the paper.

Visualization of antigenic data on the tree

We used the models learned from the HI titer data to allow interactive exploration and visualization of measured and predicted titers within nextflu. will continue to be updated, while will display the full data set available last summer.


Color indicates antigenic distance from the focal reference virus A/Victoria/361/2011 marked by the red cross-hair. The model on the right interpolates and smooths the data. The focal reference virus can be changed by clicking with the mouse on any other sera marked by grey boxes, upon which the tree coloring will be updated.

Predicting successful viruses

The ultimate goal of this project is to improve predictions of the composition of future influenza virus populations to optimize the vaccine match. We and others have developed methods to make such predictions solely based on sequence information. Here, we showed that successful strains tend to be antigenically advanced, but that blindly picking the most antigenically advanced strain often fails. To improve predictions, we need to find was to integrate antigenic information with other predictors of successful clades and carefully account for competition between clades.

Many aspects of intrapatient HIV-1 evolution are predictable

Within HIV infected individuals, the human immune system tries to prevent virus replication while HIV continuously changes to avoid recognition by the immune system. The resulting evolution of the virus population has become a paradigmatic example of rapidly adapting populations. We just published a paper that provides unprecendented insight into intrapatient HIV evolution. This work is the product of an extremely enjoyable collaboration involving Fabio Zanini from our group here in Tuebingen and the group of Jan Albert at the Karolinska Institute in Stockholm.

Whole genome deep sequencing

Our aim in this study was to provide a comprehensive assessment of the evolutionary dynamics ulogonfolding within the body of HIV infected people. We developed a strategy to sequence the entire virus genome such that even rare mutations are accurately represented in our data set. The impressive sample collections in Sweden and the generous participation of patients in this study allowed us to follow HIV evolution densely in multiple patients. We developed an interactive web application that allows users to explore HIV evolutionary dynamics and access the data in a convenient way.

What did we find?

Mutations occur at random and while selection for replication weeds out harmful mutation and amplifies useful ones. The common conception is that mutation rates are low and/or useful mutations are rare. Furthermore, biological reality is complicated and predicting what might be a useful mutations seems hopeless. In HIV, however, we find a high degree of reproducibility and predictability, indicating that “finding the right mutation” is not a rare, fortunate event for HIV but rather a fast and reliable mechanism of survival.

The predictability extends to single positions in the genome. More then 20% of sites that are globally unconserved (such as sites at which mutations are synonymous) are measurably diverse after a few years within each patient. This diversity is growing continuously with little signs of loss of diversity through genetic drift or hitch-hiking with beneficial mutations. This implies a large population that systematically explores sequence space. In contrast, at conserved sites, we observe next to no diversity indicating efficient selection against deleterious variants.

We can not only predict where mutations accumulate because they are tolerated, but also where they spread because they help the virus. By looking specifically at sites where the virus population was initially different from the majority of HIV sequences known, we found toaway_croppedthat the virus has a strong tendency to come back to the global consensus state. 30% of all substitutions occur at the 5% of sites were the initial virus differed from this consensus and represent reversions. The tendency to revert to this global attractor is stronger at sites that are globally more conserved.  Within the diversity of HIV-1, this attractor seems universal. The picture on the right shows the rate of evolution (divergence after 6 years) separately sites that can revert and sites already in the consensus state. At the most globally most conserved sites, about 50% of all non-consensus positions revert to consensus after 5 years — a roughly 1000 fold excess over evolution away from consensus. We also found that reversions are happening not only soon after infection, but rather all along, for many years.

What does this mean?

Our data are consistent with HIV as a large population that systematically explores a mostly universal fitness landscape and returns to favoured state when possible. The reproducible patterns of evolution are only possible since HIV recombines extensively within patients — without recombination it would be much more difficult for the virus population to simultaneously revert and escape in different regions of the genome as these mutations would interfere with each other as they spread. Sweeping of adaptive mutations would wipe out diversity and the reproducible patterns of mutation accumulation. The reproducibility of minor variation further suggests that the fitness costs of individual mutations are similar among unrelated viruses and explains why inference of fitness landscapes of HIV from cross-sectional data is possible. visualizes antigenic evolution of influenza viruses

In a new preprint by Trevor Bedford, Rodney Daniels, Colin Russell , Boris Shraiman, and myself, we show how antigenic evolution can be integrated with the phylogenetic tree of HA sequences.
Human immunity is the main driving force of the seasonal influenza viruses. Only when changing their antigenic properties, influenza viruses are able to re-infect previously infected humans. Mutation that prevent immune recognition often rapidly spread across the globe. Human antibodies typically recognize the tip of the HA trimer on the surface of the influenza virus and the amino acids at those positions change very often through time.

Mutations with antigenic effects highlighted on a HA structure
Mutations with antigenic effects highlighted on a HA structure

The antigenic properties of circulating influenza viruses are constantly monitored by the national influenza centres and the WHO collaborating centres for influenza using hemagglutination inhibition assays. These assays basically record how much an antiserum (obtained from a ferret) can be diluted and still recognise a virus — if the virus has changed antigenically relative to the serum, even high serum concentrations don’t inhibit the virus.

Results from such assays come in large tables of numbers quite removed from the molecular evolution of the HA protein. In the preprint, we show that antigenic distances behave similarly to sequence distances on the tree and can be explained as a sum of contributions associated with amino acid substitutions or contributions on branches of the tree.

To visualize antigenic evolution, we integrated HI titer data (mostly from the Worldwide Influenza Centre; in London and was generated by John McCauley, Rodney Daniels, and colleages) into nextflu. The site allows to selected a particular reference virus or vaccine strain and will color all viruses on the tree according to their measured or predicted antigenic similarity to the reference virus, see screen shot below. This integration makscreenshotes it easy to associate antigenic changes with genotypic changes and the dynamics of the corresponding clades.

In the manuscript, we further investigate to what extent antigenic changes are predictive of the composition of future influenza populations. Successful clades tend to be antigenically advanced, but a substantial fraction of antigenic advances fail to spread. The appeal of HI data in the context of prediction is the early detection of a novel variant that is suboptimally recognized by the vaccine. However, many such “hopefuls” often go extinct and reliable prediction using HI data has find ways to differentiate antigenic changes that are likely to be successful from those that compromise virus functionality.