Inferring HIV escape rates

We have a new preprint on the arXiv  (here on Haldane’s sieve). This work is the result of a collaboration between us and Alan Perelson, LANL, and explores methods to estimate parameters of the HIV-immune system interaction from time resolved sequence data. The focus of this paper is on early infeImagection dominated by a few rapid substitutions that fix because they prevent or reduce recognition of infected cells by the immune system via cytotoxic T-lymphocytes (CTL).  CTL escape is one of the fastest instances of evolution I have come across. 4-6 mutations spread within a few weeks. It happens in most HIV infections and is partly predictable based on the HLA genotype of the infected person. These substitutions are so rapid that clonal interference has to be modeled. Our method fits a reduced model of clonal interference to the typically very sparse data and thereby estimates the selection coefficients, aka escape rates.

Why do we want to know these numbers?
The number of viruses in the blood of an infected person peaks 2-3 weeks after infection and thereafter drops by 2-3 order of magnitude. This drop is partly due to a response by the adaptive immune system. However, it has proved difficult to attribute this drop to specific parts of the immune response. The rates at which different mutations sweep through the population gives us information about the pressure exerted by the T-cell clones that target the epitope containing this mutation.

How do we do it?
Early in infection, the viral population is large and selection is strong. In these conditions, recombination is of minor importance since most double/triple… mutants are more efficiently produced by recurrent mutation than recombination. This implies that mutations accumulate sequentially always on a background one which already all previous mutations are present. The time at which a novel mutation happens in tightly constrained by the trajectory of preceding genotype. These constraints regularize the fitting problem to some degree and the multi-locus fitting is more robust than single locus fitting.

What do we learn about evolution in general?
In addition to the intrinsic interest in the HIV/CTL interaction, CTL escape is an ideal setting to study rapidly evolving populations. This evolution happens in its “natural” habitat and the selective pressure as well as the functional consequences of the observed molecular changes can be quantified via immunological data, protein structure, and replication assays. In addition, we have ample cross-sectional data (HIV sequences from many different patients) that allows us to look at prevalence of the escape mutations and potential compensatory mutations. None of this is done in this paper, but studying HIV/immune-system coevolution is a fascinating show case of rapid evolution.

Arxiv: Coalescence in sexual populations under selection

Update: the paper is now published.

A few days ago, I uploaded a revision of our recent manuscript (with Taylor Kessinger and Boris Shraiman) on genetic diversity in sexual populations under selection. I would like to elaborate a little bit on what I think is remarkable about our results.

Why is it important?
It is common these days to sequence multiple individuals from a population and analyze the genetic diversity in the sample to learn something about demographic and evolutionary past. To infer the past from diversity data, we need to know how diversity depends on the parameters and processes we are interested in. This link typically comes from the analysis of simple models. The predominant framework used for this purpose is the neutral coalescent, which is often used as a null model to detect selection. This strategy — looking for outliers in a mostly neutral genome — seemed like a good strategy at the time when it was thought that the great majority of polymorphims are neutral. If, however, the majority of polymorphisms is under some form of selection, we need a new null model to detect adaptations of particular interest that stand out from all the rest that, while not neutral, has weak or fluctuating effects. Our manuscript aims at delivering such a null. In contrast to previous analysis that focussed on mutations with strong effects (background selection or hitch-hiking), we analyze a model where a large number of weakly selected polymorphisms generate fitness diversity in a sexual population. We find that the properties of neutral diversity smoothly interpolate between the neutral limit (drift dominated) and the limit of strong selection (draft dominated). The crossover between the two regimes happens when fitness difference between haplotypes are comparable to the inverse population size. The length of haplotypes (LD) and the diversity are self-consistently determined and depend on the fitness variance per maplength, but only weakly on the population size. To determine where a population sits on this continuum between neutral or draft dominated regime, it is informative to analyze the site frequency spectrum, which changes qualitatively between the regimes.

How did we address it?
In sexual populations, crossing over reshuffles alleles, which results in linkage equilibrium and independent histories of loci at large distances. The histories of tightly linked loci, however, remain correlated and very close loci behave as if they were asexual. These different degrees of linkage interact with selection in complicated ways. Our approach to this problem was to identify the length of blocks that behave more or less asexually over the time to the most recent common ancestor at the locus, calculate the fitness variation within those blocks that, and map the problem to results for coalescence with selection in asexual populations. Image

The latter problem has been addressed by Oskar Hallatschek and myself. We showed that in asexual populations with substantial selected diversity, coalescence and genetic diversity are not described by the Kingman (standard neutral) coalescent, but resemble the Bolthausen-Sznitman coalscent (BSC) — at least in the limit of large populations. Michael Desai, Aleksandra Walczak and Daniel Fisher published similar conclusions.

What’s next?
It is common to define an “effective population size”, Ne, via the distance between pairs of haplotypes and hope that a neutral model with this Ne explains other features of genetic diversity. This rarely works. Furthermore, Ne depends strongly on crossover rates, functional density (purifying selection), etc. The one quantity Ne is only weakly correlated with is the census population size. Our results link genetic diversity (I refuse to call it Ne) to parameters such as mutation rates, crossover rates, and effect distributions of mutations. The predictions should be applicable whenever there are many polymorphisms within a linkage block, which is likely the case in facultative outcrossers or low recombination regions of obligate outcrossers.

When analyzing resequencing data, it should be possible to use the polarized site frequency spectrum to determine whether diversity is dominated by drift or draft. In the draft regime, heterozygosity should be proportional to the square root of of rho/mu s^2, where rho is the crossover rates, mu is the mutation rate, and s^2 is the average squared effect of mutations.