In a new preprint by Trevor Bedford, Rodney Daniels, Colin Russell , Boris Shraiman, and myself, we show how antigenic evolution can be integrated with the phylogenetic tree of HA sequences.
Human immunity is the main driving force of the seasonal influenza viruses. Only when changing their antigenic properties, influenza viruses are able to re-infect previously infected humans. Mutation that prevent immune recognition often rapidly spread across the globe. Human antibodies typically recognize the tip of the HA trimer on the surface of the influenza virus and the amino acids at those positions change very often through time.
The antigenic properties of circulating influenza viruses are constantly monitored by the national influenza centres and the WHO collaborating centres for influenza using hemagglutination inhibition assays. These assays basically record how much an antiserum (obtained from a ferret) can be diluted and still recognise a virus — if the virus has changed antigenically relative to the serum, even high serum concentrations don’t inhibit the virus.
Results from such assays come in large tables of numbers quite removed from the molecular evolution of the HA protein. In the preprint, we show that antigenic distances behave similarly to sequence distances on the tree and can be explained as a sum of contributions associated with amino acid substitutions or contributions on branches of the tree.
To visualize antigenic evolution, we integrated HI titer data (mostly from the Worldwide Influenza Centre; in London and was generated by John McCauley, Rodney Daniels, and colleages) into nextflu. The site allows to selected a particular reference virus or vaccine strain and will color all viruses on the tree according to their measured or predicted antigenic similarity to the reference virus, see screen shot below. This integration makes it easy to associate antigenic changes with genotypic changes and the dynamics of the corresponding clades.
In the manuscript, we further investigate to what extent antigenic changes are predictive of the composition of future influenza populations. Successful clades tend to be antigenically advanced, but a substantial fraction of antigenic advances fail to spread. The appeal of HI data in the context of prediction is the early detection of a novel variant that is suboptimally recognized by the vaccine. However, many such “hopefuls” often go extinct and reliable prediction using HI data has find ways to differentiate antigenic changes that are likely to be successful from those that compromise virus functionality.
In November, Boris Shraiman, Colin Russell and myself published a paper on predicting evolution. The method uses the shape of genealogical trees to spot expanding clades which will likely dominate the future population. We demonstrated the method by applying it to historical data of seasonal influenza A/H3N2 virus evolution and predicted the majority years well. However, those “predictions” were for past events. It is time to predict the future!
The dynamics of A/H3N2 viruses over the last 3 years are illustrated in the above figure. There are two big clades that emerged almost 3 years ago and have been dominating A/H3N2 since (3C2 and 3C3 in WHO nomenclature). Of these, 3C3 is bigger and has dominated the 13/14 season. In the past 8 months, however, 3C2 has come back to life evolving a new subclade, 3C2a, while 3C3 has also evolved a new subclade, 3C3a. Importantly, these subclades differ at critical amino acids of HA1 from the precursor (F159Y, 3C2a in green, F159S, 3C3a in red) and are far from the current vaccine for the northern hemisphere (lower green dot). The updated H3N2 vaccine choice for the southern hemisphere (the upper green dot) is part of the 3C3a. The question now is whether 3C3a or 3C2a is taking over?
The interactive tool allows users to color the tree by our prediction using the local branching index we developed. Given the data up until 5 January 2015, our algorithm predicts 3C2a as the strain that expands and dominates the future (see screen shot below). To be concrete, the method predicts that viruses from 3C2a will be the dominant progenitors of the 2015/2016 northern hemisphere winter epidemic.
Our paper (with Colin Russell and Boris Shraiman) on predicting evolution (see also this earlier post) now appeared in eLife. The method we developed predicts fitness of individuals, which in turn allows to predict likely progenitor sequences of future populations. Our method calculates a posterior distribution of fitness for each individual. The resulting ranking reliably predicts the ancestors of future populations of seasonal influenza. Since my previous post on this topic, we have developed a simple and robust approximation to the probabilistic model. This approximation is motivated by the intuitive insight that a very fit individual can develop into a growing clone. Such an expanding subpopulation results in rapid branching in the tree. From this approximation, we developed a simple measure — the local branching index (LBI) — that it predicts future success almost as well as the full fitness inference algorithm. The LBI measures tree length in the neighborhood of each node in the tree, where neighborhood is defined via a weighing function that decreases exponentially with distance from the focal node. This is illustrated in the figure below. The only parameter of the LBI is the size of the relevant neighborhood, i.e., the radius of the shaded region in the illustration.
Why does the LBI predict fitness?
The full fitness inference algorithm multiplies the propagators for each branch to obtain posterior distribution of fitness. The propagator of each additional downstream branch pushes the posterior towards higher fitness. The amount of this additional polarization increases with the length of the subtree of the branch. This polarizing effect of tree length is, however, forgotten over long times. The fitness propagators account for this loss of memory through equilibration to the stationary distribution of ancestral fitness. In a similar fashion, the LBI focusses on local tree length explicitly by exponential weighing with distance.
Connection to multiple-merger coalescents
Together with Oskar Hallatschek, we have shown earlier that the genealogical trees of rapidly adapting populations contain approximate multiple mergers and that the genealogies are asymptotically described by the Bolthausen-Sznitman Coalescent process. Brunet et al have linked these approximate multiple mergers to rare events when one individual happens to be substantially fitter than all others. It then generates a burst if offspring, which looks like a multiple merger in a reconstructed tree. The LBI measures the “burstiness” of the branching at a node. For a given number of offspring, the LBI is maximal for a star-like tree (i.e., a multiple merger), and minimal for a binary merger with no other branching happening for a long time.