Prediction of antigenic phenotypes of influenza viruses

Influenza viruses evolve rapidly, in part to evade recognition by human antibodies generated during previous infections. Mutations that change antigenic properties are common and rapidly spread through the virus population, making frequent updates of the seasonal influenza vaccine necessary. A close match between the vaccine and the circulating viruses is necessary to ensure vaccine efficiency

Antigenic change can be detected in HI assays with anti-sera raised against reference and vaccine viruses. Low titers indicate that a virus is antigenically different from the virus used to produce the serum (a reference virus). Members of the WHO Global Influenza Surveillance and Response System perform many HI assays every year to monitor antigenic dynamics of influenza. The results are reported in tables like the one below from John McCauley and colleagues at the Crick Institute. HI_table

Each column corresponds to one anti-serum, each row to one virus. Large numbers indicated strong binding. The red values on the diagonal in the upper half highlight homologous titers, that is titers of serum against the virus it was raised against.

To explore and visualize such data, Derek Smith and colleagues have developed antigenic cartography, a variant of multi-dimensional scaling that maps titers to difference in two or more euclidean dimensions. Unfortunately, these 2D projections are difficult to combine with the genome sequences of the corresponding influenza viruses – a type of data that is becoming ever more abundant.

Integration of HI data with sequence data

Together with Boris Shraiman, Trevor Bedford, Colin Russell and Rod Daniels, we have developed models and visualizations to directly integrate HI titer data with influenza virus sequences and phylogenies. This work was published in PNAS this week. Our models infer antigenic distance as additive contributions of branches in the tree or similarly as illustrated below:


The titer distance is modelled either as a sum of terms on the path between virus a and b (used to raise the serum), or as a sum of contributions associated with amino acid difference between the sequences. Both models are similar and describe the data well, for details have a look at the paper.

Visualization of antigenic data on the tree

We used the models learned from the HI titer data to allow interactive exploration and visualization of measured and predicted titers within nextflu. will continue to be updated, while will display the full data set available last summer.


Color indicates antigenic distance from the focal reference virus A/Victoria/361/2011 marked by the red cross-hair. The model on the right interpolates and smooths the data. The focal reference virus can be changed by clicking with the mouse on any other sera marked by grey boxes, upon which the tree coloring will be updated.

Predicting successful viruses

The ultimate goal of this project is to improve predictions of the composition of future influenza virus populations to optimize the vaccine match. We and others have developed methods to make such predictions solely based on sequence information. Here, we showed that successful strains tend to be antigenically advanced, but that blindly picking the most antigenically advanced strain often fails. To improve predictions, we need to find was to integrate antigenic information with other predictors of successful clades and carefully account for competition between clades.

Advertisements visualizes antigenic evolution of influenza viruses

In a new preprint by Trevor Bedford, Rodney Daniels, Colin Russell , Boris Shraiman, and myself, we show how antigenic evolution can be integrated with the phylogenetic tree of HA sequences.
Human immunity is the main driving force of the seasonal influenza viruses. Only when changing their antigenic properties, influenza viruses are able to re-infect previously infected humans. Mutation that prevent immune recognition often rapidly spread across the globe. Human antibodies typically recognize the tip of the HA trimer on the surface of the influenza virus and the amino acids at those positions change very often through time.

Mutations with antigenic effects highlighted on a HA structure
Mutations with antigenic effects highlighted on a HA structure

The antigenic properties of circulating influenza viruses are constantly monitored by the national influenza centres and the WHO collaborating centres for influenza using hemagglutination inhibition assays. These assays basically record how much an antiserum (obtained from a ferret) can be diluted and still recognise a virus — if the virus has changed antigenically relative to the serum, even high serum concentrations don’t inhibit the virus.

Results from such assays come in large tables of numbers quite removed from the molecular evolution of the HA protein. In the preprint, we show that antigenic distances behave similarly to sequence distances on the tree and can be explained as a sum of contributions associated with amino acid substitutions or contributions on branches of the tree.

To visualize antigenic evolution, we integrated HI titer data (mostly from the Worldwide Influenza Centre; in London and was generated by John McCauley, Rodney Daniels, and colleages) into nextflu. The site allows to selected a particular reference virus or vaccine strain and will color all viruses on the tree according to their measured or predicted antigenic similarity to the reference virus, see screen shot below. This integration makscreenshotes it easy to associate antigenic changes with genotypic changes and the dynamics of the corresponding clades.

In the manuscript, we further investigate to what extent antigenic changes are predictive of the composition of future influenza populations. Successful clades tend to be antigenically advanced, but a substantial fraction of antigenic advances fail to spread. The appeal of HI data in the context of prediction is the early detection of a novel variant that is suboptimally recognized by the vaccine. However, many such “hopefuls” often go extinct and reliable prediction using HI data has find ways to differentiate antigenic changes that are likely to be successful from those that compromise virus functionality.

Seasonal influenza in 2015 and future projections

Together with Trevor Bedford, we have put together an informal summary of recent patterns of seasonal influenza evolution. We use nextflu to explore which clades are on the rise or about to go extinct and discuss strains that likely dominate 2016. Read the full report on

Which flu strain will dominate 2015/2016?

In November, Boris Shraiman, Colin Russell and myself published a paper on predicting evolution. The method uses the shape of genealogical trees to spot expanding clades which will likely dominate the future population. We demonstrated the method by applying it to historical data of seasonal influenza A/H3N2 virus evolution and predicted the majority years well. However, those “predictions” were for past events. It is time to predict the future!

Building on two projects by Trevor Bedford to automatically construct influenza trees (augur) and visualize them using javascript (augur), I developed a tool for interactive exploration of seasonal influenza evolution and prediction.

The past 15 month of A/H3N2 visualized using our interactive tool. Isolates after the dates indicated in the panel are shown in grey, colors encode the genotype at 7 amino acid position with strong effects on antigenic properties. The clade designation is indicated in the right panel, big green dots correspond to recent vaccine strains.

The dynamics of A/H3N2 viruses over the last 3 years are illustrated in the above figure. There are two big clades that emerged almost 3 years ago and have been dominating A/H3N2 since (3C2 and 3C3 in WHO nomenclature). Of these, 3C3 is bigger and has dominated the 13/14 season. In the past 8 months, however, 3C2 has come back to life evolving a new subclade, 3C2a, while 3C3 has also evolved a new subclade, 3C3a. Importantly, these subclades differ at critical amino acids of HA1 from the precursor (F159Y, 3C2a in green, F159S, 3C3a in red) and are far from the current vaccine for the northern hemisphere (lower green dot). The updated H3N2 vaccine choice for the southern hemisphere (the upper green dot) is part of the 3C3a. The question now is whether 3C3a or 3C2a is taking over?

The interactive tool allows users to color the tree by our prediction using the local branching index we developed. Given the data up until 5 January 2015, our algorithm predicts 3C2a as the strain that expands and dominates the future (see screen shot below). To be concrete, the method predicts that viruses from 3C2a will be the dominant progenitors of the 2015/2016 northern hemisphere winter epidemic.

Our prediction for the 2015/2016 season as of Jan 05, 2015: 3C2a is most likely to take over, redder colors correspond to more rapid expansion. (Parameters tau = 0.0007 using data from 200 days prior to Jan 2015).


The tool also allows users to replay past evolution by ignoring all isolates past a certain date. Go ahead and explore the evolution of seasonal influenza A/H3N2 viruses.

Now out in eLife: Predicting evolution

Our paper (with Colin Russell and Boris Shraiman) on predicting evolution (see also this earlier post) now appeared in eLife. The method we developed predicts fitness of individuals, which in turn allows to predict likely progenitor sequences of future populations. Our method calculates a posterior distribution of fitness for each individual. The resulting ranking reliably predicts the ancestors of future populations of seasonal influenza. Since my previous post on this topic, we have developed a simple and robust approximation to the probabilistic model. This approximation is motivated by the intuitive insight that a very fit individual can develop into a growing clone.  Such an expanding subpopulation results in rapid branching in the tree. From this approximation, we developed a simple measure — the local branching index (LBI) — that it predicts future success almost as well as the full fitness inference algorithm. The LBI measures tree length in the neighborhood of each node in the tree, where neighborhood is defined via a weighing function that decreases exponentially with distance from the focal node. This is illustrated in the figure below. The only parameter of the LBI is the size of the relevant neighborhood, i.e., the radius of the shaded region in the illustration.

How the Local Branching Index (LBI) works: Consider the rooted tree on the left and the corresponding unrooted tree (the root is indicated by the black dot). The LBI of the node connecting the orange and red clade is the length of the tree in the shaded area. The more rapidly the tree is branching close to this node, the larger is the LBI.

Why does the LBI predict fitness?

The full fitness inference algorithm multiplies the propagators for each branch to obtain posterior distribution of fitness. The propagator of each additional downstream branch pushes the posterior towards higher fitness. The amount of this additional polarization increases with the length of the subtree of the branch. This polarizing effect of tree length is, however, forgotten over long times. The fitness propagators account for this loss of memory through equilibration to the stationary distribution of ancestral fitness. In a similar fashion, the LBI focusses on local tree length explicitly by exponential weighing with distance.

Connection to multiple-merger coalescents

Together with Oskar Hallatschek, we have shown earlier that the genealogical trees of rapidly adapting populations contain approximate multiple mergers and that the genealogies are asymptotically described by the Bolthausen-Sznitman Coalescent process. Brunet et al have linked these approximate multiple mergers to rare events when one individual happens to be substantially fitter than all others. It then generates a burst if offspring, which looks like a multiple merger in a reconstructed tree. The LBI measures the “burstiness” of the branching at a node. For a given number of offspring, the LBI is maximal for a star-like tree (i.e., a multiple merger), and minimal for a binary merger with no other branching happening for a long time.