Now out in eLife: Predicting evolution

Our paper (with Colin Russell and Boris Shraiman) on predicting evolution (see also this earlier post) now appeared in eLife. The method we developed predicts fitness of individuals, which in turn allows to predict likely progenitor sequences of future populations. Our method calculates a posterior distribution of fitness for each individual. The resulting ranking reliably predicts the ancestors of future populations of seasonal influenza. Since my previous post on this topic, we have developed a simple and robust approximation to the probabilistic model. This approximation is motivated by the intuitive insight that a very fit individual can develop into a growing clone.  Such an expanding subpopulation results in rapid branching in the tree. From this approximation, we developed a simple measure — the local branching index (LBI) — that it predicts future success almost as well as the full fitness inference algorithm. The LBI measures tree length in the neighborhood of each node in the tree, where neighborhood is defined via a weighing function that decreases exponentially with distance from the focal node. This is illustrated in the figure below. The only parameter of the LBI is the size of the relevant neighborhood, i.e., the radius of the shaded region in the illustration.

How the Local Branching Index (LBI) works: Consider the rooted tree on the left and the corresponding unrooted tree (the root is indicated by the black dot). The LBI of the node connecting the orange and red clade is the length of the tree in the shaded area. The more rapidly the tree is branching close to this node, the larger is the LBI.

Why does the LBI predict fitness?

The full fitness inference algorithm multiplies the propagators for each branch to obtain posterior distribution of fitness. The propagator of each additional downstream branch pushes the posterior towards higher fitness. The amount of this additional polarization increases with the length of the subtree of the branch. This polarizing effect of tree length is, however, forgotten over long times. The fitness propagators account for this loss of memory through equilibration to the stationary distribution of ancestral fitness. In a similar fashion, the LBI focusses on local tree length explicitly by exponential weighing with distance.

Connection to multiple-merger coalescents

Together with Oskar Hallatschek, we have shown earlier that the genealogical trees of rapidly adapting populations contain approximate multiple mergers and that the genealogies are asymptotically described by the Bolthausen-Sznitman Coalescent process. Brunet et al have linked these approximate multiple mergers to rare events when one individual happens to be substantially fitter than all others. It then generates a burst if offspring, which looks like a multiple merger in a reconstructed tree. The LBI measures the “burstiness” of the branching at a node. For a given number of offspring, the LBI is maximal for a star-like tree (i.e., a multiple merger), and minimal for a binary merger with no other branching happening for a long time.


Sampling beta coalescent trees

Beta coalescent processes are a one parameter family (alpha) of Lambda-coalescents that include the standard Kingman coalescent (alpha=2) and the Bolthausen-Sznitman coalescent (alpha=1), which has recently been shown to describe genealogies in rapidly adapting populations. The role of the beta coalescent processes at intermediate alpha are not as well understood.

One would often like to compare data or simulations to trees generated by a particular beta coalescent processes. To this end, we have written a small python program that generates beta coalescent trees for any 1<=alpha<=2. These trees are returned as BioPython trees.

In addition, we have written a wrapper that repeatedly generated trees and uses these trees to calculate the site frequency spectrum (SFS). The SFS (histogram of allele frequencies) is an informative summary statistic that changes characteristically as alpha varies from 1 to 2. The scripts are at, including a few examples.