Frank Noe
frank.noe@fu-berlin.de
FU Berlin,
Arnimallee 6, 14195 Berlin, Germany
Summary: Transition path theory allows us to compute the flux network from a set of educt states $A$ to a set of product states $B$ from an underlying Markov process (continuous, Markov jump process, or Markov chain / Markov model). The main purpose of the flux is that it can elucidate mechanisms of transitions, e.g. what are the dominant pathways of folding or binding of a macromolecule. From the flux, a number of quantities can be computed: pathway-based expectation values, the total flux, the mean first passage time or rate from $A$ to $B$. The flux can also be decomposed into the full set of $A\rightarrow B$ transition pathways. |
Heterogeneity in folding pathways has been found in a number of experimental studies. For example, using time-resolved FRET with four different intramolecular distances, it was found in Barstar [27] that there are multiple folding routes, and that different routes dominate under different folding conditions. Moreover, changing the denaturant can change the dominant pathway [18]. Extensive mutational analysis of the seven ankyrin sequence repeats of the Notch ankyrin repeat domain has revealed its funnel landscape [6][20][28]. Some folding is sequential, as in FynSH3[17], cytochrome [10], T4 lysozyme [7], and Im7 [11], and some folding is parallel, as in cytochrome C [13] and HEW lysozyme [19].
Formally, the question about folding pathways boils down to the following: Let $A$ and $B$ be two subsets of state space, defined so as to specify the transition process one wants to investigate. For example, $A$ may correspond to the strongly denatured set of sets while $B$ is the metstable set around the crystal structure when known [24]. All remaining states are unassigned “intermediate” states $I$. What is the probability distribution of the trajectories leaving $A$ and continuing on to $B$? I.e., what is the typical sequence of $I$ states used along the transition pathways?
When an MSM is already available, the information of transition pathways is easily accessible via Transition Path Theory, which is explained below. Transition path theory has been introduced in general, and worked out for some continuous dynamics in [33]; see also [CiteEricsReview] for a review. In [21], TPT has been worked out for Markov jump processes (i.e. a Markov process using rate matrices). [4] contains expressions for the reactive flux of $A\rightarrow B$ pathways that are identical to the TPT expressions for rate matrices with detailed balance. [24] has worked out the TPT expressions for discrete-time Markov chains and adds some analysis algorithms.
Transition path theory is related to Transition path sampling (TPS) in the sense that both are trying to generate statistical information about the ensemble of $A\rightarrow B$ pathways. TPS is a direct approach to sampling pathways directly [5] and could in principle be used to sample folding pathways. However, in TPS the sampled trajectories are in practice of limited length and it is thus unpractical to use TPS when the intermediate states $I$ contain metastabilities. One can run multiple TPS-samplings between pairs of metastable states after having identified them [32].
We partition our state space into three sets: $S=\{A,B,I\}$ with: $$\begin{aligned} A & & \mathrm{reactant}\:\mathrm{states}\\ B & & \mathrm{product}\:\mathrm{states}\\ I & & \mathrm{intermediate}\:\mathrm{states} \:\:\:\:(1)\end{aligned}$$ The essential ingredient required to compute the statistics of transition pathways is the committor probability $q_{i}^{+}$. $q_{i}^{+}$ is the probability when being at state $i$, the system will reach the set $B$ next rather than $A$ [31][9][5]. In protein folding contexts, it is the probability of folding [9]. By definition, all states in $A$ have $q_{i}^{+}=0$ while all states in $B$ have $q_{i}^{+}=1$. Using these definitions in the committor equations (see Sec. basics_hitting_probabilities), we can derive the linear system of equations to compute the committor probabilities of the intermediate states $I$ [24]: $$-q_{i}^{+}+\sum_{k\in I}T_{ik}q_{k}^{+}=-\sum_{k\in B}T_{ik}. \:\:\:\:(2)$$ For intermediate states, the committor gradually increases from $A$ to $B$. The committor is a coordinate that quantifies the progress of the $A\rightarrow B$ reaction, i.e. how close a given intermediate state is to $B$ dynamically. The backward committor can be computed from a similar linear system, following [24]: $$-\pi_{i}q_{i}^{-}+\sum_{k\in I}\pi_{k}T_{ki}q_{k}^{-}=-\sum_{k\in A}\pi_{k}T_{ki}. \:\:\:\:(3)$$ Assuming that the transition matrix $\mathbf{T}$ fulfills detailed balance, we can simply compute it as: $$q_{i}^{-}=1-q_{i}^{+} \:\:\:\:(4)$$ for all states $i$.
A main result of transition path theory is the reactive flux. Consider the probability flux between two states $i$ and $j$, given by $\pi_{i}T_{ij}$ (absolute probability of finding the system at this transition). We are only interested in trajectories that successfully move from $A$ to $B$ without recurring to $A$ beforehand. The flux associated to these reactive trajectories only is given by multiplying the flux by the probability to come from $A$ and to move on to $B$: $$f_{ij}^{AB}=\left\{ \begin{array}{cc} \pi_{i}q_{i}^{-}T_{ij}q_{j}^{+} & i\neq j\\ 0 & i=j \end{array}\right.. \:\:\:\:(5)$$ This flux is the quantity that could be obtained directly from a converged TPS sampling by counting transitions of the reactive path ensemble. However, we further want to remove contributions that come from recrossings or detours. For example, a trajectory that would jump on its way from $A$ to $B$ multiple times between two substates $i$ and $j$ would produce an increase in the flux $i\rightarrow j$ and the backward flux $j\rightarrow i$. However, we only want to consider a single transition per pathway and thus define the net flux, given by: $$f_{ij}^{+}=\max\{0,\: f_{ij}-f_{ji}\}. \:\:\:\:(6)$$ If $\mathbf{T}$ fulfills detailed balance, using Eq. (4) in Eq. (5) and when ordering states along the reaction coordinate $q_{i}^{+}$ such that $q_{i}^{+}\leq q_{j}^{+}$, the reactive flux is given by: $$f_{ij}^{+}=\pi_{i}T_{ij}(q_{j}^{+}-q_{i}^{+}). \:\:\:\:(7)$$ $f_{ij}^{+}$ defines the net flux and is a network of fluxes leaving states $A$ and entering states $B$ (see Fig. 1). The form (7) is rather insightful because we can associate its quantities with Ohm’s law in an electric network where $f_{ij}^{+}$ corresponds to the current, the commitor corresponds to a potential, and the commitor difference $q_{j}^{+}-q_{i}^{+}$ thus to a voltage, and the quantity $\pi_{i}T_{ij}$ corresponds to a conductivity.
Flux conservation (Kirchhoff’s 1st law): The reactive flux network fulfills Kirchhoff’s current rule, i.e. for every intermediate state $i$, the input flux equals the output flux: $$\sum_{j\in S}(f_{ij}^{AB}-f_{ji}^{AB})=0\:\:\:\:\forall i\in I \:\:\:\:(8)$$ This is easy to show: $$\begin{aligned} \sum_{j\in S}(f_{ij}^{AB}-f_{ji}^{AB}) & =\pi_{i}q_{i}^{-}\sum_{j\neq i}T_{ij}q_{j}^{+}-q_{i}^{+}\sum_{j\neq i}\pi_{j}q_{j}^{-}T_{ji}\\ & =\pi_{i}q_{i}^{-}\sum_{j\neq i}T_{ij}q_{j}^{+}-\pi_{i}q_{i}^{+}\sum_{j\neq i}q_{j}^{-}\tilde{T}_{ij} \:\:\:\:(9)\end{aligned}$$ where we have used the backward propagator $\tilde{T}_{ij}=\frac{\pi_{j}}{\pi_{i}}T_{ji}$. Using the commitor equations (Sec. basics_hitting_probabilities), we obtain: $$\begin{aligned} \sum_{j\in S}(f_{ij}^{AB}-f_{ji}^{AB}) & = & \pi_{i}q_{i}^{-}(q_{i}^{+}-T_{ii}q_{i}^{+})-\pi_{i}q_{i}^{+}(q_{i}^{-}-\tilde{T}_{ii}q_{i}^{-})\\ & = & \pi_{i}q_{i}^{-}q_{i}^{+}(1-T_{ii})-\pi_{i}q_{i}^{-}q_{i}^{+}(1-T_{ii})\\ & = & 0 \:\:\:\:(10)\end{aligned}$$ Kirchhoff’s 2nd law (voltages sum to 0 over a circle) is automatically fulfilled, because the voltage present voltage is derived from a potential difference.
Total flux: Flux is not conserved for states in sets $A$ and $B$. Set $A$ has outflux, but no influx because $q_{j}^{+}=0$ for all states $j\in A$, and therefore set $A$ is the flux producer of the network. Likewise, set $B$ has influx but no outflux, because $q_{i}^{-}=0$ for all states $i\in B$, and therefore set $B$ is the flux consumer of the network. As a consequence, the outflux from $A$ and the influx to $B$ must be equal, and are called total flux $F$ of the transition $A\rightarrow B$: $$F=\sum_{i\in A}\sum_{j\notin A}\pi_{i}T_{ij}q_{j}^{+}=\sum_{i\notin B}\sum_{j\in B}\pi_{i}q_{i}^{-}T_{ij} \:\:\:\:(11)$$ The value of $F$ gives the expected number of observed $A\rightarrow B$ transitions per time unit $\tau$ that an infinitely long trajectory would produce. However, such a trajectory must cycle between $A$ and $B$, and the the mean cycling time $t_{A\leftrightarrow B}=F^{-1}$ is dividing in the mean time $t_{A\rightarrow B}$ required for the forward reaction $A\rightarrow B$, and the mean time $t_{B\rightarrow A}$ required for the backward reaction $B\rightarrow A$. Therefore, if we are interested in the transition rate, i.e. the number of $A\rightarrow B$ transitions given that we start in set $A$, we have to count only the forward trajectory segments: $$\begin{aligned} t_{A\rightarrow B} & = & t_{A\leftrightarrow B}\mathbb{P}(A\rightarrow B)\\ & = & F^{-1}\sum_{i=1}^{n}\pi_{i}q_{i}^{-1} \:\:\:\:(12)\end{aligned}$$ where $\sum_{i=1}^{n}\pi_{i}q_{i}^{-1}$ is the probability that the trajectory is in its “forward” part. Consequently, the transition rate $k_{AB}=t_{A\rightarrow B}^{-1}$ is given by (see [24] for derivation): $$k_{AB}=F\left/\sum_{i=1}^{m}\pi_{i}q_{i}^{-}\right.. \:\:\:\:(13)$$ Both the flux $F$ and the rate $k_{AB}$ correspond to a rate per time units of $\tau$, where $\tau$ is the time step of the transition matrix.
Note that all states that trap the trajectory for some time will reduce $k_{AB}$. The effect of these traps is properly accounted for in the folding flux, even if they do not contribute to productive pathways.
Since the number of $n$ states used to construct a Markov model is often very large, it is convenient for illustration purposes to compute the net flux of $A\rightarrow B$ trajectories amongst only a few coarse sets that are of specific interest. We consider a coarse partition of state space $S=\{C_{1},C_{2},...,C_{n}\}$, which may be based on a decomposition into metastable states as described in Sec. lecture_pcca, or another partition that the user defines e.g. based on order parameters of interest. We make the restriction, however, that this decomposition preserves the boundaries of sets $A$, $B$ and $I$, i.e. $A$ and $B$ are either identical to individual $C_{i}$, or to a collection of multiple $C_{i}$. The coarse-grained flux between two sets is then given by: $$F_{ij}=\sum_{k\in C_{i},l\in C_{j}}f_{kl}. \:\:\:\:(14)$$ and the net flux by $$F_{ij}^{+}=\max\{0,\: F_{ij}-F_{ji}\}. \:\:\:\:(15)$$ We note a technicality here: the second step of again removing backfluxes to obtain a coarse-grained net flux is necessary only if the clusters used do not partition state space along the isocommitor surfaces. Thus it may be desirably to use a partition that groups states with similar committor values.
The flux network can be decomposed into pathways from $A\rightarrow B$. When the dynamics are reversible, then the flux can be completely decomposed into such $A\rightarrow B$ pathways and no cycles will remain. Consider a simple and direct pathway from $A$ to $B$ consisting of $n_{w}$ nodes $$w=(i_{1}\in A,\: i_{2},\:...,\: i_{n_{w}-1},\: i_{n_{w}}\in B). \:\:\:\:(16)$$ Simple means that there is no cycle in the path, i.e. no state occurs twice in $w$. Direct means that intermediate states $i_{2},...,i_{n_{w}-1}$ do not lie in either $A$ or $B$. Along each of its edges, say $i_{l}\rightarrow i_{l+1}$, the flux network can carry a flux of up to $f_{i_{l}i_{l+1}}^{+}$. Thus, the capacity or flux of the pathway is given by the minimum of these fluxes: $$c(w)=\min\{f_{i_{l}i_{l+1}}^{+}\mid l=1...n_{w}-1\}. \:\:\:\:(17)$$ The transition/edge at which this minimum flux is assumed is called the bottleneck: $$(b_{1},b_{2})=\arg\min_{(i,j)\in w}\{f_{ij}^{+}\} \:\:\:\:(18)$$ A pathway decomposition consists of choosing a pathway $P_{1}$, and then removing its flux $f(P_{1})$ from the flux along all the edges of $P_{1}$. This may be repeated until the total flux $F$ has been subtracted and the network is thus free of of $A\rightarrow B$ pathways. Note that while the flux network is unique, such a decomposition is not unique, because one may choose different strategies to select pathways. Nevertheless pathway decompositions are useful in at least the following aspects:
The strongest pathway, i.e. the pathway whose minimum flux $f(P)$ is largest of all pathways, i.e. the path with the largest bottleneck, is of special interest. Especially so, if $f(P)$ is not much smaller than the total flux $F$. One reasonable way to perform a pathways decomposition is to first remove the strongest pathway, then remove the strongest pathway of the remaining network, and so on [22]. This decomposition is useful to estimate how many $A\rightarrow B$ are necessary to obtain a certain percentage of the flux [24].
Algorithm 1 sketches how to compute a pathway of maximum capacity from a graph $G$, whose edges are defined using the net flux $f^{+}$.
Algorithm 1:
BestPath($G,A,B$) finds a maximum capacity pathway in the graph
1. Determine maximal bottleneck $(b_{1},b_{2})$ in G (bottleneck of the path with maximum capacity) 2. Decompose G into L and R, which are the parts of $G$ “left” of $b_{1}$ (nodes $\{i:q_{i}^{+}\le q_{b_{1}}^{+}\}$) and “right” of $b_{2}$ (nodes $\{i:q_{i}^{+}\ge q_{b_{2}}^{+}\}$) 3. $w_{L}=\left{ \begin{array}{cc} b{1} & if\: b{1}\in A\ BestPath(L,A,{b_{1}}) & else \end{array}\right.$ 4. $w_{R}=\left{ \begin{array}{cc} b{2} & if\: b{2}\in B\ BestPath(R,{b_{2}},B) & else \end{array}\right.$ 5. return ($w_{L}$,$w_{R}$) |
Pathway decomposition is useful to quantitatively evaluate mechanisms of the $A\rightarrow B$ transition. For example, if the intermediate states $I$ consist of subsets $I_{1}$ and $I_{2}$ , and we are interested in computing which of these subsets is visited first during the $A\rightarrow B$ reaction, i.e. whether the pathway $A\rightarrow I_{1}\rightarrow I_{2}\rightarrow B$ or $A\rightarrow I_{2}\rightarrow I_{1}\rightarrow B$ is prefered. The probability to go to $I_{1}$ first is equal to the fraction of the capacities of pathways that go to $I_{1}$ first. In general, if we consider some function of an $A\rightarrow B$ pathway $w$, $g(w)$ (e.g. the function that assigns 1 to a pathway when it contains a specific sequence of interest), then the expectation of $g(w)$ is given by the weighted path sum: $$\mathbb{E}_{w}[g]=\sum_{i}c(w_{i})g(w_{i}) \:\:\:\:(21)$$
Example 1: Protein folding model
Fig. 1 shows the committor (color-coding) for the protein folding model: At low temperatures, the committor changes rapidly after leaving the unfolded state and forming the first structure elements. At high temperatures, it changes rapidly when entering the full-structured native state. At both temperatures, the folding process has thus essentially two-state character, although with different definitions of the two states. At intermediate temperatures, the commitor increases gradually from the unfolded to the native state, indicating that it is important to consider the intermediate states in the folding process.
Coarse-graining generates a simplified view but correct on the folding flux. The actual dynamics, represented by the Markov model $\mathbf{T}(\tau)$ cannot easily be coarse-grained without loosing information, and no statement is made here about the transition probability between two coarse sets $C_{i}$ and $C_{j}$. Fig. 2 shows the coarse-grained fluxes for the metastable states of the protein folding model at different temperatures.
The pathway decomposition is usually done on the original flux network. It can also be done on a coarse-grained flux network, provided that the coarse-graining does not lump states which need to be distinguished in order to calculate the probabilities of the events investigated.
Example 2: PinWW folding pathways
In order to illustrate the utility of our approach for studying folding mechanisms, the folding dynamics of the PinWW domain [15] is studied here. The text and figures from this section have been published in [24].
180 molecular dynamics simulations were started, 100 from near-native, 80 from different denatured conformations, and run for 115 ns each at a temperature of 360 K. The simulations were conducted with the GROMACS program [29] using explicit SPC solvent, the GROMOS96 force field [30] and the reaction field method for computing nonbonded forces. The simulation setup is described in detail in the Supplementary Information. The simulated structures were aligned onto the native structure and then clustered finely into 1734 kinetically connected and well-populated clusters. A transition matrix ${\bf \mathbf{T}(\tau)}$ was constructed by counting transitions between these clusters at a lagtime of $\tau$=2 ns. It was verified that ${\bf \mathbf{T}(\tau)}$ is a good model for the long-time kinetics by conducting a Chapman-Kolmogorow test (see Supplementary Information of [24]). All properties computed from the Markov model are associated with statistical uncertainty resulting from that fact than only a finite amount of simulation data has been used to construct the model. These uncertainties are computed using a Bayesian inference method described in [23], the details are given in the Supplementary Information of [24]. The slowest timescale, corresponding to the second-largest eigenvalue of the Markov model, was 26 $\mu s$ (confidence intervals 8 - 78 $\mu s$), compared to 13.2 $\mu s$ measured in a temperature-jump experiment [15].
In order to study the folding mechanism, a folded set, $B$, was defined to be the set of clusters with average backbone root mean square difference to the X-ray structure of less than 0.3 nm. The denatured set, $A$, was defined to be the set of all clusters with little $\beta$-structure (having a mean of \<3 h-bonds in hairpin 1 which has 6 h-bonds in the native state and \<1 h-bonds in hairpin 2 which has 3 h-bonds in the native state). Based on these definitions and the transition matrix $\mathbf{T}(\tau)$ between the 1734 clusters, the committor probabilities and the folding flux were computed as described in the Theory section.
In order to obtain a view of the sequence of events that is unbiased by defining reaction-coordinates, the folding pathways must be considered individually. Therefore, the folding flux was decomposed into individual pathways (see Theory section) and for each of them the times when hairpin 1 or 2 forms and remains stable were computed. “Formation” was defined as having 80% of the average number of hydrogen bonds that are present in the native state, but variations of this threshold did not change the results qualitatively. The probability that hairpin 1 forms before hairpin 2 was computed by calculating the fraction of individual path fluxes in which this occurred (see previous section): In 30% of the folding trajectories, hairpin 1 forms before hairpin 2 (confidence interval 18% - 34%), and in 70% the other way around. Thus, there is no unique mechanism in terms of the order of secondary structure formation, which is in qualitative agreement with a structural interpretation of mutational $\Phi$-values for the pin WW domain [34].
In order to visualize the “essential folding pathways”, coarse conformational sets were defined onto which the folding flux was projected (see Theory section). We employed a definition of 50 sets that separate the most slowly converting (“metastable”) parts of state space. The number of sets can be chosen by fixing a timescale of interest (here 100 ns), then the number of metastable sets are given by the number of implied timescales of the transition matrix slower than that timescale of interest. The definition of metastable states was obtained by PCCA+. Fig. 3 shows the network of the 70% most relevant pathways, which involves only 21 of these 50 conformational sets. The remaining 30% of the flux is mainly in small pathways between the structures shown in Fig. 3 and is omitted here for clarity of the visualization. The 29 of the 50 conformational sets not shown in the figure are only weakly involved in the $A\rightarrow B$ flux.
The denatured set (A) consists of mostly globular structures. No completely stretched structures are observed in the simulation. The figure suggests a relatively large number of kinetically separated unfolded states. Note that this does not necessarily mean that there are large energy barriers between them, but only that the energy barriers between them are not smaller than the ones that are overcome when proceeding towards the intermediate states.
The coarse-grained folding flux suggests that there is a large number of unfolded states and early intermediates that narrow down when coming closer to the native state. The picture reemphasizes the existence of many structurally different parallel pathways. Pathways where hairpin 1 forms first are shown on the right, pathways where hairpin 2 forms first on the left. It is apparent that the pathways in which hairpin 1 forms first also include some partially helical structures formed by the sequence that will later become the third $\beta$ strand.
Fig. 3 also indicates whether a set of structures with hairpins formed has the same register pattern as in the native state (0) or is register-shifted by one or two residues (1,2). Most of the productive folding pathways proceed from no hairpins over on-register intermediates to the native state. Some of the folding-efficient structures have the smaller hairpin 2 register-shifted, but none of them have hairpin 1 register-shifted. A special case is a structure which has both chain ends curled in such a way that they are on-register near the termini, but register-shifted by 2 residues in between (indicated by “0-2”).
For the 50 coarse states defined here, the coarse flux network was decomposed into individual pathways according to decreasing flux as described in the Theory section. Fig. 4 top shows the cumulative flux depending on the number of pathways, showing that about 3-5 pathways are needed to carry 50% of the total flux and about 11-20 pathways are needed to carry 90% of the total flux. Although the absolute number of parallel pathways depends on the number of states one defines, i.e. on the amount of coarse-graining, the structural differences between the 50 sets defined here implies a remarkable degree of parallelness of the folding mechanism in the present system.
The six pathways which carry most of the total flux are depicted in Fig. 4 bottom, highlighting that there are routes where hairpin 1 forms first (paths 3,4,6), where hairpin 2 forms first (paths 1,2), and where there is a more or less concurrent formation of both (path 5). Note that the percentages of individual pathways given in Fig. 4 should not be misinterpreted as the absolute probability of finding this exact sequence of conformations, as these pathways do, e.g., not consider the possibility of unproductive recrossing events or changing between different paths. However, they do provide the relative probabilities of choosing each one folding pathway from the ensemble of productive folding pathways. For example, pathway 1 is nearly twice as probable as pathway 6.
Interestingly, there are three metastable sets that contribute almost no folding flux (\<5%), but the system still spends a significant fraction of the time in them (stationary probability 18% with confidence intervals 3%-45%). These “trap” states, depicted in Fig. 3, have almost full $\beta$ content, but the hairpins are register shifted with respect to the native structure, in particular at hairpin 1 which is not fully shifted in any of the intermediates that significantly contribute to the folding flux. The effective flux reveals that these traps are accessible from different metastable states, all of which already have a register shift in hairpin 2 or a partial register shift in hairpin 1 (see Fig. 3).
Removing the trap states from the Markov model increases the absolute folding rate $k_{AB}$ (Eq. 13) by almost a factor of 2, showing that there is a significant probability that the system gets stuck in one of the trap states for some time.