Metastable States¶

Frank Noe frank.noe@fu-berlin.de
FU Berlin, Arnimallee 6, 14195 Berlin, Germany

Jan-Hendrik Prinz jan-hendrik.prinz@fu-berlin.de
FU Berlin, Arnimallee 6, 14196 Berlin, Germany

Summary: The perron-cluster cluster analysis (PCCA) exploit the structure of the eigenvectors in order to define the metastable (long-lived) sets of states of a Markov chain. PCCA+ and PCCA++ are advanced methods that find an optimal linear transform of the eigenvector coordinates into a probability simplex. In particular, PCCA++ ensures that all memberships of states to metastable sets are nonnegatives, and partitions of 1. Using PCCA+ or PCCA++ with a Markov model is a common approach to find long-lived states in molecular dynamics trajectories.

The protein folding model used here for illustration consists of only 8 states and is thus easy to comprehend. When building Markov models from clustered molecular dynamics data one often requires several thousands of states in order to approximate the system kinetics well. Network approaches have been developed to visualize the network of transitions arising from such a model [6], but especially when the network is dense, this is not straightforward. It is thus desirable to find en effective representation that communicates the essential properties of the kinetics. In this section we describe a way to cluster the large discrete state space into a few metastable sets that have the property that they capture the dynamics for long times before jumping to another set. Let us stress that the purpose of finding these sets is purely illustrative (e.g. for lumping fluxes, see Sec. lecture_tpt). For quantitatively calculating kinetic properties, the full Markov model should be used, as the approximation of the system’s kinetics will generally deteriorate when using a lumped Markov model [5][2][8].

Let us consider the coarse partition of state space $\Omega=\{C_{1},C_{2},...,C_{n}\}$ where each cluster $C_{i}$ consists of a set of states $S_{j}$. We are interested in finding a clustering that is maximally metastable. In other words, each cluster $C_{i}$ should represent a set of structures that the dynamics remains in for a long time before jumping to another cluster $C_{j}$. Thus, each cluster $C_{i}$ can be associated with a free energy basin.

We can understand the slow kinetics in terms of probability transport by the dominant eigenvectors of the transition matrix. Consequently, these dominant eigenvectors can also be used in order to decompose the system into metastable sets [9][10]. Consider the eigenvector corresponding to the slowest process in (refer to Figure in theory chapter) (yellow line): This eigenvector is almost a step function which changes from negative to positive values at the saddle point. When we take the value of this eigenvector in each state and plot it along one axis, we obtain Fig. 1a. Partitioning this line in the middle dissects state space into two the two most metastable states of the system (Fig. 1b). The two most metastable states exchange at a timescale given by the slowest timescale $t_{2}$. If we are interested in differentiating between smaller substates, we may ask for the partition into the three most metastable states. In this case we consider two eigenvectors simultaneously, $\mathbf{r}_{2}$ and $\mathbf{r}_{3}$. Plotting the coordinates in these eigenvalues for each state yields the triangle shown in Fig. 1c whose corners represent the kinetic centers of metastable states. Assigning each state to the nearest corner partitions state space into the three most metastable states (Fig. 1d) that exchange at timescales of $t_{3}$ or slower. The same partition can be done using three eigenvectors, $\mathbf{r}_{2}$, $\mathbf{r}_{3}$ and $\mathbf{r}_{4}$, yielding four metastable states exchanging at timescales $t_{4}$ and slower, and so on (Fig. 1e,f). Generally, it can be shown that when $n$ eigenvectors are considered, their coordinates lie in an $n$-dimensional simplex with $n+1$ corners called vertices which allow the dynamics to be partitioned into $n+1$ metastable sets [10][3].

Each of these partitionings is a valid selection in a hierarchy of possible decompositions of the system dynamics. Moving down this hierarchy means that more states are being distinguished, revealing more structural details and smaller timescales. For the system show in (refer to Figure in theory chapter), two to four states are especially interesting to distinguish. After four states there is a gap in the timescales ($t_{5}\ll t_{4}$) induced by a gap after the fourth eigenvalue ((refer to Figure in theory chapter) c). Thus, for a qualitative understanding of the system kinetics, it is not very interesting to distinguish more than four states. However, note that for quantitatively modeling the system kinetics, it is essential to maintain a fine discretization as the MSM discretization error will increase when states are lumped.

Figure 1: Metastable states of the one-dimensional dynamics (**refer to Figure in theory chapter**) identified by PCCA+. (a), (c), (e): Plot of the eigenvector elements of one, two, and three eigenvectors. The colors indicate groups of elements (and thus conformational states) that are clustered together. (b), (d), (f): Clustering of conformation space into two, three, and four clusters, spectively. Fig. 2 shows the metastable states of the protein folding model. Interestingly, there is no simple partition that splits unfolded and folded states. In the interemediate temperature case this is most closely the case as the unfolded state is a metastable state and separated from all other states with partial structure. The remaining space and the conformation space at other temperatures is clustered in a non-obvious manner. Sometimes these clusters are defined by the presence of particular structural elements (e.g. red cluster in the high-temperature case is characterized by having $c$ formed.

Figure 2: Metastable sets of the Folding Model

Acknowledgements:¶

The major part of this article was published in [4].

Citing PCCA:¶

The original PCCA method that clustered states based on the signs of the eigenvectors was introduced in [9]. Today, PCCA+ [1] or PCCA++ [7] that transform the eigenvector coordinates to a probability simplex from which the state memberships are computed.

Bibliography¶

[1]: P. Deuflhard and M. Weber: Robust Perron cluster analysis in conformation dynamics.. In: Linear Algebra Appl.. M. Dellnitz, S. Kirkland, M. Neumann and C. Schütte (editors). Elsevier, New York, 2005. 398C, (2005).

[2]: S. Kube and M. Weber: A coarse graining method for the identification of transition rates between molecular conformations. J. Chem. Phys. 126, 024103+ (2007).

[3]: Noé, F., Horenko, I., Schütte, C. and Smith, J. C.: Hierarchical Analysis of Conformational Dynamics in Biomolecules: Transition Networks of Metastable States. J. Chem. Phys. 126, 155102 (2007).

[4]: J.-H. Prinz, B. Keller and F. Noé: Probing molecular kinetics with Markov models: Metastable states, transition pathways and spectroscopic observables. Phys. Chem. Chem. Phys. 13, 16912-16927 (2011).

[5]: J.-H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schütte and F. Noé: Markov models of molecular kinetics: Generation and Validation. J. Chem. Phys. 134, 174105 (2011).

[6]: Rao, F. and Caflisch, A.: The Protein Folding Network. J. Mol. Bio. 342, 299-306 (2004).

[7]: S. Röblitz and M. Weber: Fuzzy spectral clustering by PCCA+: application to Markov state models and data classification. Adv. Data Anal. Classif. 7, 147-179 (2013).

[8]: M. Sarich, F. Noé and C. Schütte: On the approximation quality of Markov state models. SIAM Multiscale Model. Simul. 8, 1154-1177 (2010).

[9]: Schütte, C., Fischer, A., Huisinga, W. and Deuflhard, P.: A Direct Approach to Conformational Dynamics based on Hybrid Monte Carlo. J. Comput. Phys. 151, 146-168 (1999).

[10]: Weber, M.: Improved Perron cluster analysis. ZIB Report 03-04, (2003).