Frank Noe
frank.noe@fu-berlin.de
FU Berlin,
Arnimallee 6, 14195 Berlin, Germany
Jan-Hendrik Prinz
jan-hendrik.prinz@fu-berlin.de
FU Berlin,
Arnimallee 6, 14196 Berlin, Germany
Summary: The perron-cluster cluster analysis (PCCA) exploit the structure of the eigenvectors in order to define the metastable (long-lived) sets of states of a Markov chain. PCCA+ and PCCA++ are advanced methods that find an optimal linear transform of the eigenvector coordinates into a probability simplex. In particular, PCCA++ ensures that all memberships of states to metastable sets are nonnegatives, and partitions of 1. Using PCCA+ or PCCA++ with a Markov model is a common approach to find long-lived states in molecular dynamics trajectories. |
Let us consider the coarse partition of state space $\Omega=\{C_{1},C_{2},...,C_{n}\}$ where each cluster $C_{i}$ consists of a set of states $S_{j}$. We are interested in finding a clustering that is maximally metastable. In other words, each cluster $C_{i}$ should represent a set of structures that the dynamics remains in for a long time before jumping to another cluster $C_{j}$. Thus, each cluster $C_{i}$ can be associated with a free energy basin.
We can understand the slow kinetics in terms of probability transport by the dominant eigenvectors of the transition matrix. Consequently, these dominant eigenvectors can also be used in order to decompose the system into metastable sets [9][10]. Consider the eigenvector corresponding to the slowest process in (refer to Figure in theory chapter) (yellow line): This eigenvector is almost a step function which changes from negative to positive values at the saddle point. When we take the value of this eigenvector in each state and plot it along one axis, we obtain Fig. 1a. Partitioning this line in the middle dissects state space into two the two most metastable states of the system (Fig. 1b). The two most metastable states exchange at a timescale given by the slowest timescale $t_{2}$. If we are interested in differentiating between smaller substates, we may ask for the partition into the three most metastable states. In this case we consider two eigenvectors simultaneously, $\mathbf{r}_{2}$ and $\mathbf{r}_{3}$. Plotting the coordinates in these eigenvalues for each state yields the triangle shown in Fig. 1c whose corners represent the kinetic centers of metastable states. Assigning each state to the nearest corner partitions state space into the three most metastable states (Fig. 1d) that exchange at timescales of $t_{3}$ or slower. The same partition can be done using three eigenvectors, $\mathbf{r}_{2}$, $\mathbf{r}_{3}$ and $\mathbf{r}_{4}$, yielding four metastable states exchanging at timescales $t_{4}$ and slower, and so on (Fig. 1e,f). Generally, it can be shown that when $n$ eigenvectors are considered, their coordinates lie in an $n$-dimensional simplex with $n+1$ corners called vertices which allow the dynamics to be partitioned into $n+1$ metastable sets [10][3].
Each of these partitionings is a valid selection in a hierarchy of possible decompositions of the system dynamics. Moving down this hierarchy means that more states are being distinguished, revealing more structural details and smaller timescales. For the system show in (refer to Figure in theory chapter), two to four states are especially interesting to distinguish. After four states there is a gap in the timescales ($t_{5}\ll t_{4}$) induced by a gap after the fourth eigenvalue ((refer to Figure in theory chapter) c). Thus, for a qualitative understanding of the system kinetics, it is not very interesting to distinguish more than four states. However, note that for quantitatively modeling the system kinetics, it is essential to maintain a fine discretization as the MSM discretization error will increase when states are lumped.
The major part of this article was published in [4].
The original PCCA method that clustered states based on the signs of the eigenvectors was introduced in [9]. Today, PCCA+ [1] or PCCA++ [7] that transform the eigenvector coordinates to a probability simplex from which the state memberships are computed.
[1]: P. Deuflhard and M. Weber: Robust Perron cluster analysis in conformation dynamics.. In: Linear Algebra Appl.. M. Dellnitz, S. Kirkland, M. Neumann and C. Schütte (editors). Elsevier, New York, 2005. 398C, (2005).
[2]: S. Kube and M. Weber: A coarse graining method for the identification of transition rates between molecular conformations. J. Chem. Phys. 126, 024103+ (2007).
[3]: Noé, F., Horenko, I., Schütte, C. and Smith, J. C.: Hierarchical Analysis of Conformational Dynamics in Biomolecules: Transition Networks of Metastable States. J. Chem. Phys. 126, 155102 (2007).
[4]: J.-H. Prinz, B. Keller and F. Noé: Probing molecular kinetics with Markov models: Metastable states, transition pathways and spectroscopic observables. Phys. Chem. Chem. Phys. 13, 16912-16927 (2011).
[5]: J.-H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schütte and F. Noé: Markov models of molecular kinetics: Generation and Validation. J. Chem. Phys. 134, 174105 (2011).
[6]: Rao, F. and Caflisch, A.: The Protein Folding Network. J. Mol. Bio. 342, 299-306 (2004).
[7]: S. Röblitz and M. Weber: Fuzzy spectral clustering by PCCA+: application to Markov state models and data classification. Adv. Data Anal. Classif. 7, 147-179 (2013).
[8]: M. Sarich, F. Noé and C. Schütte: On the approximation quality of Markov state models. SIAM Multiscale Model. Simul. 8, 1154-1177 (2010).
[9]: Schütte, C., Fischer, A., Huisinga, W. and Deuflhard, P.: A Direct Approach to Conformational Dynamics based on Hybrid Monte Carlo. J. Comput. Phys. 151, 146-168 (1999).
[10]: Weber, M.: Improved Perron cluster analysis. ZIB Report 03-04, (2003).