Frank Noe
frank.noe@fu-berlin.de
FU Berlin,
Arnimallee 6, 14195 Berlin, Germany
Summary: Implied timescales refers to the relaxation timescales of a molecule implied by a Markov model transition matrix estimated at a lag time $\tau$. Since $\tau$ is a model parameter but the relaxation timescales is a physical property of the simulated system, it is expected that the implied timescales are independent of $\tau$. This is not the case for very small values of $\tau$, where the eigenvector approximation error and the spectral error of the Markov model are large; nor it is the case for very large values of $\tau$, where the estimation is dominated by numerical errors. For a good state space discretization, however, an intermediate range of $\tau$ values can be found in which the largest implied timescales should be approximately constant. Computing the implied timescales and plotting them is thus a useful tool to decide for a lag time $\tau$ at which the Markov model will be estimated. Testing whether the implied timescales are constant over a range of $\tau$ within statistical error is also a weak version of making a Chapman-Kolmogorow test. |
If the discretized dynamics are not exactly Markovian, how do the implied timescales behave in $k\tau_{0}$? In [11], we have derived a tight bound for the slowest relaxation rate $\kappa_{2}=t_{2}^{-1}$. From this bound follows that the relative error of the largest implied timescale is bounded by: $$\begin{aligned} \frac{\hat{t}_{2}-t_{2}}{t_{2}} & \le & \frac{\ln\frac{1}{\alpha}}{\frac{\tau}{t_{2}}+\ln\frac{1}{\alpha}} \:\:\:\:(3)\end{aligned}$$ Where $\alpha=\langle\psi_{2},\hat{\psi}_{2}\rangle_{\mu}$ is the discretization quality with respect to the second propagator eigenfunction. In simple words, if $\alpha=1$, the state space discretization resolves the slowest process perfectly, while if $\alpha=0$, the slowest process is completely concealed by the discretization. For the limit $\frac{\tau}{t_{2}}\gg\ln\frac{1}{\alpha}$, i.e. when either the state space discretization is very good, or the lagtime is very large, the error becomes $$\begin{aligned} \frac{\hat{t}_{2}-t_{2}}{t_{2}} & \lessapprox & \frac{t_{2}}{\tau}\ln\frac{1}{\alpha}. \:\:\:\:(4)\end{aligned}$$ We observe that: (1) the implied timescales are well approximated if the state space discretization is very good ($\alpha=1\rightarrow\ln\alpha^{-1}=0$), (2) the implied timescale estimate converges towards the true implied timescale when the lagtime $\tau$ is increased. Unfortunately this convergence is very slow, with $\tau^{-1}$. Following [6], we can derive that also for all other relaxation processes, the corresponding implied timescale converges to its true value when the discretization quality increases or the lagtime $\tau$ increases:
$$\lim_{\tau\rightarrow\infty}\left|t_{j}(\tau)-\hat{t}_{j}(\tau)\right|=0, \:\:\:\:(5)$$and also
$$\lim_{\delta_{j}\rightarrow0}\left|t_{j}(\tau)-\hat{t}_{j}(\tau)\right|=0. \:\:\:\:(6)$$where $\delta_{j}=1-\alpha_{j}$ is the projection error of the state space discretization with respect to the $j$th dynamical process. This fact has been empirically observed in many previous studies [14][13][4][5][2][9][10].
From the mathematical facts above, the following rationale to assess the quality of the state space discretization follows:
For a given state space discretization, estimate a series of transition matrices $\mathbf{T}(\tau_{k}=k\Delta t)$, where $\Delta t$ is the time step between saved trajectory frames and $k$ is a variable integer, using the methods described in lecture_ml_nonrev and lecture_ml_rev.
Compute the $m$ largest eigenvalues of $\mathbf{T}(\tau_{k})$, and from these the $m$ slowest implied timescales $t_{i}(\tau_{k})$ depending on lag time $\tau_{k}$.
When the implied timescales $t_{i}$ reach an approximately constant value for lagtimes $\tau_{k}$, the state space discretization is sufficiently good to resolve the dynamics in these slowest processes. Usually, it is also expected that the lagtimes for which this approximate constant value is reached are significantly smaller than the timescales $t_{i}$ of interest.
Select the minimal lagtime $\tau$ at which $t_{i}$ are approximately constant, and use $\mathbf{T}(\tau)$ as Markov model.
Two notes of caution must be made at this point: (1) The argument above does not include the effect of statistics and is thus strictly only valid for the limit of good sampling. In many practical cases, statistics are insufficient and the implied timescales do not show the expected monotonous behavior that permits the quality of the discretization to be assessed. In this case, additional sampling is needed. (2) Observing convergence of the slowest implied timescales in $\tau$ is not a strict test of Markovianity. While Markovian dynamics implies constancy of implied timescales in $\tau$ [9][13], the reverse is not true and would require the eigenvectors to be constant as well. However, observing the lag time-dependence of the implied timescales is a useful approach to asses the quality of the discretization, and to choose a lag time $\tau$ at which $\mathbf{T}(\tau)$ shall be estimated, but this model needs to be validated subsequently (see lecture_chapman_kolmogorow).
The major part of this article has been published in [12].
The idea of using implied timescales to check for Markovianity was originally introduced in [13].
[1]: Bieri, Oliver, Wirz, Jakob, Hellrung, Bruno, Schutkowski, Mike, Drewello, Mario and Kiefhaber, Thomas: The speed limit for protein folding measured by triplet-triplet energy transfer. Proc. Natl. Acad. Sci. USA 96, 9597-9601 (1999).
[2]: N. V. Buchete and G. Hummer: Coarse Master Equations for Peptide Folding Dynamics. J. Phys. Chem. B 112, 6057-6069 (2008).
[3]: Chan, Chi-Kin, Hu, Yi, Takahashi, Satoshi, Rousseau, Denis L., Eaton, William A. and Hofrichter, James: Submillisecond protein folding kinetics studied by ultrarapid mixing. Proc. Natl. Acad. Sci. USA 94, 1779-1784 (1997).
[4]: Chodera, J. D., Dill, K. A., Singhal, N., Pande, V. S., Swope, W. C. and Pitera, J. W.: Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. J. Chem. Phys. 126, 155101 (2007).
[5]: J. D. Chodera and F. Noé: Probability distributions of molecular observables computed from Markov models. II: Uncertainties in observables and their time-evolution. J. Chem. Phys. 133, 105102 (2010).
[6]: N. Djurdjevac, M. Sarich and C. Schütte: Estimating the eigenvalue error of Markov State Models. Multiscale Model. Simul. 10, 61-81 (2012).
[7]: Jäger, Marcus, Nguyen, Houbi, Crane, Jason C., Kelly, Jeffery W. and Gruebele, Martin: The folding mechanism of a beta-sheet: the WW domain. J. Mol. Biol. 311, 373-393 (2001).
[8]: Neuweiler, Hannes, Doose, Sören and Sauer, Markus: A microscopic view of miniprotein folding: Enhanced folding efficiency through formation of an intermediate. Proc. Natl. Acad. Sci. USA 102, 16650-16655 (2005).
[9]: Noé, F., Horenko, I., Schütte, C. and Smith, J. C.: Hierarchical Analysis of Conformational Dynamics in Biomolecules: Transition Networks of Metastable States. J. Chem. Phys. 126, 155102 (2007).
[10]: F. Noé, C. Schütte, E. Vanden-Eijnden, L. Reich and T.R. Weikl: Constructing the Full Ensemble of Folding Pathways from Short Off-Equilibrium Simulations. Proc. Natl. Acad. Sci. USA 106, 19011-19016 (2009).
[11]: J.-H. Prinz, J. D. Chodera and F. Noé: Spectral rate theory for two-state kinetics. Phys. Rev. X 4, 011020 (2014).
[12]: J.-H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schütte and F. Noé: Markov models of molecular kinetics: Generation and Validation. J. Chem. Phys. 134, 174105 (2011).
[13]: Swope, W. C., Pitera, J. W. and Suits, F.: Describing protein folding kinetics by molecular dynamics simulations: 1. Theory. J. Phys. Chem. B 108, 6571-6581 (2004).
[14]: Swope, W. C., Pitera, J. W., Suits, F., Pitman, M. and Eleftheriou, M.: Describing protein folding kinetics by molecular dynamics simulations: 2. Example applications to alanine dipeptide and beta-hairpin peptide. Journal of Physical Chemistry B 108, 6582-6594 (2004).