Frank Noe
frank.noe@fu-berlin.de
FU Berlin,
Arnimallee 6, 14195 Berlin, Germany
Summary: We have observed one or several trajectories amongst $n$ discrete states. At lag time $\tau$, there have been $c_{ij}=c_{ij}(\tau)$ transitions between these states, as encoded in a count matrix $\mathbf{C}\in\mathbb{R}^{n\times n}$. We aim at estimating a Markov transition matrix, $\hat{\mathbf{P}}\in\mathbb{R}^{n\times n}$ with maximum likelihood under the constraint that $\hat{\mathbf{P}}$ fulfills detailed balance. Mathematically, $$\begin{aligned} \hat{\mathbf{P}} & = & \arg\max_{\mathbf{P}}p(\mathbf{C}\mid\mathbf{P})\\ s.t. & & \pi_{i}p_{ij}=\pi_{j}p_{ji}\:\:\:\:\:\forall i,j\end{aligned}$$ where $\pi_{i}$ is the stationary distribution of $\hat{\mathbf{P}}$. In contrast to the maximum likelihood problem without the detailed balance constraint, this problem has no closed solution. Here, two iterative optimization algorithms are reviewed that solve this problem. |
Note that the nonreversible maximum likelihood estimator $\hat{p}_{ij}=c_{ij}/\sum_{k=1}^{n}c_{ik}$ does usually not fulfill detailed balance $(\pi_{i}\hat{p}_{ij}\neq\pi_{j}\hat{p}_{ji})$, even if the underlying dynamics is in equilibrium and thus $\pi_{i}p_{ij}=\pi_{j}p_{ji}$ holds for the exact transition matrix $\mathbf{P}(\tau)$. In many cases it is desirable and advantageous to estimate a transition matrix that does fulfill detailed balance. If the underlying microscopic dynamics are reversible, detailed balance of the discrete transition matrix $\mathbf{P}$ is a consequence and therefore it is meaninful to impose this structure onto $\hat{\mathbf{P}}$ (see lecture_microdynamics). Two practical implications are: (1) The additional constraints reduce the number of free parameters in $\hat{\mathbf{P}}$ which can result in a significant reduction of statistical uncertainty [2]. (2) The eigenvalues and eigenvectors of a reversible transition matrix are real (see basics_markov_chains).
There is no known closed form solution for the maximum probability estimator with the detailed balance constraint. We present two iterative methods subsequently.
Let $x_{ij}=\pi_{i}p_{ij}$ be the unconditional transition probability
to observe a transition $i\rightarrow j$. These variables fulfill the
constraint $\sum_{i,j}x_{ij}=1$, and the detailed balance condition is
given by $x_{ij}=x_{ji}$. It is hence sufficient to store the $x_{ij}$
with $i\le j$ in order to construct a reversible transition matrix. A
reversible transition matrix $\mathbf{P}=\left[p_{ij}\right]$ can be
expressed as
$$p_{ij}=\frac{x_{ij}}{x_{i}}=\frac{x_{ij}}{\pi_{i}}
\:\:\:\:(6)$$ where $\mathbf{X}=\left[x_{ij}\right]$ is a nonnegative
symmetric matrix and $x_{i}=\sum_{j}x_{ij}=\pi_{i}$. Then the
log-likelihood of $\mathbf{X}$ with count matrix
$\mathrm{\mathbf{C}}=\left[c_{ij}\right]$ is
$$Q=\log p\left(\mathbf{C}|\mathbf{X}\right)=\sum_{i}c_{ii}\log\frac{x_{ii}}{x_{i}}+\sum_{i
where we have used $c_{i}=\sum_{k=1}^{n}c_{ik}$. $Q$ is maximized at $\frac{\partial Q}{\partial x_{ij}}=0$, yielding the optimality conditions: $$\frac{c_{ji}+c_{ij}}{x_{ij}}-\frac{c_{i}}{x_{i}}-\frac{c_{j}}{x_{j}}=0\:\:\:\:\:\:\:\forall i,j. \:\:\:\:(10)$$
In [1] the following approach to search $\hat{\mathbf{X}}$ has been suggested. Solving 10 for $x_{ij}$, we can write the optimality conditions as the following fixed-point iteration: $$x_{ij}^{(k+1)}=\frac{c_{ij}+c_{ji}}{\frac{c_{i}}{x_{i}^{(k)}}+\frac{c_{j}}{x_{j}^{(k)}}}, \:\:\:\:(11)$$ where $k$ is the iteration number. This approach iteratively updates all elements of $\mathbf{X}$ until the log-likelihood converges. The transition matrix $\hat{\mathbf{P}}$ can then be calculated from Eq. (6).
We can get a more memory-efficient algorithm by summing both sides over $j$ and using $\sum x_{ij}=x_{i}$ and $x_{i}=\pi_{i}$ [5]: $$\pi_{i}^{(k+1)}=\sum_{j}\frac{c_{ij}+c_{ji}}{\frac{c_{i}}{\pi_{i}^{(k)}}+\frac{c_{j}}{\pi_{j}^{(k)}}}. \:\:\:\:(12)$$ In this approach, only the $n$ variables of the $\boldsymbol{\pi}$-vector are varied and need to be stored. After convergence, the reversible transition matrix estimate is obtained in one step from the final estimates $\pi_{i}=\pi_{i}^{(k)}$: $$\hat{p}_{ij}=\frac{c_{ij}+c_{ji}}{c_{i}+\frac{\pi_{i}}{\pi_{j}}c_{j}}. \:\:\:\:(13)$$
The optimization approach derived in [4] performs element-wise optimization steps that each maximize the likelihood with respect to the element worked on, while keeping the others fixed. Such a conditional optimization can be done by solving a one-dimensional quadratic optimization problem for each element of the $\mathbf{X}$-matrix. We distinguish between diagonal and off-diagonal entries:
$i=j$ $$\begin{aligned} \arg\max_{x_{ii}}\log p\left(\mathbf{C}|\mathbf{X}\right) & = & \arg\max_{x_{ii}}c_{ii}\log\frac{x_{ii}}{x_{ii}+x_{i,-i}}+\sum_{k\neq i}c_{ik}\log\frac{x_{ik}}{x_{ii}+x_{i,-i}}\nonumber \\ & = & \arg\max_{x_{ii}}c_{ii}\log x_{ii}-c_{i}\log\left(x_{ii}+x_{i,-i}\right) \:\:\:\:(14)\end{aligned}$$ where $x_{i,-j}=\sum_{k\neq j}x_{ik}$. Then the solution of the above equation is $$x_{ii}=\frac{c_{ii}x_{i,-i}}{c_{i}-c_{ii}} \:\:\:\:(15)$$
$i
The maximum probability estimator is then obtained by the following iterative algorithm (see Supplementary Information of [4] for the proof of correctness), which is iterated until some stopping criterion is met (e.g. change of $\max_{i,j}\{x_{ij}\}$ in one iteration is smaller than a given constant or the number of iterations exceeds a pre-defined threshold):
Algorithm 1:
Maximum probability estimator of reversible transition matrices
(1) For all $i,j=1,...,n$: initialize $$\begin{aligned} x_{ij}=x_{ji} & := & c_{ij}+c_{ji}\\ x_{i} & := & \sum_{j}x_{ij} \:\:\:\:(24)\end{aligned}$$ (2) Repeat until stopping criterion is met (1.1.) For all $i=1,...,n$: $$\begin{aligned} \mathrm{update}\: x_{ii} & := & \frac{c_{ii}(x_{i}-x_{ii})}{c_{i}-c_{ii}}\\ \mathrm{update}\: x_{i} & := & \sum_{j}x_{ij} \:\:\:\:(26)\end{aligned}$$(1.2) For all $i=1,...,n-1,j=i+1,...,n$: $$\begin{aligned} a & = & c_{i}-c_{ij}+c_{j}-c_{ji}\\ b & = & c_{i}(x_{j}-x_{ij})+c_{j}(x_{i}-x_{ij})-(c_{ij}+c_{ji})(x_{i}+x_{j}-2x_{ij})\\ c & = & -(c_{ij}+c_{ji})(x_{i}-x_{ij})(x_{j}-x_{ij})\\ \mathrm{update}\: x_{ij}=x_{ji} & := & \frac{-b+\sqrt{b^{2}-4ac}}{2a}\\ \mathrm{update}\: x_{i} & := & \sum_{j}x_{ij} \:\:\:\:(28)\end{aligned}$$ (2) Update $p_{ij}$, $i,j=1,...,n$: $$p_{ij}:=\frac{x_{ij}}{x_{i}} \:\:\:\:(30)$$ |
Note that both the optimum sought by Eq. (8) exhibits $\hat{p}_{ij}=0$ if $c_{ij}+c_{ji}=0$. Thus, in both optimization algorithms, the sparsity structure of the matrix $\mathbf{C}+\mathbf{C}^{T}$ can be used in order to restrict all iterations to the elements that will result in a nonzero element $\hat{p}_{ij}>0$.
Part of this article has been published in [4].
The fixed-point reversible estimator was first introduced in [1]. The quadratic optimization method was suggested in [4].
[1]: G. R. Bowman, K. A. Beauchamp, G. Boxer and V. S. Pande: Progress and challenges in the automated construction of Markov state models for full protein systems.. J. Chem. Phys. 131, 124101 (2009).
[2]: F. Noé: Probability Distributions of Molecular Observables computed from Markov Models. J. Chem. Phys. 128, 244103 (2008).
[3]: F. Noé, C. Schütte, E. Vanden-Eijnden, L. Reich and T.R. Weikl: Constructing the Full Ensemble of Folding Pathways from Short Off-Equilibrium Simulations. Proc. Natl. Acad. Sci. USA 106, 19011-19016 (2009).
[4]: J.-H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schütte and F. Noé: Markov models of molecular kinetics: Generation and Validation. J. Chem. Phys. 134, 174105 (2011).
[5]: B. Trendelkamp-Schroer, H. Wu, F. Paul and F. Noé: Estimation and uncertainty of reversible Markov models. in preparation