Maximum likelihood reversible transition matrix

Frank Noefrank.noe@fu-berlin.de
FU Berlin, Arnimallee 6, 14195 Berlin, Germany

Summary: We have observed one or several trajectories amongst $n$ discrete states. At lag time $\tau$, there have been $c_{ij}=c_{ij}(\tau)$ transitions between these states, as encoded in a count matrix $\mathbf{C}\in\mathbb{R}^{n\times n}$. We aim at estimating a Markov transition matrix, $\hat{\mathbf{P}}\in\mathbb{R}^{n\times n}$ with maximum likelihood under the constraint that $\hat{\mathbf{P}}$ fulfills detailed balance. Mathematically, $$\begin{aligned} \hat{\mathbf{P}} & = & \arg\max_{\mathbf{P}}p(\mathbf{C}\mid\mathbf{P})\\ s.t. & & \pi_{i}p_{ij}=\pi_{j}p_{ji}\:\:\:\:\:\forall i,j\end{aligned}$$ where $\pi_{i}$ is the stationary distribution of $\hat{\mathbf{P}}$. In contrast to the maximum likelihood problem without the detailed balance constraint, this problem has no closed solution. Here, two iterative optimization algorithms are reviewed that solve this problem.


Consider that we have observed a trajectory $[s_{1},...,s_{T}]$ amongst $n$ discrete states labeled $1$ through $n$. The number of transitions between any pair of states $i$ and $j$ are denoted $\mathbf{C}=[c_{ij}^{\mathrm{obs}}]$, i.e.: $$c_{ij}^{\mathrm{obs}}=\left|\{s_{t}=i\wedge s_{t+1}=j\mid t=1,...,T-1\}\right| \:\:\:\:(1)$$ and the total number of transitions out of state $i$ is $c_{i}^{\mathrm{obs}}$: $$c_{i}^{\mathrm{obs}}=\sum_{j=1}^{n}c_{ij}^{\mathrm{obs}}. \:\:\:\:(2)$$ The likelihood of a Markov chain with transition matrix $\mathbf{P}=[p_{ij}]\in\mathbb{R}^{n\times n}$ is given by: $$p(\mathbf{C}^{\mathrm{obs}}|\mathbf{P})=\prod_{i,j=1}^{n}p_{ij}^{c_{ij}^{\mathrm{obs}}} \:\:\:\:(3)$$ Our aim is to estimate the transition matrix that maximizes (3). In order to generalize our approach we also consider the maximization of a posterior probability which uses a conjugate prior, i.e. a prior with the same functional form as the likelihood: $$p(\mathbf{P}|\mathbf{C}^{\mathrm{obs}})\propto\prod_{i,j=1}^{n}p_{ij}^{c_{ij}^{\mathrm{prior}}+c_{ij}^{\mathrm{obs}}}=\prod_{i,j=1}^{n}p_{ij}^{c_{ij}}, \:\:\:\:(4)$$ Where $c_{ij}^{\mathrm{prior}}$ are prior counts. We now seek the matrix that maximizes (4) with the additional detailed balance constraints $$\pi_{i}p_{ij}=\pi_{j}p_{ji} \:\:\:\:(5)$$ For the choice $c_{ij}^{\mathrm{prior}}\equiv0$ (uniform prior), the maximum probability estimator is identical to the maximum likelihood estimator.

Note that the nonreversible maximum likelihood estimator $\hat{p}_{ij}=c_{ij}/\sum_{k=1}^{n}c_{ik}$ does usually not fulfill detailed balance $(\pi_{i}\hat{p}_{ij}\neq\pi_{j}\hat{p}_{ji})$, even if the underlying dynamics is in equilibrium and thus $\pi_{i}p_{ij}=\pi_{j}p_{ji}$ holds for the exact transition matrix $\mathbf{P}(\tau)$. In many cases it is desirable and advantageous to estimate a transition matrix that does fulfill detailed balance. If the underlying microscopic dynamics are reversible, detailed balance of the discrete transition matrix $\mathbf{P}$ is a consequence and therefore it is meaninful to impose this structure onto $\hat{\mathbf{P}}$ (see lecture_microdynamics). Two practical implications are: (1) The additional constraints reduce the number of free parameters in $\hat{\mathbf{P}}$ which can result in a significant reduction of statistical uncertainty [2]. (2) The eigenvalues and eigenvectors of a reversible transition matrix are real (see basics_markov_chains).

There is no known closed form solution for the maximum probability estimator with the detailed balance constraint. We present two iterative methods subsequently.

Let $x_{ij}=\pi_{i}p_{ij}$ be the unconditional transition probability to observe a transition $i\rightarrow j$. These variables fulfill the constraint $\sum_{i,j}x_{ij}=1$, and the detailed balance condition is given by $x_{ij}=x_{ji}$. It is hence sufficient to store the $x_{ij}$ with $i\le j$ in order to construct a reversible transition matrix. A reversible transition matrix $\mathbf{P}=\left[p_{ij}\right]$ can be expressed as $$p_{ij}=\frac{x_{ij}}{x_{i}}=\frac{x_{ij}}{\pi_{i}} \:\:\:\:(6)$$ where $\mathbf{X}=\left[x_{ij}\right]$ is a nonnegative symmetric matrix and $x_{i}=\sum_{j}x_{ij}=\pi_{i}$. Then the log-likelihood of $\mathbf{X}$ with count matrix $\mathrm{\mathbf{C}}=\left[c_{ij}\right]$ is $$Q=\log p\left(\mathbf{C}|\mathbf{X}\right)=\sum_{i}c_{ii}\log\frac{x_{ii}}{x_{i}}+\sum_{i $$\hat{\mathbf{X}}=\arg\max_{\mathbf{X}}Q \:\:\:\:(8)$$ We first derive the optimality conditions as in [1]. The partial derivatives of $Q$ are given by:

$$\begin{aligned} \frac{\partial Q}{\partial x_{ij}} & = & \frac{c_{ij}}{x_{ij}}+\frac{c_{ji}}{x_{ji}}-\sum_{j'=1}^{n}\frac{c_{ij'}}{\sum_{k=1}^{n}x_{ik}}-\sum_{j'=1}^{n}\frac{c_{jj'}}{\sum_{k=1}^{n}x_{jk}}\\ & = & \frac{c_{ji}+c_{ij}}{x_{ij}}-\frac{c_{i}}{x_{i}}-\frac{c_{j}}{x_{j}} \:\:\:\:(9)\end{aligned}$$

where we have used $c_{i}=\sum_{k=1}^{n}c_{ik}$. $Q$ is maximized at $\frac{\partial Q}{\partial x_{ij}}=0$, yielding the optimality conditions: $$\frac{c_{ji}+c_{ij}}{x_{ij}}-\frac{c_{i}}{x_{i}}-\frac{c_{j}}{x_{j}}=0\:\:\:\:\:\:\:\forall i,j. \:\:\:\:(10)$$

Direct fixed-point iteration:

In [1] the following approach to search $\hat{\mathbf{X}}$ has been suggested. Solving 10 for $x_{ij}$, we can write the optimality conditions as the following fixed-point iteration: $$x_{ij}^{(k+1)}=\frac{c_{ij}+c_{ji}}{\frac{c_{i}}{x_{i}^{(k)}}+\frac{c_{j}}{x_{j}^{(k)}}}, \:\:\:\:(11)$$ where $k$ is the iteration number. This approach iteratively updates all elements of $\mathbf{X}$ until the log-likelihood converges. The transition matrix $\hat{\mathbf{P}}$ can then be calculated from Eq. (6).

We can get a more memory-efficient algorithm by summing both sides over $j$ and using $\sum x_{ij}=x_{i}$ and $x_{i}=\pi_{i}$ [5]: $$\pi_{i}^{(k+1)}=\sum_{j}\frac{c_{ij}+c_{ji}}{\frac{c_{i}}{\pi_{i}^{(k)}}+\frac{c_{j}}{\pi_{j}^{(k)}}}. \:\:\:\:(12)$$ In this approach, only the $n$ variables of the $\boldsymbol{\pi}$-vector are varied and need to be stored. After convergence, the reversible transition matrix estimate is obtained in one step from the final estimates $\pi_{i}=\pi_{i}^{(k)}$: $$\hat{p}_{ij}=\frac{c_{ij}+c_{ji}}{c_{i}+\frac{\pi_{i}}{\pi_{j}}c_{j}}. \:\:\:\:(13)$$

Quadratic optimizer:

The optimization approach derived in [4] performs element-wise optimization steps that each maximize the likelihood with respect to the element worked on, while keeping the others fixed. Such a conditional optimization can be done by solving a one-dimensional quadratic optimization problem for each element of the $\mathbf{X}$-matrix. We distinguish between diagonal and off-diagonal entries:

  1. $i=j$ $$\begin{aligned} \arg\max_{x_{ii}}\log p\left(\mathbf{C}|\mathbf{X}\right) & = & \arg\max_{x_{ii}}c_{ii}\log\frac{x_{ii}}{x_{ii}+x_{i,-i}}+\sum_{k\neq i}c_{ik}\log\frac{x_{ik}}{x_{ii}+x_{i,-i}}\nonumber \\ & = & \arg\max_{x_{ii}}c_{ii}\log x_{ii}-c_{i}\log\left(x_{ii}+x_{i,-i}\right) \:\:\:\:(14)\end{aligned}$$ where $x_{i,-j}=\sum_{k\neq j}x_{ik}$. Then the solution of the above equation is $$x_{ii}=\frac{c_{ii}x_{i,-i}}{c_{i}-c_{ii}} \:\:\:\:(15)$$

  2. $i0$ and $$\frac{-b-\sqrt{b^{2}-4ac}}{2a}<0 \:\:\:\:(21)$$ for $a>0$ and $ac<0$, such that $$x_{ij}=\frac{-b+\sqrt{b^{2}-4ac}}{2a} \:\:\:\:(22)$$ is the desired maximum.

The maximum probability estimator is then obtained by the following iterative algorithm (see Supplementary Information of [4] for the proof of correctness), which is iterated until some stopping criterion is met (e.g. change of $\max_{i,j}\{x_{ij}\}$ in one iteration is smaller than a given constant or the number of iterations exceeds a pre-defined threshold):

Algorithm 1: Maximum probability estimator of reversible transition matrices
(1) For all $i,j=1,...,n$: initialize $$\begin{aligned} x_{ij}=x_{ji} & := & c_{ij}+c_{ji}\\ x_{i} & := & \sum_{j}x_{ij} \:\:\:\:(24)\end{aligned}$$

(2) Repeat until stopping criterion is met

(1.1.) For all $i=1,...,n$:

$$\begin{aligned} \mathrm{update}\: x_{ii} & := & \frac{c_{ii}(x_{i}-x_{ii})}{c_{i}-c_{ii}}\\ \mathrm{update}\: x_{i} & := & \sum_{j}x_{ij} \:\:\:\:(26)\end{aligned}$$

(1.2) For all $i=1,...,n-1,j=i+1,...,n$: $$\begin{aligned} a & = & c_{i}-c_{ij}+c_{j}-c_{ji}\\ b & = & c_{i}(x_{j}-x_{ij})+c_{j}(x_{i}-x_{ij})-(c_{ij}+c_{ji})(x_{i}+x_{j}-2x_{ij})\\ c & = & -(c_{ij}+c_{ji})(x_{i}-x_{ij})(x_{j}-x_{ij})\\ \mathrm{update}\: x_{ij}=x_{ji} & := & \frac{-b+\sqrt{b^{2}-4ac}}{2a}\\ \mathrm{update}\: x_{i} & := & \sum_{j}x_{ij} \:\:\:\:(28)\end{aligned}$$

(2) Update $p_{ij}$, $i,j=1,...,n$: $$p_{ij}:=\frac{x_{ij}}{x_{i}} \:\:\:\:(30)$$

Fig. 1 compares the two estimators using a small and a large transition matrix as examples. It is seen that the quadratic optimizer converges asymptotically in approximately a factor of 5 less steps compared to the direct method, while using slightly more CPU time per step.

Note that both the optimum sought by Eq. (8) exhibits $\hat{p}_{ij}=0$ if $c_{ij}+c_{ji}=0$. Thus, in both optimization algorithms, the sparsity structure of the matrix $\mathbf{C}+\mathbf{C}^{T}$ can be used in order to restrict all iterations to the elements that will result in a nonzero element $\hat{p}_{ij}>0$.

Figure 1: Performance comparison of the direct and quadratic optimizer for reversible transition matrix estimation. Shown is the difference of the current likelihood to the optimal likelihood. (a) Results for the count matrix $C=\left[\begin{array}{ccc} 5 & 2 & 0\ 1 & 1 & 1\ 2 & 5 & 20 \end{array}\right]$. (b) Results for the 1734$\times$1734 count matrix from Pin WW folding simulations used in Ref. [3].

Acknowledgements:

Part of this article has been published in [4].

Citing maximum likelihood reversible transition matrix estimation:

The fixed-point reversible estimator was first introduced in [1]. The quadratic optimization method was suggested in [4].

Bibliography

[1]: G. R. Bowman, K. A. Beauchamp, G. Boxer and V. S. Pande: Progress and challenges in the automated construction of Markov state models for full protein systems.. J. Chem. Phys. 131, 124101 (2009).

[2]: F. Noé: Probability Distributions of Molecular Observables computed from Markov Models. J. Chem. Phys. 128, 244103 (2008).

[3]: F. Noé, C. Schütte, E. Vanden-Eijnden, L. Reich and T.R. Weikl: Constructing the Full Ensemble of Folding Pathways from Short Off-Equilibrium Simulations. Proc. Natl. Acad. Sci. USA 106, 19011-19016 (2009).

[4]: J.-H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schütte and F. Noé: Markov models of molecular kinetics: Generation and Validation. J. Chem. Phys. 134, 174105 (2011).

[5]: B. Trendelkamp-Schroer, H. Wu, F. Paul and F. Noé: Estimation and uncertainty of reversible Markov models. in preparation