### Project Summary

Molecular dynamics (MD) simulation is a technique that aids in the understanding of fundamental processes in biology and chemistry, and has important technological applications in pharmacy, biotechnology, and nanotechnology. Many complex molecular processes have cascades of timescales spanning the range from 10-15 s to 1 s, often with no pronounced gap that would permit ef.cient coarse-grained time integration.

In many applications, the slowest timescales and the associated structural rearrangements are the ones of interest. In this project, we study the challenging process of induced folding of peptides. The metastable states associated with the slowest timescale of an induced folding problem are shown in Fig. A04-1. Such states may be found when peptide ligands bind to proteins and when membrane-associated proteins anchor into the membrane. Here, association and conformational changes occur on physical timescales of nano-to milliseconds, while dissociation events may require seconds or longer.

As a root model, we choose classical molecular dynamics with atomistic resolution and explicit solvent. Thus, the simulation system consists of a box, containing typically

10.000 to 100.000 classical particles, representing solvent, ions, and the solvated protein. The system evolves by a time-stepping scheme that approximates the solution of the classical equations of motion. Additionally, the time-stepping scheme usually contains a stochastic term that models the coupling of the molecular system to a heat bath, and thus ensures the desired thermodynamic ensemble (e.g., canonical). In this setting, molecular dynamics is a Markov process in a high-dimensional state space. The dominant timescales and their associated structure changes between metastable (long-lived) states are given by the eigenvalues and eigenfunctions of the transfer operator of the Markov process. These dominant eigenvalues and eigenfunctions therefore need to be approximated.

The introduction of Markov state models (MSMs) to molecular simulation in the past few years has been a breakthrough in providing the ability to perform such an approximation. An MSM consists of a discretization of the molecular state space into sets, often found by geometric clustering of available simulation data, and a matrix of transition probabilities between them, estimated from the same simulation data. This is an estimation of a set discretization of the transfer operator. Despite their success, the current algorithmic realization of MSMs for high-dimensional system suffers from two fundamental problems:

1. Discretization Problem: When the initial discretization for the MSM, based on Euclidean distances in the data, is poor, the dominant transfer operator eigenvalues (and timescales) will be systematically underestimated, resulting in numerical unreliability of the approach. When the user is interested in approximating a sizable number (e.g., 10–100) of slow processes with high accuracy, the common practice to use data-driven geometric clustering methods may not be a viable approach.

2. Sampling Problem: MSMs contain only information of states that have been visited and transitions that have occurred in the simulation data. While the slowest events may occur on timescales of seconds, affordable simulation lengths are on the order of microseconds. Thus, MSM construction suffers from a severe sampling problem.

Both problems are coupled. Based on keystones set by recent theoretical results, we now set out to develop a concise numerical and algorithmic framework to address them.

The long-term aims of this project are to develop efficient modeling and simulation methods for the dominant (slow) timescales of complex biomolecular simulation systems, and apply them to folding-binding problems in biomolecules.

In contrast to previous conformation-dynamics approaches such as Markov state modeling that are driven by a set-based approach, we attempt a paradigm shift and will focus on developing methods to approximate and sample individual timescales and eigenfunctions one by one.

### Project publications

Klus, S. and Nüske, F. and Koltai, P. and Wu, H. and Kevrekidis, I. and Schütte, Ch. and Noé, F.
(2018)
*Data-driven model reduction and transfer operator approximation.*
Journal of Nonlinear Science, 28
(1).
pp. 1-26.

Koltai, Péter and Wu, H. and Noé, F. and Schütte, Ch.
(2018)
*Optimal data-driven estimation of generalized Markov state models for non-equilibrium dynamics.*
Computing
.
(Submitted)

Paul, F. and Noé, F. and Weikl, T.
(2018)
*Identifying Conformational-Selection and Induced-Fit Aspects in the Binding-Induced Folding of PMI from Markov State Modeling of Atomistic Simulations.*
J. Phys. Chem. B
.

Paul, F. and Wehmeyer, C. and Abualrous, E. T. and Wu, H. and Crabtree, M. D. and Schöneberg, J. and Clarke, J. and Freund, C. and Weikl, T. and Noé, F.
(2017)
*Protein-peptide association kinetics beyond the seconds timescale from atomistic simulations.*
Nat. Comm., 8
(1095).

Gerber, S. and Horenko, I.
(2017)
*Toward a direct and scalable identification of reduced models for categorical processes.*
Proceedings of the National Academy of Sciences, 114
(19).
pp. 4863-4868.

Nüske, F. and Wu, H. and Wehmeyer, C. and Clementi, C. and Noé, F.
(2017)
*Markov State Models from short non-Equilibrium Simulations - Analysis and Correction of Estimation Bias.*
J. Chem. Phys., 146
.
094104.

Olsson, Simon and Wu, H. and Paul, F. and Clementi, C. and Noé, F.
(2017)
*Combining experimental and simulation data of molecular processes via augmented Markov models.*
Proc. Natl. Acad. Sci. USA, 114
.
pp. 8265-8270.

Wu, H. and Noé, F.
(2017)
*Variational approach for learning Markov processes from time series data.*
https://arxiv.org/abs/1707.04659
.

Wu, H. and Paul, F. and Wehmeyer, C. and Noé, F.
(2016)
*Multiensemble Markov models of molecular thermodynamics and kinetics.*
Proceedings of the National Academy of Sciences, 113
(23).
E3221-E3230 .
ISSN 0027-8424

Nüske, F. and Schneider, R. and Vitalini, F. and Noé, F.
(2016)
*Variational Tensor Approach for Approximating the Rare-Event Kinetics of Macromolecular Systems.*
J. Chem. Phys., 144
(5).
054105.

Paul, F. and Weikl, T.
(2016)
*How to Distinguish Conformational Selection and Induced Fit Based on Chemical Relaxation Rates.*
PLOS Computational Biology
.

Vitalini, F. and Noé, F. and Keller, B.
(2016)
*Molecular dynamics simulations data of the twenty encoded amino acids in different force fields.*
Data in Brief, 7
.
pp. 582-590.

Trendelkamp-Schroer, B. and Wu, H. and Paul, F. and Noé, F.
(2015)
*Estimation and uncertainty of reversible Markov models.*
J. Chem. Phys., 143
(17).
p. 174101.

Scherer, M. K. and Trendelkamp-Schroer, B. and Paul, F. and Pérez-Hernández, G. and Hoffmann, M. and Plattner, N. and Wehmeyer, C. and Prinz, J.-H. and Noé, F.
(2015)
*PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models.*
J. Chem. Theory Comput., 11
(11).
pp. 5525-5542.

Wu, H. and Prinz, J.-H. and Noé, F.
(2015)
*Projected Metastable Markov Processes and Their Estimation with Observable Operator Models.*
J. Chem. Phys., 143
(14).
p. 144101.

Wu, H. and Noé, F.
(2015)
*Gaussian Markov transition models of molecular kinetics.*
J. Chem. Phys., 142
(8).
084104.

Wu, H. and Mey, A.S.J.S. and Rosta, E. and Noé, F.
(2014)
*Statistically optimal analysis of state-discretized trajectory data from multiple thermodynamic states.*
J. Chem. Phys., 141
(21).
p. 214106.

Mey, A.S.J.S. and Wu, H. and Noé, F.
(2014)
*xTRAM: Estimating equilibrium expectations from time-correlated simulation data at multiple thermodynamic states.*
Phys. Rev. X, 4
(4).
041018.