Molecular Dynamics at Millisecond Scale
Breaking the time-scale barrier in computational biology with ArcOne-accelerated simulation.
Molecular dynamics (MD) simulation is the workhorse of computational biophysics. By numerically integrating Newton's equations of motion for every atom in a molecular system, MD reveals the physical basis of biological processes — how proteins fold, how drugs bind, how membranes reshape. But the method carries a brutal timescale limitation. Accurate integration of atomic forces requires femtosecond (10^-15 second) timesteps, because the fastest motions in the system — bond vibrations — occur at that frequency. A microsecond of simulated time therefore requires one billion integration steps. The biological processes we most need to understand — protein conformational changes, drug binding and unbinding, allosteric signaling — occur on timescales of microseconds to milliseconds and beyond. Simulating one millisecond of a modest protein system (50,000 atoms) on a modern GPU cluster requires roughly three months of continuous wall-clock computation. D.E. Shaw Research's purpose-built Anton supercomputer pushed this boundary heroically, but even Anton 3 cannot routinely access millisecond timescales for the large, complex systems that dominate modern drug discovery pipelines. The result is a field where the computations we can afford to run are often too short to observe the events we need to study.
ArcOne addresses this timescale barrier not by making individual timesteps cheaper, but by learning which timesteps can be skipped without sacrificing the accuracy of the dynamical trajectory. Our approach, which we call Learned Adaptive Temporal Coarse-Graining (LATCG), trains a neural model to predict the system's state at time t+delta from its state at time t, where delta is variable and can span thousands of conventional timesteps. The key insight is that molecular systems spend most of their time in local energy minima, executing small thermal fluctuations that are statistically redundant — they contribute to equilibrium averages but do not drive the rare, large-scale conformational transitions that determine biological function. LATCG identifies these quiescent periods in real time and fast-forwards through them, reverting to fine-grained integration only when the system approaches a transition state or enters a dynamically interesting region. The model is not a black-box surrogate that replaces the physics; it is a learned controller that decides when the physics needs to be computed at full resolution and when a coarse summary suffices.
The technical implementation couples a graph neural network (GNN) to the MD engine at each macro-step. The GNN takes as input the current atomic positions, velocities, and a local structural context (residue-level secondary structure, solvent accessibility, contact maps) and outputs two predictions: the system state after the proposed skip interval, and a confidence score reflecting the expected accuracy of that skip. If the confidence falls below a calibrated threshold — indicating proximity to a conformational transition or an under-sampled region of phase space — the controller reduces the skip interval, potentially down to single-timestep resolution. This adaptive scheme preserves detailed balance and produces trajectories whose equilibrium distributions match those of brute-force MD to within statistical error, as validated by comparing radial distribution functions, order parameters, and free energy surfaces on benchmark systems including alanine dipeptide, the villin headpiece, and BPTI.
The speedup is system-dependent but consistently substantial. For the villin headpiece (580 atoms), LATCG achieves an effective 320x speedup over conventional MD, enabling millisecond-equivalent sampling in under seven hours on a single A100 GPU. For the beta-2 adrenergic receptor in an explicit lipid bilayer (140,000 atoms) — a system directly relevant to drug discovery — the speedup is 85x, reflecting the higher proportion of dynamically complex behavior in a large membrane protein system. Crucially, the speedup increases as the biological timescale of interest lengthens, because longer simulations contain proportionally more quiescent time that can be compressed. We validated LATCG's predictions against experimental observables: NMR relaxation rates for ubiquitin (RMSD to experiment: 8.2%, versus 7.5% for brute-force MD of equivalent effective duration), and stopped-flow kinetics for barnase-barstar binding (predicted kon within 2.5-fold of experiment, compared to 4-fold for unaccelerated MD at attainable timescales).
The practical implications for drug discovery are immediate. Drug binding kinetics — the rates at which a compound associates with and dissociates from its target — are increasingly recognized as better predictors of in vivo efficacy than equilibrium binding affinity alone. But computing binding kinetics requires simulating the full binding and unbinding process, which occurs on timescales of microseconds to seconds. LATCG brings this within computational reach. In a blinded retrospective study with a pharmaceutical partner, we computed residence times for 23 kinase inhibitors and achieved a Spearman correlation of 0.74 with experimental measurements — sufficient to correctly rank-order compounds for lead optimization. We are extending LATCG to handle enhanced sampling methods (metadynamics, replica exchange) and to integrate with our quantum simulation pipeline for systems where electronic structure effects on dynamics are critical, such as metalloenzyme catalysis and photoreceptor activation.