Ensemble Averages from MD Simulations¶

This example shows how ensemble averages can be computed from MD simulations, such that their gradient with respect to force field parameters can be computed through backpropagation.

We start by parameterizing the set of molecules that will appear in our simulation boxes:

In [ ]:

Copied!





import openff.interchange
import openff.toolkit

import smee.converters

interchanges = [
    openff.interchange.Interchange.from_smirnoff(
        openff.toolkit.ForceField("openff-2.0.0.offxml"),
        openff.toolkit.Molecule.from_smiles(smiles).to_topology(),
    )
    for smiles in ("CCO", "CO")
]

tensor_ff, topologies = smee.converters.convert_interchange(interchanges)

# move the force field to the GPU for faster processing of the simulation
# trajectories - the system and force field must be on the same device.
tensor_ff = tensor_ff.to("cuda")
import openff.interchange
import openff.toolkit

import smee.converters

interchanges = [
    openff.interchange.Interchange.from_smirnoff(
        openff.toolkit.ForceField("openff-2.0.0.offxml"),
        openff.toolkit.Molecule.from_smiles(smiles).to_topology(),
    )
    for smiles in ("CCO", "CO")
]

tensor_ff, topologies = smee.converters.convert_interchange(interchanges)

# move the force field to the GPU for faster processing of the simulation
# trajectories - the system and force field must be on the same device.
tensor_ff = tensor_ff.to("cuda")

We will also flag that the vdW parameter gradients are required:

In [ ]:

Copied!

vdw_potential = tensor_ff.potentials_by_type["vdW"]
vdw_potential.parameters.requires_grad = True
vdw_potential = tensor_ff.potentials_by_type["vdW"]
vdw_potential.parameters.requires_grad = True

We then define the full simulation boxes that we wish to simulate:

In [ ]:

Copied!





import smee

# define a periodic box containing 216 ethanol molecules
system_ethanol = smee.TensorSystem([topologies[0]], [216], is_periodic=True)
system_ethanol = system_ethanol.to("cuda")
# define a periodic box containing 216 methanol molecules
system_methanol = smee.TensorSystem([topologies[1]], [216], True)
system_methanol = system_methanol.to("cuda")
# define a periodic box containing 128 ethanol molecules and 128 methanol molecules
system_mixture = smee.TensorSystem(topologies, [128, 128], True)
system_mixture = system_mixture.to("cuda")
import smee

# define a periodic box containing 216 ethanol molecules
system_ethanol = smee.TensorSystem([topologies[0]], [216], is_periodic=True)
system_ethanol = system_ethanol.to("cuda")
# define a periodic box containing 216 methanol molecules
system_methanol = smee.TensorSystem([topologies[1]], [216], True)
system_methanol = system_methanol.to("cuda")
# define a periodic box containing 128 ethanol molecules and 128 methanol molecules
system_mixture = smee.TensorSystem(topologies, [128, 128], True)
system_mixture = system_mixture.to("cuda")

A tensor system is simply a wrapper around a set of topology objects that define parameters applied to individual molecules, and the number of copies of that topology that should be present similar to GROMACS topologies. The is_periodic flag indicates whether the system should be simulated in a periodic box.

Here we have also moved the systems onto the GPU. This will allow us to much more rapidly compute ensemble averages from the trajectories, but is not required.

We then also must define the simulation protocol that will be used to run the simulations. This consists of a config object that defines how to generate the system coordinates using PACKMOL, the set of energy minimisations /simulations to run as equilibration, and finally the configuration of the production simulation:

In [ ]:

Copied!





import tempfile

import openmm.unit

import smee.mm

temperature = 298.15 * openmm.unit.kelvin
pressure = 1.0 * openmm.unit.atmosphere

beta = 1.0 / (openmm.unit.MOLAR_GAS_CONSTANT_R * temperature)

# we can run an arbitrary number of equilibration simulations / minimizations.
# all generated data will be discarded, but the final coordinates will be used
# to initialize the production simulation
equilibrate_config = [
    smee.mm.MinimizationConfig(),
    # short NVT equilibration simulation
    smee.mm.SimulationConfig(
        temperature=temperature,
        pressure=None,
        n_steps=50000,
        timestep=1.0 * openmm.unit.femtosecond,
    ),
    # short NPT equilibration simulation
    smee.mm.SimulationConfig(
        temperature=temperature,
        pressure=pressure,
        n_steps=50000,
        timestep=1.0 * openmm.unit.femtosecond,
    ),
]
# long NPT production simulation
production_config = smee.mm.SimulationConfig(
    temperature=temperature,
    pressure=pressure,
    n_steps=500000,
    timestep=2.0 * openmm.unit.femtosecond,
)
import tempfile

import openmm.unit

import smee.mm

temperature = 298.15 * openmm.unit.kelvin
pressure = 1.0 * openmm.unit.atmosphere

beta = 1.0 / (openmm.unit.MOLAR_GAS_CONSTANT_R * temperature)

# we can run an arbitrary number of equilibration simulations / minimizations.
# all generated data will be discarded, but the final coordinates will be used
# to initialize the production simulation
equilibrate_config = [
    smee.mm.MinimizationConfig(),
    # short NVT equilibration simulation
    smee.mm.SimulationConfig(
        temperature=temperature,
        pressure=None,
        n_steps=50000,
        timestep=1.0 * openmm.unit.femtosecond,
    ),
    # short NPT equilibration simulation
    smee.mm.SimulationConfig(
        temperature=temperature,
        pressure=pressure,
        n_steps=50000,
        timestep=1.0 * openmm.unit.femtosecond,
    ),
]
# long NPT production simulation
production_config = smee.mm.SimulationConfig(
    temperature=temperature,
    pressure=pressure,
    n_steps=500000,
    timestep=2.0 * openmm.unit.femtosecond,
)

We will further define a convenience function that will first simulate the system of interest (storing the trajectory in a temporary directory), and then compute ensemble averages over that trajectory:

In [ ]:

Copied!





import pathlib

import torch


def compute_ensemble_averages(
    system: smee.TensorSystem, force_field: smee.TensorForceField
) -> dict[str, torch.Tensor]:
    # computing the ensemble averages is a two step process - we first need to run
    # an MD simulation using the force field making sure to store the coordinates,
    # box vectors and kinetic energies
    coords, box_vectors = smee.mm.generate_system_coords(system, force_field)

    interval = 1000

    # save the simulation output every 1000th frame (2 ps) to a temporary file.
    # we could also save the trajectory more permanently, but as we do nothing
    # with it after computing the averages in this example, we simply want to
    # discard it.
    with (
        tempfile.NamedTemporaryFile() as tmp_file,
        smee.mm.tensor_reporter(tmp_file.name, interval, beta, pressure) as reporter,
    ):
        smee.mm.simulate(
            system,
            force_field,
            coords,
            box_vectors,
            equilibrate_config,
            production_config,
            [reporter],
        )

        # we can then compute the ensemble averages from the trajectory. generating
        # the trajectory separately from computing the ensemble averages allows us
        # to run the simulation in parallel with other simulations more easily, without
        # having to worry about copying gradients between workers / processes.
        avgs, stds = smee.mm.compute_ensemble_averages(
            system, force_field, pathlib.Path(tmp_file.name), temperature, pressure
        )
        return avgs
import pathlib

import torch


def compute_ensemble_averages(
    system: smee.TensorSystem, force_field: smee.TensorForceField
) -> dict[str, torch.Tensor]:
    # computing the ensemble averages is a two step process - we first need to run
    # an MD simulation using the force field making sure to store the coordinates,
    # box vectors and kinetic energies
    coords, box_vectors = smee.mm.generate_system_coords(system, force_field)

    interval = 1000

    # save the simulation output every 1000th frame (2 ps) to a temporary file.
    # we could also save the trajectory more permanently, but as we do nothing
    # with it after computing the averages in this example, we simply want to
    # discard it.
    with (
        tempfile.NamedTemporaryFile() as tmp_file,
        smee.mm.tensor_reporter(tmp_file.name, interval, beta, pressure) as reporter,
    ):
        smee.mm.simulate(
            system,
            force_field,
            coords,
            box_vectors,
            equilibrate_config,
            production_config,
            [reporter],
        )

        # we can then compute the ensemble averages from the trajectory. generating
        # the trajectory separately from computing the ensemble averages allows us
        # to run the simulation in parallel with other simulations more easily, without
        # having to worry about copying gradients between workers / processes.
        avgs, stds = smee.mm.compute_ensemble_averages(
            system, force_field, pathlib.Path(tmp_file.name), temperature, pressure
        )
        return avgs

Computing the ensemble averages is then as simple as:

In [ ]:

Copied!





# run simulations of each system and compute ensemble averages over the trajectories
# of the potential energy, volume, and density
ethanol_avgs = compute_ensemble_averages(system_ethanol, tensor_ff)
methanol_avgs = compute_ensemble_averages(system_methanol, tensor_ff)
mixture_avgs = compute_ensemble_averages(system_mixture, tensor_ff)
# run simulations of each system and compute ensemble averages over the trajectories
# of the potential energy, volume, and density
ethanol_avgs = compute_ensemble_averages(system_ethanol, tensor_ff)
methanol_avgs = compute_ensemble_averages(system_methanol, tensor_ff)
mixture_avgs = compute_ensemble_averages(system_mixture, tensor_ff)

Each of the returned values is a dictionary of ensemble averages computed over the simulated production trajectory. This currently includes the potential energy, volume, and density of the system.

These averages can be used in a loss function

In [ ]:

Copied!





# define some MOCK data and loss function
mock_ethanol_density = 0.789  # g/mL
mock_methanol_density = 0.791  # g/mL

mock_enthalpy_of_mixing = 0.891  # kcal/mol

loss = (ethanol_avgs["density"] - mock_ethanol_density) ** 2
loss += (methanol_avgs["density"] - mock_methanol_density) ** 2

mixture_enthalpy = mixture_avgs["enthalpy"] / 256

ethanol_enthalpy = ethanol_avgs["enthalpy"] / 128
methanol_enthalpy = methanol_avgs["enthalpy"] / 128

enthalpy_of_mixing = mixture_enthalpy - (
    0.5 * ethanol_enthalpy + 0.5 * methanol_enthalpy
)
loss += (enthalpy_of_mixing - mock_enthalpy_of_mixing) ** 2
# define some MOCK data and loss function
mock_ethanol_density = 0.789  # g/mL
mock_methanol_density = 0.791  # g/mL

mock_enthalpy_of_mixing = 0.891  # kcal/mol

loss = (ethanol_avgs["density"] - mock_ethanol_density) ** 2
loss += (methanol_avgs["density"] - mock_methanol_density) ** 2

mixture_enthalpy = mixture_avgs["enthalpy"] / 256

ethanol_enthalpy = ethanol_avgs["enthalpy"] / 128
methanol_enthalpy = methanol_avgs["enthalpy"] / 128

enthalpy_of_mixing = mixture_enthalpy - (
    0.5 * ethanol_enthalpy + 0.5 * methanol_enthalpy
)
loss += (enthalpy_of_mixing - mock_enthalpy_of_mixing) ** 2

and the gradient of this loss function with respect to the force field parameters can be computed through backpropagation:

In [ ]:

Copied!

loss.backward()

epsilon_col = vdw_potential.parameter_cols.index("epsilon")
sigma_col = vdw_potential.parameter_cols.index("sigma")

print("VdW Ɛ Gradients", vdw_potential.parameters.grad[:, epsilon_col])
print("VdW σ Gradients", vdw_potential.parameters.grad[:, sigma_col])
loss.backward()

epsilon_col = vdw_potential.parameter_cols.index("epsilon")
sigma_col = vdw_potential.parameter_cols.index("sigma")

print("VdW Ɛ Gradients", vdw_potential.parameters.grad[:, epsilon_col])
print("VdW σ Gradients", vdw_potential.parameters.grad[:, sigma_col])