-
Notifications
You must be signed in to change notification settings - Fork 10
Add simulation extraction and viewing notebook #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
0562cfb
2169fe2
c8d5ea6
3c77455
1251354
9342a9c
9980155
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,316 @@ | ||
| { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Try adding the
Also this is a big nc file but the hybrid system doesn't include water. Would it be worth pulling down one of the smaller NC files? (I think we have newer generated ones?) Reply via ReviewNB
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure that the image will show up in the final html render. Can we embed the image somehow? I know that was done in the MDAnalysis UserGuide. Reply via ReviewNB
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Line #3. state_atoms = np.array([atom.ix for atom in u_0.atoms if atom.bfactor in (bfactor, 0.5, 0.0)])
Couple of things:
Reply via ReviewNB
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm thinking, in MDAnalysis, it might be easy enough to align the trajectory to the coordinates of the PDB file and write that out. That's technically "centering" for free.
Also I'd be happy with merging this as-is, but opening a separate issue to try to make an MDAnalysis-only solution work. I'm sure it's doable, just needs the right set of transformations. Reply via ReviewNB |
||
| "cells": [ | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "20a8d98a", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Extracting and visualising a free energy simulation\n", | ||
| "\n", | ||
| "This notebook provides a step-by-step guide to extract and visualise a free energy simulation trajectory from a ``simulation.nc`` file using [openfe-analysis](https://github.com/OpenFreeEnergy/openfe_analysis), [MDAnalysis](https://github.com/MDAnalysis/mdanalysis) and [mdtraj](https://github.com/mdtraj/mdtraj). By the end, you should understand how to:\n", | ||
| "\n", | ||
| "1. Extract the trajectory of a ``replica`` or ``single lambda state`` from a ``simulation.nc`` file\n", | ||
| "2. For a given hybrid topology trajectory, extract the relevant atom positions for the end states using `MDAnalysis`\n", | ||
| "3. Write out the trajectorie(s) using `MDAnalysis`\n", | ||
| "4. Centre the ligand in the simulation box using `mdtraj`\n", | ||
| "\n", | ||
| "## Downloading the example data\n", | ||
| "\n", | ||
| "First, download some example trajectory data. This may take a few minutes due to the size of the simulation file. Please skip this section if you have already done this!" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 1, | ||
| "id": "be6317291bbf3804", | ||
| "metadata": {}, | ||
| "outputs": [ | ||
| { | ||
| "name": "stdout", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "--2025-09-22 13:47:47-- https://zenodo.org/records/15375081/files/simulation.nc\n", | ||
| "Resolving zenodo.org (zenodo.org)... 2001:1458:d00:25::100:372, 2001:1458:d00:61::100:2f3, 2001:1458:d00:24::100:f6, ...\n", | ||
| "Connecting to zenodo.org (zenodo.org)|2001:1458:d00:25::100:372|:443... connected.\n", | ||
| "HTTP request sent, awaiting response... 200 OK\n", | ||
| "Length: 516886878 (493M) [application/octet-stream]\n", | ||
| "Saving to: ‘simulation.nc.2’\n", | ||
| "\n", | ||
| "simulation.nc.2 7%[> ] 37.40M 6.76MB/s eta 88s ^C\n", | ||
| "--2025-09-22 13:47:55-- https://zenodo.org/records/15375081/files/hybrid_system.pdb\n", | ||
| "Resolving zenodo.org (zenodo.org)... 2001:1458:d00:61::100:2f3, 2001:1458:d00:24::100:f6, 2001:1458:d00:25::100:372, ...\n", | ||
| "Connecting to zenodo.org (zenodo.org)|2001:1458:d00:61::100:2f3|:443... connected.\n", | ||
| "HTTP request sent, awaiting response... 200 OK\n", | ||
| "Length: 388547 (379K) [application/octet-stream]\n", | ||
| "Saving to: ‘hybrid_system.pdb.2’\n", | ||
| "\n", | ||
| "hybrid_system.pdb.2 100%[===================>] 379.44K 1.29MB/s in 0.3s \n", | ||
| "\n", | ||
| "2025-09-22 13:47:56 (1.29 MB/s) - ‘hybrid_system.pdb.2’ saved [388547/388547]\n", | ||
| "\n" | ||
| ] | ||
| } | ||
| ], | ||
| "source": [ | ||
| "! wget https://zenodo.org/records/15375081/files/simulation.nc\n", | ||
| "! wget https://zenodo.org/records/15375081/files/hybrid_system.pdb" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "5d066221-0b8e-4b1b-a047-c989a5957cf7", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Extracting the trajectory with `MDAnalysis`\n", | ||
| "\n", | ||
| "The `openfe-analysis` package provides an `MDAnalysis` reader to help extract the trajectory data from the `simulation.nc` file. As the file contains multipule replicas simulated at different lambda states, we must choose which of these to load as a single trajectory. We have two options available to construct the trajectory:\n", | ||
| "- `state_id`: will construct a trajectory which follows a single Hamiltonian lambda state at the specified value.\n", | ||
| "- `recplica_id`: will construct a trajectory which follows a single replica at the specified value.\n", | ||
| "\n", | ||
| "In this example which uses a trajectory from a relative binding free energy calculation we will load the trajectory at `lambda=0` or the end state corresponding to Ligand A and visulaise the trajectory with `nglview`." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 28, | ||
| "id": "05ba7dc7", | ||
| "metadata": {}, | ||
| "outputs": [ | ||
| { | ||
| "name": "stderr", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "/Users/joshua/mambaforge/envs/openfe_dev/lib/python3.12/site-packages/openfe_analysis/utils/multistate.py:41: UserWarning: This is an older NetCDF file that does not yet contain information about the write frequency of positions and velocities. We will assume that positions and velocities were written out at every iteration. \n", | ||
| " warnings.warn(wmsg)\n" | ||
| ] | ||
| }, | ||
| { | ||
| "data": { | ||
| "application/vnd.jupyter.widget-view+json": { | ||
| "model_id": "e5a30cb71c014c3c9831cf2f3723ea72", | ||
| "version_major": 2, | ||
| "version_minor": 0 | ||
| }, | ||
| "text/plain": [ | ||
| "NGLWidget(max_frame=500)" | ||
| ] | ||
| }, | ||
| "metadata": {}, | ||
| "output_type": "display_data" | ||
| } | ||
| ], | ||
| "source": [ | ||
| "import MDAnalysis as mda\n", | ||
| "import mdtraj as md\n", | ||
| "from openfe_analysis import FEReader\n", | ||
| "import nglview as nv\n", | ||
| "import numpy as np\n", | ||
| "\n", | ||
| "u_0 = mda.Universe(\"hybrid_system.pdb\", \"simulation.nc\", format=FEReader, state_id=0)\n", | ||
| "\n", | ||
| "w = nv.show_mdanalysis(u_0)\n", | ||
| "w" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "f0e7b0e7-04b5-42c0-ab65-2202275288d4", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "<div class=\\\"alert alert-block alert-info\\\"> <b>Note:</b> The OpenFE relative binding free energy protocol does not save water positions by default, this can be changed via the <a href=\"https://docs.openfree.energy/en/latest/reference/api/openmm_protocol_settings.html#openfe.protocols.openmm_utils.omm_settings.MultiStateOutputSettings.output_indices\">output_indices</a> protocol setting. </div>\n", | ||
| "\n", | ||
| "\n", | ||
| "To view the final state at `lambda=1` we can use negative indexing if we don't know the total number of lambda states." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 29, | ||
| "id": "009a8ac7-0ca9-494c-bbd6-177e752b0d4f", | ||
| "metadata": {}, | ||
| "outputs": [ | ||
| { | ||
| "name": "stderr", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "/Users/joshua/mambaforge/envs/openfe_dev/lib/python3.12/site-packages/openfe_analysis/utils/multistate.py:41: UserWarning: This is an older NetCDF file that does not yet contain information about the write frequency of positions and velocities. We will assume that positions and velocities were written out at every iteration. \n", | ||
| " warnings.warn(wmsg)\n" | ||
| ] | ||
| }, | ||
| { | ||
| "data": { | ||
| "application/vnd.jupyter.widget-view+json": { | ||
| "model_id": "e9e8ec05be4c4075ad1b13bc9be88c6c", | ||
| "version_major": 2, | ||
| "version_minor": 0 | ||
| }, | ||
| "text/plain": [ | ||
| "NGLWidget(max_frame=500)" | ||
| ] | ||
| }, | ||
| "metadata": {}, | ||
| "output_type": "display_data" | ||
| } | ||
| ], | ||
| "source": [ | ||
| "u_1 = mda.Universe(\"hybrid_system.pdb\", \"simulation.nc\", format=FEReader, state_id=-1)\n", | ||
| "\n", | ||
| "w = nv.show_mdanalysis(u_1)\n", | ||
| "w.center()\n", | ||
| "w" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "cbc5534f-846c-486c-9b8b-01e1bab8d880", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Extracting the end state positions with `MDAnalysis` \n", | ||
| "\n", | ||
| "The trajectory data stored in the `simulation.nc` file contains the positions of the end-state ligands in their hybrid topology format. This means only atoms that are unique to the end-states have individual positions, with conserved core atoms sharing a single set of positions. As you might have noticed in the visualisation above, this can complicate the analysis and visualisation of the protein-ligand interactions. However, we can identify the atoms relevant to the end states or core atoms using the beta factors in the topology file:\n", | ||
| "\n", | ||
| "- `0.0`: The non-alchemical atoms (protein, solvent, etc)\n", | ||
| "- `0.25`: The unique atoms of state A\n", | ||
| "- `0.5`: The conserved core atoms present in both end states\n", | ||
| "- `0.75`: The unique atoms of state B\n", | ||
| "\n", | ||
| "With this information, we can easily extract the atom positions relevant to `state A` for `lambda=0`:" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 30, | ||
| "id": "0ac90329-4a4a-4c76-be44-447c7ef9bb8b", | ||
| "metadata": {}, | ||
| "outputs": [ | ||
| { | ||
| "name": "stderr", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "/Users/joshua/mambaforge/envs/openfe_dev/lib/python3.12/site-packages/MDAnalysis/core/topologyattrs.py:329: DeprecationWarning: The bfactor topology attribute is only provided as an alias to the tempfactor attribute. It will be removed in 3.0. Please use the tempfactor attribute instead.\n", | ||
| " warnings.warn(BFACTOR_WARNING, DeprecationWarning)\n" | ||
| ] | ||
| }, | ||
| { | ||
| "data": { | ||
| "application/vnd.jupyter.widget-view+json": { | ||
| "model_id": "8d08590641d041e9bda6b83b21c8d95e", | ||
| "version_major": 2, | ||
| "version_minor": 0 | ||
| }, | ||
| "text/plain": [ | ||
| "NGLWidget(max_frame=500)" | ||
| ] | ||
| }, | ||
| "metadata": {}, | ||
| "output_type": "display_data" | ||
| } | ||
| ], | ||
| "source": [ | ||
| "# get atoms for state A\n", | ||
| "bfactor = 0.25\n", | ||
| "state_atoms = np.array([atom.ix for atom in u_0.atoms if atom.bfactor in (bfactor, 0.5, 0.0)])\n", | ||
| "state = u_0.atoms[state_atoms]\n", | ||
| "\n", | ||
| "w = nv.show_mdanalysis(state)\n", | ||
| "\n", | ||
| "w" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "7026738c-e4fe-461e-85ca-f726e9768a5e", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Saving the trajectory to file with `MDAnalysis`\n", | ||
| "\n", | ||
| "We can now use `MDAnalysis` to save the trajectory of the `state A` atoms to a common file format, note that we will also need to write out a new topology file that can be used to load this trajectory:" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 31, | ||
| "id": "6e410679", | ||
| "metadata": {}, | ||
| "outputs": [ | ||
| { | ||
| "name": "stderr", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "/Users/joshua/mambaforge/envs/openfe_dev/lib/python3.12/site-packages/MDAnalysis/coordinates/PDB.py:1154: UserWarning: Found no information for attr: 'formalcharges' Using default value of '0'\n", | ||
| " warnings.warn(\"Found no information for attr: '{}'\"\n" | ||
| ] | ||
| } | ||
| ], | ||
| "source": [ | ||
| "# write a new PDB topology file for the state A atoms only\n", | ||
| "state.write(\"state_a_topology.pdb\")\n", | ||
| "# write the trajectory to an xtc file\n", | ||
| "with mda.Writer('out.xtc', n_atoms=len(state.atoms)) as w:\n", | ||
| " for ts in u_0.trajectory:\n", | ||
| " w.write(u_0.atoms[state_atoms])" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "75602d58-6709-4d9b-8a52-9896fdc6a0af", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Centering the Ligand with `mdtraj`\n", | ||
| "\n", | ||
| "You may have noticed in the view above that the ligand seems to have drifted away from the protein, this is a visualisation artifact caused by the use of periodic boundary conditions and the way in which `OpenMM` tries to ensure that all particle positions are written into a single periodic box. We can fix this, however, using `mdtraj` and the [image_molecules](https://mdtraj.org/1.9.3/api/generated/mdtraj.Trajectory.html?highlight=image_molecules#mdtraj.Trajectory.image_molecules) function:" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 33, | ||
| "id": "a5f16795-1b9c-4198-8694-6568eaba06c7", | ||
| "metadata": {}, | ||
| "outputs": [ | ||
| { | ||
| "data": { | ||
| "application/vnd.jupyter.widget-view+json": { | ||
| "model_id": "c967de9e5a9a4ac688f2e74e25ede21f", | ||
| "version_major": 2, | ||
| "version_minor": 0 | ||
| }, | ||
| "text/plain": [ | ||
| "NGLWidget(max_frame=500)" | ||
| ] | ||
| }, | ||
| "metadata": {}, | ||
| "output_type": "display_data" | ||
| } | ||
| ], | ||
| "source": [ | ||
| "traj = md.load_xtc(\"out.xtc\", top=\"state_a_topology.pdb\")\n", | ||
| "traj = traj.image_molecules()\n", | ||
| "\n", | ||
| "w = nv.show_mdtraj(traj)\n", | ||
| "\n", | ||
| "w" | ||
| ] | ||
| } | ||
| ], | ||
| "metadata": { | ||
| "kernelspec": { | ||
| "display_name": "Python 3 (ipykernel)", | ||
| "language": "python", | ||
| "name": "python3" | ||
| }, | ||
| "language_info": { | ||
| "codemirror_mode": { | ||
| "name": "ipython", | ||
| "version": 3 | ||
| }, | ||
| "file_extension": ".py", | ||
| "mimetype": "text/x-python", | ||
| "name": "python", | ||
| "nbconvert_exporter": "python", | ||
| "pygments_lexer": "ipython3", | ||
| "version": "3.12.11" | ||
| } | ||
| }, | ||
| "nbformat": 4, | ||
| "nbformat_minor": 5 | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make it clear here what Protocols this applies to, i.e. this is for HybridTop mostly (i.e. the tempfactor stuff is hybridtop only) but also applies for the most part to ABFEs, AHFEs, and SepTop.
Reply via ReviewNB