I/O of Molecular Dynamics Trajectories#

This notebook demonstrates how to read, inspect, manipulate, and write molecular dynamics trajectory files in XYZ format using FishMol.

Topics covered:

  1. Reading a trajectory from an XYZ file

  2. Visualising frames

  3. Calibrating the trajectory by fixing the centre of mass

  4. Wrapping atoms back into the simulation box (periodic boundary conditions)

  5. Writing a modified trajectory back to disk

  6. Automatic molecule recognition by bond topology

  7. Filtering and selecting atoms by species or index

Check if FishMol is successfully installed#

!fishmol
                                                                                       
                  Welcome!                    ▄▄█▀                  FishMol
                                          ▄▄███▀                 version 0.0.1
      ○                                ▄▄█████▀                          ○
           ○                        ▄▄████████▄                         /
                                ▄▄▄████████████▄                    ○--○
         ○                ▄▄▄████████████████████▄▄                     \ 
                    ▄▄▄██████▀███████████████████████▄▄                  ○--○           ▄
                ▄▄████████████ ██████████████████████████▄▄             /           ▄▄█▀
         ○   ▄█████████████████ ████████████████████████████▄▄         ○         ▄███▀
          ▄██████████▀▀▀████████ ██████████████████████████████▄▄             ▄█████▀
         ▄████████▀   ○  ▀███████ █████████████████████████████████▄▄      ▄▄█████▀
         ■▄███████▄      ▄███████ █████████████████████████████████████▄▄▄███████▀
          ▀█████████▄▄▄█████████ ███████████████████████████████████████████████
            ▀██████████████████ ████████████████████████████████████████████████▄
              ▀██████████████▀▄███████████████████████████████████▀▀▀▀▀████████████▄
                ▀▀█████████▀▄███████████████████████████████▀▀           ▀▀████████▀
                   ▀▀█████▄████████████████████████▀▀▀▀██▀                 ▀▀████▀
                        ▀▀▀█████████████████▀▀▀▀        ▀■                    ▀█▀
                             ▀▀▀███████▀▀
                                 ▀████
                                    ▀▀▄                Contact: Lei.Lei@durham.ac.uk
import numpy as np
import matplotlib.pyplot as plt
from fishmol import trj
from fishmol import vis

Read a Trajectory File#

FishMol reads trajectories in the extended XYZ format, where each frame begins with the atom count, followed by a comment line, followed by one symbol x y z line per atom.

Two mandatory arguments:

  • timestep — time interval between frames in femtoseconds (fs). Defaults to 5 fs; set it explicitly to match your MD output frequency.

  • data — path to the XYZ file.

Optional arguments:

  • cell — fallback simulation cell supplied as a 3×3 matrix of lattice vectors (rows = a, b, c) in Å. Needed only for NVT files that do not embed a Lattice= field in their comment lines (e.g. CP2K FORMAT XYZ output). For NPT files that carry a per-frame Lattice= field this argument can be omitted entirely — the cell is read automatically from each frame.

  • index — frame selection. ":" loads all frames; a slice object (e.g. slice(0, 1000, 2)) loads every other frame from 0 to 1000.

The cell below loads a short test trajectory (100 frames, 516 atoms per frame, 5 fs timestep). Because this file is a raw CP2K output that does not contain Lattice= in its comment lines, the triclinic simulation box must be supplied explicitly via the cell argument.

%%time
cell= [
    [21.2944000000,        0.0000000000,        0.0000000000],
    [-4.6030371123,       20.7909480472,        0.0000000000],
    [-0.9719093466,       -1.2106211379,       15.1054299403]
]

# cell is required here: this raw CP2K file has no Lattice= in each frame comment
traj = trj.Trajectory(timestep = 5, data = "test/cage1_test_traj.xyz", index = ":", cell = cell)
CPU times: user 217 ms, sys: 10.6 ms, total: 228 ms
Wall time: 228 ms

Visualise the System#

vis.render_atoms() embeds an interactive 3-D viewer in the notebook.
Mouse controls: left-drag → rotate · right-drag / scroll → zoom · middle-drag → pan · A → re-centre view.

vis.render_atoms(traj.frames[0]) # Press 'A' to centre the view, left: rotate, right: zoom, middle: move
ASE atomic visualization

Accessing Frames#

traj.frames is a Python list of Atoms objects, one per frame.
Each Atoms object exposes:

  • .symbs — NumPy array of element symbols (shape (N,))

  • .pos — NumPy array of Cartesian coordinates in Ångström (shape (N, 3))

  • .cell — the associated simulation cell (lattice vectors)

  • .basis — coordinate basis ('Cartesian' or 'Crystal')

traj.frames[1] # The first frame in the trajectory, which is an fishmol Atoms object
Atoms(<generator object Atoms.__new__.<locals>.<genexpr> at 0x7fef095d1900>,
      dtype=object)

Get the atomic symbol and position#

traj.frames[1].symbs
array(['F', 'F', 'F', 'O', 'O', 'C', 'C', 'F', 'F', 'F', 'O', 'O', 'C',
       'C', 'O', 'H', 'H', 'O', 'H', 'H', 'N', 'N', 'N', 'N', 'O', 'H',
       'H', 'H', 'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H', 'C', 'C', 'H',
       'C', 'H', 'C', 'H', 'C', 'H', 'C', 'C', 'C', 'H', 'C', 'H', 'C',
       'C', 'H', 'C', 'H', 'C', 'C', 'C', 'H', 'C', 'H', 'C', 'C', 'H',
       'C', 'H', 'C', 'C', 'H', 'C', 'H', 'C', 'C', 'H', 'C', 'H', 'C',
       'C', 'H', 'C', 'H', 'C', 'H', 'C', 'H', 'C', 'C', 'H', 'H', 'C',
       'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H', 'C', 'H',
       'H', 'C', 'H', 'H', 'C', 'C', 'H', 'C', 'H', 'C', 'H', 'C', 'H',
       'C', 'C', 'C', 'H', 'C', 'H', 'C', 'C', 'H', 'C', 'H', 'H', 'F',
       'F', 'F', 'O', 'O', 'C', 'C', 'F', 'F', 'F', 'O', 'O', 'C', 'C',
       'O', 'H', 'H', 'O', 'H', 'H', 'N', 'N', 'N', 'N', 'O', 'H', 'H',
       'H', 'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H', 'C', 'C', 'H', 'C',
       'H', 'C', 'H', 'C', 'H', 'C', 'C', 'C', 'H', 'C', 'H', 'C', 'C',
       'H', 'C', 'H', 'C', 'C', 'C', 'H', 'C', 'H', 'C', 'C', 'H', 'C',
       'H', 'C', 'C', 'H', 'C', 'H', 'C', 'C', 'H', 'C', 'H', 'C', 'C',
       'H', 'C', 'H', 'C', 'H', 'C', 'H', 'C', 'C', 'H', 'H', 'C', 'H',
       'H', 'C', 'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H',
       'C', 'H', 'H', 'C', 'C', 'H', 'C', 'H', 'C', 'H', 'C', 'H', 'C',
       'C', 'C', 'H', 'C', 'H', 'C', 'C', 'H', 'C', 'H', 'H', 'F', 'F',
       'F', 'O', 'O', 'C', 'C', 'F', 'F', 'F', 'O', 'O', 'C', 'C', 'O',
       'H', 'H', 'O', 'H', 'H', 'N', 'N', 'N', 'N', 'O', 'H', 'H', 'H',
       'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H', 'C', 'C', 'H', 'C', 'H',
       'C', 'H', 'C', 'H', 'C', 'C', 'C', 'H', 'C', 'H', 'C', 'C', 'H',
       'C', 'H', 'C', 'C', 'C', 'H', 'C', 'H', 'C', 'C', 'H', 'C', 'H',
       'C', 'C', 'H', 'C', 'H', 'C', 'C', 'H', 'C', 'H', 'C', 'C', 'H',
       'C', 'H', 'C', 'H', 'C', 'H', 'C', 'C', 'H', 'H', 'C', 'H', 'H',
       'C', 'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H', 'C',
       'H', 'H', 'C', 'C', 'H', 'C', 'H', 'C', 'H', 'C', 'H', 'C', 'C',
       'C', 'H', 'C', 'H', 'C', 'C', 'H', 'C', 'H', 'H', 'F', 'F', 'F',
       'O', 'O', 'C', 'C', 'F', 'F', 'F', 'O', 'O', 'C', 'C', 'O', 'H',
       'H', 'O', 'H', 'H', 'N', 'N', 'N', 'N', 'O', 'H', 'H', 'H', 'H',
       'H', 'C', 'H', 'H', 'C', 'H', 'H', 'C', 'C', 'H', 'C', 'H', 'C',
       'H', 'C', 'H', 'C', 'C', 'C', 'H', 'C', 'H', 'C', 'C', 'H', 'C',
       'H', 'C', 'C', 'C', 'H', 'C', 'H', 'C', 'C', 'H', 'C', 'H', 'C',
       'C', 'H', 'C', 'H', 'C', 'C', 'H', 'C', 'H', 'C', 'C', 'H', 'C',
       'H', 'C', 'H', 'C', 'H', 'C', 'C', 'H', 'H', 'C', 'H', 'H', 'C',
       'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H', 'C', 'H',
       'H', 'C', 'C', 'H', 'C', 'H', 'C', 'H', 'C', 'H', 'C', 'C', 'C',
       'H', 'C', 'H', 'C', 'C', 'H', 'C', 'H', 'H'], dtype='<U2')
traj.frames[1].pos
array([[ 7.75433428, 19.22969115,  7.25789922],
       [ 7.38459724, 17.89872552,  5.49949104],
       [ 6.01174112, 17.83305553,  7.17866554],
       ...,
       [-6.55112186, 13.02095143,  1.36287726],
       [-6.47623272, 11.92016128,  1.43990435],
       [-2.75848988, 12.44274372,  3.11918468]])

Calibrate by Centre of Mass#

In long MD runs the entire system can drift bodily across the simulation box. traj.calib() removes this centre-of-mass (CoM) drift by translating every frame so that its CoM matches the CoM of frame 0.

This is important before computing absolute displacements (MSD) or comparing atom positions across frames. It does not affect any internal geometry or periodic images.

Simply use the calib function

calibrated_traj = traj.calib()
calibrated_traj.frames[1].pos
array([[ 7.75432882, 19.22971761,  7.25787368],
       [ 7.38459178, 17.89875198,  5.4994655 ],
       [ 6.01173566, 17.83308199,  7.17864   ],
       ...,
       [-6.55112732, 13.02097789,  1.36285172],
       [-6.47623818, 11.92018774,  1.43987881],
       [-2.75849534, 12.44277018,  3.11915914]])

Wrap Atoms into the Simulation Box#

Under periodic boundary conditions atoms can wander outside the nominal box. traj.wrap2box() folds every atom back inside using fractional coordinates.

Parameters:

  • center — fractional coordinate of the box centre (default (0.5, 0.5, 0.5)). Atoms are placed in the range [center − 0.5, center + 0.5] along each axis.

  • pretty_translation — if True, minimises the spread of fractional coordinates (useful for molecules that straddle a box boundary).

Wrapping is required before any neighbour-search, RDF, or bonding analysis that relies on minimum-image convention distances.

traj.wrap2box()
<fishmol.trj.Trajectory at 0x7fef09572280>
vis.render_atoms(traj.frames[0])
ASE atomic visualization

Save a Trajectory to an XYZ File#

traj.write() writes the trajectory (or any filtered subset of frames/atoms) to an extended XYZ file. If the file already exists, -1 is appended to the filename automatically to avoid overwriting.

traj.write(filename = "test/cage1_test_wrapped.xyz")

Read the file just saved#

# FishMol-written files embed Lattice= in every frame comment — no cell argument needed
traj_1 = traj = trj.Trajectory(timestep = 5, data = "test/cage1_test_wrapped.xyz", index = ":")
vis.render_atoms(traj_1.frames[0])
ASE atomic visualization

NPT Trajectories#

When the trajectory file contains a Lattice= field in every frame comment (NPT output from CP2K, LAMMPS, VASP, …) the cell is read automatically from each frame — no cell argument is needed:

traj_npt = trj.Trajectory(timestep=5, data="npt.xyz")

Each Atoms frame carries its own per-frame cell accessible via frame.cell.lattice. Use traj.is_npt to check programmatically whether consecutive frames have detectably different cells:

Property

Description

traj.cell

Cell of the first frame (3×3 array, backward-compatible convenience accessor)

traj.frames[i].cell.lattice

Per-frame cell at frame i (correct for NPT)

traj.is_npt

True when the first and last frame cells differ detectably

# Query the cell info of the NVT trajectory loaded above
print("First frame cell:")
print(traj.cell)          # same as traj.frames[0].cell.lattice
print("Is NPT (varying cell):", traj.is_npt)

Automatic Molecule Recognition#

cluster() identifies individual molecules from a single frame by building a bonding graph. Two atoms are considered bonded when their distance is less than 1.05 × (r₁ + r₂) where r₁, r₂ are their covalent radii. Connected components of this graph are the molecules.

Set mic=True to apply the minimum image convention — essential when molecules straddle periodic box boundaries.

Each returned Molecule object has:

  • .formula — empirical formula string (e.g. 'H2O')

  • .at_idx — list of atom indices belonging to this molecule

cell= [
    [21.2944000000,        0.0000000000,        0.0000000000],
    [-4.6030371123,       20.7909480472,        0.0000000000],
    [-0.9719093466,       -1.2106211379,       15.1054299403]
]

# cell is required here: this raw CP2K file has no Lattice= in each frame comment
traj = trj.Trajectory(timestep = 5, data = "test/cage1_test_traj.xyz", index = ":", cell = cell)
from fishmol.sel_tools import cluster
molecules = cluster(traj.frames[0], mic = True)
molecules
[Molecule(formula='O2F3C2', at_idx=[0, 1, 2, 3, 4, 5, 6]),
 Molecule(formula='O2F3C2', at_idx=[7, 8, 9, 10, 11, 12, 13]),
 Molecule(formula='H2O', at_idx=[16, 14, 15]),
 Molecule(formula='H2O', at_idx=[17, 18, 19]),
 Molecule(formula='H104O2N8C104', at_idx=[512, 513, 514, 515, 20, 21, 22, 23, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 278, 279, 280, 281, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511]),
 Molecule(formula='H104O2N8C104', at_idx=[24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 149, 150, 151, 152, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 407, 408, 409, 410]),
 Molecule(formula='O2F3C2', at_idx=[129, 130, 131, 132, 133, 134, 135]),
 Molecule(formula='O2F3C2', at_idx=[136, 137, 138, 139, 140, 141, 142]),
 Molecule(formula='H2O', at_idx=[144, 145, 143]),
 Molecule(formula='H2O', at_idx=[146, 147, 148]),
 Molecule(formula='O2F3C2', at_idx=[258, 259, 260, 261, 262, 263, 264]),
 Molecule(formula='O2F3C2', at_idx=[265, 266, 267, 268, 269, 270, 271]),
 Molecule(formula='H2O', at_idx=[272, 273, 274]),
 Molecule(formula='H2O', at_idx=[275, 276, 277]),
 Molecule(formula='O2F3C2', at_idx=[387, 388, 389, 390, 391, 392, 393]),
 Molecule(formula='O2F3C2', at_idx=[394, 395, 396, 397, 398, 399, 400]),
 Molecule(formula='H2O', at_idx=[401, 402, 403]),
 Molecule(formula='H2O', at_idx=[404, 405, 406])]

You can pass the atom indices to the frame of a trajectory to access the information of the molecule

# e.g. view the second molecule
vis.render_atoms(traj.frames[0][molecules[4].at_idx])
ASE atomic visualization
vis.render_atoms(traj.frames[0][molecules[5].at_idx])
ASE atomic visualization

Filtering the Trajectory#

Selecting Atoms#

Atoms.at_sel() returns a tuple (indices, atoms_subset).
The selector argument can be:

  • A string — selects all atoms with that element symbol

  • A list of strings — union of multiple element symbols

  • A list of integers — explicit atom indices

  • A slice — index range

  • Pass inverse_select=True to invert the selection

Select by atomic symbol#

idx, atoms = traj.frames[0].at_sel(["O","H"])
print(idx)

Keep Only Water Molecules#

Use the molecule indices from cluster() to build a flat list of all water-atom indices, then apply it to every frame to create a water-only trajectory.

waters = [mol.at_idx for mol in molecules if mol.formula == "H2O"]
waters = [idx for at_idx in waters for idx in at_idx]
waters
[16,
 14,
 15,
 17,
 18,
 19,
 144,
 145,
 143,
 146,
 147,
 148,
 272,
 273,
 274,
 275,
 276,
 277,
 401,
 402,
 403,
 404,
 405,
 406]
traj.frames[0][waters]
Atoms(<generator object Atoms.__new__.<locals>.<genexpr> at 0x7fd66e4fe430>,
      dtype=object)
vis.render_atoms(traj.frames[0][waters])
ASE atomic visualization
water_frames = [frame[waters] for frame in traj.frames]
traj.frames = water_frames
vis.render_atoms(traj.frames[0])
ASE atomic visualization