Skip to main content

Overview

Sandbox prediction jobs can be easily submitted to Boltz Lab via the CLI or Python SDK. The input is provided as a .yaml file describing the complex (chains/molecules), optional constraints and properties. The format is largely aligned with that of our open source Boltz-2 model.
Custom MSAs or templates are currently not supported in Boltz Lab Sandbox jobs.

Top-level schema

version: 1            # optional (recommended if supported by your runner)
sequences:            # required
  - <entity>          # one entry per *unique* entity (polymer or ligand)
constraints:          # optional
  - <constraint>
properties:           # optional (e.g., affinity)
  - <property>

sequences

What it represents

One entry per unique chain/molecule in the complex.
  • Polymers: protein, dna, rna → require sequence
  • Ligands (non-polymers): ligand → require either smiles or ccd (mutually exclusive)
  • id is the chain/molecule identifier. If there are multiple identical entities, use a list (e.g., [A, B]).

Entity block schemas

Protein / DNA / RNA

- protein:                 # or dna, rna
    id: CHAIN_ID           # or [CHAIN_ID, CHAIN_ID] for identical copies
    sequence: SEQUENCE
    modifications:         # optional (polymer only)
      - position: RES_IDX  # 1-indexed
        ccd: CCD           # CCD code for modified residue
    cyclic: false          # optional (polymer only)

Ligand

- ligand:
    id: CHAIN_ID           # or [CHAIN_ID, CHAIN_ID] for identical copies
    smiles: "SMILES"       # ligand only; mutually exclusive with ccd
    # ccd: CCD             # ligand only; mutually exclusive with smiles

Modifications

modifications is optional and supported for polymers (protein, dna, rna):
  • position: residue index starting at 1
  • ccd: CCD code of the modified residue (currently supported only for CCD ligands)

Cyclic polymers

cyclic: true|false indicates whether a polymer chain is cyclic (not applicable to ligands).

constraints (optional)

Constraints add structural hints to the input.

Common indexing conventions

  • CHAIN_ID: the id you defined in sequences
  • RES_IDX: residue index starting from 1
  • For ligands, RES_IDX is 1
  • ATOM_NAME: standardized atom name (verify in the component’s CIF from RCSB)

1) bond

Covalent bond between two atoms.
- bond:
    atom1: [CHAIN_ID, RES_IDX, ATOM_NAME]
    atom2: [CHAIN_ID, RES_IDX, ATOM_NAME]

2) pocket

Defines a binding pocket (binding-site residues/atoms) for a binder chain.
- pocket:
    binder: CHAIN_ID
    contacts: [[CHAIN_ID, RES_IDX/ATOM_NAME], [CHAIN_ID, RES_IDX/ATOM_NAME]]
    max_distance: DIST_ANGSTROM
    force: false
Notes:
  • contacts entries are:
    • residues (with RES_IDX, 1-indexed), or
    • ligand atoms (with ATOM_NAME)
  • max_distance is the maximum distance (Å) between any atom in binder and any atom in each contact element.
    • Supported range: 4–20 Å, default 6 Å
  • If force: true, a potential enforces the pocket constraint.

3) contact

Forces a contact between two residues/atoms.
- contact:
    token1: [CHAIN_ID, RES_IDX/ATOM_NAME]
    token2: [CHAIN_ID, RES_IDX/ATOM_NAME]
    max_distance: DIST_ANGSTROM
    force: false
Notes:
  • max_distance supported range: 4–20 Å, default 6 Å
  • If force: true, a potential enforces the contact constraint.

properties (optional)

affinity

Enables affinity computation against a specified ligand chain.
- affinity:
    binder: CHAIN_ID
Rules/limits:
  • Only one small molecule can be specified.
  • Must be a ligand chain (not protein/DNA/RNA).
  • Size limit: at most 128 atoms (heavy atoms + hydrogens retained by RDKit RemoveHs).
  • Recommended: avoid ligands significantly larger than 56 atoms (training-time limit).
  • Reliable only for small-molecule → protein affinity; other targets may run but be unreliable.

Full schema (compact)

sequences:
  - ENTITY_TYPE:
      id: CHAIN_ID
      sequence: SEQUENCE            # protein/dna/rna only
      smiles: "SMILES"              # ligand only (exclusive with ccd)
      ccd: CCD                      # ligand only (exclusive with smiles)
      modifications:                # optional (polymer only)
        - position: RES_IDX
          ccd: CCD
      cyclic: false                 # polymer only
  - ENTITY_TYPE:
      id: [CHAIN_ID, CHAIN_ID]      # identical copies
      ...

constraints:
  - bond:
      atom1: [CHAIN_ID, RES_IDX, ATOM_NAME]
      atom2: [CHAIN_ID, RES_IDX, ATOM_NAME]
  - pocket:
      binder: CHAIN_ID
      contacts: [[CHAIN_ID, RES_IDX/ATOM_NAME], [CHAIN_ID, RES_IDX/ATOM_NAME]]
      max_distance: DIST_ANGSTROM
      force: false
  - contact:
      token1: [CHAIN_ID, RES_IDX/ATOM_NAME]
      token2: [CHAIN_ID, RES_IDX/ATOM_NAME]
      max_distance: DIST_ANGSTROM
      force: false

properties:
  - affinity:
      binder: CHAIN_ID

Example

version: 1
sequences:
  - protein:
      id: [A, B]
      sequence: MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEAPADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSLVGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTTLSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRLGVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVDQIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRILLARRATEPSAVPEGQASENLYFQ
  - ligand:
      id: [C, D]
      ccd: SAH
  - ligand:
      id: [E, F]
      smiles: "N[C@@H](Cc1ccc(O)cc1)C(=O)O"