Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.boltz.bio/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Sandbox prediction jobs can be easily submitted to Boltz Lab via the CLI or Python SDK. The input is provided as a .yaml file describing the complex (chains/molecules), optional constraints and properties. The format is largely aligned with that of our open source Boltz-2 model.
Custom MSAs or templates are currently not supported in Boltz Lab Sandbox jobs.

Top-level schema

version: 1            # optional (recommended if supported by your runner)
sequences:            # required
  - <entity>          # one entry per *unique* entity (polymer or ligand)
constraints:          # optional
  - <constraint>
properties:           # optional (e.g., affinity)
  - <property>

sequences

What it represents

One entry per unique chain/molecule in the complex.
  • Polymers: protein, dna, rna → require sequence
  • Ligands (non-polymers): ligand → require either smiles or ccd (mutually exclusive)
  • id is the chain/molecule identifier. If there are multiple identical entities, use a list (e.g., [A, B]).

Entity block schemas

Protein / DNA / RNA

- protein:                 # or dna, rna
    id: CHAIN_ID           # or [CHAIN_ID, CHAIN_ID] for identical copies
    sequence: SEQUENCE
    modifications:         # optional (polymer only)
      - position: RES_IDX  # 1-indexed
        ccd: CCD           # CCD code for modified residue
    cyclic: false          # optional (polymer only)

Ligand

- ligand:
    id: CHAIN_ID           # or [CHAIN_ID, CHAIN_ID] for identical copies
    smiles: "SMILES"       # ligand only; mutually exclusive with ccd
    # ccd: CCD             # ligand only; mutually exclusive with smiles

Modifications

modifications is optional and supported for polymers (protein, dna, rna):
  • position: residue index starting at 1
  • ccd: CCD code of the modified residue (currently supported only for CCD ligands)

Cyclic polymers

cyclic: true|false indicates whether a polymer chain is cyclic (not applicable to ligands).

constraints (optional)

Constraints add structural hints to the input.

Common indexing conventions

  • CHAIN_ID: the id you defined in sequences
  • RES_IDX: residue index starting from 1
  • For ligands, RES_IDX is 1
  • ATOM_NAME: standardized atom name (verify in the component’s CIF from RCSB)

1) bond

Covalent bond between two atoms.
- bond:
    atom1: [CHAIN_ID, RES_IDX, ATOM_NAME]
    atom2: [CHAIN_ID, RES_IDX, ATOM_NAME]

2) pocket

Defines a binding pocket (binding-site residues/atoms) for a binder chain.
- pocket:
    binder: CHAIN_ID
    contacts: [[CHAIN_ID, RES_IDX/ATOM_NAME], [CHAIN_ID, RES_IDX/ATOM_NAME]]
    max_distance: DIST_ANGSTROM
    force: false
Notes:
  • contacts entries are:
    • residues (with RES_IDX, 1-indexed), or
    • ligand atoms (with ATOM_NAME)
  • max_distance is the maximum distance (Å) between any atom in binder and any atom in each contact element.
    • Supported range: 4–20 Å, default 6 Å
  • If force: true, a potential enforces the pocket constraint.

3) contact

Forces a contact between two residues/atoms.
- contact:
    token1: [CHAIN_ID, RES_IDX/ATOM_NAME]
    token2: [CHAIN_ID, RES_IDX/ATOM_NAME]
    max_distance: DIST_ANGSTROM
    force: false
Notes:
  • max_distance supported range: 4–20 Å, default 6 Å
  • If force: true, a potential enforces the contact constraint.

properties (optional)

affinity

Enables affinity computation against a specified ligand chain.
- affinity:
    binder: CHAIN_ID
Rules/limits:
  • Only one small molecule can be specified.
  • Must be a ligand chain (not protein/DNA/RNA).
  • Size limit: at most 128 atoms (heavy atoms + hydrogens retained by RDKit RemoveHs).
  • Recommended: avoid ligands significantly larger than 56 atoms (training-time limit).
  • Reliable only for small-molecule → protein affinity; other targets may run but be unreliable.

Full schema (compact)

sequences:
  - ENTITY_TYPE:
      id: CHAIN_ID
      sequence: SEQUENCE            # protein/dna/rna only
      smiles: "SMILES"              # ligand only (exclusive with ccd)
      ccd: CCD                      # ligand only (exclusive with smiles)
      modifications:                # optional (polymer only)
        - position: RES_IDX
          ccd: CCD
      cyclic: false                 # polymer only
  - ENTITY_TYPE:
      id: [CHAIN_ID, CHAIN_ID]      # identical copies
      ...

constraints:
  - bond:
      atom1: [CHAIN_ID, RES_IDX, ATOM_NAME]
      atom2: [CHAIN_ID, RES_IDX, ATOM_NAME]
  - pocket:
      binder: CHAIN_ID
      contacts: [[CHAIN_ID, RES_IDX/ATOM_NAME], [CHAIN_ID, RES_IDX/ATOM_NAME]]
      max_distance: DIST_ANGSTROM
      force: false
  - contact:
      token1: [CHAIN_ID, RES_IDX/ATOM_NAME]
      token2: [CHAIN_ID, RES_IDX/ATOM_NAME]
      max_distance: DIST_ANGSTROM
      force: false

properties:
  - affinity:
      binder: CHAIN_ID

Example

version: 1
sequences:
  - protein:
      id: [A, B]
      sequence: MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEAPADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSLVGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTTLSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRLGVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVDQIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRILLARRATEPSAVPEGQASENLYFQ
  - ligand:
      id: [C, D]
      ccd: SAH
  - ligand:
      id: [E, F]
      smiles: "N[C@@H](Cc1ccc(O)cc1)C(=O)O"