Sandbox YAML Format

Overview

Sandbox prediction jobs can be easily submitted to Boltz Lab via the CLI or Python SDK. The input is provided as a .yaml file describing the complex (chains/molecules), optional constraints and properties. The format is largely aligned with that of our open source Boltz-2 model.

Custom MSAs or templates are currently not supported in Boltz Lab Sandbox jobs.

Top-level schema

version: 1            # optional (recommended if supported by your runner)
sequences:            # required
  - <entity>          # one entry per *unique* entity (polymer or ligand)
constraints:          # optional
  - <constraint>
properties:           # optional (e.g., affinity)
  - <property>

`sequences`

What it represents

One entry per unique chain/molecule in the complex.

Polymers: protein, dna, rna → require sequence
Ligands (non-polymers): ligand → require either smiles or ccd (mutually exclusive)
id is the chain/molecule identifier. If there are multiple identical entities, use a list (e.g., [A, B]).

Entity block schemas

Protein / DNA / RNA

- protein:                 # or dna, rna
    id: CHAIN_ID           # or [CHAIN_ID, CHAIN_ID] for identical copies
    sequence: SEQUENCE
    modifications:         # optional (polymer only)
      - position: RES_IDX  # 1-indexed
        ccd: CCD           # CCD code for modified residue
    cyclic: false          # optional (polymer only)

Ligand

- ligand:
    id: CHAIN_ID           # or [CHAIN_ID, CHAIN_ID] for identical copies
    smiles: "SMILES"       # ligand only; mutually exclusive with ccd
    # ccd: CCD             # ligand only; mutually exclusive with smiles

Modifications

modifications is optional and supported for polymers (protein, dna, rna):

position: residue index starting at 1
ccd: CCD code of the modified residue (currently supported only for CCD ligands)

Cyclic polymers

cyclic: true|false indicates whether a polymer chain is cyclic (not applicable to ligands).

`constraints` (optional)

Constraints add structural hints to the input.

Common indexing conventions

CHAIN_ID: the id you defined in sequences
RES_IDX: residue index starting from 1
For ligands, RES_IDX is 1
ATOM_NAME: standardized atom name (verify in the component’s CIF from RCSB)

1) `bond`

Covalent bond between two atoms.

- bond:
    atom1: [CHAIN_ID, RES_IDX, ATOM_NAME]
    atom2: [CHAIN_ID, RES_IDX, ATOM_NAME]

2) `pocket`

Defines a binding pocket (binding-site residues/atoms) for a binder chain.

- pocket:
    binder: CHAIN_ID
    contacts: [[CHAIN_ID, RES_IDX/ATOM_NAME], [CHAIN_ID, RES_IDX/ATOM_NAME]]
    max_distance: DIST_ANGSTROM
    force: false

Notes:

contacts entries are:
- residues (with RES_IDX, 1-indexed), or
- ligand atoms (with ATOM_NAME)
max_distance is the maximum distance (Å) between any atom in binder and any atom in each contact element.
- Supported range: 4–20 Å, default 6 Å
If force: true, a potential enforces the pocket constraint.

3) `contact`

Forces a contact between two residues/atoms.

- contact:
    token1: [CHAIN_ID, RES_IDX/ATOM_NAME]
    token2: [CHAIN_ID, RES_IDX/ATOM_NAME]
    max_distance: DIST_ANGSTROM
    force: false

Notes:

max_distance supported range: 4–20 Å, default 6 Å
If force: true, a potential enforces the contact constraint.

`properties` (optional)

`affinity`

Enables affinity computation against a specified ligand chain.

- affinity:
    binder: CHAIN_ID

Rules/limits:

Only one small molecule can be specified.
Must be a ligand chain (not protein/DNA/RNA).
Size limit: at most 128 atoms (heavy atoms + hydrogens retained by RDKit RemoveHs).
Recommended: avoid ligands significantly larger than 56 atoms (training-time limit).
Reliable only for small-molecule → protein affinity; other targets may run but be unreliable.

Full schema (compact)

sequences:
  - ENTITY_TYPE:
      id: CHAIN_ID
      sequence: SEQUENCE            # protein/dna/rna only
      smiles: "SMILES"              # ligand only (exclusive with ccd)
      ccd: CCD                      # ligand only (exclusive with smiles)
      modifications:                # optional (polymer only)
        - position: RES_IDX
          ccd: CCD
      cyclic: false                 # polymer only
  - ENTITY_TYPE:
      id: [CHAIN_ID, CHAIN_ID]      # identical copies
      ...

constraints:
  - bond:
      atom1: [CHAIN_ID, RES_IDX, ATOM_NAME]
      atom2: [CHAIN_ID, RES_IDX, ATOM_NAME]
  - pocket:
      binder: CHAIN_ID
      contacts: [[CHAIN_ID, RES_IDX/ATOM_NAME], [CHAIN_ID, RES_IDX/ATOM_NAME]]
      max_distance: DIST_ANGSTROM
      force: false
  - contact:
      token1: [CHAIN_ID, RES_IDX/ATOM_NAME]
      token2: [CHAIN_ID, RES_IDX/ATOM_NAME]
      max_distance: DIST_ANGSTROM
      force: false

properties:
  - affinity:
      binder: CHAIN_ID

Example

version: 1
sequences:
  - protein:
      id: [A, B]
      sequence: MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEAPADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSLVGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTTLSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRLGVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVDQIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRILLARRATEPSAVPEGQASENLYFQ
  - ligand:
      id: [C, D]
      ccd: SAH
  - ligand:
      id: [E, F]
      smiles: "N[C@@H](Cc1ccc(O)cc1)C(=O)O"

Introduction

Command Line Interface (CLI)

Python SDK

Sandbox YAML Format

Overview

Top-level schema

`sequences`

What it represents

Entity block schemas

Protein / DNA / RNA

Ligand

Modifications

Cyclic polymers

`constraints` (optional)

Common indexing conventions

1) `bond`

2) `pocket`

3) `contact`

`properties` (optional)

`affinity`

Full schema (compact)

Example

Introduction

Command Line Interface (CLI)

Python SDK

​Overview

​Top-level schema

​sequences

​What it represents

​Entity block schemas

​Protein / DNA / RNA

​Ligand

​Modifications

​Cyclic polymers

​constraints (optional)

​Common indexing conventions

​1) bond

​2) pocket

​3) contact

​properties (optional)

​affinity

​Full schema (compact)

​Example

Overview

Top-level schema

`sequences`

What it represents

Entity block schemas

Protein / DNA / RNA

Ligand

Modifications

Cyclic polymers

`constraints` (optional)

Common indexing conventions

1) `bond`

2) `pocket`

3) `contact`

`properties` (optional)

`affinity`

Full schema (compact)

Example