usage: boltzgen run [-h] [--protocol {protein-anything,peptide-anything,protein-small_molecule,nanobody-anything}] [--output OUTPUT] [--config CONFIG [CONFIG ...]] [--devices DEVICES] [--num_workers NUM_WORKERS] [--config_dir CONFIG_DIR] [--use_kernels {auto,true,false}] [--moldir MOLDIR] [--num_designs NUM_DESIGNS] [--diffusion_batch_size DIFFUSION_BATCH_SIZE] [--design_checkpoints DESIGN_CHECKPOINTS [DESIGN_CHECKPOINTS ...]] [--step_scale STEP_SCALE] [--noise_scale NOISE_SCALE] [--skip_inverse_folding] [--inverse_fold_num_sequences INVERSE_FOLD_NUM_SEQUENCES] [--inverse_fold_checkpoint INVERSE_FOLD_CHECKPOINT] [--inverse_fold_avoid INVERSE_FOLD_AVOID] [--only_inverse_fold] [--folding_checkpoint FOLDING_CHECKPOINT] [--affinity_checkpoint AFFINITY_CHECKPOINT] [--budget BUDGET] [--alpha ALPHA] [--filter_biased {true,false}] [--metrics_override METRICS_OVERRIDE [METRICS_OVERRIDE ...]] [--additional_filters ADDITIONAL_FILTERS [ADDITIONAL_FILTERS ...]] [--size_buckets SIZE_BUCKETS [SIZE_BUCKETS ...]] [--refolding_rmsd_threshold REFOLDING_RMSD_THRESHOLD] [--reuse] [--no_subprocess] [--steps {design,inverse_folding,design_folding,folding,affinity,analysis,filtering} [{design,inverse_folding,design_folding,folding,affinity,analysis,filtering} ...]] [--force_download] [--models_token MODELS_TOKEN] [--cache CACHE] design_spec [design_spec ...] Boltzgen binder design pipeline options: -h, --help show this help message and exit design specification: design_spec Path(s) to design specification YAML file(s), or a directory containing prepared configs general configuration: --protocol {protein-anything,peptide-anything,protein-small_molecule,nanobody-anything} Protocol to use for the design. This determines default settings and in some cases what steps are run. Default: protein-anything --output OUTPUT Output directory for pipeline results --config CONFIG [CONFIG ...] Override pipeline step configuration, in format = = ...(example: '--config folding num_workers=4 trainer.devices=4'). Can be used multiple times. --devices DEVICES Number of devices to use. Default is all devices available. --num_workers NUM_WORKERS Number of DataLoader worker processes. --config_dir CONFIG_DIR Path to the directory of default config files. Default: /net/galaxy/home/koes/fea54/.conda/envs/boltz gen/lib/python3.12/site- packages/boltzgen/resources/config --use_kernels {auto,true,false} Whether to use kernels. One of 'auto', 'true', or 'false'. Default: auto. If 'auto', will use kernels if the device capability is >= 8. --moldir MOLDIR Path to the moldir. Default: huggingface:boltzgen/inference-data:mols.zip design: --num_designs NUM_DESIGNS Number of total designs to generate. This commonly would be something like 10,000. After generating 10,000 designs we then filter down to --budget many designs in the filter step --diffusion_batch_size DIFFUSION_BATCH_SIZE Number of diffusion samples to generate per trunk run. If not specified, this defaults to 1 if --num-designs is less than 100, and 10 otherwise. Note that for design tasks that randomly sample the binder length (or use randomness in other ways), all designs generated in the same batch will share the same length. Having a large diffusion batch size compared to the total number of designs to generate will therefore not evenly sample the possible lengths. --design_checkpoints DESIGN_CHECKPOINTS [DESIGN_CHECKPOINTS ...] Path to the boltzgen checkpoint(s). One or more checkpoints are supported. Just specifying an individual path here will work.Each will be used for an equal fraction of the designs. By default, two checkpoints are used. Default: ['huggingface:boltzgen/ boltzgen-1:boltzgen1_diverse.ckpt', 'huggingface:boltz gen/boltzgen-1:boltzgen1_adherence.ckpt'] --step_scale STEP_SCALE Fixed step scale to use (e.g. 1.8). Default is to use a schedule --noise_scale NOISE_SCALE Fixed noise scale to use (e.g. 0.98). Default is to use a schedule inverse folding: --skip_inverse_folding Skip inverse folding step --inverse_fold_num_sequences INVERSE_FOLD_NUM_SEQUENCES Number of sequences per backbone to generate in the inverse fold step. Default: 1 --inverse_fold_checkpoint INVERSE_FOLD_CHECKPOINT Path or huggingface repo and filename for the inverse fold checkpoint. Default: huggingface:boltzgen/boltzgen-1:boltzgen1_ifold.ckpt --inverse_fold_avoid INVERSE_FOLD_AVOID Disallowed residues as a string of one letter amino acid codes, e.g. 'KEC'. This is implemented at the inverse fold step, so it only affects results if inverse folding is enabled. Default: none for protein design, 'C' for peptide and nanobody design. Pass an empty list if you want Cysteins to be generated if you are using a nanobody or peptide protocol --only_inverse_fold Skip design step and only run inverse folding. Requires a fully specified structure. folding and affinity prediction: --folding_checkpoint FOLDING_CHECKPOINT Path to the folding checkpoint. Default: huggingface:boltzgen/boltzgen-1:boltz2_conf_final.ckpt --affinity_checkpoint AFFINITY_CHECKPOINT Path to the affinity predictor checkpoint. Default: huggingface:boltzgen/boltzgen-1:boltz2_aff.ckpt filtering: --budget BUDGET How many designs should be in the final diversity optimized set. This is used in the filtering step. --alpha ALPHA Trade-off for sequence diversity selection: 0.0=quality-only, 1.0=diversity-only. Default is 0.01 (peptide-anything protocol) or 0.001 (other protocols). --filter_biased {true,false} Remove amino-acid composition outliers (default caps on ALA/GLY/GLU/LEU/VAL). Default: true. --metrics_override METRICS_OVERRIDE [METRICS_OVERRIDE ...] Per-metric inverse-importance weights for ranking. Format: metric_name=weight (e.g., plip_hbonds_refolded=4 delta_sasa_refolded=2). A larger value down-weights that metric's rank. Use 'metric_name=none' to remove a metric. --additional_filters ADDITIONAL_FILTERS [ADDITIONAL_FILTERS ...] Extra hard filters. Format: feature>threshold or feature0.3' 'design_GLY<0.2'). Use '>' if higher is better, '<' if lower is better. Make sure to single-quote the strings so your shell doesn't get confused by < and > characters. --size_buckets SIZE_BUCKETS [SIZE_BUCKETS ...] Optional constraint for maximum number of designs in size ranges. Format: min-max:count (e.g., 10-20:5 20-30:10 30-40:5). --refolding_rmsd_threshold REFOLDING_RMSD_THRESHOLD Threshold used for RMSD-based filters (lower is better). execution options: --reuse Reuse existing results across all steps. Generate only as many new designs are needed to achieve the specified total number of designs. --no_subprocess Run each step in the main process. Will cause issues when devices >1. --steps {design,inverse_folding,design_folding,folding,affinity,analysis,filtering} [{design,inverse_folding,design_folding,folding,affinity,analysis,filtering} ...] Run only the specified pipeline steps (default: run all steps) model and data download options: --force_download Force a (re)-download of models and data. --models_token MODELS_TOKEN Secret token to use for our models hosting service (Hugging Face). Default: None --cache CACHE Directory where downloaded models will be stored. Default: ~/.cache This script orchestrates work. It sets up an output directory with yaml files of pipeline steps that need to be run, and launches processes that run the pipeline steps. Mainly it: 1) **Writes to yaml files** when `configure_command(...)` is executed - For each `PipelineStep`, the resolved Hydra config is written to `OUTPUT/config/.yaml`. - A manifest `OUTPUT/steps.yaml` is also written, listing the enabled steps and their config files in execution order. 2) **Executes from YAML** when `execute_command(...)` is executed - Each step is launched **as a subprocess** (`python main.py `) unless `--no_subprocess` is set (not the default). - If `--no_subprocess` is specified, the config is instantiated in-process and the `Task.run(...)` method is called directly. The actual code that is exectued in each pipeline step is found in `main.py` which a wrapper for running the .run() function of our `Task` class. If you run the pipeline (for example via `boltzgen run design_spec.yaml ...`) then this function reads the yaml files of the individual pipeline steps and executes the pipeline steps. The possible tasks (and code files you want to inspect to understand what they are running): - Predict src/boltzgen/task/predict/predict.py (GPU: Running BoltzGen diffusion, inverse folding, refolding, designfolding, or affinity prediction) - Analyze src/boltzgen/task/analyze/analyze.py (CPU: Compute CPU Metrics and aggregate metrics from GPU steps) - Filter src/boltzgen/task/filter/filter.py (CPU: Very fast (20s) computes ranking and writes final output files)