usage: boltzgen run [-h]
                    [--protocol {protein-anything,peptide-anything,protein-small_molecule,nanobody-anything}]
                    [--output OUTPUT] [--config CONFIG [CONFIG ...]]
                    [--devices DEVICES] [--num_workers NUM_WORKERS]
                    [--config_dir CONFIG_DIR]
                    [--use_kernels {auto,true,false}] [--moldir MOLDIR]
                    [--num_designs NUM_DESIGNS]
                    [--diffusion_batch_size DIFFUSION_BATCH_SIZE]
                    [--design_checkpoints DESIGN_CHECKPOINTS [DESIGN_CHECKPOINTS ...]]
                    [--step_scale STEP_SCALE] [--noise_scale NOISE_SCALE]
                    [--skip_inverse_folding]
                    [--inverse_fold_num_sequences INVERSE_FOLD_NUM_SEQUENCES]
                    [--inverse_fold_checkpoint INVERSE_FOLD_CHECKPOINT]
                    [--inverse_fold_avoid INVERSE_FOLD_AVOID]
                    [--only_inverse_fold]
                    [--folding_checkpoint FOLDING_CHECKPOINT]
                    [--affinity_checkpoint AFFINITY_CHECKPOINT]
                    [--budget BUDGET] [--alpha ALPHA]
                    [--filter_biased {true,false}]
                    [--metrics_override METRICS_OVERRIDE [METRICS_OVERRIDE ...]]
                    [--additional_filters ADDITIONAL_FILTERS [ADDITIONAL_FILTERS ...]]
                    [--size_buckets SIZE_BUCKETS [SIZE_BUCKETS ...]]
                    [--refolding_rmsd_threshold REFOLDING_RMSD_THRESHOLD]
                    [--reuse] [--no_subprocess]
                    [--steps {design,inverse_folding,design_folding,folding,affinity,analysis,filtering} [{design,inverse_folding,design_folding,folding,affinity,analysis,filtering} ...]]
                    [--force_download] [--models_token MODELS_TOKEN]
                    [--cache CACHE]
                    design_spec [design_spec ...]

Boltzgen binder design pipeline

options:
  -h, --help            show this help message and exit

design specification:
  design_spec           Path(s) to design specification YAML file(s), or a
                        directory containing prepared configs

general configuration:
  --protocol {protein-anything,peptide-anything,protein-small_molecule,nanobody-anything}
                        Protocol to use for the design. This determines
                        default settings and in some cases what steps are run.
                        Default: protein-anything
  --output OUTPUT       Output directory for pipeline results
  --config CONFIG [CONFIG ...]
                        Override pipeline step configuration, in format
                        <step_name> <arg1>=<value1> <arg2>=<value2>
                        ...(example: '--config folding num_workers=4
                        trainer.devices=4'). Can be used multiple times.
  --devices DEVICES     Number of devices to use. Default is all devices
                        available.
  --num_workers NUM_WORKERS
                        Number of DataLoader worker processes.
  --config_dir CONFIG_DIR
                        Path to the directory of default config files.
                        Default: /net/galaxy/home/koes/fea54/.conda/envs/boltz
                        gen/lib/python3.12/site-
                        packages/boltzgen/resources/config
  --use_kernels {auto,true,false}
                        Whether to use kernels. One of 'auto', 'true', or
                        'false'. Default: auto. If 'auto', will use kernels if
                        the device capability is >= 8.
  --moldir MOLDIR       Path to the moldir. Default:
                        huggingface:boltzgen/inference-data:mols.zip

design:
  --num_designs NUM_DESIGNS
                        Number of total designs to generate. This commonly
                        would be something like 10,000. After generating 10,000
                        designs we then filter down to --budget many designs
                        in the filter step
  --diffusion_batch_size DIFFUSION_BATCH_SIZE
                        Number of diffusion samples to generate per trunk run.
                        If not specified, this defaults to 1 if --num-designs
                        is less than 100, and 10 otherwise. Note that for
                        design tasks that randomly sample the binder length
                        (or use randomness in other ways), all designs
                        generated in the same batch will share the same
                        length. Having a large diffusion batch size compared
                        to the total number of designs to generate will
                        therefore not evenly sample the possible lengths.
  --design_checkpoints DESIGN_CHECKPOINTS [DESIGN_CHECKPOINTS ...]
                        Path to the boltzgen checkpoint(s). One or more
                        checkpoints are supported. Just specifying an
                        individual path here will work.Each will be used for
                        an equal fraction of the designs. By default, two
                        checkpoints are used. Default: ['huggingface:boltzgen/
                        boltzgen-1:boltzgen1_diverse.ckpt', 'huggingface:boltz
                        gen/boltzgen-1:boltzgen1_adherence.ckpt']
  --step_scale STEP_SCALE
                        Fixed step scale to use (e.g. 1.8). Default is to use
                        a schedule
  --noise_scale NOISE_SCALE
                        Fixed noise scale to use (e.g. 0.98). Default is to
                        use a schedule

inverse folding:
  --skip_inverse_folding
                        Skip inverse folding step
  --inverse_fold_num_sequences INVERSE_FOLD_NUM_SEQUENCES
                        Number of sequences per backbone to generate in the
                        inverse fold step. Default: 1
  --inverse_fold_checkpoint INVERSE_FOLD_CHECKPOINT
                        Path or huggingface repo and filename for the inverse
                        fold checkpoint. Default:
                        huggingface:boltzgen/boltzgen-1:boltzgen1_ifold.ckpt
  --inverse_fold_avoid INVERSE_FOLD_AVOID
                        Disallowed residues as a string of one letter amino
                        acid codes, e.g. 'KEC'. This is implemented at the
                        inverse fold step, so it only affects results if
                        inverse folding is enabled. Default: none for protein
                        design, 'C' for peptide and nanobody design. Pass an
                        empty list if you want Cysteins to be generated if you
                        are using a nanobody or peptide protocol
  --only_inverse_fold   Skip design step and only run inverse folding.
                        Requires a fully specified structure.

folding and affinity prediction:
  --folding_checkpoint FOLDING_CHECKPOINT
                        Path to the folding checkpoint. Default:
                        huggingface:boltzgen/boltzgen-1:boltz2_conf_final.ckpt
  --affinity_checkpoint AFFINITY_CHECKPOINT
                        Path to the affinity predictor checkpoint. Default:
                        huggingface:boltzgen/boltzgen-1:boltz2_aff.ckpt

filtering:
  --budget BUDGET       How many designs should be in the final diversity
                        optimized set. This is used in the filtering step.
  --alpha ALPHA         Trade-off for sequence diversity selection:
                        0.0=quality-only, 1.0=diversity-only. Default is 0.01
                        (peptide-anything protocol) or 0.001 (other
                        protocols).
  --filter_biased {true,false}
                        Remove amino-acid composition outliers (default caps
                        on ALA/GLY/GLU/LEU/VAL). Default: true.
  --metrics_override METRICS_OVERRIDE [METRICS_OVERRIDE ...]
                        Per-metric inverse-importance weights for ranking.
                        Format: metric_name=weight (e.g.,
                        plip_hbonds_refolded=4 delta_sasa_refolded=2). A
                        larger value down-weights that metric's rank. Use
                        'metric_name=none' to remove a metric.
  --additional_filters ADDITIONAL_FILTERS [ADDITIONAL_FILTERS ...]
                        Extra hard filters. Format: feature>threshold or
                        feature<threshold (e.g., 'design_ALA>0.3'
                        'design_GLY<0.2'). Use '>' if higher is better, '<' if
                        lower is better. Make sure to single-quote the strings
                        so your shell doesn't get confused by < and >
                        characters.
  --size_buckets SIZE_BUCKETS [SIZE_BUCKETS ...]
                        Optional constraint for maximum number of designs in
                        size ranges. Format: min-max:count (e.g., 10-20:5
                        20-30:10 30-40:5).
  --refolding_rmsd_threshold REFOLDING_RMSD_THRESHOLD
                        Threshold used for RMSD-based filters (lower is
                        better).

execution options:
  --reuse               Reuse existing results across all steps. Generate only
                        as many new designs are needed to achieve the
                        specified total number of designs.
  --no_subprocess       Run each step in the main process. Will cause issues
                        when devices >1.
  --steps {design,inverse_folding,design_folding,folding,affinity,analysis,filtering} [{design,inverse_folding,design_folding,folding,affinity,analysis,filtering} ...]
                        Run only the specified pipeline steps (default: run
                        all steps)

model and data download options:
  --force_download      Force a (re)-download of models and data.
  --models_token MODELS_TOKEN
                        Secret token to use for our models hosting service
                        (Hugging Face). Default: None
  --cache CACHE         Directory where downloaded models will be stored.
                        Default: ~/.cache

This script orchestrates work. It sets up an output directory with yaml files of pipeline steps that need to be run, and launches processes that run the pipeline steps.

Mainly it:
1) **Writes to yaml files** when `configure_command(...)` is executed
   - For each `PipelineStep`, the resolved Hydra config is written to
     `OUTPUT/config/<step>.yaml`.
   - A manifest `OUTPUT/steps.yaml` is also written, listing the enabled steps
     and their config files in execution order.

2) **Executes from YAML** when `execute_command(...)` is executed
   - Each step is launched **as a subprocess** (`python main.py <config.yaml>`)
     unless `--no_subprocess` is set (not the default).
   - If `--no_subprocess` is specified, the config is instantiated in-process
     and the `Task.run(...)` method is called directly.

The actual code that is exectued in each pipeline step is found in `main.py` which a wrapper for running the .run() function of our `Task` class.
If you run the pipeline (for example via `boltzgen run design_spec.yaml ...`) then this function reads the yaml files of the individual pipeline steps and executes the pipeline steps.

The possible tasks (and code files you want to inspect to understand what they are running):
    - Predict src/boltzgen/task/predict/predict.py (GPU: Running BoltzGen diffusion, inverse folding, refolding, designfolding, or affinity prediction)
    - Analyze src/boltzgen/task/analyze/analyze.py (CPU: Compute CPU Metrics and aggregate metrics from GPU steps)
    - Filter src/boltzgen/task/filter/filter.py (CPU: Very fast (20s) computes ranking and writes final output files)
