Skip to content

HTCondor

Warning

Running MadLAD on HTCondor requires singularity.

To start, you need to build a singularity image containing model and PDF set required for the run following this tutorial. You also need a submit description file (submit.sub) and an executable (job_condor.sh). Submitting job to HTCondor, run:


1. Overview

The workflow consists of three parts:

Step What you need Why
1️⃣ Build a Singularity image singularity build (see the container‑build tutorial) The image contains all of MadLAD’s dependencies, your UFO models and PDF sets.
2️⃣ Create a Condor submit description (submit.sub) A small text file that tells HTCondor how to run the job Specifies the executable, resources and file transfer rules.
3️⃣ Write an executable (job_condor.sh) A Bash wrapper that activates your Conda environment and runs MadLAD Keeps the Condor job simple – it just calls a script.

Once those pieces are in place you can submit the job with:

condor_submit submit.sub

2. Building the Singularity Image

Follow the this guide to create a Singularity image that contains:

  • MadLAD itself
  • The UFO model(s) you want to use
  • Any PDF sets that your processes require

Tip – Keep the image lightweight: only add what you actually need for your workflow.


3. HTCondor Submit Description (submit.sub)

Below is a minimal but complete submit file that you can copy into your project directory. Feel free to tweak the resource requests (request_CPUs, queue) to match your workload.

# submit.sub – HTCondor job description for MadLAD

executable              = job_condor.sh
Requirements            = HasSingularity          # Only use nodes that have Singularity
arguments               = $(ClusterId)$(ProcId)
output                  = $(ClusterId).$(ProcId).out
error                   = $(ClusterId).$(ProcId).err
log                     = $(ClusterId).log
environment             = "ClusterId=$(ClusterId) ProcId=$(ProcId)"
should_transfer_files   = YES
transfer_input_files    = MadLAD, job_condor.sh  # Transfer the code and wrapper
when_to_transfer_output = ON_EXIT
transfer_output_files   = MadLAD/Output          # Path must match `save_dir` in config
request_CPUs = 8                                 # Adjust to your job’s CPU needs
queue 100                                        # Submit 100 parallel jobs

What each directive does

Directive Meaning
executable The script that Condor will run on the worker node
Requirements Ensures the node has Singularity (HasSingularity)
arguments Passes a unique ID to the script (you can use it in logs or filenames)
output, error Where stdout and stderr will be written on the submit machine
log Condor’s internal job log (not your output)
environment Passes variables into the wrapper script
transfer_input_files Files that need to be copied to the worker node before execution
when_to_transfer_output When to copy results back (ON_EXIT is usually what you want)
transfer_output_files Where the output of your job lives – make sure it matches the path you set in MadLAD’s save_dir
request_CPUs Number of CPU cores requested for the job
queue How many jobs to submit

4. The Condor Executable (job_condor.sh)

#!/bin/bash

# ------------------------------------------------------------------
# 1. Activate the environment that contains MadLAD and dependencies
# ------------------------------------------------------------------
source /opt/miniconda3/etc/profile.d/conda.sh   # Adjust if Miniconda lives elsewhere
conda activate madlad

# ------------------------------------------------------------------
# 2. Compute a unique random seed for this job
#    -----------------------------------------
#    ClusterId and ProcId are supplied by HTCondor.
#    The constant 639945 is the first job number you submit in this
#    workflow – change it if you start a fresh batch at a different
#    ClusterId.  The arithmetic is just a quick, reproducible way to
#    generate a large range of seeds without collisions.
# ------------------------------------------------------------------
seed=$(((ClusterId + ProcId - 639945) * 30000 + 390 * 10))
echo "Random seed is set to $seed."

# ------------------------------------------------------------------
# 3. Change into the folder that Condor transferred
# ------------------------------------------------------------------
cd MadLAD   # The directory you specified in transfer_input_files

# ------------------------------------------------------------------
# 4. Run MadLAD with the user‑supplied configuration
#    -----------------------------------------------
#    * --config-name  → your YAML config (e.g. ttbar-allhad.yaml)
#    * run.image     → the name of your Singularity image
#    * gen.block_settings.nb_core   → CPU cores you want to use
#    * gen.block_run.nevents        → number of events per job
#    * gen.block_run.iseed          → the seed we just computed
# ------------------------------------------------------------------
python -m madlad.generate \
  --config-name=ttbar-allhad.yaml \
  run.image=madlad-custom \
  gen.block_settings.nb_core=4 \
  gen.block_model.save_dir=test_ttbar \
  gen.block_run.nevents=100000 \
  gen.block_run.iseed=$seed

# ------------------------------------------------------------------
# 5. Return to the Condor working directory
#    (optional – just a tidy‑up)
# ------------------------------------------------------------------
cd -

Warning

Change the random seeds for different Condor jobs, otherwise you will ended up with same set of events. e.g. gen.block_run.iseed=$((ClusterId + ProcId - first_cluster) * 30000 + 3900)


5. Submitting and Monitoring

# Submit the job
condor_submit submit.sub

# Check status
condor_q   # Show your jobs in the queue

# View logs
cat $(ClusterId).$(ProcId).out   # Stdout of the job
cat $(ClusterId).$(ProcId).err   # Stderr (error messages)