HTCondor
Warning
Running MadLAD on HTCondor requires singularity.
To start, you need to build a singularity image containing model and PDF set required for the run following this tutorial.
You also need a submit description file (submit.sub) and an executable (job_condor.sh). Submitting job to HTCondor, run:
1. Overview
The workflow consists of three parts:
| Step | What you need | Why |
|---|---|---|
| 1️⃣ Build a Singularity image | singularity build (see the container‑build tutorial) |
The image contains all of MadLAD’s dependencies, your UFO models and PDF sets. |
2️⃣ Create a Condor submit description (submit.sub) |
A small text file that tells HTCondor how to run the job | Specifies the executable, resources and file transfer rules. |
3️⃣ Write an executable (job_condor.sh) |
A Bash wrapper that activates your Conda environment and runs MadLAD | Keeps the Condor job simple – it just calls a script. |
Once those pieces are in place you can submit the job with:
condor_submit submit.sub
2. Building the Singularity Image
Follow the this guide to create a Singularity image that contains:
- MadLAD itself
- The UFO model(s) you want to use
- Any PDF sets that your processes require
Tip – Keep the image lightweight: only add what you actually need for your workflow.
3. HTCondor Submit Description (submit.sub)
Below is a minimal but complete submit file that you can copy into your project directory. Feel free to tweak the resource requests (request_CPUs, queue) to match your workload.
# submit.sub – HTCondor job description for MadLAD
executable = job_condor.sh
Requirements = HasSingularity # Only use nodes that have Singularity
arguments = $(ClusterId)$(ProcId)
output = $(ClusterId).$(ProcId).out
error = $(ClusterId).$(ProcId).err
log = $(ClusterId).log
environment = "ClusterId=$(ClusterId) ProcId=$(ProcId)"
should_transfer_files = YES
transfer_input_files = MadLAD, job_condor.sh # Transfer the code and wrapper
when_to_transfer_output = ON_EXIT
transfer_output_files = MadLAD/Output # Path must match `save_dir` in config
request_CPUs = 8 # Adjust to your job’s CPU needs
queue 100 # Submit 100 parallel jobs
What each directive does
| Directive | Meaning |
|---|---|
executable |
The script that Condor will run on the worker node |
Requirements |
Ensures the node has Singularity (HasSingularity) |
arguments |
Passes a unique ID to the script (you can use it in logs or filenames) |
output, error |
Where stdout and stderr will be written on the submit machine |
log |
Condor’s internal job log (not your output) |
environment |
Passes variables into the wrapper script |
transfer_input_files |
Files that need to be copied to the worker node before execution |
when_to_transfer_output |
When to copy results back (ON_EXIT is usually what you want) |
transfer_output_files |
Where the output of your job lives – make sure it matches the path you set in MadLAD’s save_dir |
request_CPUs |
Number of CPU cores requested for the job |
queue |
How many jobs to submit |
4. The Condor Executable (job_condor.sh)
#!/bin/bash
# ------------------------------------------------------------------
# 1. Activate the environment that contains MadLAD and dependencies
# ------------------------------------------------------------------
source /opt/miniconda3/etc/profile.d/conda.sh # Adjust if Miniconda lives elsewhere
conda activate madlad
# ------------------------------------------------------------------
# 2. Compute a unique random seed for this job
# -----------------------------------------
# ClusterId and ProcId are supplied by HTCondor.
# The constant 639945 is the first job number you submit in this
# workflow – change it if you start a fresh batch at a different
# ClusterId. The arithmetic is just a quick, reproducible way to
# generate a large range of seeds without collisions.
# ------------------------------------------------------------------
seed=$(((ClusterId + ProcId - 639945) * 30000 + 390 * 10))
echo "Random seed is set to $seed."
# ------------------------------------------------------------------
# 3. Change into the folder that Condor transferred
# ------------------------------------------------------------------
cd MadLAD # The directory you specified in transfer_input_files
# ------------------------------------------------------------------
# 4. Run MadLAD with the user‑supplied configuration
# -----------------------------------------------
# * --config-name → your YAML config (e.g. ttbar-allhad.yaml)
# * run.image → the name of your Singularity image
# * gen.block_settings.nb_core → CPU cores you want to use
# * gen.block_run.nevents → number of events per job
# * gen.block_run.iseed → the seed we just computed
# ------------------------------------------------------------------
python -m madlad.generate \
--config-name=ttbar-allhad.yaml \
run.image=madlad-custom \
gen.block_settings.nb_core=4 \
gen.block_model.save_dir=test_ttbar \
gen.block_run.nevents=100000 \
gen.block_run.iseed=$seed
# ------------------------------------------------------------------
# 5. Return to the Condor working directory
# (optional – just a tidy‑up)
# ------------------------------------------------------------------
cd -
Warning
Change the random seeds for different Condor jobs, otherwise you will ended up with same set of events.
e.g. gen.block_run.iseed=$((ClusterId + ProcId - first_cluster) * 30000 + 3900)
5. Submitting and Monitoring
# Submit the job
condor_submit submit.sub
# Check status
condor_q # Show your jobs in the queue
# View logs
cat $(ClusterId).$(ProcId).out # Stdout of the job
cat $(ClusterId).$(ProcId).err # Stderr (error messages)