# Condor 


## Really usefull links 

* https://batchdocs.web.cern.ch/index.html
* https://batchdocs.web.cern.ch/local/quick.htmlhttps://batchdocs.web.cern.ch/local/submit.htmlSuper 
* https://opensciencegrid.org/docs/compute-element/submit-htcondor-ce/
* https://htcondor.readthedocs.io/en/latest/users-manual/index.html


## How to submit a simple job 

Imagine you have a executable `foo.exe`
Create a bash file condor_submit_file.sh (no need for it to be executable) containing 

```
executable            = ./foo.exe
should_transfer_files = yes
universe              = vanilla
output                = simple.out
error                 = simple.err
log                   = simple.log
# shortest job duration 
+JobFlavour           ="espresso"  
queue 
```

Then run  
```
condor_submit condor_submit_file.sh
```  

The above will produce a log/out/err files called simple.log, simple.out, simple.err. Log file is accessible from the moment the job is launched and will be updated. 
Out/Err files will be only available at the end of the job.   

You can check status of your job (idle/hold/run/done) using 
````
condor_q 
````
(Sometimes the scheduller cannot be reached using condor_q but it does not mean the job is not running or waiting to be run) 

Condor jobs launch on CERN with ```environnement = vanilla``` can both access to file stored on AFS and EOS no need to transfer them to the server 

## Passing arguments 
Add line 
```
arguments = 1 2 3 myfile.root
```

## Submitting several jobs 

Will execute foo.exe and pass as argument mydata.0.root to mydata.149.root 
```
executable            = ./foo.exe
arguments             = mydata.$(ProcId).root
output                = simple.$(ClusterId).$(ProcId).out
error                 = simple.$(ClusterId).$(ProcId).err
# group all log files in one 
log                   = simple.$(ClusterId).log
queue 150
```

Or sometimes it is easier to create the submit file with a program because arguments can vary a lot. Example process a file with different values of cut for a pT 
```
## common part to all job 
executable            = ./foo.exe
output                = simple.$(ProcId).out
error                 = simple.$(ProcId).err
log                   = simple.$(ClusterId).log

# part specific to each job 
# will create 3 different jobs 
arguments = file.root 40 
queue 

arguments = file.root 60 
queue 

arguments = file.root 100 
queue 
```  

## Job duration 

* espresso     = 20 minutes
* microcentury = 1 hour
* longlunch    = 2 hours
* workday      = 8 hours
* tomorrow     = 1 day
* testmatch    = 3 days
* nextweek     = 1 week

Line to add 
```
+JobFlavour = "longlunch"
````

## Request memory/CPUs
* https://batchdocs.web.cern.ch/local/submit.html

CERN: default job 2GB of memory and 20GB of disk space
(2 GB of memory/core) 

Request more core (hence more RAM memory)
```
request_memory = 4GB 
```
Will reserve 2 cores hence will reserve a total of 8GB of memory 

Request more disk space 
```
request_disk = 40GB
```

Request more disk space : 

## Requirement of job host
```
requirements = (OpSysAndVer =?= "CentOS7" && Arch =?= "X86_64")
```


## Concrete example 

Imagine you want to execute a bash file ```run_foo_prog.sh``` which will execute your program and let's say that program requires ROOT 6.18 to be set up. If the program `foo.exe` finishes properly it prints RUN_SUCCESSFULL otherwise that string is not printed.

First make sure ```run_foo_prog.sh``` is executable. 
````
chmod +x run_foo_prog.sh 
````

And let's assume you want to pass some argument to the executable `foo.exe` for example a filename and a number 
Now the command to lauch the job is 
```
condor_submit MyArgs="file.root 4" condor_submit_file.sh 
```

This time the program ```foo.exe``` need to be transferred to the remote host 
(remark path are relative to the submitter directory in the ```condor_submit_file.sh``` unless absolute path is used (and unless initialdir is specified see later) 

condor_submit_file.sh 
```
executable            = ./run_foo_prog.sh
transfer_input_files  = ./foo.exe
should_transfer_files = yes
Arguments             = $(MyArgs) 
universe              = vanilla
output                = simple.out
error                 = simple.err
log                   = simple.log
+JobFlavour           = "espresso"  
queue 
```


run_foo_prog.sh 
```
#!/bin/bash 

#Fist 2 steps to do before anything else, never remove the while loop 
# ===================================
#get all arguments
# do not use $@ instead 
# see https://stackoverflow.com/questions/3811345/how-to-pass-all-arguments-passed-to-my-bash-script-to-a-function-of-mine/3816747
# https://stackoverflow.com/questions/12314451/accessing-bash-command-line-args-vs
# capture argument 
Args=("$*")


# setup ATLAS and root 
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase 
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
source /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.18.04/x86_64-centos7-gcc48-opt/bin/thisroot.sh

# To avoid complaints on compute nodes
export HOME=${PWD}

echo "PWD                        = $PWD" 
echo "PATH                       = $PATH" 
echo "LD_LIBRARY_PATH            = $LD_LIBRARY_PATH" 
echo "ROOTSYS                    = $ROOTSYS" 

COMMAND_LINE="./foo.exe $Args" 
echo "Executing commandline"
echo "$COMMAND_LINE" 

# to both execute the command and direct output string to _condor_stdout file which will become the output file at the end of the job 
output_str_commandline=$(eval $COMMAND_LINE | tee -a _condor_stdout ) ; 

# test if the program succesfully ended
$(grep -q "RUN_SUCCESSFULL" <<< ${output_str_commandline}); bool_succeeded=$? ;

echo 
if [ ${bool_succeeded} == "0" ] ; then 
  echo "========================="
  echo "SUCCESS_EXECUTION"
  echo "========================="
  echo""
  # return 0 = success 
  exit 0 
else 
  echo "========================="
  echo "FAILURE_EXECUTION"
  echo "========================="
  echo""
  # return 1 = failure 
  exit 1
fi
``` 

## Retry failed jobs (super usefull!)
* https://batchdocs.web.cern.ch/workarounds/job-retry.htmlon_

```
# Send the job to held state on failure 
on_exit_hold            = (ExitBySignal == True) || (ExitCode != 0)
# Periodically retry the jobs every 10 minutes, up to a maximum of 10 retries
periodic_release        =  (NumJobStarts < 10 && ((CurrentTime - EnteredCurrentStatus) > 600)) 
```

## Transfer input/output files/directory 

Transfer setup.sh script and build directory (NB no / for build otherwise it would transfer what is inside the build directory not the directory itself) 
```
# remark no slash for build otherwise it would transfer what is inside build directory and not the directory itself 
transfer_input_files    = setup.sh,build

# will transfer the directory results_dir and myfile.txt from the server to the submitter directory
# remark no slash for results_dir otherwise it would transfer what is inside results_dir and not the directory itself 
transfer_output_files   = results_dir,myfile.txt 
should_transfer_files   = yes
when_to_transfer_output = ON_EXIT
```

Transfer output remaps mechanism (beware only valid for files not directories) 
* https://manpages.debian.org/stretch/htcondor/condor_submit.1.en.html

How to "remaps" directories to be different from the submission directory?  
Uses initialdir
* https://www-auth.cs.wisc.edu/lists/htcondor-users/2020-September/msg00039.shtml 

But condor EOS submission is not allowed.
* https://batchdocs.web.cern.ch/troubleshooting/eos.html#no-eos-submission-allowed

How to remap output directory to EOS? 
Use of a trick I found  
initialdir = /./eos/path_you_want

Just make sure the log files are not on EOS otherwise it will not work (i.e. when submitting Log = path_you_want then path_you_want is not on EOS) 

e.g go to AFS and you can launch 
```
executable = $ENV(PWD)/foo.exe
should_transfer_files = YES
transfer_output_files = results

# trick for condor 

initialdir=/./eos/user/b/bouquet/
+JobFlavour="espresso"

#job1 
Log       = $ENV(PWD)/foo_a.log
Output    = $ENV(PWD)/foo_a.out
Error     = $ENV(PWD)/foo_a.error
Arguments = a
queue

#job2 
Log       = $ENV(PWD)/foo_b.log
Output    = $ENV(PWD)/foo_b.out
Error     = $ENV(PWD)/foo_b.error
Arguments = b
queue
```

## Working with big files 

* https://batchdocs.web.cern.ch/tutorial/exercise11.html &rightarrow;
"transfer_input_files and transfer_output_files (In fact, the output is limited to 1GB)" 

Use xrdcp 

## Define environnement variable for the condor job 

Imagine you want to defined environnement variable on the host as your EOSPATH or the absolute path where the job was submitted 
```
environment = "ABSPATH_SUBMITTER=$ENV(PWD) EOSPATH=/eos/user/b/bouquet/"
```


## List all properties of jobs

```
condor_q -l 
```

## Remove jobs 

Remove all jobs launched 
````
condor_rm -all 
````

Remove some specific jobs based on its cluster id e.g 5001 and 5002 
````
condor_rm 5001 5002
````

## Connect to a job to see if it is running succesfully 

Only works if the job is in run state 
```
condor_ssh_to_job 5001.0 
```

## Use Proxy 

* https://batchdocs.web.cern.ch/tutorial/exercise2e_proxy.html

Before submitting the job setup following variables
````
# setup proxy valid for 96h 
echo "Setting voms-proxy"  
voms-proxy-init -voms atlas -valid 96:00
voms-proxy-info -all 
export PROXYFILENAME=x509up_u$(id -u)
export PROXYFILEPATH=$HOME/private/$PROXYFILENAME
echo "Copying $PROXYFILENAME to $HOME/private/"
cp /tmp/$PROXYFILENAME $PROXYFILEPATH
````


In the submitter file add 
````
transfer_input_files    = $ENV(PROXYFILEPATH)
environment             = "X509_USER_PROXY=$ENV(PROXYFILENAME)"
````

And in your bash script that will be executed on the remote host add 
````
echo "X509_USER_PROXY = $X509_USER_PROXY"
voms-proxy-info -all
voms-proxy-info -all -file $X509_USER_PROXY
````