This section explains how to best make use of Longbow parameters.
To communicate, copy files and submit jobs on a remote resource, Longbow needs to be supplied with the details of the HPC machine. Also, to automatically create a job submission script to your specification, Longbow also needs to be supplied with details of the job such as the executable to run or the number of cores to use. In this documentation, these pieces of information are referred to as parameters.
There are a number of ways that you can feed Longbow with these parameters, this section of documentation will explain in detail the ways and also the kinds of parameters that can be supplied to Longbow.
Basic Principles of Configuration¶
Longbow has been designed to be extremely flexible in terms of configuration, to keep things simple we have maintained a well-defined system of configuration that will assimilate information from a number of different sources. You are not limited to providing all parameters from one source either.
This means that you can use a combination of input sources for such purposes as representing complex configuration patterns (facility specific configuration differences, or job type differences) or simply for the convenience of having a base configuration for quick job submission but that can be overridden easily if necessary.
There are several sources from which Longbow will bring in configuration parameters, these are the command-line, job configuration files, host configuration files and Longbow internal defaults. Longbow will load up all parameters from these sources and then for each parameter make a decision which one to use based upon a simple but structured hierarchy of priority.
This order of priority is outlined in the following diagram, it can be seen that the command-line has the highest priority. This means that a given parameter on the command-line would override any value provided for this parameter from any of the other sources.
Although Longbow has this well defined structure, not every parameter can be provided in each configuration source. For example, only a subset of the total set of parameters may be given on the command-line. For obvious reasons, there are not Longbow defaults for every single parameter either. This means that the user does have to provide certain information as a requirement in certain sources (for example the host file is always required). The following sections will discuss this in detail.
Provided that at least the host configuration file has been configured (instructions for basic setup) with at least the minimum required parameters, simple jobs can be launched from the command-line. A simple Longbow command-line looks like the following:
longbow [longbow arguments] executable [executableargs]
The above command-line shows how a simple command-line job would be launched, the Longbow parameters should always come before the executable and it’s arguments. So looking at the first half of the Longbow command-line above, it is well known that you will use the program executable name first in the command-line (in this case “longbow”).
The “longbow arguments” are how we configure Longbow from the command-line, we do this by providing flags for Longbow to look for. There are two categories of flags, those that aren’t related to running jobs (such as –help) and those that are explicitly related to running jobs (such as –replicates).
Starting with command-line flags that are not related to running jobs we have
|--about||This flag will output information about Longbow to the console terminal.|
|--examples||This flag will download the Longbow example set, you can control where these are placed by navigating in your terminal to the desired location before launching.|
|--help||This flag will output the Longbow help to the console, this is useful for quick command-line flag look-up.|
|--version||This flag will output the current version of Longbow that is installed on your system.|
The above command-line flags are designed to be run alone, not in conjunction with any other flags. They will typically perform the function described above and then Longbow will exit. An example of running with one of the above flags, would be to query which version of Longbow is installed on your machine, to do this one simply opens a terminal and types:
The following list constains the command-line flags that are explicitly related to running jobs
|--debug||This flag will trigger the output of debugging information to both your log file and the console terminal. Should only be used when requesting support.|
|--disconnect||This flag will activate dis-connect mode link.|
This flag will make Longbow use the host file and path specified and not the default ~/.longbow/host.conf. If only a file name is given and not a full path then longbow will search the current working directory and then the ~/.longbow directory in that order for the named file, it will use the first one it discovers.
This flag will make Longbow look for and load parameters from a job configuration file. Note that a job configuration file is not strictly necessary to use Longbow. If the filename is provided but not the path, Longbow will search in the directory that you launch your experiment from (the current working directory) and then the directory in which the longbow executable resides if necessary. There is no default filename that is assumed for the the job configuration file.
This flag will set the job name. This will NOT override the same parameters in configuration files making it an exception to the usual parameter hierarchy.
This flag is set then the log file will be written to the specified path. If the filename is provided, but not the path, the file will be output to the current working directory. If this flag is not included, a default file called log will be output to the current working directory.
This flag will override the walltime for each job.
|--nochecks||This flag will disable checks that are performed on the application availability on the remote HPC machine. This is for cases where the path the the executable is too complex, such that Longbow has a hard time trying to find it but you are certain that it should work.|
This flag will start the recovery of a failed or disconnected Longbow session. Longbow will save recovery files into the ~/.longbow directory with a date and time stamp in the file name, you should supply the path to this file to initiate the recovery and continuation of the session link
This flag specifies which HPC machine in the host configuration file to send the job to. This will overrule the same parameters in any configuration files.
This flag specifies the number of replicate jobs to run. This will overrule the same parameters in any configuration files.
|--verbose||This flag, will turn on logging to the console terminal in addition to the log file, this is useful in cases where you are running Longbow on a desktop computer and wish to monitor the progress live rather than from file. Longbow is set to only log to file by default, so that it can be used in conjunction with local batch queue systems without duplicate output.|
Now we have seen the Longbow configuration side of the command-line all that remains is the executable side of the command-line:
longbow [longbow arguments] executable [executableargs]
This is simply the command-line of the program that you would normally use if you were not using it in conjunction with Longbow. In the above example the parameter “executable” can be one of pmemd, pmemd.MPI, charmm, namd2, mdrun, mdrun_d, mdrun_mpi, mdrun_mpi_d, lmp_xc30 (Longbow can be extended to work with other software link). The parameter “executableargs” should be replaced with the arguments that you wish to supply to your MD program, for example
"<" example.inp ">" output
or for amber:
-i example.in -c example.min -p example.top -o output for Amber
The executableargs must immediately follow the executable on the Longbow command line. So putting all of this together, we saw in the Running Longbow Examples section an example of the executable and executableargs parameters being provided on the longbow command line. The example below is similar but demonstrates how the user can specify which remote resource to use using the –resource flag:
longbow --resource archer-short --replicates 5 --jobname TestJob pmemd.MPI -O -i test.in -c test.min -p test.top -o test.out
Longbow will submit the replicate job named “TestJob” to ARCHER. If a job is submitted without a job configuration file and also without the –jobname flag being set then the job name will default to “Longbowjob” and will show up as such if you query the batch queue system of the remote resource.
The Job Configuration File¶
Compared to the host configuration file (discussed in the next part), the job configuration file is designed to be changed frequently, perhaps for each job submitted. The idea is that the job configuration file will overrule parameters in the host configuration file on a parameter by parameter basis. For example, A user might have a large set of jobs that fit into different categories of resource requirement (small, medium and large), this user could then have generic job configuration files to setup the number cores and wall time requirements of those three categories of job. The job configuration file allows the user to overrule parameters listed in the host configuration file.
A job configuration file is not necessarily needed as demonstrated in the Running Longbow Examples section of this guide. However, not all parameters that may change on a job by job basis can be provided on the command line, and the user may not wish to change the defaults they have specified in the host configuration file. The job configuration file was introduced to deal with such issues or prevent the need for writing lengthy commands such as:
longbow --cores 48 --maxtime 96:00 --memory 20 --polling-frequency 60 pmemd.MPI -i example.in -c example.min -p example.top -o output
Instead, these parameters can be provided in the job configuration file like this:
[example] resource = Archer maxtime = 01:00 cores = 48 polling-frequency = 120 remoteworkdir = /work/myproject/myproject/myusername/longbow
Where the jobname appears within the square brackets “[ ]” and each parameter and value is listed under this jobname. Multiple jobs can be specified by including multiple sections, each starting with a jobname and then listing its parameters (more on this here link). Once the job configuration file has been prepared the above command-line shrinks down to:
longbow --job /path/filename
If the filename is provided but not the path, Longbow will search in the (local) directory that you launch your experiment from (i.e. the working directory on your desktop) and then the directory in which the longbow executable resides if necessary. There is no default filename that is assumed for the the job configuration file unlike with the host configuration file.
Upon launch, the above job would run on Archer in /work/myproject/myproject/myusername/longbow/exampleXXXXX where XXXXX represents a generated 5-digit number. You see, the remoteworkdir parameter in the the job configuration file would overrule that in the host configuration file. Note that the name of the job “example” is used as the subdirectory of remoteworkdir in which the job runs only with a random 5-digit number appended on the end. These random numbers are appended for all jobs regardless of the configuration methods used, this is to prevent jobs of the same name clashing on the remote resource.
Any parameter listed in the parameters section can be included in the job configuration file with the exception of host and user because these are strongly tied to the HPC resource rather than the job.
The Host Configuration File¶
The host configuration file is designed to be a reasonably static entity which contains the details to access remote HPC machines. In addition, any parameters the user deems would not change across jobs but across hosts or those that hardly change would be a good candidate to appear with the hosts.
Upon installing Longbow, a default host configuration file is created in the ~/.longbow/hosts.conf file and as such the –hosts longbow argument flag needn’t be provided if you are using this file. Many users will wish to modify this file to reflect their HPC resource(s). However, some users may wish to specify their own file using the –hosts longbow argument flag on the command line when submitting jobs:
longbow --hosts /path/filename .....
If the filename is provided but not the path, Longbow will search in the directory that you launch your experiment from (i.e. the working directory), then the directory the longbow executable is stored and then the ~/.longbow directory.
The host configuration file is structured in the below format which consists of sections in square brackets (Longbow uses this as the name of the HPC machine) followed by a list of named parameters and their values:
[Archer] host = login.archer.ac.uk user = myusername remoteworkdir = /work/myproject/myproject/myusername/ account = myproject-Surname
Any parameter listed in the parameters section below can be included in the host configuration file with the exception of the parameter “resource”. There are three parameters that must be included in the host configuration file otherwise an error will occur, these are
user host remoteworkdir
Of these three, only remoteworkdir can be also supplied in the job configuration file. Due to the hierarchy of parameters, if remoteworkdir, or any other common parameter is specified in both, the value in the job configuration file will be preferentially selected.
On a final note, if Longbow is not told which remote resource to submit jobs to using the –resource flag on the command line or parameter of the same name in the job configuration file, Longbow will default to submitting jobs to the HPC machine that appears first in the host configuration file.
List of Configuration File Parameters¶
This section contains a list of parameters that may be used in either of the host or job configuration files, a small selection however should only be used in the host configuration file.
|account||If the HPC machine requires an account code (ARCHER does) supply it using this parameter or else jobs may be rejected.|
|accountflag||Sometimes system administrators for whatever reason decide that they need to change the default for the account directive option. If this is the case then the user can specify what Longbow should supply with this parameter. Longbow defaults to -A for PBS, SGE and SLURM but for LSF will default to -P.|
|cores||The total number of cores to request.|
|corespernode||This parameter is important for Longbow to be be able to properly resource jobs and should be provided for all machines. Longbow has an internal default of 24 cores per node as this is currently a common configuration, however if the machine you are using differs then you should set it using this parameter in your host configuration file. You can normally find this information from the hardware section of your HPC machine webpages or ask their support staff.|
|download-include||Exposes the rsync –include flag for downloads, these flags are used to get fine grained control over what is transferred using rysnc. Users should specify a comma separated list of files to include whilst simultaneously setting the download exclude parameter to all (download-exclude = *) when making use of this parameter.|
|download-exclude||Exposes the rsync –exclude flag for downloads, these flags are used to get fine grained control over what is transferred using rysnc. Users should either specify a comma separated list of files (black-listing) they wish to exclude from the download staging or set to all “*” in conjunction with providing a list of files to the download-include parameter listed above (white-listing).|
|email-address||This parameter allows the user to set an email address that will be written into the job submission script so that the scheduler can send an email on job completion.|
|email-flags||This parameter allows the user to set the email flags that tell the scheduler how you want to receive emails about running jobs. You should use the format that you normally use in your job submission scripts ie PBS “email-flags = b” or for SGE “email-flags = beas”.|
The name of the executable to use on the HPC machine. Using this argument in conjunction with executableargs parameter is a good way of avoiding having to write long command-lines. A list of executables supported out of the box is:
New programs and/or executables can be added by following this guide link
As an alternative to providing the arguments for the MD package on the Longbow command-line, these arguments can be specified in a configuration file.
For example, a NAMD job could be submitted on your local machine in the following fashion:
longbow --verbose namd2 "<" example.in
An equivalent way to submit this job would be to specify the arguments in the job configuration file, job.conf:
[myjob] .. executable = namd2 executableargs = example.in ..
and then use the following command:
longbow --verbose --job job.conf
This parameter enables users to specify the name of the job handler that should be used to run the parallel MD job (e.g. mpirun, aprun etc). If this parameter is not included, the code will automatically try to determine which handler is present on the HPC machine. A minority of users may wish to provide additional arguments to the executable (e.g. -np for mpirun) and can do so using this parameter. Simply use for example:
[Archer] .. handler = mpirun -np 16 ..
Note, that for the aprun handler, the -n and -N flags are provided by default by Longbow.
|host||The address of the HPC machine. For example login.archer.ac.uk|
|localworkdir||Path to the directory on the desktop from which the job should be run if this should not be the current working directory. This is optional and will override where the input files required for the MD job are to be found and where the results files should be directed to (most users should ignore this unless there is a good reason).|
|lsf-cluster||For users of HPC machines that run an LSF scheduler, the cluster the job should be submitted to can be specified with this parameter.|
The modules to be loaded on the remote resource using the “module load” command. A comma separated list can be entered here if multiple modules are required. For example, if you would normally have the following lines in your submission script:
module load intel-mpi module load charmm
simply include the following in the job configuration file:
modules = intel-mpi, charmm
If the modules parameter isn’t specified, Longbow will try assume which modules are required according to the executable name. However, this only works if the executable supplied is supported by a plugin. Out of the box the following executables map onto the following module names by default
|maxtime||Maximum wall clock time, this will be used to tell the scheduler how long the job should last and will likely be kicked out of the queue if it overruns. This should be given in the format “HH:MM”. Longbow will automatically add zero seconds onto your entry if your scheduler requires the format “HH:MM:SS”.|
|memory||Integer representing the number of GB to be assigned to the scheduler memory directive in your submit script.|
|mpiprocs||Allows undersubscription or to change mpiprocs freely without hacking the corespernode parameter. This is often needed to properly run LAMMPS SMP builds.|
|polling-frequency||The interval for Longbow to query the status of a job/s, this is given in seconds and should not be set too small (not less than 60) otherwise the system admins may not like you.|
|port||The port number if the remote resource is using an unusual port for ssh, Longbow defaults to 22 if nothing is given.|
|queue||The queue the job should be submitted to on the remote resource.|
|remoteworkdir||Name the working directory on the HPC machine. If it doesn’t already exist Longbow will create it. Longbow will then use remoteworkdir as its staging area, so for each job a subdirectory will be created by Longbow in which the job will run.|
|replicates||Number of desired replicates for job arrays (see the Running Jobs section).|
Users that wish to have different naming prefix for replicate directories than the default of rep1, rep2, rep3, … can supply something like this in their configuration scripts:
replicate-naming = run
and this will result in the directories run1, run2, run3, … getting used.
|resource||This specifies the name of the HPC machine to use, which refers to the name given within the square brackets  in the host configuration file.|
|scheduler||This is the name of the job scheduling environment (PBS/LSF/SGE/SLURM) this can be used to force Longbow to use the logic for a given scheduler if the internal tests run by Longbow are struggling to identify the setup for your HPC machine.|
This parameter is for including scripts in the job submission script written by Longbow. The script/s must already be present on the HPC machine, Longbow will not transfer these for you, so any paths must be valid for the script path on the HPC machine.
An example of using this in a configuration file is to include the Linux module initialisation script:
scripts = source /etc/profile.d/modules.sh
and this will append this line into your job submission file like this:
#!/bin/bash --login #$ -cwd -V #$ -N single #$ -q gpu #$ -l h_rt=24:00:00 source /etc/profile.d/modules.sh module load apps/intel/gromacs/4.6.1 mpirun mdrun -s example.tpr -deffnm output
Multiple scripts can be included by referencing a comma separated list of commands:
scripts = source /etc/profile.d/modules.sh, source /this/one.too
This parameter is specific to users that are using machines with SGE. This parameter controls the name of the parallel environment that the job is asking for, by default this is set to “mpi” which yields the following line in your job submission script:
#$ -pe mpi 8
however if you have to provide something different here such as “gpu-env” then set this flag to that name in your host or job configuration file.
This parameter is specific to users that are using machines with SGE. Users of some machines, possibly using accelerators, will want to use a single core. In this instance Longbow would not write out a line for the parallel environment directive “#$ -pe mpi 1”, however some machines have been configured in a way that jobs fail if this is not given. So to provide this for cases with 1 core jobs then provide the following in your host or job configuration file:
sge-peoverride = true
This parameter is specific to users that are using machines with slurm. Users of machines that have generic resources configured can use them by supplying the slurm-gres parameter in a job or host configuration file:
slurm-gres = gpu:1
results in the following being added to your job submit script generated by Longbow:
|staging-frequency||The frequency in seconds in which files should be synced between the remote and local machine. If the frequency should be the same as the polling frequency then leave this unset and it will default to the same. This parameter should not be set too small, especially you are syncing large files otherwise you will be syncing constantly.|
|stderr||This parameter will rename the stdout file that is created by the scheduling system.|
|stdout||This parameter will rename the stdout file that is created by the scheduling system.|
|subfile||Advanced users that use other tools to generate submission scripts but would like to take advantage of the staging and submission parts of Longbow can do so using the subfile parameter to give the exising submit file. This is for advanced users and workflow developers that understand the implications of doing this. You will still have to provide normal command-lines etc and go through all the checks and tests.|
|user||Used to supply your user name on the HPC machine. This is the user name that you would normally use with SSH.|
To give users a solid starting point and to also minimise the number of false starts for new users, some parameters will have default values. These will be based on machine specific metrics for the UK national HPC - currently ARCHER. Below are listed some of the parameters that will have a default value and what that default is.
A Worked Example¶
To Demonstrate the interplay of configuration files, this section provides a demonstration of a PBS job submission script that is generated using a job configuration file and also the default host configuration file (~/.longbow/hosts.conf) and how that final submit file is assembled.
Host configuration file:
[myhpc] host = login.archer.ac.uk user = myusername scheduler = PBS handler = mpirun corespernode = 16
Job configuration file:
[single] resource = myhpc executable = mdrun remoteworkdir = /work/longbow maxtime = 01:00 cores = 32 memory = 20 queue = dev executableargs = -s example.tpr -deffnm output
longbow --verbose --job job.conf
Below is the resultant PBS submission script that is generated by Longbow, this and all associated job files will then be copied to /work/longbow/singleXXXXX on the remote resource (where XXXXX represents a random 5-digit number) and submitted to the scheduler:
#!/bin/bash --login #PBS -N single #PBS -q dev #PBS -l select=2:ncpus=16:mpiprocs=16:mem=20gb #PBS -l walltime=01:00:00 export PBS_O_WORKDIR=$(readlink -f $PBS_O_WORKDIR) cd $PBS_O_WORKDIR export OMP_NUM_THREADS=1 module load gromacs mpirun mdrun -s example.tpr -deffnm output