Optional HPC support¶
The optional ZOO-Kernel HPC support gives you the opportunity to use OGC WPS for invoking remote execution of OTB applications. The current implementation rely on OpenSSH and the Slurm scheduler.
Note
Slurm is an acronym for Simple Linux Utility for Resource Management. Learn more on official website.
For executing an OGC WPS Service using this HPC support, one should use the OGC WPS version 2.0.0 and asynchornous request. Any tentative to execute synchronously a HPC service will fail with the message “The synchronous mode is not supported by this type of service”. The ZOO-Kernel is not the only responsible for the execution and will wait for the execution on the HPC server to end before being able to continue its execution. Also, transfering the data from the WPS server to the cluster and downloading the data produced by the execution will take time. Hence, when OGC WPS Client will request for GetCapabilities or DescribeProcess only the “async-execute” mode will be present in the jobControlOptions attribute for the HPC services.
You can see in the sequence diagram below the interactions between the OGC WPS Server (ZOO-Kernel), the mail daemon (running on the OGC WPS server), the Callback Service and, the HPC Server during an execution of a HPC Service. The dashed lines represent the behavior in case the optional callback service invocation has been activated. These invocations are made asynchronously for lowering their impact over the whole process in case of failure of the callback service for instance.
By now, the callback service is not a WPS service but an independent server.
Installation and configuration¶
Follow the step described below in order to activate the ZOO-Project optional HPC support.
Prerequisites¶
latest ZOO-Kernel trunk version
an access to a server with Slurm and OrfeoToolBox.
Installation steps¶
ZOO-Kernel¶
Compile ZOO-Kernel using the configuration options as shown below:
cd zoo-kernel
autoconf
./configure --with-hpc=yes --with-ssh2=/usr --with-mapserver=/usr --with-ms-version=7
make
sudo make install
Optionally, you can ask your ZOO-Kernel to invoke a callback service
which is responsible to record execution history and data
produced. In such a case you can add the --with-callback=yes
option to the configure command.
Note
In case you need other languages to be activated, such as Python for exemple, please use the corresponding option(s).
FinalizeHPC WPS Service¶
For being informed that the remote OTB application ends on the
cluster, one should invoke the FinalizeHPC service. It is responsible
to connect using SSH to the HPC server to run an sacct
command for
extracting detailled informations about the sbatch that has been
run. If the sacct
command succeed, and the service is no more
running on the cluster, then the informations are stored in a local
conf file containing a [henv]
section definition, the service
connect to unix domain socket (opened by the ZOO-Kernel that has
initially schedduled the service through Slurm) to inform about the
end of the service run on the cluster. This makes the initial
ZOO-Kernel to continue its execution by downloading output data
produced over the execution of the OTB application on the cluster. So,
this service should be build and deployed on your WPS server. You can
use the following commands to do so.
cd zoo-service/utils/hpc
make
cp cgi-env/* /usr/lib/cgi-bin
mkdir -p /var/data/xslt/
cp xslt/updateExecute.xsl /var/data/xslt/
You should also copy the .. note:
FinalizeHPC should be called from a daemon, responsible for reading
mails sent by the cluster to the WPS server.
Configuration steps¶
Main configuration file¶
When HPC support is activated, you can use different HPC configuration
by adding confId
to your usual serviceType=HPC
in your zcfg
file. For being able to find which configuration a service should
use, the ZOO-Kernel require to know what are the options for creating
the relevant sbatch.
Also, you can define multiple configuration to run the OTB application
on the cluster(s) depending on the size of the inputs. You should
define in the section corresponding to your ServiceType
the
treshold for both raster (preview_max_pixels
) and vector
(preview_max_features
) input. In case the raster or the vector
dataset is upper the defined limit, then the fullres_conf
will
be used, in other case the preview_conf
will be.
For each of this configurations, you will have define the parameters
to connect the HPC server, by providing ssh_host
, ssh_port
,
ssh_user
and, ssh_key
. Also, you should set where the input
data will be stored on the HPC server, by defining
remote_data_path
(the default directory to store data),
remote_presistent_data_path
(the directory to store data
considerated as shared data, see below) and, remote_work_path
the
directory used to store the SBATCH script created locally then,
uploaded by the ZOO-Kernel.
Also, there are multiple options you can use to run your applications
using SBATCH. You can define them using jobscript_header
,
jobscript_body
and jobscript_footer
or by using
sbatch_options_<SBATCH_OPTION>
where <SBATCH_OPTION>
should be
replaced by a real option name, like workdir
in the following
example. For creating the SBATCH file, the ZOO-Kernel create a file
starting with the content of the file pointed by jobscript_header
(if any, a default header is set in other case), then, any option
defined in sbatch_options_*
and a specific one: job-name
,
then, jobscript_body
is added (if any, usually to load required
modules), then the ZOO-Kernel add the invocation of the OTB
application then, optionally the jobscript_footer
is added, if
any.
Finally, remote_command_opt
should contains all the informations
you want to be extracted by the sacct
command run by the
FinalizeHPC service. billing_nb_cpu
is used for billing purpose to
define a cost for using a specific option (preview or fullres).
In addition to the specific HPC_<ID>
section and the corresponding
fullres and preview ones, you should define in the [security]
section using the shared
parameter to set the URLs from where the
downloaded data should be considerated as shared, meaning that even if
this ressources requires authentication to be accessed, any
authenticated user will be allowed to access the cache file even if
it was created by somebody else. Also, this shared cache won’t contain
any authentication informations in the cache file name as it is
usually the case.
[HPC_Sample]
preview_max_pixels=820800
preview_max_features=100000
preview_conf=hpc-config-2
fullres_conf=hpc-config-1
[hpc-config-1]
ssh_host=mycluster.org
ssh_port=22
ssh_user=cUser
ssh_key=/var/www/.ssh/id_rsa.pub
remote_data_path=/home/cUser/wps_executions/data
remote_persitent_data_path=/home/cUser/wps_executions/datap
remote_work_path=/home/cUser/wps_executions/script
jobscript_header=/usr/lib/cgi-bin/config-hpc1_header.txt
jobscript_body=/usr/lib/cgi-bin/config-hpc1_body.txt
sbatch_options_workdir=/home/cUser/wps_executions/script
sbatch_substr=Submitted batch job
billing_nb_cpu=1
remote_command_opt=AllocCPUS,AllocGRES,AllocNodes,AllocTRES,Account,AssocID,AveCPU,AveCPUFreq,AveDiskRead,AveDiskWrite,AvePages,AveRSS,AveVMSize,BlockID,Cluster,Comment,ConsumedEnergy,ConsumedEnergyRaw,CPUTime,CPUTimeRAW,DerivedExitCode,Elapsed,Eligible,End,ExitCode,GID,Group,JobID,JobIDRaw,JobName,Layout,MaxDiskRead,MaxDiskReadNode,MaxDiskReadTask,MaxDiskWrite,MaxDiskWriteNode,MaxDiskWriteTask,MaxPages,MaxPagesNode,MaxPagesTask,MaxRSS,MaxRSSNode,MaxRSSTask,MaxVMSize,MaxVMSizeNode,MaxVMSizeTask,MinCPU,MinCPUNode,MinCPUTask,NCPUS,NNodes,NodeList,NTasks,Priority,Partition,QOS,QOSRAW,ReqCPUFreq,ReqCPUFreqMin,ReqCPUFreqMax,ReqCPUFreqGov,ReqCPUS,ReqGRES,ReqMem,ReqNodes,ReqTRES,Reservation,ReservationId,Reserved,ResvCPU,ResvCPURAW,Start,State,Submit,Suspended,SystemCPU,Timelimit,TotalCPU,UID,User,UserCPU,WCKey,WCKeyID
[hpc-config-2]
ssh_host=mycluster.org
ssh_port=22
ssh_user=cUser
ssh_key=/var/www/.ssh/id_rsa.pub
remote_data_path=/home/cUser/wps_executions/data
remote_persitent_data_path=/home/cUser/wps_executions/datap
remote_work_path=/home/cUser/wps_executions/script
jobscript_header=/usr/lib/cgi-bin/config-hpc2_header.txt
jobscript_body=/usr/lib/cgi-bin/config-hpc2_body.txt
sbatch_options_workdir=/home/cUser/wps_executions/script
sbatch_substr=Submitted batch job
billing_nb_cpu=4
remote_command_opt=AllocCPUS,AllocGRES,AllocNodes,AllocTRES,Account,AssocID,AveCPU,AveCPUFreq,AveDiskRead,AveDiskWrite,AvePages,AveRSS,AveVMSize,BlockID,Cluster,Comment,ConsumedEnergy,ConsumedEnergyRaw,CPUTime,CPUTimeRAW,DerivedExitCode,Elapsed,Eligible,End,ExitCode,GID,Group,JobID,JobIDRaw,JobName,Layout,MaxDiskRead,MaxDiskReadNode,MaxDiskReadTask,MaxDiskWrite,MaxDiskWriteNode,MaxDiskWriteTask,MaxPages,MaxPagesNode,MaxPagesTask,MaxRSS,MaxRSSNode,MaxRSSTask,MaxVMSize,MaxVMSizeNode,MaxVMSizeTask,MinCPU,MinCPUNode,MinCPUTask,NCPUS,NNodes,NodeList,NTasks,Priority,Partition,QOS,QOSRAW,ReqCPUFreq,ReqCPUFreqMin,ReqCPUFreqMax,ReqCPUFreqGov,ReqCPUS,ReqGRES,ReqMem,ReqNodes,ReqTRES,Reservation,ReservationId,Reserved,ResvCPU,ResvCPURAW,Start,State,Submit,Suspended,SystemCPU,Timelimit,TotalCPU,UID,User,UserCPU,WCKey,WCKeyID
[security]
attributes=Cookie,Cookies
hosts=*
shared=myhost.net/WCS
You can see below an example of jobscript_header
file.
#!/bin/sh
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH --exclusive
#SBATCH --distribution=block:block
#SBATCH --partition=partName
#SBATCH --mail-type=END # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=user@wps_server.net # Where to send mail
You can see below an example of jobscript_body
file.
# Load all the modules
module load cv-standard
module load cmake/3.6.0
module load gcc/4.9.3
module load use.own
module load OTB/6.1-serial-24threads
In casse you have activated the callback service, then you should also
have a [callback]
section, in which you will define url
to
invoke the callback service, prohibited
to list the services that
should not require invocation of the callback sercvice if any and,
template
pointing to the local updateExecute.xsl
file used to
replace any inputs provided by value to the reference to the locally
published OGC WFS/WCS web services. This execute request is provided
to the callback service.
[callback]
url=http://myhost.net:port/callbackUpdate/
prohibited=FinalizeHPC,Xml2Pdf,DeleteData
template=/home/cUser/wps_dir/updateExecute.xsl
OGC WPS Services metadata¶
To produce the zcfg files corresponding to the metadata definition of
the WPS services, you can use the otb2zcfg tool to produce them. You
will need to replace serviceType=OTB
by serviceType=HPC
and,
optionally, add one line containing confId=HPC_Sample
for
instance.
Please refer to otb2zcfg documentation to know how to use this tool.
Using the HPC support, when you define one output, there will be automatically 1 to 3 inner outputs created for the defined output:
- download_link
URL to download to generated output
- wms_link
URL to access the OGC WMS for this output (only in case useMapserver=true)
- wcs_link/wfs_link
URL to access the OGC WCS or WFS for this output (only in case useMapserver=true)
You can see below an example of Output node resulting of the definition of one output named out and typed as geographic imagery.
<wps:Output>
<ows:Title>Outputed Image</ows:Title>
<ows:Abstract>Image produced by the application</ows:Abstract>
<ows:Identifier>out</ows:Identifier>
<wps:Output>
<ows:Title>Download link</ows:Title>
<ows:Abstract>The download link</ows:Abstract>
<ows:Identifier>download_link</ows:Identifier>
<wps:ComplexData>
<wps:Format default="true" mimeType="image/tiff"/>
<wps:Format mimeType="image/tiff"/>
</wps:ComplexData>
</wps:Output>
<wps:Output>
<ows:Title>WMS link</ows:Title>
<ows:Abstract>The WMS link</ows:Abstract>
<ows:Identifier>wms_link</ows:Identifier>
<wps:ComplexData>
<wps:Format default="true" mimeType="image/tiff"/>
<wps:Format mimeType="image/tiff"/>
</wps:ComplexData>
</wps:Output>
<wps:Output>
<ows:Title>WCS link</ows:Title>
<ows:Abstract>The WCS link</ows:Abstract>
<ows:Identifier>wcs_link</ows:Identifier>
<wps:ComplexData>
<wps:Format default="true" mimeType="image/tiff"/>
<wps:Format mimeType="image/tiff"/>
</wps:ComplexData>
</wps:Output>
</wps:Output>