Skip to content, Skip to search

Changes

SPIM Workflow Manager For HPC

3,905 bytes added, 02:33, 29 August 2018
Description
= Description =
== HEAppE middleware ==
Accessing a remote HPC cluster is often burdened by administrative overhead due to more or less complex security policies enforced by HPC centers. This barrier can be substantially lowered by employing a middleware tool based on the HPC-as-a-Service concept. To provide this simple and intuitive access to the supercomputing infrastructure an in-house application framework called High-End Application Execution (HEAppE) [http://heappe.eu] Middleware has been developed. This middleware provides HPC capabilities to the users and third-party applications without the need to manage the running jobs from the command-line interface of the HPC scheduler on the cluster.
 
To facilitate access to HPC from the Fiji environment, we utilize the in-house HEAppE Middleware framework allowing end users to access an HPC system through web services and remotely execute pre-defined tasks. Furthermore, HEAppE is designed to be universal and applicable to various HPC architectures. HEAppE also provides the mapping between the external users and internal cluster service accounts that are being used for the actual job submission to the cluster. It simplifies the access to the computation resources from the security and administrative point of view. For security purposes, users are permitted to run only a pre-prepared set of so-called command templates. Each command template defines an arbitrary script or an executable file which is to be run on the cluster, a set of input parameters modifiable at runtime, any dependencies or third-party software it might require, and the type of queue that should be used for the processing.
 
We developed a Fiji plugin underlain by HEAppE, which enables users to steer workflows running on a remote HPC resource. As a representative workflow we use a Snakemake based SPIM data processing pipeline operating on large image datasets. The Snakemake workflow engine resolves dependencies between subsequent steps and executes in parallel any tasks appearing to be independent, such as processing of individual time points of a time-lapse acquisition.
== SPIM data processing pipeline ==
SPIM ("Selective/Single Plane Illumination Microscopy") typically images living biological samples from multiple angles (views) collecting several 3D image stacks to cover the entire biological specimen. The 3D image stacks, representing one time point in a long-term time-lapse acquisition, need to be registered to each other which is typically achieved using fluorescent beads as fiduciary markers .
After the registration, the individual views within one time point need to be combined into a single output image either by content-based fusion or multi-view deconvolution [https://imagej.net/Multiview-Reconstruction]. The living specimen can move during acquisition, necessitating an intermediate step of time-lapse registration. Whereas parallel processing of individual time points has proven to be beneficial, the time-lapse registration takes only a few seconds and can therefore be performed on a single computing node without the need for parallelization.
The sheer amount of the SPIM data requires conversion from raw microscopy data to Hierarchical Data Format (HDF5) for efficient input/output access and visualization in Fiji's BigDataViewer (BDV) [https://imagej.net/BigDataViewer#Publication]. BDV uses an XML file to store experiment metadata (i.e. number of angles, time points, channels etc.). Although the conversion to HDF5 is a parallelizable procedure, further updating the XML file downstream in the pipeline is not; and per-time point XML files have to be created and then merged after completion of the registration and fusion steps. Consequently, the parallel processing of individual time points on an HPC resource (conversion to HDF5, registration, fusion and deconvolution) is interrupted by non-parallelizable steps (time-lapse registration and XML merging).
== HEAppE middleware ==Accessing Pipeline input parameters are entered by a user into a remote HPC config.yaml configuration file. In the first step, the .czi raw data are concurrently resaved into the HDF5 container in parallel on the cluster. Similarly, the individual time points are registered in parallel using fluorescent beads as fiduciary markers on the cluster is often burdened . Subsequently, a non-parallel job executed by administrative overhead due to more or less complex security policies enforced Snakemake consolidate the registration XML files into a single one, followed by HPC centerstime-lapse registration using the beads segmented during the spatial registration step. This barrier can be substantially lowered by employing a middleware tool based on After this, the HPCpipeline diverge into either parallel content-as-abased fusion or parallel multi-Service conceptview deconvolution. To facilitate access to HPC achieve this divergence in practice, the Snakemake pipeline is launched from the Fiji environment, we utilize an in-house HEAppE Middleware framework [http://heappeplugin as two separate jobs using two different config.eu] allowing end users yaml files set to access an HPC system through web services and remotely execute precontent-defined tasksbased fusion and deconvolution respectively. FurthermoreIn the final stage of the pipeline, HEAppE the fusion/deconvolution output is designed to be universal and applicable to various HPC architecturessaved into a new HDF5 containerWe developed a Fiji plugin underlain by HEAppEFigure shows results of registration, which enables users to steer workflows running on a remote HPC resource. As a representative workflow we use a Snakemake based SPIM data processing pipeline operating on large image datasets. The Snakemake workflow engine resolves dependencies between subsequent steps fusion and executes deconvolution in parallel any tasks appearing to be independent, such as processing of individual different time points of a time-lapse acquisition.
= Instalation =
88
edits