ZooConf User Guide¶
The zoo_conf module provides essential utilities for working with ZOO-Project configuration and CWL workflows. This guide covers practical usage patterns.
Overview¶
The module contains four main classes:
- ZooConf: Configuration handler
- ZooInputs: Input parameter parser with type conversion
- ZooOutputs: Output handler
- CWLWorkflow: CWL workflow parser and analyzer
Working with ZooConf¶
ZooConf is a simple wrapper around the ZOO-Project configuration dictionary.
Basic Usage¶
from zoo_conf import ZooConf
def my_service(conf, inputs, outputs):
zoo_conf = ZooConf(conf)
# Get the workflow identifier
workflow_id = zoo_conf.workflow_id # From conf["lenv"]["Identifier"]
# Access underlying conf dict
tmp_path = zoo_conf.conf["main"]["tmpPath"]
return 3
Handling Inputs with ZooInputs¶
ZooInputs handles the complexity of ZOO input parsing, including type conversion and array handling.
Type Conversion¶
ZOO-Project passes all values as strings. ZooInputs automatically converts them to appropriate Python types:
from zoo_conf import ZooInputs
inputs = {
"threshold": {
"dataType": "float",
"value": "0.85"
},
"max_iterations": {
"dataType": "integer",
"value": "100"
},
"enable_cache": {
"dataType": "boolean",
"value": "1"
}
}
zoo_inputs = ZooInputs(inputs)
params = zoo_inputs.get_processing_parameters()
# Result:
# {
# 'threshold': 0.85, # float
# 'max_iterations': 100, # int
# 'enable_cache': 1 # int (boolean)
# }
File Inputs¶
File inputs are converted to CWL File objects:
inputs = {
"input_raster": {
"cache_file": "/tmp/zTMPykgmh8/data.tif",
"mimeType": "image/tiff"
},
"config_file": {
"cache_file": "/tmp/zTMPykgmh8/config.json",
"mimeType": "application/json"
}
}
zoo_inputs = ZooInputs(inputs)
params = zoo_inputs.get_processing_parameters()
# Result:
# {
# 'input_raster': {
# 'class': 'File',
# 'path': '/tmp/zTMPykgmh8/data.tif',
# 'format': 'image/tiff'
# },
# 'config_file': {
# 'class': 'File',
# 'path': '/tmp/zTMPykgmh8/config.json',
# 'format': 'application/json'
# }
# }
Array Inputs¶
ZOO-Project has a quirk: arrays with one element are converted to strings. ZooInputs automatically handles this:
inputs = {
"bands": {
"maxOccurs": "3",
"value": "B02" # Single value, but should be array
}
}
zoo_inputs = ZooInputs(inputs)
# Automatically converted to array:
# inputs["bands"]["value"] = ["B02"]
Getting Individual Input Values¶
zoo_inputs = ZooInputs(inputs)
# Get specific input
try:
threshold = zoo_inputs.get_input_value("threshold")
except KeyError:
print("threshold parameter not provided")
Handling Outputs with ZooOutputs¶
ZooOutputs simplifies setting service output values.
Basic Output¶
from zoo_conf import ZooOutputs
def my_service(conf, inputs, outputs):
# ... processing ...
zoo_outputs = ZooOutputs(outputs)
# Set the result
zoo_outputs.set_output("/tmp/results/catalog.json")
return 3
Multiple Outputs¶
outputs = {
"result": {},
"metadata": {}
}
zoo_outputs = ZooOutputs(outputs)
# Sets the primary output (first key)
zoo_outputs.set_output("/tmp/result.tif")
# For additional outputs, set directly
outputs["metadata"]["value"] = "/tmp/metadata.json"
Getting Output Parameters¶
zoo_outputs = ZooOutputs(outputs)
output_params = zoo_outputs.get_output_parameters()
# {'result': '/tmp/result.tif', 'metadata': '/tmp/metadata.json'}
Working with CWL Workflows¶
CWLWorkflow provides utilities for parsing and analyzing CWL workflows.
Loading a Workflow¶
from zoo_conf import CWLWorkflow
# Load from dict
cwl_dict = {
"cwlVersion": "v1.0",
"$graph": [
{
"class": "Workflow",
"id": "main-workflow",
"label": "My Workflow",
"doc": "Process satellite imagery",
"inputs": [...],
"outputs": [...],
"steps": [...]
}
],
"s:softwareVersion": "1.2.0"
}
workflow = CWLWorkflow(cwl_dict, "main-workflow")
Getting Workflow Metadata¶
# Version
version = workflow.get_version() # "1.2.0"
# Label
label = workflow.get_label() # "My Workflow"
# Documentation
doc = workflow.get_doc() # "Process satellite imagery"
Analyzing Workflow Inputs¶
# Get all inputs
all_inputs = workflow.get_workflow_inputs()
# ['input_data', 'threshold', 'bands', 'output_format']
# Get only mandatory inputs (no default values)
required_inputs = workflow.get_workflow_inputs(mandatory=True)
# ['input_data']
# Validate that required inputs are provided
zoo_inputs = ZooInputs(inputs)
for required in required_inputs:
try:
zoo_inputs.get_input_value(required)
except KeyError:
raise ValueError(f"Missing required input: {required}")
Resource Requirements¶
Analyze and aggregate resource requirements across the workflow:
resources = workflow.eval_resource()
# Result:
# {
# 'coresMin': [2, 4, 1], # Requirements from each step
# 'coresMax': [4, 8, 2],
# 'ramMin': [2048, 4096, 512],
# 'ramMax': [4096, 8192, 1024],
# 'tmpdirMin': [1024],
# 'tmpdirMax': [2048],
# 'outdirMin': [512],
# 'outdirMax': [1024]
# }
# Calculate totals
total_cores_min = sum(resources['coresMin']) # 7
total_ram_min = sum(resources['ramMin']) # 6656 MB
# Use maximum values for resource allocation
max_cores = max(resources['coresMax']) # 8
max_ram = max(resources['ramMax']) # 8192 MB
print(f"Request {max_cores} cores and {max_ram} MB RAM")
Scatter Operations¶
The resource evaluator automatically accounts for scatter operations:
import os
# Set scatter multiplier (default: 2)
os.environ["SCATTER_MULTIPLIER"] = "3"
workflow = CWLWorkflow(cwl_dict, "main-workflow")
resources = workflow.eval_resource()
# Steps with scatter will have their resources multiplied by 3
Complete Service Example¶
Here's a complete ZOO service using all the utilities:
from zoo_conf import ZooConf, ZooInputs, ZooOutputs, CWLWorkflow
from base_runner import BaseRunner
def process_imagery(conf, inputs, outputs):
"""
ZOO Service for processing satellite imagery
"""
try:
# Parse configuration
zoo_conf = ZooConf(conf)
workflow_id = zoo_conf.workflow_id
# Load CWL workflow
cwl_path = f"/opt/zooservices/{workflow_id}/app-package.cwl"
with open(cwl_path) as f:
import yaml
cwl_dict = yaml.safe_load(f)
workflow = CWLWorkflow(cwl_dict, workflow_id)
# Validate inputs
required_inputs = workflow.get_workflow_inputs(mandatory=True)
zoo_inputs = ZooInputs(inputs)
for required in required_inputs:
try:
zoo_inputs.get_input_value(required)
except KeyError:
conf["lenv"]["message"] = f"Missing required input: {required}"
return 4 # SERVICE_FAILED
# Get processing parameters
params = zoo_inputs.get_processing_parameters()
# Evaluate resources
resources = workflow.eval_resource()
total_ram = sum(resources['ramMin'])
conf["lenv"]["message"] = f"Workflow requires {total_ram} MB RAM"
# Create runner
runner = BaseRunner(
cwl_path=cwl_path,
input_params=params,
conf=conf
)
# Execute
result = runner.execute()
# Set outputs
zoo_outputs = ZooOutputs(outputs)
zoo_outputs.set_output(result["stac_catalog"])
return 3 # SERVICE_SUCCEEDED
except Exception as e:
conf["lenv"]["message"] = str(e)
return 4 # SERVICE_FAILED
Best Practices¶
1. Always Use ZooInputs for Type Conversion¶
# ✅ DO: Use ZooInputs for automatic type conversion
zoo_inputs = ZooInputs(inputs)
params = zoo_inputs.get_processing_parameters()
# ❌ DON'T: Manual parsing (error-prone)
threshold = float(inputs["threshold"]["value"])
2. Validate Mandatory Inputs¶
# ✅ DO: Check for required inputs
workflow = CWLWorkflow(cwl_dict, workflow_id)
required = workflow.get_workflow_inputs(mandatory=True)
zoo_inputs = ZooInputs(inputs)
for inp in required:
try:
zoo_inputs.get_input_value(inp)
except KeyError:
return 4 # SERVICE_FAILED
3. Use Resource Evaluation¶
# ✅ DO: Evaluate and communicate resource requirements
resources = workflow.eval_resource()
total_ram = sum(resources['ramMin'])
conf["lenv"]["message"] = f"Requires {total_ram} MB RAM"
4. Handle Arrays Correctly¶
# ✅ DO: Let ZooInputs handle array conversion
zoo_inputs = ZooInputs(inputs) # Handles single-value arrays automatically
# ❌ DON'T: Assume inputs["array"]["value"] is always a list
Troubleshooting¶
Array Input Not Recognized¶
Problem: Single-value array becomes a string
Solution: Use ZooInputs - it automatically handles this:
Type Conversion Errors¶
Problem: String values not converting to numbers
Solution: Ensure dataType is set in input definition:
Resource Evaluation Returns Empty Lists¶
Problem: eval_resource() returns empty lists
Solution: Ensure CWL workflow has ResourceRequirement in requirements or hints: