This documentation runs through an example using a script to effectively submit and manage large batches of molecular dynamics simulation workloads to Cedana from pdb files stored in AWS S3 storage.
Prerequisites
Before running the script, ensure you have the following:
AWS CLI installed and configured with aws configure to access the specified S3 bucket
Cedana CLI must be installed
A valid workload template file (workload.yml) containing placeholders (WORKING_DIR and JOB_NAME) in the same directory as the script
Proper directory structure in your S3 bucket with molecular dynamics simulation files
Script Configuration
#!/bin/bash
BUCKET_NAME="your-bucket-name"
PREFIX="gromacs_test/"
WORKLOAD_CONFIG="./workload.yml"
TEMP_WORKLOAD_CONFIG="./tmp-workload.yml"
# Sanitize job names for Kubernetes
sanitize_job_name() {
local name="md-simul-${1//_/-}" # Replace _ with -
name=$(echo "$name" | tr '[:upper:]' '[:lower:]') # Convert to lowercase
name=$(echo "$name" | sed 's/--*/-/g' | sed 's/-$//') # Remove double hyphens & trailing hyphen
echo "$name"
}
# Get folders from S3
FOLDERS=$(aws s3 ls "s3://$BUCKET_NAME/$PREFIX" --recursive | awk '{print $NF}' | awk -F'/' 'NF>2 {print $2}' | sort -u)
for folder in $FOLDERS; do
echo "Processing folder: $folder"
JOB_NAME=$(sanitize_job_name "$folder")
# Create a temporary workload with right complex.pdb file by replacing placeholders
sed -e "s|WORKING_DIR|$folder|g" -e "s|JOB_NAME|$JOB_NAME|g" "$WORKLOAD_CONFIG" > "$TEMP_WORKLOAD_CONFIG"
# Submit job
cedana-cli create workload --payload "$TEMP_WORKLOAD_CONFIG" --contentType yaml
rm -f "$TEMP_WORKLOAD_CONFIG"
done
The script uses the following default values that you may need to adjust:
BUCKET_NAME="customer-simulation": The S3 bucket containing simulation data
PREFIX="gromacs_test/": The prefix path within the bucket to search for simulation folders
WORKLOAD_CONFIG="./workload.yml": Path to the template workload configuration