RHAPSODY | Hybrid AI-HPC Workflow Execution

What you can do with RHAPSODY

Concurrent execution of heterogeneous AI-HPC workloads through uniform abstractions

Heterogeneous Workloads

Execute diverse task types concurrently—MPI simulations, GPU kernels, CPU analytics, and Python functions. RHAPSODY sustains up to 6 distinct task types without artificial phase separation, maintaining 95% GPU utilization.

Documentation → Performance Benchmarks →

Supports: Dragon Flux RADICAL-Pilot

from rhapsody import Session, AITask, ComputeTask
# Define heterogeneous AI-HPC tasks
tasks = [
AITask(
prompt='What is the capital of
                                France?',
backend=inference_backend.name
),
ComputeTask(
executable='/usr/bin/echo',
arguments=['Hello from Dragon!'],
                        
backend=execution_backend.name
)
]
# Initialize and execute
session = Session([execution_backend,
                            inference_backend])
results = session.submit(tasks)

High-Throughput Inference

Deploy persistent vLLM services with intelligent request routing. Process 350K+ tokens per second with balanced load distribution across multi-node inference clusters. Near-linear scaling demonstrated up to 8 GPU nodes.

Documentation →

Integrates with: vLLM PyTorch DeepSpeed (upcoming)

from rhapsody import Session
from rhapsody.backends import DragonVllmInferenceBackend
# Launch multi-node vLLM service
vllm = DragonVllmInferenceBackend(
model="meta-llama/Llama-2-70b",
                        
nodes=32,
tp=4
)
session = Session([vllm])
# Process 8,000 prompts with load balancing
                        
prompts = ["Analyze..." for _ in range(8000)]
results = session.submit_tasks(prompts)
                        

Integrations: Coupled AI-HPC

Low-latency data exchange between simulations and AI components via Memory-based coupling achieves 50% faster execution than filesystem methods with only 0.08ms latency for PUT/GET operations.

Documentation →

Data Exchange: SmartRedis Redis

from rhapsody import Session
from radical.asyncflow import WorkflowManager
# Create coupled simulation-inference
                                workflow
flow = WorkflowManager()
# Simulation writes state to Redis
@flow.function_task
async def simulation(step):
state = compute_state(step)
redis_client.put_tensor(f"state_{step}", state)
# AI task reads from Redis for inference
                        
@flow.function_task
async def ai_inference(step):
state = redis_client.get_tensor(f"state_{step}")
return model.predict(state)

Integrations: Agentic Workflows

Support LLM-driven decision making with Flowgentic integration. Maintain bounded lag between agent decisions and HPC task execution at scale. Tested with 49,000+ agents managing dynamic workflow orchestration.

Documentation →

Frameworks: Flowgentic LangGraph

from rhapsody import Session
from flowgentic import Agent
# AI agent dynamically plans workflow
agent = Agent(
model="gpt-4",
tools=[simulation_tool, analysis_tool]
)
session = Session([execution_backend])
                        
# Agent decides which HPC tasks to run
for step in
                            range(100):
                        
decision = agent.decide(current_state)
task = create_task(decision)
result = session.submit(task)
                        
current_state = update_state(result)

Performance at Scale

Tested on leadership-class HPC systems including OLCF Frontier, NERSC Perlmutter, and Purdue Anvil

Fast Execution

RHAPSODY introduces minimal overhead, achieving 11,000+ tasks per second throughput with less than 300 microseconds per-task overhead on distributed workers.

11K tasks/sec <300µs overhead

Sustained Heterogeneity

Concurrently execute up to 6 distinct task types spanning MPI simulations, GPU kernels, and Python functions without runtime-imposed phase separation.

6 task types 95% GPU utilization

Inference Scaling

Near-linear scaling demonstrated for high-throughput inference workloads, processing 350,000 tokens per second across 8 GPU nodes with intelligent load balancing.

350K tokens/sec 8x scaling

Efficient Coupling

Memory-based coupling between AI and HPC tasks achieves 50% faster execution than filesystem methods with minimal data transfer overhead.

0.08ms latency 50% faster

Where you can run RHAPSODY

Deploy on local and HPC systems with no platform-specific customization required

Leadership-Class HPC

Run RHAPSODY on supercomputers like OLCF Frontier, NERSC Perlmutter, and other leadership-class systems. Integrates with Slurm, PBS Pro, and other HPC schedulers.

OLCF Frontier NERSC Perlmutter Purdue Anvil

American Science Cloud

Deploy on the Genesis Mission Platform—a federated DOE infrastructure for building and using scientific foundation models across distributed sites.

Genesis Platform DOE Facilities

Hybrid AI-HPC Workflows

What you can do with RHAPSODY

Heterogeneous Workloads

High-Throughput Inference

Integrations: Coupled AI-HPC

Integrations: Agentic Workflows

Performance at Scale

Fast Execution

Sustained Heterogeneity

Inference Scaling

Efficient Coupling

Where you can run RHAPSODY

Leadership-Class HPC

American Science Cloud

Super easy to get started