Multi-Runtime AI-HPC Middleware

Hybrid AI-HPC Workflows

Fast and easy parallel execution that scales. Run simulations, inference, and training concurrently on leadership-class HPC systems.

10,000+
Tasks per Second
350K
Tokens per Second
1,024
Nodes Tested at Scale

What you can do with RHAPSODY

Concurrent execution of heterogeneous AI-HPC workloads through uniform abstractions

Heterogeneous Workloads

Execute diverse task types concurrently—MPI simulations, GPU kernels, CPU analytics, and Python functions. RHAPSODY sustains up to 6 distinct task types without artificial phase separation, maintaining 95% GPU utilization.

Supports: Dragon Flux RADICAL-Pilot
from rhapsody import Session, AITask, ComputeTask
# Define heterogeneous AI-HPC tasks
tasks = [
AITask(
prompt='What is the capital of France?',
backend=inference_backend.name
),
ComputeTask(
executable='/usr/bin/echo',
arguments=['Hello from Dragon!'],
backend=execution_backend.name
)
]
# Initialize and execute
session = Session([execution_backend, inference_backend])
results = session.submit(tasks)

High-Throughput Inference

Deploy persistent vLLM services with intelligent request routing. Process 350K+ tokens per second with balanced load distribution across multi-node inference clusters. Near-linear scaling demonstrated up to 8 GPU nodes.

Integrates with: vLLM PyTorch DeepSpeed (upcoming)
from rhapsody import Session
from rhapsody.backends import DragonVllmInferenceBackend
# Launch multi-node vLLM service
vllm = DragonVllmInferenceBackend(
model="meta-llama/Llama-2-70b",
nodes=32,
tp=4
)
session = Session([vllm])
# Process 8,000 prompts with load balancing
prompts = ["Analyze..." for _ in range(8000)]
results = session.submit_tasks(prompts)

Integrations: Coupled AI-HPC

Low-latency data exchange between simulations and AI components via Memory-based coupling achieves 50% faster execution than filesystem methods with only 0.08ms latency for PUT/GET operations.

Data Exchange: SmartRedis Redis
from rhapsody import Session
from radical.asyncflow import WorkflowManager
# Create coupled simulation-inference workflow
flow = WorkflowManager()
# Simulation writes state to Redis
@flow.function_task
async def simulation(step):
state = compute_state(step)
redis_client.put_tensor(f"state_{step}", state)
# AI task reads from Redis for inference
@flow.function_task
async def ai_inference(step):
state = redis_client.get_tensor(f"state_{step}")
return model.predict(state)

Integrations: Agentic Workflows

Support LLM-driven decision making with Flowgentic integration. Maintain bounded lag between agent decisions and HPC task execution at scale. Tested with 49,000+ agents managing dynamic workflow orchestration.

Frameworks: Flowgentic LangGraph
from rhapsody import Session
from flowgentic import Agent
# AI agent dynamically plans workflow
agent = Agent(
model="gpt-4",
tools=[simulation_tool, analysis_tool]
)
session = Session([execution_backend])
# Agent decides which HPC tasks to run
for step in range(100):
decision = agent.decide(current_state)
task = create_task(decision)
result = session.submit(task)
current_state = update_state(result)

Performance at Scale

Tested on leadership-class HPC systems including OLCF Frontier, NERSC Perlmutter, and Purdue Anvil

Fast Execution

RHAPSODY introduces minimal overhead, achieving 11,000+ tasks per second throughput with less than 300 microseconds per-task overhead on distributed workers.

11K tasks/sec <300µs overhead

Sustained Heterogeneity

Concurrently execute up to 6 distinct task types spanning MPI simulations, GPU kernels, and Python functions without runtime-imposed phase separation.

6 task types 95% GPU utilization

Inference Scaling

Near-linear scaling demonstrated for high-throughput inference workloads, processing 350,000 tokens per second across 8 GPU nodes with intelligent load balancing.

350K tokens/sec 8x scaling

Efficient Coupling

Memory-based coupling between AI and HPC tasks achieves 50% faster execution than filesystem methods with minimal data transfer overhead.

0.08ms latency 50% faster

Where you can run RHAPSODY

Deploy on local and HPC systems with no platform-specific customization required

Leadership-Class HPC

Run RHAPSODY on supercomputers like OLCF Frontier, NERSC Perlmutter, and other leadership-class systems. Integrates with Slurm, PBS Pro, and other HPC schedulers.

OLCF Frontier NERSC Perlmutter Purdue Anvil

American Science Cloud

Deploy on the Genesis Mission Platform—a federated DOE infrastructure for building and using scientific foundation models across distributed sites.

Genesis Platform DOE Facilities

Super easy to get started

Install RHAPSODY and start scaling your AI-HPC workflows today

Install Now Read Documentation