Architecture¶

How OpenModal works under the hood — from openmodal run to a running container on your cloud.

The big picture¶

OpenModal is an orchestration layer on top of Kubernetes. Your code gets packaged into a Docker image, pushed to a container registry, and runs as a Kubernetes pod on your cloud.

graph LR
    Code[your code] --> Image[Docker image] --> Pod[K8s pod] --> Result[result]

Under the hood, three systems are involved:

graph TB
    subgraph Your Machine
        CLI[openmodal CLI]
    end

    subgraph Container Registry
        Image[Docker Image]
    end

    subgraph Kubernetes Cluster
        API[K8s API Server]
        Scheduler[Scheduler]
        Node1[Node 1]
        Node2[Node 2]
        Node3[GPU Node]
    end

    CLI -->|1. build & push| Image
    CLI -->|2. create pod| API
    API --> Scheduler
    Scheduler --> Node1
    Scheduler --> Node2
    Scheduler --> Node3
    Image -.->|3. pull| Node1
    Image -.->|3. pull| Node3

Container Registry (Artifact Registry, ECR, or ACR) — stores your Docker images
K8s API Server — accepts pod creation requests
Scheduler — places pods on nodes with enough CPU, memory, and GPUs

What happens when you run `openmodal run app.py`¶

sequenceDiagram
    participant You
    participant CLI as openmodal CLI
    participant Registry as Container Registry
    participant K8s as K8s API
    participant Pod

    You->>CLI: openmodal run app.py
    CLI->>CLI: Generate Dockerfile from Image chain
    CLI->>Registry: Build & push image
    CLI->>K8s: Create pod with image
    K8s->>Pod: Schedule on node, pull image, start
    Pod->>Pod: Unpickle args → call function → pickle result
    Pod-->>CLI: Return result
    CLI-->>You: Print return value

Step by step¶

1. Image build. OpenModal reads the Image chain (.apt_install(), .pip_install(), etc.) and generates a Dockerfile. It builds the image and pushes it to a registry.

Provider	Registry	Build method
GCP	Artifact Registry	Cloud Build (remote)
AWS	ECR	Local `docker build` + push
Azure	ACR	ACR Tasks (remote)
Local	None	Local `docker build`

GCP and Azure build images remotely — no local Docker needed. AWS uses local Docker because CodeBuild requires admin IAM permissions.

2. Pod creation. OpenModal creates a Kubernetes pod spec with your image, resource requests, GPU requirements, env vars, and volumes, then submits it to the K8s API.

3. Scheduling. The scheduler finds a node with enough free resources. If nothing fits, the pod stays Pending and the cluster autoscaler adds a new node (see Cluster autoscaling).

4. Image pull. The node pulls the image from the registry. First pull is slow (2-30s depending on image size). Subsequent pulls on the same node use cached layers.

5. Execution. The container runs the OpenModal agent, which unpickles your function arguments, calls your function, pickles the result, and sends it back (see Remote function execution).

Image building¶

The Image class is a chainable Dockerfile generator. Each method call appends a line to the Dockerfile:

image = (
    openmodal.Image.debian_slim()         # FROM ubuntu:24.04 + python 3.12
    .apt_install("git", "curl")           # RUN apt-get install -y git curl
    .pip_install("torch", "transformers") # RUN pip install torch transformers
    .run_commands("echo setup done")      # RUN echo setup done
)

This generates:

FROM ubuntu:24.04
ENV DEBIAN_FRONTEND=noninteractive
RUN curl -sSL <python-build-standalone-url> | tar xz -C /usr/local ...
RUN apt-get update && apt-get install -y git curl ...
RUN pip install torch transformers
RUN echo setup done
RUN pip install openmodal
COPY your_app.py /opt/your_app.py
CMD ["python", "-m", "openmodal.runtime.agent"]

Python is installed via python-build-standalone (pre-compiled binaries from Astral). This means any Python version (3.10–3.13) works on any base image — you're not tied to the distro's Python.

Image caching¶

Images are content-hashed. If the Dockerfile and source files haven't changed, OpenModal skips the build entirely and reuses the existing image from the registry.

Sandboxes¶

Sandboxes are long-running containers you can exec commands into — like SSH-ing into a machine. They're used by coding agents (CooperBench, Harbor/SWE-bench) that need to run bash commands, edit files, and run tests inside a codebase.

sequenceDiagram
    participant Agent as Your Code
    participant K8s as K8s API
    participant Pod as Sandbox Pod

    Agent->>K8s: Sandbox.create(image=..., timeout=300)
    K8s->>Pod: Start pod running "sleep 300"

    Agent->>Pod: sandbox.exec("git diff")
    Pod-->>Agent: stdout, stderr, returncode

    Agent->>Pod: sandbox.exec("python test.py")
    Pod-->>Agent: stdout, stderr, returncode

    Agent->>K8s: sandbox.terminate()
    K8s->>Pod: Delete pod

The pod runs sleep <timeout> as its main process — this keeps the container alive while you exec commands into it. Each exec call runs a separate process inside the same container, sharing the same filesystem. Under the hood, exec_in_pod uses the Kubernetes exec API (websocket to the kubelet). On local Docker, it's just docker exec.

Default resource requests¶

Every sandbox pod requests 0.25 CPU and 256 MB RAM. This is important for autoscaling — it tells the scheduler how many pods fit on a node:

e2-standard-8 node (8 CPU, 32 GB RAM)
→ fits ~32 sandbox pods at 0.25 CPU each

OpenModal sets these defaults automatically so the autoscaler works out of the box.

Remote function execution¶

When you call f.remote(x), your arguments are serialized (pickled), sent to a pod, and the result is pickled back:

sequenceDiagram
    participant Client as Your machine
    participant Agent as Pod: openmodal agent
    participant Func as Your function

    Client->>Agent: Pickled (func_name, args, kwargs)
    Agent->>Agent: Import your module as "_user_app"
    Agent->>Agent: Unpickle args
    Agent->>Func: Call function(args, kwargs)
    Func-->>Agent: Return value
    Agent-->>Client: Pickled result

The agent registers your module as _user_app in sys.modules before unpickling. This is critical — when you pass a dataclass or Pydantic model as an argument, Python pickles it with the module path (e.g., _user_app.TrainingConfig). The agent needs that module to exist to reconstruct the object.

`f.map()` — parallel execution¶

f.map(inputs) creates one pod per input and runs them in parallel across the cluster:

graph TB
    Client[Your machine]
    Client -->|"f.map([a, b, c, d])"| Pool[ThreadPoolExecutor]
    Pool --> Pod1[Pod 1: f-a]
    Pool --> Pod2[Pod 2: f-b]
    Pool --> Pod3[Pod 3: f-c]
    Pool --> Pod4[Pod 4: f-d]
    Pod1 -.->|result| Client
    Pod2 -.->|result| Client
    Pod3 -.->|result| Client
    Pod4 -.->|result| Client

Each pod runs on potentially different nodes. Results are yielded as they complete — you don't wait for all pods to finish before getting the first result.

GPU serving and scale-to-zero¶

When you deploy a web server (e.g., vLLM), OpenModal creates a GPU pod and monitors it for idle connections. If nobody connects for scaledown_window seconds, the pod is deleted and the GPU is released.

stateDiagram-v2
    [*] --> Deployed: openmodal deploy
    Deployed --> Serving: requests arrive
    Serving --> Idle: no connections
    Idle --> Serving: new request
    Idle --> ScaledToZero: idle > scaledown_window
    ScaledToZero --> Deployed: openmodal deploy

How it works per provider¶

GCP: A CronJob runs every 60 seconds, checks active connections via a shell script, and deletes the pod if idle
AWS / Azure: KEDA (Kubernetes Event-Driven Autoscaler) watches metrics and scales the deployment to zero replicas when idle

Cost¶

State	What's running	Approximate cost
Serving requests	GPU node + pod	~$1.20/hr (H100 spot)
Idle, within scaledown window	Same	Same
Scaled to zero	Control plane + default node	~$0.10/hr
Cluster deleted	Nothing	$0

Cluster autoscaling¶

When many pods are created at once (e.g., CooperBench running 60 agents), the cluster scales up automatically.

sequenceDiagram
    participant App as Your app
    participant Sched as K8s Scheduler
    participant CA as Cluster Autoscaler
    participant Cloud as Cloud API

    App->>Sched: Create 60 pods (0.25 CPU each)
    Sched->>Sched: Existing node fits ~12
    Note over Sched: 12 Running, 48 Pending
    CA->>Cloud: 48 Pending → add 2 nodes
    Cloud-->>CA: Nodes ready (~60s)
    Sched->>Sched: Schedule remaining pods
    Note over Sched: All 60 Running

    Note over App,Cloud: Pods complete, nodes idle 5 min...
    CA->>Cloud: Remove idle nodes

OpenModal sets default resource requests (0.25 CPU, 256 MB) on every sandbox pod, so the scheduler correctly distributes pods across nodes and the autoscaler fires when needed.

Provider comparison¶

	GCP (GKE)	AWS (EKS)	Azure (AKS)
Autoscaler	GKE cluster autoscaler	Karpenter	AKS cluster autoscaler
Sandbox nodes	`e2-standard-8` pool	Karpenter picks best fit	`Standard_D8s_v5`
Max nodes	100 per zone	100 CPU limit	100
Scale-up time	~60s	~30-60s	~60-90s
GPU nodes	Separate pool per GPU type	Karpenter auto-provisions	Separate pool per GPU

Volumes¶

Volumes sync data between cloud storage and pod filesystems. No CSI drivers or IAM admin permissions needed — it uses init containers and sidecars.

sequenceDiagram
    participant Cloud as Cloud Storage
    participant Init as Init Container
    participant Main as Main Container
    participant Sidecar as Sidecar

    Note over Init,Main: Pod starts
    Init->>Cloud: Sync data down to /vol
    Init-->>Main: Done, volume ready

    Note over Main: Your code runs, reads/writes /vol

    Note over Main,Sidecar: Pod shutting down
    Sidecar->>Cloud: Sync /vol back up to cloud

All three containers (init, main, sidecar) share an emptyDir volume — an ephemeral disk on the node. The init container downloads data before your code starts. The sidecar uploads changes when the pod shuts down.

Provider	Cloud storage	Sync tool
GCP	GCS bucket	`gcloud storage rsync`
AWS	S3 bucket	`aws s3 sync`
Azure	Azure Blob	`az storage blob sync`
Local	`~/.openmodal/volumes/`	Direct bind mount

Networking¶

How your machine talks to pods differs by provider:

graph LR
    subgraph GCP
        You1[Your machine] -->|direct HTTP| PodGCP[Pod 10.x.x.x]
    end

    subgraph AWS / Azure
        You2[Your machine] -->|localhost:PORT| KPF[kubectl port-forward]
        KPF -->|tunnel| PodAWS[Pod 10.x.x.x]
    end

Provider	How	Why	Latency overhead
GCP	Direct pod IP	GKE pods get routable IPs	~0ms
AWS	`kubectl port-forward`	EKS pod IPs are VPC-internal	~100ms
Azure	`kubectl port-forward`	AKS pod IPs are VPC-internal	~100ms
Local	Container IP / `docker exec`	Docker bridge network	~0ms

This matters for web servers (vLLM, FastAPI). For sandboxes, all providers use the K8s exec API which has similar latency everywhere.

Provider abstraction¶

All providers implement the same CloudProvider interface. Your code never touches the provider directly.

classDiagram
    class CloudProvider {
        <<abstract>>
        +create_instance(spec, image_uri, name)
        +delete_instance(name)
        +create_sandbox_pod(name, image, timeout, gpu, cpu, memory)
        +exec_in_pod(pod_name, *args)
        +build_image(dockerfile_dir, name, tag)
        +copy_to_pod(pod_name, local, remote)
        +copy_from_pod(pod_name, remote, local)
        +ensure_volume(name)
        +stream_logs(instance_name)
    }

    CloudProvider <|-- GKEProvider
    CloudProvider <|-- EKSProvider
    CloudProvider <|-- AKSProvider
    CloudProvider <|-- LocalProvider

The provider is selected by:

CLI flag: --local, --aws, --azure (GCP is the default)
Environment variable: OPENMODAL_PROVIDER=local|gcp|aws|azure

Switching providers changes where your code runs, not how you write it.