By Grey Newell, CTO at Supermodel

For engineers evaluating async job architectures or considering similar trade-offs. Our API processes customer codebases with tree-sitter and LLMs to produce structural graphs. The original prototype took 10-15 minutes per synchronous request. We redesigned the entire API to scale asynchronously in one calendar week (implementation and initial production deployment). Here is how the architecture works, why we made each decision, and what we traded off.

Business challenge

Our prototype ran 2 runtimes (JVM + Node.js) in 1 Docker container, sharing a single synchronous HTTP request path.

A concrete failure scenario: a client sends POST /v1/graphs/call with a 50MB zip archive. 8 minutes into processing, the client's load balancer times out. The work is lost. The client retries. Now 2 copies of the job run simultaneously, consuming 2x compute, and the client has no way to deduplicate or retrieve the first result.

Design tenets

Four principles. The team shipped the redesign in one calendar week, so each principle had to be simple to implement.

Solution overview

Two independently deployed runtimes connected by a shared Postgres database (Citus) and Azure Blob Storage container.

flowchart LR
    subgraph client [Client]
        SDK["SupermodelClient\n(TypeScript SDK)"]
    end

    subgraph controlPlane ["Control Plane (Java / Spring Boot)"]
        Auth["API Key Validation\n+ Subscription Check"]
        JobCreate["Job Creation\n+ Idempotency"]
        Poll["Poll Handler\n(200 / 202)"]
    end

    subgraph sharedInfra ["Shared Infrastructure"]
        DB[("Postgres (Citus)\njobs table")]
        Blob[("Azure Blob\nzip payloads")]
    end

    subgraph dataPlane ["Data Plane (TypeScript / Node.js)"]
        Worker["Job Worker\n(poll loop)"]
        TreeSitter["Tree-sitter Parser"]
        LLM["LLM Service\n(OpenRouter)"]
    end

    SDK -->|"POST /v1/graphs/call\n+ Idempotency-Key"| Auth
    Auth --> JobCreate
    JobCreate -->|"INSERT status=pending"| DB
    JobCreate -->|"upload zip"| Blob
    SDK -->|"re-POST same key\n(poll)"| Poll
    Poll -->|"SELECT job"| DB

    Worker -->|"UPDATE SET status=processing\nFOR UPDATE SKIP LOCKED"| DB
    Worker -->|"download zip"| Blob
    Worker --> TreeSitter
    Worker --> LLM
    Worker -->|"UPDATE SET status=completed\nresult = jsonb"| DB

Why Postgres instead of Kafka, SQS, or Redis?

We use Azure Cosmos DB for PostgreSQL (Citus). Coordinator-only today (node_count=0). Distribution key user_id is set on 4 tables; workers can be added without schema migration. No cross-shard queries. The jobs table with FOR UPDATE SKIP LOCKED gives you exactly-once claiming, transactional state updates, and zero new infrastructure. Trade-offs: 1-30 second polling latency vs. sub-second push from a dedicated broker; table growth and VACUUM; no built-in backpressure. For our workload (minutes per job, moderate volume), acceptable.

Why no separate /jobs/{id}/status endpoint?

The client re-POSTs the same request with the same idempotency key. The server returns the existing job. Polling is submission. Trade-off: less discoverable than a dedicated status endpoint.

Why blob storage for payloads?

Zip archives are too large for a database column. The control plane uploads to Azure Blob, stores the URL in the job row, the data plane downloads it, and the blob is deleted after processing.

The job lifecycle

A single request, start to finish.

Phase 1: Submission (control plane, ~50ms)

  1. API key validation (Caffeine cache, 10k entries, 5-min TTL) and HMAC verification with constant-time comparison.

  2. Subscription status check (Caffeine cache, 10k entries, 30-second TTL).

  3. JobService.getOrCreateJob() queries by (idempotency_key, user_id, api_key_id). If found: returns the existing job with 0 new work. If not found: computes SHA-256 of the zip, uploads to Azure Blob, inserts a row with status='pending' and blob_expires_at = now + 1 hour.

  4. Returns HTTP 202 Accepted with Retry-After: 10 (configurable per operation, default 10 seconds).

Input validation: Zip size limit 500MB (multipart). Path traversal in zips is blocked. No zip-bomb or nested-archive validation.

The idempotent job creation logic. JobRepository.save() catches unique violation (23505) and returns null; no exception thrown.

// JobService.getOrCreateJob (simplified)
Optional<Job> existing = jobRepository.findByIdempotencyKey(idempotencyKey, userId, apiKeyId);
if (existing.isPresent()) return existing.get();

blobConnector.uploadZip(jobId, fileBytes);
UUID savedId = jobRepository.save(...);

if (savedId == null) {
    // Unique constraint: concurrent request won. Delete our blob, return theirs.
    blobConnector.deleteZip(jobId);
    return jobRepository.findByIdempotencyKey(idempotencyKey, userId, apiKeyId).orElseThrow();
}
return jobRepository.findById(savedId).orElseThrow();

Two concurrent requests with the same key both upload and attempt INSERT. One wins. The loser catches the unique constraint violation, deletes its blob, returns the winner's job. If the loser's deleteZip() fails (e.g. blob timeout), the blob is orphaned (no job row). Azure lifecycle deletes it within 24 hours. No dedicated orphan-cleanup job.

Phase 2: Processing (data plane, seconds to minutes)

  1. JobWorkerService.pollLoop() polls every 1-30 seconds (adaptive: x2 backoff on failure, halves after 3 consecutive successes).

  2. findPendingJobs() atomically claims work:

UPDATE jobs SET status = 'processing', started_at = NOW()
WHERE id IN (
    SELECT id FROM jobs
    WHERE status = 'pending' AND blob_expires_at > NOW()
    ORDER BY created_at ASC LIMIT $1
    FOR UPDATE SKIP LOCKED
)
RETURNING *;

Up to 4 jobs claimed per cycle (SUPERMODEL_JOB_CONCURRENCY). FOR UPDATE SKIP LOCKED means multiple replicas poll concurrently with zero contention; each claims different rows. Jobs cannot run longer than 30 minutes (zombie reaper). Blob TTL (60 min) only affects pending jobs; once claimed, the data plane deletes the blob on completion.

  1. Downloads zip from blob (3 retries, 1s initial delay, 10s max). Extracts via ZipHydratorService to a temp directory.

  2. Parses with tree-sitter. Calls LLMs via OpenRouter if needed.

  3. Writes the result: UPDATE jobs SET status = 'completed', result = $1::jsonb, blob_url = NULL, ... WHERE id = $3. DB write retries: 3 attempts, 500ms initial delay, 5s max, exponential backoff with 0-50% jitter.

  4. Deletes blob and temp directory in the finally block.

Phase 3: Retrieval (client polls, ~50ms per poll)

The client re-POSTs with the same Idempotency-Key. The control plane loads the job and branches:

if (job.isCompleted()) return ResponseEntity.ok(response.withResult(...));
if (job.isFailed()) return ResponseEntity.ok(response.withError(...));
return ResponseEntity.status(202).header("Retry-After", String.valueOf(retryAfter)).body(response);

completed = 200 OK with result. failed = 200 OK with error. Anything else = 202 Accepted with Retry-After.

The client SDK reads retryAfter, sleeps, re-posts:

while (attempt < maxPollingAttempts && !timedOut) {
    const response = await apiCall();
    if (response.status === 'completed') return response.result;
    if (response.status === 'failed') throw new JobFailedError(...);
    await sleep((response.retryAfter ?? 10) * 1000);
}

SDK uses 15-minute default timeout, 90 max attempts, and falls back to 10s if retryAfter is missing. The caller sees none of this. They call client.generateCallGraph(file, { idempotencyKey }) and get a result.

Idempotency in detail

Idempotency-Key is required for all data plane requests, enforced by ApiKeyAuthFilter. The SDK generates one via crypto.randomUUID().

The jobs table enforces uniqueness: UNIQUE(idempotency_key, user_id, api_key_id) (with partial indexes for NULL api_key_id in bearer-token auth). Same key + same user + same API key returns the existing job regardless of request content. We do not validate that the zip hash matches; first submission wins.

There is no /jobs/{id}/status endpoint. The client re-POSTs with the same idempotency key to poll (1 endpoint, 1 code path, 1 auth check per poll). Every poll re-validates the API key and subscription status. Revocation invalidates cache on the instance that processes the revoke; other replicas may serve cached entries for up to 5 minutes (cache TTL).

How we process code without retention

Where does your code go, and when is it deleted?

Customer source code is deleted from every storage layer after processing. 4 independent cleanup mechanisms cover crash scenarios. Worst-case retention: 60 minutes (blob hard TTL). Typical: seconds.

flowchart TD
    subgraph upload ["1. Upload (Control Plane)"]
        CP["Client uploads zip"]
        Blob["Blob Storage\n(TTL: 1 hour)"]
        DB_pending["jobs.blob_url = url\njobs.blob_expires_at = now + 1h"]
    end

    subgraph process ["2. Process (Data Plane)"]
        Download["Download zip from blob"]
        Extract["Extract to temp dir\n/__processing/repoId-uuid/"]
        Parse["Tree-sitter parse\n(in-memory graph)"]
    end

    subgraph cleanup ["3. Cleanup (immediate)"]
        MarkDone["markCompleted/markFailed\nblob_url = NULL"]
        DeleteBlob["deleteBlob: jobId.zip"]
        DeleteDisk["hydration.cleanup:\nfs.remove targetDir"]
    end

    subgraph expire ["4. Expiry (scheduled)"]
        JobCleanup["JobCleanupService\ndeletes expired rows"]
        ZombieReap["Zombie reaper\nmarks stuck jobs failed"]
        BlobExpiry["Expired pending jobs\nmarked failed"]
    end

    CP --> Blob --> DB_pending
    DB_pending --> Download --> Extract --> Parse
    Parse --> MarkDone
    MarkDone --> DeleteBlob
    MarkDone --> DeleteDisk
    DeleteBlob --> JobCleanup
    DeleteDisk --> ZombieReap
    ZombieReap --> BlobExpiry

Layer 1: Blob storage

The 60-minute TTL is enforced by application logic (blob_expires_at, expired-pending cleanup). Azure Blob lifecycle policy is a 24-hour safety net (staging/production); Azure does not support hour-level TTL natively. On upload, blob_expires_at = NOW() + 1 hour. On completion or failure, the data plane deletes the blob (3 retries, 1s initial delay, 10s max) and sets blob_url = NULL.

If the DB write fails, the blob is intentionally preserved so the zombie reaper can identify the orphaned job: if (statusRecorded) await deleteBlob(...); in the finally block.

Layer 2: Disk

Extracted files live in /__processing/{repoId}-{uuid}/. cleanup() runs fs.remove(targetDir) in the finally block on all paths. RETAIN_HYDRATED_REPOS exists only for local development; not set in any deployed environment.

Layer 3: Database

The result column contains JSONB structural metadata (file paths, function signatures, dependency edges, line numbers), not source code. File paths and function signatures may expose repo structure; we do not store source code. No application-level limit on JSONB result size; Postgres TOAST applies. Large graphs may impact WAL and backups.

Completed jobs: deleted after 24 hours. Failed jobs: deleted after 7 days. JobCleanupService runs hourly: @Scheduled(cron = "0 0 * * * ?") calls jobRepository.deleteExpired(). Scheduled jobs run on every control-plane replica. No distributed lock (e.g. ShedLock). Cleanup is idempotent; usage rollups use ON CONFLICT for deduplication.

Layer 4: Defense-in-depth

3 independent mechanisms catch anything the primary cleanup misses:

Zombie reaper (data plane, every poll cycle). Marks jobs stuck in processing for > 30 minutes as failed, sets blob_url = NULL: UPDATE jobs SET status = 'failed', blob_url = NULL WHERE status = 'processing' AND started_at <= $1.

Expired pending cleanup (control plane, every 60 seconds via @Scheduled(cron = "0 * * * * ?")). Marks pending jobs with blob_expires_at <= NOW() as failed.

Expired pending cleanup (data plane, every poll cycle). Runs the same query, independently, in case the control plane is down.

The data plane has 0 ingress (ingress_enabled = false in Azure Container Apps). It receives 0 external HTTP requests. It pulls work from the database, processes in memory, writes structural metadata back, and deletes all source code artifacts. Stateless by construction, not by policy.

Retention summary

Artifact Typical retention Worst-case retention Cleanup mechanism
Blob (customer zip) Seconds (deleted on job completion) 60 minutes (hard TTL) deleteBlob() + expired pending cleanup
Extracted files on disk Seconds (deleted in finally block) Container lifetime (crash = container replaced) hydration.cleanup() via fs.remove()
blob_url pointer in DB Seconds (NULLed on completion/failure) 30 minutes (zombie reaper threshold) markCompleted/markFailed SQL
Job result (structural metadata, not source code) 24 hours (completed) / 7 days (failed) Same JobCleanupService.cleanupExpiredJobs() hourly
Orphan/zombie jobs 30 minutes 60 minutes Zombie reaper + expired pending cleanup (both planes)

Failure modes and automated recovery

Failure Automated response Recovery time
Client disconnects mid-poll Job continues processing. Client re-POSTs same key to retrieve result. 0; job is unaffected
Data plane container crashes Zombie reaper marks jobs in processing > 30 min as failed, clears blob_url. Container orchestrator (ACA) restarts the replica. Client may wait up to 30 minutes to learn of a failed job. 30 minutes (zombie threshold)
markFailed DB write fails after crash Blob is intentionally preserved. Zombie reaper catches the orphan on next poll cycle. 30 minutes
Blob expires before processing starts findPendingJobs skips jobs with blob_expires_at <= NOW(). Both planes independently mark them failed. 60 minutes (blob TTL)
Postgres transient failure (connection reset, 53xxx) Both planes retry: 3 attempts, 500ms initial delay, 5s max, exponential backoff with 0-50% jitter. Seconds
Blob storage transient failure (timeout, 5xx) 3 retries, 1s initial delay, 10s max. 404 (BlobNotFoundError) is not retried. Seconds
Concurrent duplicate submissions Unique constraint on (idempotency_key, user_id, api_key_id). Loser deletes its blob, returns winner's job. 0; no duplicate work
API key revoked during processing Job completes (work already started), but next poll re-validates the key and returns 401 Unauthorized. Revoked key cannot retrieve results. Immediate
Control plane down No new jobs, no polling. Data plane continues processing. Until control plane recovers

0 of these failure modes require manual intervention. 0 of them result in customer code persisting beyond the cleanup window. This applies to the failures listed; Postgres outage beyond retry window, control-plane outage, and schema migrations may require manual intervention. See Limitations.

Separation of concerns

Responsibility Control Plane (Java/Spring Boot) Data Plane (TypeScript/Node.js)
API key validation (HMAC + Caffeine cache) Yes No
Subscription enforcement (Stripe) Yes No
Usage metering and billing Yes No
Job creation and idempotency Yes No
HTTP ingress (public endpoint) Yes (port 8080) No (ingress_enabled = false)
Tree-sitter parsing No Yes
LLM calls (OpenRouter, Google AI) No Yes
Graph construction (in-memory) No Yes
Job claiming (FOR UPDATE SKIP LOCKED) No Yes
Blob download/deletion No Yes

Shared interface: 1 Postgres database (jobs table) + 1 Azure Blob container (job-payloads, naming {jobId}.zip). That is the entire contract. 0 RPC calls. 0 shared code. 0 protobuf schemas. Each runtime has its own Dockerfile, CI pipeline, and Azure Container App.

Observability

Application Insights (control and data plane). Data plane emits job events (completed/failed), duration, success/failure counts, retries, poll interval. Structured JSON logging with correlation IDs. No OpenTelemetry; no Prometheus/CloudWatch. We do not publish formal SLOs. Control-plane target: sub-100ms P99 for auth and job creation.

Deployment

Single revision, 100% traffic. No blue/green or canary. In-flight jobs in a crashed container are failed by the zombie reaper after 30 minutes.

Why Java for the control plane?

The control plane does not process code. It validates keys, checks subscriptions, creates database rows, and returns HTTP responses. Java/Spring Boot is built for this.

We use OpenAPI-first code generation (same spec generates the TypeScript SDK), Spring Security filter chain (OAuth2, API key auth, CSRF, Stripe webhooks), @Scheduled cron jobs (4 tasks: expired cleanup, job deletion, usage reports), AOP for usage metering and scope checks, and Caffeine caches for API keys and subscription status. The data plane is TypeScript because tree-sitter ships Node.js bindings and the work is I/O-bound. We had a Spring Boot veteran on the team from day one. Each runtime handles what it is best at. 64 source files, ~8,200 lines (significant portion generated). Every request <100ms.

Scaling constraints and trade-offs

Parameter Default Configurable via Notes
Poll interval (base) 1,000ms SUPERMODEL_JOB_POLL_INTERVAL_MS Minimum latency between job creation and pickup
Poll interval (max) 30,000ms SUPERMODEL_JOB_MAX_POLL_INTERVAL_MS Reached after consecutive failures (x2 backoff)
Poll recovery Halves after 3 successes Hardcoded Returns to base interval
Concurrency per replica 4 jobs SUPERMODEL_JOB_CONCURRENCY More replicas = linear scale
Blob TTL 1 hour SUPERMODEL_JOB_TTL_BLOB_HOURS Unprocessed jobs fail after this
Completed job TTL 24 hours SUPERMODEL_JOB_TTL_COMPLETED_HOURS Client must retrieve results within this window
Failed job TTL 7 days SUPERMODEL_JOB_TTL_FAILED_DAYS For debugging and support
Zombie threshold 30 minutes Hardcoded Jobs in processing longer than this are failed
API key cache 10,000 keys, 5-min TTL Hardcoded Caffeine in-process cache
Subscription cache 10,000 entries, 30-sec TTL Hardcoded Caffeine in-process cache
DB retries 3 attempts, 500ms-5s backoff Hardcoded Exponential + 0-50% jitter
Blob retries 3 attempts, 1s-10s backoff Hardcoded Exponential + 0-50% jitter
Citus distribution user_id on 4 tables Schema-level Coordinator-only (node_count=0); workers addable without migration
Client Retry-After 10 seconds Per-operation in application.properties Configurable per graph type

Trade-off. Polling adds 1-30 seconds of latency between job completion and client retrieval. For jobs that take minutes, negligible. Sub-second notification would require WebSocket or SSE.

Limitations and future work

Conclusion

Metric Before After
Peak concurrent jobs 1 (synchronous) N replicas x 4 jobs each (production: 2-10 data-plane replicas). Throughput depends on job duration.
Client connection requirement Hold open 10-15 minutes Single POST + periodic polls (~50ms each)
Duplicate work on retry 100% (new job every time) 0% (idempotency key deduplication)
Infrastructure components added N/A 0 (reused existing Postgres + Blob)
Message brokers N/A 0
Customer code worst-case retention Indefinite (container lifetime) 60 minutes (blob TTL)
Customer code typical retention Container lifetime Seconds
Manual intervention for failures Required 0 for the failure modes listed above; see Limitations.
Time to build N/A 1 calendar week (implementation and initial production deployment)

What we would change


For questions about our architecture or API, contact engineers@supermodeltools.com or visit supermodeltools.com.