Operations
After the deploy works, here’s what you need to know to run it.
Custom domains
Section titled “Custom domains”Routes are declared in each wrangler.jsonc:
"routes": [ { "pattern": "app.yourdomain.com", "custom_domain": true }]When you deploy with custom_domain: true, Cloudflare:
- Auto-creates a DNS record on the zone.
- Provisions a TLS cert (~1 min on first deploy).
- Routes traffic to the Worker.
If the cert doesn’t provision, the zone may not be fully on Cloudflare. Fix the zone, redeploy.
For staging, the [env.staging] blocks point at *.staging.yourdomain.com — same mechanism.
Observability
Section titled “Observability”Every Worker has observability: { enabled: true } set, so logs ship to Cloudflare Logs by default. View live with:
npx wrangler tail managed-agentsnpx wrangler tail openma-agentnpx wrangler tail managed-agents-integrationsLogs are structured JSON for things you’d want to query (silent catches, integration dispatches). Plain console.log for ad-hoc.
Analytics Engine
Section titled “Analytics Engine”The platform writes to an Analytics Engine dataset (oma_events) for high-cardinality events:
- Silent catches (caught errors that didn’t surface to the user)
- Session lifecycle transitions
- Tool call success/failure
- Integration dispatches
Query from the dashboard or with the GraphQL API. Useful for “are silent failures spiking?” and “which tool errors most often?”.
Metrics
Section titled “Metrics”Workers metrics (requests, CPU time, errors) are in the Cloudflare dashboard per Worker. DO metrics (active instances, storage) are under Durable Objects.
Multi-tenancy
Section titled “Multi-tenancy”By default, openma is multi-tenant on a single D1. Every user gets a tenant_id row; every query is scoped by it. Adequate for most teams.
The codebase has two distinct mechanisms for stricter isolation; they solve different problems:
Per-tenant DB isolation (rare)
Section titled “Per-tenant DB isolation (rare)”For B2B SaaS use cases where each customer needs its own D1 (compliance, “noisy-neighbor” insulation, per-customer backups):
npx wrangler secret put PER_TENANT_DB_ENABLED # set to "true"npx wrangler secret put STORE_BACKENDS # JSON config of backendsSee packages/storage/README.md for the full config schema. This is independent of the multi-shard scaling below.
Multi-shard scaling
Section titled “Multi-shard scaling”If your deployment outgrows a single D1 (D1 has a 10 GB / 50K-row limits per database), you can fan out across N MAIN_DB shards. The shard router lives in ROUTER_DB; per-tenant data lands on whichever shard tenant_shard says owns the tenant.
Activation is config-only — the wrangler files ship with env.production overlays that bind 4 sharded auth DBs (used by openma.dev’s hosted prod). To enable for your own deployment:
- Provision the additional D1s:
wrangler d1 create openma-router,wrangler d1 create openma-auth-shard-01..03 - Patch the IDs into
env.productionblocks ofapps/{main,agent,integrations}/wrangler.jsonc(replacing openma.dev’s IDs) - Apply migrations to each shard:
Terminal window wrangler d1 migrations apply openma-auth-shard-01 --remote --config apps/main/wrangler.jsoncwrangler d1 migrations apply openma-auth-shard-02 --remote --config apps/main/wrangler.jsoncwrangler d1 migrations apply openma-auth-shard-03 --remote --config apps/main/wrangler.jsoncwrangler d1 migrations apply openma-router --remote --config apps/main/wrangler.jsonc - Seed the shard pool:
Terminal window wrangler d1 execute openma-router --remote --command \"INSERT INTO shard_pool (binding_name, status, tenant_count) VALUES \('AUTH_DB_00', 'open', 0), \('AUTH_DB_01', 'open', 0), \('AUTH_DB_02', 'open', 0), \('AUTH_DB_03', 'open', 0);" - Deploy with
--env production:Terminal window npx wrangler deploy --config apps/main/wrangler.jsonc --env production# repeat for agent, integrations
Auto-detection in packages/services/src/index.ts:buildCfTenantDbProvider activates the multi-shard router as soon as AUTH_DB_01 is bound. New signups are assigned to the least-loaded open shard.
Existing tenants stay on their assigned shard; there’s no automatic rebalancing tool today.
Database backups
Section titled “Database backups”D1 supports point-in-time recovery on Workers Paid. Enable in the Cloudflare dashboard → D1 → your database → Backups.
R2 has versioning; enable per bucket in the dashboard if your skill files are mutable and you want history.
Upgrades
Section titled “Upgrades”Pull main, install, redeploy:
git pullpnpm installnpx wrangler d1 migrations apply openma-auth --remote --config apps/main/wrangler.jsoncnpx wrangler d1 migrations apply openma-integrations --remote --config apps/main/wrangler.jsoncnpx wrangler deploy --config apps/main/wrangler.jsoncnpx wrangler deploy --config apps/agent/wrangler.jsoncnpx wrangler deploy --config apps/integrations/wrangler.jsoncMigrations are forward-only and additive by convention; rollbacks are not supported. Test against staging first if you have one.
Migration baseline (one-time)
Section titled “Migration baseline (one-time)”The migration history was squashed to a single 0001_consolidated.sql per D1. Older deploys that already ran the historical 0001-0017 files have those filenames stamped in d1_migrations. To stop wrangler from re-applying the new consolidated file on existing deployments, run once:
./scripts/stamp-baseline-existing-deploy.sh production# or: ./scripts/stamp-baseline-existing-deploy.sh stagingThe script INSERTs the consolidated filename into each D1’s d1_migrations table so wrangler treats it as already applied. Safe to re-run; uses INSERT OR IGNORE.
A fresh self-hoster does NOT run this — setup-cf.sh applies the consolidated file as the very first migration, and wrangler stamps it normally.
The historical migration files are kept under apps/main/migrations/_archive/ (and same for migrations-integrations/, migrations-router/) for git-blame reference. They’re not applied.
Sandbox isolation
Section titled “Sandbox isolation”Every session gets its own Cloudflare Container instance via the SANDBOX Durable Object class. Sessions never share container state. The container’s outbound network goes through openma’s egress proxy; secrets (Vault entries) are injected per-host so the model never sees raw credentials.
If you want to restrict outbound destinations, edit apps/agent/src/sandbox/proxy.ts to add allowlist rules.
Common troubleshooting
Section titled “Common troubleshooting””1010 Cloudflare Browser Integrity” on POST
Section titled “”1010 Cloudflare Browser Integrity” on POST”CF Bot Fight Mode is rejecting the request. Either:
- Add a browser-shaped User-Agent header on your client, or
- Lower Bot Fight Mode for the affected hostname in the dashboard.
This bites self-hosters using curl from automation against *.openma.dev defaults; the User-Agent path is recorded in the project memory for hosted use.
”Migration failed” on first deploy
Section titled “”Migration failed” on first deploy”Make sure you ran wrangler d1 migrations apply openma-auth --remote before pnpm deploy. The deploy script doesn’t re-apply migrations.
Container fails to start
Section titled “Container fails to start”Check wrangler tail openma-agent for the actual error. Common causes:
- The base image listed in your Environment doesn’t exist or isn’t pullable.
- A package in the Environment’s install list doesn’t exist (typo’d
pippackage). - You’re past the Containers free quota — upgrade in the dashboard.
OAuth callback 404
Section titled “OAuth callback 404”You probably set the redirect URL on the third-party to a host that doesn’t match your apps/integrations/wrangler.jsonc → routes. They must match exactly.
Webhook signature verification fails
Section titled “Webhook signature verification fails”Your LINEAR_WEBHOOK_SECRET / GITHUB_WEBHOOK_SECRET / SLACK_SIGNING_SECRET is wrong, or wasn’t set on the integrations Worker. Verify with:
npx wrangler secret list -c apps/integrations/wrangler.jsoncKnown long-tail items
Section titled “Known long-tail items”These aren’t bugs in the “user-visible breakage” sense, but they’re worth having on the radar so a long-running deploy doesn’t drift into one.
SQL growth inside SessionDO
Section titled “SQL growth inside SessionDO”The DO writes one row to the events table per trajectory event. Events
larger than 500 KB spill to the R2 FILES_BUCKET with an _spilled
reference, but small events accumulate in SQL forever — there’s no
pruning. CF’s Durable Object SQLite quota is 10 GB per DO, which leaves
room for roughly 10M events at ~1 KB each. A single session would have
to run ~100 000 turns to come close. Same shape applies to the streams
table: finalize flips the row’s status, doesn’t delete it. Quantity
is small (one row per LLM call) but it’s also unbounded.
If you run multi-day persistent sessions in production, plan a periodic
prune of events rows older than the retention window you actually
need, and configure an R2 lifecycle rule on managed-agents-files so
spilled blobs from extinct sessions don’t sit in storage forever.
Container OOM
Section titled “Container OOM”Sandbox containers default to instance_type: standard-1 (1 vCPU, 4 GB
RAM, 4 GB disk). Heavy pip install of ML libraries, large data files
loaded into pandas, or anything that mmap’s a multi-GB blob will OOM.
The container exits, which triggers onActivityExpired →
snapshotWorkspaceNow → super.onActivityExpired → graceful stop;
the next session warmup picks up the most recent backup.
If your workload routinely OOMs, bump instance_type to standard-2
(2 vCPU, 8 GB RAM) or standard-4 (4 vCPU, 16 GB RAM) in
apps/agent/wrangler.template.jsonc. Watch the containers dataset
in observability for Container exited (code: 137) (SIGKILL, usually
OOM) vs Container exited (code: 0) (graceful sleepAfter).
R2 bucket lifecycles
Section titled “R2 bucket lifecycles”The managed-agents-backups bucket’s per-object TTL is set by the SDK
at upload time (7 days by default — BACKUP_TTL_SEC in
apps/agent/src/oma-sandbox.ts), so old backups age out cleanly.
managed-agents-files (event spillover, vault audit blobs) and
managed-agents-memory (memory-store content) have no automatic TTL.
For long-running deploys, attach an R2 lifecycle rule with whatever
retention you actually need — Cloudflare’s R2 dashboard lets you
configure these per-bucket without code changes.
LLM stream chunks > 500 KB
Section titled “LLM stream chunks > 500 KB”A single chunk that crosses the spillover threshold ends up in R2
rather than SQL, which is fine, but the chunks_json array still has
to fit in one row. If a chunk ever exceeds the SQLite row size limit
(~1 GB on Workers) the stream finalize will fail. We’ve never seen
this in practice — typical model output is line-by-line — but if you
build a tool that streams huge blobs through the LLM, slice them.