feat: stop/start and autoscale previews to 0#6488
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
Preview deployment for your docs. Learn more about Mintlify Previews.
|
There was a problem hiding this comment.
Important
The destructive desired_state enum change ships without a data-backfill migration, and existing rows almost certainly contain standby. Please confirm the production migration plan before merging — details below. Everything else is minor.
Reviewed changes — full review of the preview stop/start + idle-to-zero feature: enum consolidation, the relocated every-minute idle cron, branch-displacement spin-down, and the manual stop/wake stack across Restate, connect RPC, tRPC, and dashboard.
- Consolidate
desired_statetorunning/stopped— dropsstandbyandarchivedfrom the MySQL enum, the Drizzle schema, and the proto state machine (ChangeDesiredState), and points all callers at the singleSTOPPEDvalue. - Relocate idle scaledown to a cron handler — moves the scan from the deploy worker into
idlepreview/handler.go, lowers the idle window to 1h, and schedules it every minute as a singleton-keyedCronServiceobject. - Spin down previous branch deployments — when a preview becomes ready,
spinDownPreviousDeploymentsschedules sibling running deployments on the same branch to stop after a 1-minute grace via the newListRunningDeploymentsByBranchquery. - Manual stop/wake — adds
StopDeployment/WakeDeploymentRestate handlers, public connect RPCs, tRPC routers with audit logging, UI dialogs, andcanStop/canWakeeligibility; wake polls instance health inline until ready orregionReadyTimeout. - Add
OverwritetoScheduleDesiredStateChange— lets explicit user intent (swapLiveDeployment, manual stop) replace a pending transition while the idle cron and branch spin-down yield withOverwrite:false.
⚠️ Enum drop has no data-backfill migration and standby was in active use
The desired_state enum is narrowed from three values to two, but there is no migration in the PR to rewrite existing rows. The description states neither old value "was ever used", which does not hold for standby: the previous swapLiveDeployment, promote_handler, and the old idle cron all scheduled demoted deployments to STANDBY, so any production database that has ever promoted or rolled back a deployment has live standby rows.
Dropping an enum member while such rows exist is unsafe: in MySQL strict mode the ALTER fails outright, and in non-strict mode out-of-range values are silently coerced to the empty string '', which then fails every Go enum scan on read.
Technical details
# Enum drop has no data-backfill migration
## Affected sites
- `pkg/mysql/schema/deployments.sql:21` — enum narrowed to `('running','stopped')`. This is the declarative dev schema (applied fresh via `dev/Dockerfile.mysql` initdb), so dev is fine; production is not covered by any in-repo migration mechanism.
- `web/internal/db/src/schema/deployments.ts:60` — Drizzle schema updated to match, no accompanying migration file.
- `svc/ctrl/worker/deploy/deploy_handler.go` (was `STANDBY` in `swapLiveDeployment`), `svc/ctrl/worker/deploy/promote_handler.go:160` (was `STANDBY`) — prior code that wrote `standby` to demoted deployments, confirming the value is populated in prod.
## Required outcome
- A forward migration runs `UPDATE deployments SET desired_state = 'stopped' WHERE desired_state IN ('standby','archived')` BEFORE the `ALTER TABLE ... MODIFY COLUMN desired_state ENUM('running','stopped') ...`, and the code that reads/writes `stopped` is sequenced after the `ALTER` lands.
## Open questions for the human
- How are production MySQL schema changes applied for this repo (the schema dir is declarative for dev only)? Whoever owns that path needs the backfill + ALTER ordering above.
- Confirm whether `archived` was genuinely never written; even if so, `standby` clearly was.Claude Opus | 𝕏
There was a problem hiding this comment.
Happy path works but I have some questions.
When we make a new production deployment, the old deployment can become stopped at swapLiveDeployment. But in wake.go and wake.ts we don't guard against production, and in getDeploymentActionEligibility we don't guard against production either. So technically we can wake a production deployment, and that might disrupt the system?
| } | ||
| return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to load deployment: %w", err)) | ||
| } | ||
|
|
There was a problem hiding this comment.
nit:
assert.Equal(deployment.Status, db.DeploymentsStatusReady, "deployment is not running"),
assert.Equal(deployment.DesiredState, db.DeploymentsDesiredStateRunning, "deployment is not running"),
what if we make those checks first so we don't have to run additional query for no reason? If its not running there is no point of looking up for environment, right?
| Request( | ||
| &hydrav1.ScheduleDesiredStateChangeRequest{ | ||
| DelayMillis: 0, | ||
| State: hydrav1.DeploymentDesiredState_DEPLOYMENT_DESIRED_STATE_RUNNING, |
There was a problem hiding this comment.
Shouldn't we set overwrite: true here so wake overrides the previous pending transition? Right now it defaults to false, so wake no-ops if something is already pending (e.g. a cron stop), which means the wake might not actually win. Stop already uses overwrite: true

Previously preview deployments would stick around for a very long time, 6 hours of no-requests were necessary for our cron job to stop them. That meant we were paying for a lot of idle compute, especially for workspaces who push code frequently.
This changes a few things:
standbyandarchivedintostopped. Neither one was ever used.CleanShot 2026-06-19 at 07.14.42.mp4
We'll also do wake-on-request soon, but it would've made this PR significantly larger and I wanted to prioritize shipping this.