Versioning, Deployment Isolation & Upgrades #370
Replies: 4 comments 9 replies
-
|
How would this work for users who are looking to change out only parts of the World implementation? I want to use workflows in a NextJS app deployed to Vercel, but due to various constraints I need to ensure data is stored in AWS (RDS/SQS). I have currently implemented a world which handles the existing interfaces for storage and queues, but am unsure of how I would have to implement these changes such that it can pull the correct info (e.g. deployment ID and build manifest) and route runs to the correct code. Separately, in response to the open questions:
|
Beta Was this translation helpful? Give feedback.
-
|
What are the exact semantics when upgrading a workflow that is executing "mid-step", or waiting for webhook? that also brings me to question is there a defined set of "safe pause points" where an upgrade is allowed? I'm a bit confused on how step 4 of "cancel-and-restart" works |
Beta Was this translation helpful? Give feedback.
-
|
during an upgrade, how does rehydrating the args of a workflow function work? the RFC says "cancel-and-restart" uses the "same inputs", but doesn't define how input shape compatibility is checked. for eg., what happens if the workflow function signature changes between versions, such as adding a new parameter or reordering existing ones? |
Beta Was this translation helpful? Give feedback.
-
|
Not sure if I missed something but how does a World know that sunsetting a version is safe and won't strand runs that still depend on it (hooks, and long sleeps)? Your single VM example suggests that a user can sunset a workflow version manually (maybe it is inherently a --force operation but that's not clear either)
Will there be an API through the WDK, CLI, or the observability web app to confirm that no active runs are still tied to that versionId? Also to follow up on this:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
To make good on its promise for durability and reliability, any production workflow should be able to:
Today, versioning reliability already works for Vercel deployments, but not for other worlds. This RFC and its implementation will allow any World to support it. Upgrading workflows is currently not supported for any world, until this RFC is implemented.
Interface changes
A Version loosely represents a (current or future) deployment of a workflow runtime bundle. Worlds may have very different implementations of versions internally, but
from a consumer perspective, the changes to the interface are minimal.
World Interface
Versions are mainly managed through Storage interactions. The storage interface will be extended to support CRUD operations for a new
Versionentity.Queue Changes
Detecting the current version has moved to the storage layer:
Current:
world.queue.getDeploymentId()→stringNew:
world.versions.getCurrent()→VersionEnqueuing messages already takes a
deploymentIdcurrently, which will be renamed toversionId, but will otherwise be the same.Overall, the queue interface remains unchanged, but with the addition of versioning,
a World can now choose to perform additional checks for queueing. A World may choose to:
Runtime Changes
The main new addition to the runtime is the ability to upgrade a running workflow to a new version of the code.
Upgrade API
cancel-and-restart:
cancel-and-resume:
cancel-and-partial-resume:
Helpers
Functions used to provide the upgrade functionality will be exposed as helpers on the runtime:
getManifest()to return the JSON Manifest that was included in the bundle code that is currently running.assertInputCompatibility()to validate that the input of a run is compatible with a new version of the code, asserts viability for "cancel-and-restart"assertEventLogCompatibility()to validate that the event log of a run is compatible with a new version of the code, asserts viability for "cancel-and-resume"Manifest & Compatibility
Compatibility helpers will be based on comparing the Manifests of the old and new versions. See
Manifesttype in on the World Interface. The exact shape of the Manifest is not part of this RFC, and will be expanded upon separately.A first iteration implementation of this RFC will not support compatibility checking, and will assume that all versions are compatible.
upgradewill only support the "cancel-and-restart" strategy initially, will upgrade without checking compatibility, and the new run will simply fail if the input is incompatible, leaving the old run in pause state, with user code able to resume the old run if needed after an upgrade failure.CLI Changes
New CLI commands will be added to expose the new World and Runtime API changes:
workflow version createto create a new versionworkflow version updateto update a versionworkflow version deleteto delete a versionworkflow version listto list versionsworkflow version getto get a versionworkflow upgradeto upgrade the version of an existing runAnd some existing commands will change to emit additional version information:
workflow buildwill callworld.versions.getCurrent()internally, or generate a new version ID if one is not provided, to save to the created manifest and bundle.Deployment Changes
The bigger change that comes with versioning is the responsibilities of the World implementation. To understand these changes, let's take a look at the user experience of using a World.
CI Experience
This is an example CI script that a user might use to deploy their application:
Opinionated vs. Flexible Worlds
In the above example, step (3) could be part of the World implementation, or down
to the user. This depends on the World. Imagine, for example, an "AWS World".
This world could either:
versions.update(id, { address })and simply store it in DynamoDBaddressprovided, using it as a URL.Or, it could take over more of the user's responsibilities:
versions.create({ manifest }), it will:versions.update( id, { address })to store the address in DynamoDB, mainly to mark it as "live".This example is independent of whether the World is based on a major cloud provider's serverless implementation, Supabase, Jazz, Netlify, self-hosted Kubernetes, etc.
These are two ends up a spectrum, here called "flexible" (user-defined) and "opinionated" (world-defined). At the very least, a world will need to support a Queue and Storage implementation, though these need not be the same implementation. A community npm package might implement a queue, and another a storage layer, and these can still be combined to form a minimal world.
It's especially important to note that versioning is optional. If you simply always deploy your latest instance of code, it will cause old workflows runs to stand still. However, they may still be upgraded manually by the user. A "flexible" world with the a minimal user-given compute layer is viable, though not recommended.
In the best case, a user may be able to download, e.g. an
awesome-supabase-worldpackage, provide a Supabase project ID and credentials, and do nothing but callworkflow version createin Github CI to create a full (and possibly client/server) deployment with minimal configuration.Minimal single-VM World Implementation Example
With a minimal production-ready world like the
@workflow/world-postgres, which implements Storage and Queueing off of a single PostgresSQL instance, a user could create a production deploy like so:Open Questions
Deployment/deploymentId/world.deployments/run.deploymentIdinstead? It would make more sense semantically, but be less clearly tied to the idea of versioning.versionIdbe baked into the bundle as it currently is? The upside is that most worlds can implementversions.getCurrent()simply by callinggetManifest().versionIdfrom the runtime helpers, instead of requiring world implementations to set or expect environment variables. The downside is that we can't swap this ID out as easily as an environment variable, e.g. if we wanted to re-use a bundle across deploys of different environments. However, this is still possible if the World chooses to use environment variables for this purpose.version.update()exist at all? It is mainly used to update a address, implicitly marking a Version as live, if it was initially created as “not live”. Given that the overhead is small to maintain this and we might want to allow arbitrary properties to the set on the Version object (for world internal management), it would make sense to keep the update call for forward compatibility. On the flip side, the conceptual simplicity of immutable Version objects could be nice too.versions.create({ setLive: true })would only return if the bundle was deployed successfully and is accessible. A World may still choose to do this, but the question is whether it should be a required contract across Worlds. Similarly, there's no way to list "alive" versions, though again, the World may choose to model this based on the address field..well-knownroute names, e.g..well-known/workflow/v1/flow/vers_01ARZ3NDEKTSV4RRFFQ69G5FAV, also aliased as.well-known/workflow/v1/flow? This would allow combining bundles to allow a single nodejs instance to server multiple versions. Some Worlds might make use of this, though it is unlikely to be a good option.Beta Was this translation helpful? Give feedback.
All reactions