Commit e69ffd3
v3: Refactor attempt creation to be worker requested (#1077)
* WIP worker TaskRunAttempt creation
* Handling failing task runs that cannot create an attempt for whatever reason
* Move the visibility queue stuff into a graphile job
* Fixed task runs with unsanitized queue names
* “Borrow” the code from alerts PR to get self hosted deployments working
* Add an admin API endpoint to get info about the shared marqs queue
* Allow admins to view any project metrics
* start adding lazy attempts to prod
* lazy attempt creation for prod workers
* resurrect prod stack traces
* add exception event to failed run spans
* simplify dependency resumes
* fix typecheck
* fix merge
* fresh process for all attempts
* always try sigterm first
* stop heartbeat timeout on non-inplace replace message
* add missing ack on checkpoint creation service failure
* bypass dequeue for retries with running worker
* respect retry delays
* crash runs with invalid run status for execution
* remove debug logs
* fix nack message
* fix version locking
* fresh attempt processes in dev and prod
* improve handling of ipc timeouts
* consider checkpoint failures on cancellation
* add basic chaos monkey to checkpointer
* changeset
* control forced checkpoint simulation via env var
* fix merge
* kill old attempt processes before checkpointing
* detailed perf logging for checkpointing
* add coordinator otlp endpoint example
* improve prod run cancellation
* rename supports lazy attempts migration
* fix graceful exit
* fix retry mechanics
* clear paused state before retry
* remove checkpoint image after push
* crash worker on unrecoverable errors
* refactor unrecoverable error emit
* switch to do hosted busybox image
* increase wait for duration ipc timeout
* add changeset for misc fixes
* fix merge
* fix retry delay span runId
* fix dev retries
* improve prod worker logging
* log checkpoint sizes
* add lazy attempts catalog entries
* Fixed merge issue: use zodFetch, not wrapZodFetch
* Revert "Fixed merge issue: use zodFetch, not wrapZodFetch"
This reverts commit d137e4e.
* importEnvVars uses wrapZodFetch now
* add backwards compat for retries without checkpoints
* handle more cases of unrecoverable runs
* don't kill the child process if it shouldn't be killed
---------
Co-authored-by: nicktrn <55853254+nicktrn@users.noreply.github.com>
Co-authored-by: Matt Aitken <matt@mattaitken.com>1 parent 782d4f7 commit e69ffd3
51 files changed
Lines changed: 3345 additions & 885 deletions
File tree
- .changeset
- apps
- coordinator/src
- docker-provider
- src
- kubernetes-provider/src
- webapp/app
- routes
- services
- v3
- marqs
- services
- packages
- cli-v3/src
- commands
- workers
- common
- dev
- prod
- core-apps/src
- core
- src/v3
- runtime
- schemas
- database/prisma
- migrations/20240430101936_add_lazy_attempt_support_flag_to_workers
- references/v3-catalog
- src/trigger
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
0 commit comments