deerflow2/backend/packages/harness/deerflow
stphtt f212da9f89
fix(sandbox): create shell session before retrying on a fresh id (#3577)
* fix(sandbox): create shell session before retrying on a fresh id

The AIO sandbox recovery path generated a UUID and passed it straight to
exec_command(id=...). The sandbox image only auto-creates a session when
exec_command is called with *no* id; an exec carrying an unknown id returns
HTTP 404 "Session not found". So every ErrorObservation recovery itself
404'd, turning a transient session lapse into an unrecoverable tool error
that looped the run up to the LangGraph recursion limit.

Explicitly create_session(id=fresh_id) before targeting that id on retry.
create_session is idempotent (returns the existing session if the id already
exists), so this is safe under the serializing lock.

Updated the regression test to assert the retry targets exactly the
created session id rather than a fabricated, uncreated one.

* fix(sandbox): release the one-shot recovery session after retry

The fresh session created on the ErrorObservation recovery path is used for
exactly one command -- the next execute_command runs with no id and returns
to the default session. Under persistent session corruption every command
would create another session that is never reused or released, accumulating
sessions on the container.

Release it best-effort with cleanup_session() in a finally, swallowing any
cleanup error so it never masks a successful retry.

Addresses review feedback on #3577.

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-06-17 10:21:27 +08:00
..
agents fix(sandbox): merge idempotent sandbox state updates (#3518) 2026-06-13 22:40:48 +08:00
community fix(sandbox): create shell session before retrying on a fresh id (#3577) 2026-06-17 10:21:27 +08:00
config fix(channels): require bound identity for user-owned IM messages (#3578) 2026-06-16 23:04:39 +08:00
guardrails feat(guardrails): add pre-tool-call authorization middleware with pluggable providers (#1240) 2026-03-23 18:07:33 +08:00
mcp fix(mcp): close stdio sessions on their owning loop to avoid cross-task cancel-scope error (#3379) (#3392) 2026-06-07 21:37:30 +08:00
models feat(models): add StepFun reasoning model adapter (#3461) 2026-06-09 18:01:43 +08:00
persistence fix(channels): make runtime provider state authoritative (#3580) 2026-06-17 07:45:46 +08:00
reflection refactor: split backend into harness (deerflow.*) and app (app.*) (#1131) 2026-03-14 22:55:52 +08:00
runtime fix(history): strip base64 image data from REST endpoint responses (#3535) 2026-06-13 08:58:19 +08:00
sandbox fix(sandbox): persist lazily-acquired sandbox state via Command (#3464) 2026-06-11 17:50:36 +08:00
skills fix(skills): keep skill archive installation off the event loop (#3505) 2026-06-12 15:17:40 +08:00
subagents fix(subagents): raise general-purpose max_turns to 150 and default timeout to 30min (#3610) 2026-06-16 19:55:04 +08:00
tools fix(agents): sync agent_name across context/configurable and reject empty soul (#3549) (#3553) 2026-06-14 10:40:16 +08:00
tracing fix(tracing): propagate session_id and user_id into Langfuse traces (#2944) 2026-05-21 16:49:31 +08:00
uploads fix upload file size contract (#3408) 2026-06-06 15:12:17 +08:00
utils fix(skills): harden slash skill activation across chat channels (#3466) 2026-06-09 23:07:17 +08:00
__init__.py refactor: split backend into harness (deerflow.*) and app (app.*) (#1131) 2026-03-14 22:55:52 +08:00
client.py feat(memory): add memory.token_counting config to avoid tiktoken network dependency (#3429) (#3465) 2026-06-10 23:26:15 +08:00