* fix(channels): add operational guardrails
* make format
* fix(channels): converge with #3582 to avoid merge-order conflicts
Drop this PR's DingTalk INFO-log redaction and hand it to #3582, which
already restructures that handler and will redact the same log there. This
PR no longer touches dingtalk.py, so the two PRs can merge to main in any
order without a conflict.
For WeChat, drop the contested thread_ts priority reorder (review #3) and
keep only what inbound dedupe needs: a server-stable message_id in the
inbound metadata (message_id/msg_id, no client_id per review #6). This is a
single added line inside the metadata dict, a region #3582 never touches, so
it auto-merges regardless of order.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(channels): address three correctness review findings
1. Connect-code cap was racy (willem #1): _create_state ran delete-expired,
count, and insert as three separate transactions, so concurrent connect
POSTs from one owner could each see count < cap and all insert past it. Add
ChannelConnectionRepository.create_oauth_state_within_cap which does
delete+count+insert in a single transaction serialized per (owner,
provider) — Postgres via pg_advisory_xact_lock, SQLite via the write lock
the leading DELETE takes — and have the router use it.
2. Inbound dedupe key fell back to "" workspace (willem #3): two workspaces
delivering without team/guild/aibotid would collapse to the same key and
dedupe each other's messages. _inbound_dedupe_key now fails closed
(returns None) when no workspace identifier is present.
3. Dedupe key was recorded on receipt and never released on failure
(ShenAC #1): a transient error (DB blip, Gateway 503) left the key in place
for the full TTL, so a provider redelivery of the same message_id — exactly
the retry dedupe should absorb — was silently dropped. _handle_message now
releases the key in the unexpected-exception branch so redelivery can
recover, while keeping record-on-receipt so retries during handling are
still deduped.
Tests: repo cap enforcement incl. concurrent-issuance non-leak; dedupe
fail-closed; dedupe key release-on-failure redelivery recovery.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(channels): address cleanup/efficiency and test review findings
Efficiency / cleanup:
- Dedupe key set drops client-generated ids (client_msg_id, client_id);
keep only server-stable event_id/message_id/msg_id, which a provider's own
redelivery preserves (ShenAC #6). Every provider already emits message_id.
- TTL/overflow pruning of _recent_inbound_events is now O(k): switch to an
OrderedDict and popitem(last=False) from the front instead of scanning all
4096 entries on every inbound (willem #4).
- Log "received inbound" only after the dedupe check so a provider retrying N
times no longer logs N accepts; document that manager dedupe covers the
agent run/final answer, not provider ack side-effects (willem #5, ShenAC #2).
- Slack drops the redundant `team_id or event.get("team")` fallback the caller
already resolved (willem #6).
- create_oauth_state_within_cap prunes only this owner/provider's expired codes
instead of a global DELETE on every connect POST; global cleanup still runs
on consume_oauth_state (willem #7).
Tests:
- Dedupe test uses tmp_path instead of a leaked mkdtemp, uses distinct objects
per publish, and adds a negative control: a different message_id is still
processed, catching over-dedupe regressions (willem #8, ShenAC #4).
- Slack HTTP-mode rejection test supplies app_token so the missing-token early
return can't mask the guard, giving the state assertions teeth (ShenAC #3).
- count_oauth_states test pins that the active row survives, not just the count
(ShenAC #5).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* make format
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>