Separate timeout types, identify ambiguous writes, and recover without guessing what the carrier actually did.
Not All Timeouts Mean the Same Thing
A connect timeout, TLS timeout, upstream gateway timeout, worker deadline, and carrier-side read timeout all have different meanings. You need to know whether the request never left your system, reached the carrier but not your app code, or ran long enough that the outcome became ambiguous.
Ambiguous Outcomes Are Write-Specific
A timeout on GET tracking is annoying. A timeout on POST /shipments is operationally dangerous because the label may already exist. Treat every write timeout as an evidence-collection problem first, not a generic retry candidate.
Carrier Reality
Carrier maintenance windows and edge proxies often surface as generic timeouts even when the backend actually completed the write. If you retry blindly, you create a second problem while the first one is still being diagnosed.
Document the Recovery Ladder
The recovery path should be explicit: check telemetry, search by client reference, inspect any existing label, then decide whether retry, compensation, or escalation is the correct next move. If that ladder only lives in one engineer's head, you will relearn it during every outage.