do not call tcp/udp ::stopAll() when it is not required to do so#8598
Merged
Conversation
Collaborator
|
ArduinoOTA as well? Even a failed update kills every client Continuing the idea, Arduino/libraries/ArduinoOTA/ArduinoOTA.h Lines 50 to 61 in 8bfc2e9 Where the user might as well do the WiFiClient::stopAll();
WiFiUDP::stopAll();// or, where we want some fine grained control
WiFiClient::stopAllExcept(&server.client());
WiFiUDP::stopAllExcept(&myPreciousCnC); |
mcspr
approved these changes
Jun 23, 2022
mcspr
added a commit
that referenced
this pull request
Nov 3, 2022
follow-up of #8598 similar to ArduinoOTA API, execute certain callback in the Updater context.
6 tasks
hasenradball
pushed a commit
to hasenradball/Arduino
that referenced
this pull request
Nov 18, 2024
…8266#8598) * do not call tcp/udp ::stopAll() when it is not required to to so * remove stopAll also from ArduinoOTA
hasenradball
pushed a commit
to hasenradball/Arduino
that referenced
this pull request
Nov 18, 2024
follow-up of esp8266#8598 similar to ArduinoOTA API, execute certain callback in the Updater context.
rvdbreemen
added a commit
to rvdbreemen/OTGW-firmware
that referenced
this pull request
Apr 23, 2026
….1.x Root cause: Arduino Core 3.1.0 (esp8266/Arduino PR #8598) removed the implicit WiFiClient/WiFiUDP::stopAll() that used to run during the Update path. Without explicit cleanup from the sketch, TCP sockets linger in lwIP and the WiFi SDK state persists through the soft-reset triggered by ESP.restart(). The next boot comes up in a half-state: WiFi appears associated (stale from before reboot) but telnet, HTTP, MQTT and WebSocket all fail to rebind their ports. The only way to recover without power-cycling is to force a WiFi disconnect at the AP, which signals STAMODE_DISCONNECTED and triggers our normal reconnect path, finally initialising services properly. Reported 2026-04-22 by andrebrait (Discord #beta-testing) and independently reproduced by number3nl. Symptom matches the Core 3.1.0 breaking change exactly: device is "connected to WiFi per Unifi controller" but all services non-responsive until forced WiFi disconnect. Fix: add a prepareForReboot() helper in helperStuff.ino that closes all long-lived TCP services (MQTT, WebSocket, telnet on port 23, OTGW stream on 25238) and calls WiFi.disconnect() before the ESP.restart() soft-reset fires. mDNS and LLMNR are UDP-only and their stale state is harmless across a reset, so they are left alone (also avoids API compatibility issues across Core versions). Changes: - MQTTstuff.ino: new doMqttDisconnect() wrapper. MQTTclient is file-static so helperStuff.ino cannot reach it directly; the wrapper exposes a graceful disconnect. - OTGW-firmware.h: prototypes for doMqttDisconnect and doRestart so they are reachable from the OTA update template header. - helperStuff.ino: prepareForReboot() helper + doRestart() now calls it between flushSettings() and ESP.restart(). All reboot paths (nightly, OTA success, WiFi config-portal fallback) benefit. - OTGW-ModUpdateServer-impl.h: the HTTP_POST success handler now routes through doRestart() instead of its own inline delay + ESP.restart + delay sequence. Consolidates the reboot flow to a single canonical path. Compared to the alternative of using ESP.reset() (bootrom jump): the cleanup approach preserves MQTT last-will semantics for HA, keeps getResetReason() meaningful for post-reboot diagnostics, and stays within the Arduino-idiomatic API. Estimated effectiveness ~90% on the primary reported symptom; remaining risk covers latent issues (e.g. first-time 1.3.x -> 1.4.x LittleFS reformat) that are already addressed elsewhere (release notes flash-order guidance). Sources: - esp8266/Arduino release 3.1.0 notes: "ArduinoOTA and ESP8266HTTPUpdate no longer stop all WiFiClient/WiFiUDP (#8598)" - PR esp8266/Arduino#8598 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rvdbreemen
added a commit
to rvdbreemen/OTGW-firmware
that referenced
this pull request
Apr 23, 2026
Follow-up to 201921f (service cleanup before reboot). Switches the final reset call in doRestart() from ESP.restart() to ESP.reset(). Rationale: ESP.restart() triggers a soft-reset via the SDK, which per Core 3.1.0 (PR esp8266/Arduino#8598) no longer performs implicit WiFi/UDP cleanup. The soft-reset can leave SDK WiFi state half- associated across the reset boundary, producing the "WiFi connected but services dead" symptom that andrebrait and marceld91d reported. ESP.reset() jumps into the ROM bootrom at 0x40000080, which is functionally equivalent to asserting the RESET pin. It wipes all SDK state, all lwIP pools, all DRAM — a guaranteed fresh boot. This is exactly what the manual recovery (physical reset button) does, and what field testers discovered works reliably. The graceful-shutdown concerns that normally argue against ESP.reset() (abrupt teardown, no LWT, no close frames) are already handled by prepareForReboot() which runs BEFORE the reset: - MQTT disconnect publishes a clean offline event (no LWT trigger) - webSocket.close() sends proper close frames to all WS clients - debugTelnet.stop() / OTGWstream.stop() FIN the TCP sockets - WiFi.disconnect() triggers STAMODE_DISCONNECTED to the SDK So: cleanup preserves peer hygiene, ESP.reset() preserves local wipe robustness. Defense in depth. Side benefits: - The delay(5000) safety-tail after ESP.restart is no longer needed because ESP.reset never returns. Two fewer lines of defensive code. - Future-proof against services we might add later and forget to add to prepareForReboot(): the bootrom reset wipes their state anyway. Trade-offs accepted: - getResetReason() will report "External System" instead of "Software /System restart". The application-level reason is still logged via DebugTln(str) before the reset fires. - ESP.reset() is marked deprecated in Core 3.x comments. Still fully functional; migration path in Core 4.x unknown but likely years away. Acceptable risk. Expected effectiveness: ~95% on the reported reboot symptom. The remaining ~5% covers edge cases that would also require a physical reset (brownout, flash corruption, hardware issues). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rvdbreemen
added a commit
to rvdbreemen/OTGW-firmware
that referenced
this pull request
Apr 24, 2026
Two reboot call sites still bypassed the central doRestart() cleanup wrapper, skipping the MQTT LWT / WebSocket close / TCP-FIN sequence that is critical on Arduino Core 3.1.0+ (PR esp8266/Arduino#8598 removed the implicit WiFiClient::stopAll() from the Update path). - runNightlyRestartCheck() in OTGW-firmware.ino called ESP.restart() directly after a 200ms delay. Scheduled reboots now go through doRestart("[nightly] scheduled restart") so nightly restart and OTA-success follow the same cleanup path. - The ESP32 OTA success handler in OTGW-ModUpdateServer-esp32.h called platformRestart() directly after a manual 1s delay, asymmetric with the ESP8266 path which already uses doRestart(). Both paths now use doRestart("[OTA] Rebooting..."). Part of TASK-394 Phase 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rvdbreemen
added a commit
to rvdbreemen/OTGW-firmware
that referenced
this pull request
Apr 24, 2026
Port of 2.0.0 TASK-394 Phase 1 commit 2f2adf0 onto dev (1.4.x). runNightlyRestartCheck() called ESP.restart() directly after a 200ms delay, bypassing the central doRestart() cleanup wrapper. That skipped the MQTT LWT / WebSocket close / TCP-FIN sequence that is critical on Arduino Core 3.1.0+ (PR esp8266/Arduino#8598 removed the implicit WiFiClient::stopAll() from the Update path). Scheduled reboots now go through doRestart("[nightly] scheduled restart") so nightly restart and OTA-success follow the same cleanup path. Dev is ESP8266-only so there is no ESP32 OTA path to fix here; the existing OTGW-ModUpdateServer-impl.h:108 already uses doRestart. Part of TASK-395. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rvdbreemen
added a commit
to rvdbreemen/OTGW-firmware
that referenced
this pull request
Apr 24, 2026
Part 3 of 3 for TASK-396. The helpers (696bdb9) and setup/loop wiring (ab443d0) are now visibly exercised from the OTA path. Four new logBootSignature probes trace the OTA heap lifecycle. Grep the telnet log for "\[OTA\] " and you get a full narrative of each update: [OTA] pre-begin ← _beginFirmware/FilesystemUpload first line (upload progress) [OTA] post-end ← immediately after Update.end(true) commits the image [OTA] post-remount ← FS-OTA only, after LittleFS.begin() returns true [OTA] pre-reboot ← HTTP success handler, just before queuing the reboot The HTTP POST success handler was changed from calling doRestart() directly to calling requestDeferredReboot("[OTA] Rebooting..."). Rationale (also captured in the updated code comment): doRestart() calls prepareForReboot() which stops telnet and closes MQTT/WS/OTGWstream — all while still inside the HTTP handler callback. On fast networks the success HTML was usually flushed by client().stop() before the cleanup started, but on slow/congested links we used to see browser hangs where the success page never fully arrived. Deferring the reboot to the next loop() tick gives lwIP 10-100ms to finish draining the response socket. The existing service-cleanup sequence (critical on Arduino Core 3.1.0+ per PR esp8266/Arduino#8598) still fires — just from performDeferredReboot() in loop() rather than from the HTTP handler. Sequence on the wire: _handleUploadEnd() sets bESPactive=false ← end of upload HTTP handler sends 200 OK + success HTML client().stop() flushes ← HTTP 200 leaves the socket logBootSignature("[OTA] pre-reboot") ← captures final heap state requestDeferredReboot(...) ← sets flag, returns HTTP handler returns to WebServer loop <next loop() tick> doBackgroundTasks() runs rebootHeapWatermarkTick() updates minHeap isRebootPending() → true, isFlashing() → false performDeferredReboot() → doRestart() → prepareForReboot() → ESP.reset() Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rvdbreemen
added a commit
to rvdbreemen/OTGW-firmware
that referenced
this pull request
Apr 24, 2026
…3) for LTS stability Rolls back the ESP8266 target from Core 3.1.2 (espressif8266@4.2.1) to Core 2.7.4 (espressif8266@2.6.3), matching the 1.5.x LTS line. Motivation: field reports of OTA-reboot stalls on Core 3.1.2 traced to PR esp8266/Arduino#8598 (removal of implicit WiFiClient/WiFiUDP::stopAll in the Update path). Core 2.7.4 is the baseline the 1.3.x line used and has a multi-year stability track record. ESP32 env is untouched; it continues to use the pioarduino fork with Arduino-ESP32 v3.3.5. Comment block documents the GCC 4.8.2 vs 10.3 ctags trade-off for future maintainers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rvdbreemen
added a commit
to rvdbreemen/OTGW-firmware
that referenced
this pull request
Apr 24, 2026
…upersedes ADR-014) ADR-082 (Accepted) documents the ESP8266 Arduino Core rollback from 3.1.2 (espressif8266@4.2.1) to 2.7.4 LTS (espressif8266@2.6.3) for the 2.0.0 line. Drivers: PR esp8266/Arduino#8598 removed the implicit WiFiClient::stopAll() that the Update path relied on, producing OTA-reboot stalls; field soak on 1.5.x validated the rollback. Library-version pin consequences captured (AceTime v2.0.1 and WebSockets 2.3.6 for GCC 4.8.2 compatibility). The ESP32 target is explicitly unaffected. ADR-083 (Accepted) makes PlatformIO the primary build system on the 2.0.0 line (dual-target ESP8266 + ESP32). Retains build.py / arduino-cli only on the LTS-1.5.x branch for historical continuity. Supersedes ADR-014 which rejected PlatformIO in 2020 on grounds that no longer hold (dual-target, pre-hook library patching, GitHub-tag AceTime pinning, Arduino IDE dropped as contributor workflow). ADR-014 Status field updated to 'Superseded by ADR-083' per the project rule that Accepted/Deprecated ADRs may only have their status touched — content remains immutable. Research report moved from repo root to docs/research/ (git mv preserves history) and referenced from ADR-082. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fixes #6123