ADR-0001: Connectivity Monitoring Approach

Status: Accepted
Date: 2025-11-03
Authors: Dave Emmanuel Magno
Supersedes: None
Superseded by: None

Context

Kiosk devices must reflect connectivity changes to the UI and services in near real time with minimal overhead. We need: - Instant detection of local link/IP/route changes - Reliable internet reachability classification - Real-time notifications to the UI - Minimal privileges and CPU/network usage - Lean hexagonal architecture (clear ports/adapters, config externalized)

Existing backend already has: - Event-driven link monitoring via kernel netlink (ip monitor) to trigger checks - WebSocket topic for pushing status to the UI - HTTP GET /internet/status for polling

Decision

Adopt a hybrid, event-driven + periodic reachability monitor: - Event source: kernel netlink via ip monitor all (unprivileged), not DBus by default - Probes: DNS resolve, TCP (53) handshake, HTTP 204 endpoint; optional backend/MQTT endpoint probe - Triggering: run probes immediately on link/IP/route events and on a periodic watchdog (default 60s) - Hysteresis: require 2 consecutive samples before changing state to avoid flapping - Delivery: push updates over /ws/internet-status and expose GET /internet/status

A separate Wi‑Fi control API (scan/connect/forget) lives in a minimal bounded context (device_network) with its own port and adapter. A DBus-based wifi-helper sidecar is optional and only added if SSID/signal/captive-portal details are required.

Trade-offs

Pros: - Instant local change detection without privileged access (netlink) - Low CPU/network overhead; no tight polling loops - Resilient to silent upstream outages via watchdog - Clear UI contract via WebSocket and simple REST - Aligns with lean hexagonal boundaries (ports/adapters, config externalized)

Cons: - Netlink doesn’t expose SSID/signal/captive-portal; requires optional wifi-helper for that metadata - HTTP 204 and DNS/TCP probes can be blocked by strict firewalls (must be configurable/fallback)

Neutral Factors: - DBus integration remains an optional enhancement - Probe targets and intervals are environment-specific and configurable

Consequences

  • Detects link changes in <100 ms; classifies internet reachability in ~300–800 ms
  • Catches upstream failures within ≤60 s via watchdog
  • Reduces flapping with 2-sample hysteresis
  • Keeps privileged Wi‑Fi control separate (optional wifi-helper behind a port), preserving security

Alternatives Considered

  • Pure polling: Simple but wasteful; misses transient states; higher latency
  • Pure event-driven (DBus/NetworkManager): Efficient for local changes but blind to upstream outages
  • OS-native connectivity only: Inconsistent across environments; not flexible enough
  • Chosen: Hybrid (event + probe) for balanced reliability and performance

References

  • Current event-driven loop (link/IP/route): ```177:214:app/shared/services/connection_service.py async def _monitor_loop(self): process = await asyncio.create_subprocess_exec("ip", "monitor", "all", stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE) ... if "default" in event or "ADDR" in event or "ROUTE" in event: asyncio.create_task(self._check_and_broadcast())
- WebSocket:
```31:43:app/api/websocket_route.py
@router.websocket("/ws/internet-status")
async def ws_internet_status(websocket: WebSocket):
    await manager.connect(websocket, INTERNET_STATUS_TOPIC)
    await websocket.send_json({"type": "internet_status", "is_online": connection_service.is_online, "connectivity_level": connection_service.connectivity_level.value})
  • Status endpoint: 16:28:app/api/internet_route.py @router.get("/status") async def get_internet_status(...): return {"is_online": connection_service.is_online, "connectivity_level": connection_service.connectivity_level.value, ...}