Technical Design Document: Fleet Identity & Trust Architecture
1. Summary
This architecture implements a secure, offline-first trust model using Ed25519-based JWTs. Following ADR-0001, the Auth-Service manages a central EdDSA key pair, issuing a Shared Fleet JWT to devices. This ensures multi-day offline operation, local verification via PostgreSQL-backed storage, and a standardized JWKS renewal path.
2. Auth-Service: Key Generation & Publishing
The Auth-Service acts as the Root of Trust. It is responsible for creating the keys that the entire fleet will use.
A. Key generation logic (Ed25519)
Using joserfc or authlib, the Auth-Service performs the following:
- Generate key pair: Create an Ed25519 private key.
- Persistence: Store the private key (in PEM or JWK format) in the Auth-Service’s central PostgreSQL database.
- Metadata: Assign a unique kid (Key ID) and alg (EdDSA).
B. The JWKS endpoint (/.well-known/jwks.json)
The Auth-Service exposes a public endpoint that filters out the private components: - Logic: Queries the database for all active keys, extracts the public components only, and formats them into a standard JWK Set. - Output example:
{
"keys": [
{ "kty": "OKP", "crv": "Ed25519", "x": "...", "kid": "v1", "alg": "EdDSA", "use": "sig" }
]
}
C. Provisioning certificates
- Provisions the JWKS on registration by adding to payload along with other certificates (AmazonRootCa, certificate.pem.key, private.pem.key)
3. Component Contracts
A. Auth-Service to Device (the provisioning contract)
- Trigger: Initial registration or token refresh.
- Response payload:
access_token: Shared Fleet JWT (Ed25519, 14–30 day TTL).refresh_token: Opaque string.fleet_key.json: Initial Ed25519 public key set (JWKS).jwks_uri: https://auth-service.com/.well-known/jwks.json
B. Device Manager to Device (the inbound contract)
- Security: JWT in
Authorization: Bearer <token>. - Validation logic:
- Extract
kidfrom JWT header. - Locate matching key in local
fleet_key.json. - If
kidunknown: fetch fromjwks_uri(if online). - Verify EdDSA signature using authlib or joserfc.
C. Device to Device Manager (the outbound contract)
- Security: Device attaches the Shared Fleet JWT to the request header.
- Validation: Manager verifies the token against its copy of the JWKS.
D. Device Backend to Frontend (the local contract)
- Mechanism: HttpOnly cookies mapped to a local PostgreSQL sessions table.
- Verification: Backend validates
session_idagainst the local PostgreSQL sessions table. - Invariant: The frontend never sees the Fleet JWT or refresh token.
4. Token Storage & Validation Logic (Device Side)
| Asset | Table / Location | Field Type | Purpose |
|---|---|---|---|
| Fleet Access Token | fleet_credentials | TEXT | Current active badge. |
| Refresh Token | fleet_credentials | TEXT | Persistent renewal key. |
| fleet_key.json | /etc/fleet/keys.json | JSON | Key set for inbound verification. |
Validation logic: FastAPI checks exp claim. If current_time + 5m > exp, use the DB-stored refresh_token to call Auth-Service.
5. System States & Invariants
- Signature invariant: Must use
alg: EdDSA. - Offline invariant: Verification must work via local
fleet_key.jsonwhen disconnected. - Generation invariant: Auth-Service must never expose the
d(private) parameter in thejwks.jsonendpoint.
6. Testable Events
EVT_KEY_ROTATION: Generate a new key in Auth-Service. Verify the old key still works (for transition) and the newkidappears in the JWKS.EVT_COLD_BOOT: Device backend retrieves tokens from Postgres and calls Device Manager.EVT_OFFLINE_VERIFY: Simulate internet loss; verify inbound calls via local cache.
7. Recommended Python/FastAPI Stack
joserfc: For high-performance Ed25519 key generation and JWT verification.
python-jose[cryptography] is unmaintained. joserfc is based on authlib and has documentation: https://jose.authlib.org/en/migrations/authlib/
8. Risk & Mitigation (ADR-0001)
| Risk | Impact | Mitigation |
|---|---|---|
| Private Key Leak | Fleet-wide breach. | Use HSM or KMS for the Auth-Service private key; rotate keys immediately. |
| Shared Token Leak | Unauthorized access. | Short exp (14d); strict scope limiting. |
| DB Compromise | Local token theft. | Encrypt sensitive columns in PostgreSQL at rest. |
9. Future Expansion: Rotation & UX
- Frontend alerts: WebSocket notifications if the device is offline and the token is near expiry.
- Key rotation sync: Auth-Service notifies the Device Manager of new keys; devices discover the change via the
kidin the next request.