Skip to content

Commit a3b16c1

Browse files
authored
feat(auth): per-sandbox authentication to gateway (#1404)
1 parent 5238937 commit a3b16c1

98 files changed

Lines changed: 6610 additions & 614 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.agents/skills/debug-openshell-cluster/SKILL.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,9 +144,16 @@ Check required Helm deployment secrets:
144144
kubectl -n openshell get secret \
145145
openshell-server-tls \
146146
openshell-server-client-ca \
147-
openshell-client-tls
147+
openshell-client-tls \
148+
openshell-jwt-keys
148149
```
149150

151+
If the gateway exits with `failed to read sandbox JWT signing key from
152+
/etc/openshell-jwt/signing.pem`, verify that `openshell-jwt-keys` contains
153+
`signing.pem`, `public.pem`, and `kid`, and that the StatefulSet mounts the
154+
`sandbox-jwt` secret at `/etc/openshell-jwt`. The sandbox JWT mount is required
155+
even when local Helm values disable TLS.
156+
150157
Check the image references currently used by the gateway deployment:
151158

152159
```bash
@@ -205,6 +212,18 @@ helm -n openshell get values openshell | grep sandboxNamespace
205212

206213
Then inspect sandbox resources in that namespace.
207214

215+
Check the configured sandbox service account when TokenReview bootstrap or
216+
sandbox registration fails. Helm creates a dedicated sandbox service account by
217+
default and writes it to `[openshell.drivers.kubernetes].service_account_name`;
218+
the gateway rejects projected tokens from other service accounts.
219+
220+
```bash
221+
helm -n openshell get values openshell | grep -A3 sandboxServiceAccount
222+
kubectl -n <sandbox-namespace> get serviceaccount openshell-sandbox
223+
kubectl -n openshell get configmap openshell-config -o jsonpath='{.data.gateway\.toml}'
224+
kubectl -n <sandbox-namespace> get sandbox <sandbox-name> -o jsonpath='{.spec.template.spec.serviceAccountName}{"\n"}'
225+
```
226+
208227
### Step 6: Check VM-Backed Gateways
209228

210229
Use the VM driver logs and host diagnostics available in the user's environment. Verify:

.agents/skills/helm-dev-environment/SKILL.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,9 @@ mise run helm:k3s:create
2626
```
2727

2828
Creates a k3d cluster and merges its kubeconfig into the worktree-local `kubeconfig` file.
29-
Also applies base manifests (`deploy/kube/manifests/agent-sandbox.yaml`). Traefik is
30-
disabled at cluster creation time.
29+
Also applies base manifests (`deploy/kube/manifests/agent-sandbox.yaml`) and preloads the
30+
default community sandbox image into k3d so the first sandbox create does not wait on a
31+
large registry pull. Traefik is disabled at cluster creation time.
3132

3233
**Multi-worktree support:** the cluster name is derived from the last component of the
3334
current git branch (e.g. branch `kube-support/local-dev/tmutch` → cluster
@@ -43,6 +44,8 @@ Port mappings created at cluster time (cannot be changed without recreating):
4344

4445
Override with env vars before running `helm:k3s:create`:
4546
- `HELM_K3S_LB_HOST_PORT` (default: `8080`)
47+
- `HELM_K3S_PRELOAD_SANDBOX_IMAGE` (default:
48+
`ghcr.io/nvidia/openshell-community/sandboxes/base:latest`; set to an empty value to skip)
4649

4750
### 2. Deploy OpenShell
4851

.markdownlint-cli2.jsonc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
".claude/**",
1717
".opencode/**",
1818
".github/**",
19+
"architecture/plans/**",
1920
"**/node_modules/**",
2021
"target/**",
2122
".pytest_cache/**",

Cargo.lock

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

architecture/gateway.md

Lines changed: 40 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,12 @@ identity.
2323
## Protocol and Auth
2424

2525
The gateway listens on one service port and multiplexes gRPC and HTTP traffic.
26-
The default deployment mode is mTLS: clients and sandbox workloads present a
27-
certificate signed by the deployment CA before reaching application handlers.
26+
The default local single-user deployment mode is mTLS user authentication:
27+
clients present a certificate signed by the local deployment CA, and the
28+
gateway maps the verified certificate subject to a user principal. Kubernetes
29+
deployments use mTLS for transport only and require OIDC or a trusted access
30+
proxy for user authentication unless the explicit unsafe local-development
31+
`allow_unauthenticated_users` switch is enabled.
2832
When that service port is bound to loopback, the listener can also accept
2933
plaintext HTTP on the same port for sandbox service subdomains only. That local
3034
browser path is enabled by default and disabled with
@@ -37,14 +41,44 @@ Supported auth modes:
3741

3842
| Mode | Use |
3943
|---|---|
40-
| mTLS | Default direct gateway access for CLI, SDK, TUI, and sandbox callbacks. |
44+
| mTLS user auth | Local single-user Docker, Podman, and VM gateway access. |
4145
| Plaintext | Local development or a trusted reverse proxy boundary. |
46+
| Unauthenticated local users | Trusted Kubernetes dev or fully trusted proxy deployments only. |
4247
| Cloudflare JWT | Edge-authenticated deployments where Cloudflare Access supplies identity. |
4348
| OIDC | Bearer-token auth for users, with browser PKCE or client credentials login. |
4449

45-
Sandbox supervisor RPCs authenticate with either mTLS material or a sandbox
46-
secret depending on the runtime and deployment mode. User-facing mutations are
47-
authorized by role policy when OIDC or edge identity is enabled.
50+
Sandbox supervisor RPCs authenticate with gateway-minted sandbox JWTs when that
51+
authenticator is configured; mTLS does not grant sandbox identity. User-facing
52+
mutations are authorized by role policy when OIDC or edge identity is enabled.
53+
54+
Sandbox secrets are gateway-signed JWTs bound to a single sandbox ID. Docker,
55+
Podman, and VM drivers deliver the initial token through supervisor-only
56+
runtime material; Kubernetes supervisors exchange a projected ServiceAccount
57+
token through `IssueSandboxToken`. The gateway validates that projected token
58+
with Kubernetes `TokenReview`, requires the configured sandbox service account,
59+
checks the returned pod binding against the live pod UID, and verifies the pod's
60+
controlling `Sandbox` ownerReference against the live Sandbox CR UID and
61+
sandbox-id label before minting the gateway JWT. Supervisors renew gateway JWTs
62+
in memory before expiry only while the sandbox record still exists. Older tokens
63+
are not server-revoked; deployments bound replay exposure with short
64+
`gateway_jwt.ttl_secs` lifetimes.
65+
66+
Gateway JWT signing-key rotation is currently an offline operator action. The
67+
runtime loads one active signing key and one matching public verification key
68+
from the configured secret at startup. To rotate that key material today,
69+
operators must delete or replace the JWT key secret, let certgen recreate it,
70+
and restart the gateway pods. This invalidates outstanding supervisor tokens;
71+
running supervisors recover by re-running their bootstrap path where available
72+
or by reconnecting after sandbox restart. Online rotation with multiple
73+
verification keys keyed by `kid` is tracked separately.
74+
75+
Sandbox JWTs are not user credentials. The gRPC router accepts
76+
`Principal::Sandbox` only on the supervisor-to-gateway RPC allowlist
77+
(`ConnectSupervisor`, `RelayStream`, token renewal, config sync, policy status,
78+
log push, and policy-analysis callbacks). Handlers then compare the
79+
authenticated sandbox ID with any sandbox ID or name resolved from the request.
80+
Supervisor control and relay streams require a matching sandbox principal before
81+
the gateway registers the session or bridges relay bytes.
4882

4983
## API Surface
5084

crates/openshell-bootstrap/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ bytes = { workspace = true }
1616
futures = { workspace = true }
1717
miette = { workspace = true }
1818
rcgen = { workspace = true }
19+
sha2 = { workspace = true }
1920
serde = { workspace = true }
2021
serde_json = { workspace = true }
2122
tar = "0.4"
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
// SPDX-License-Identifier: Apache-2.0
3+
4+
//! Gateway-minted JWT signing-key generation.
5+
//!
6+
//! The gateway mints per-sandbox identity tokens (see PR 2 of the
7+
//! per-sandbox identity series, issue #1354) signed with an Ed25519
8+
//! keypair generated once at gateway init and persisted alongside the
9+
//! existing PKI bundle. The signing key never leaves the gateway; the
10+
//! public key plus a stable `kid` are consumed by the gateway's own
11+
//! validator and any future external verifiers.
12+
13+
use miette::{IntoDiagnostic, Result, WrapErr};
14+
use rcgen::{KeyPair, PKCS_ED25519};
15+
use sha2::{Digest, Sha256};
16+
17+
/// All PEM-encoded material needed to mint and validate sandbox JWTs.
18+
///
19+
/// The signing key stays in the gateway process. The public key is shared
20+
/// across gateway replicas (so any replica can validate a JWT minted by
21+
/// any other replica). The `kid` is published in every minted JWT's
22+
/// header so the validator can pick the right key after a future rotation.
23+
pub struct JwtKeyMaterial {
24+
/// PKCS#8 PEM-encoded Ed25519 private key.
25+
pub signing_key_pem: String,
26+
/// `SubjectPublicKeyInfo` PEM-encoded Ed25519 public key.
27+
pub public_key_pem: String,
28+
/// Stable identifier derived from the public key (SHA-256 hex prefix).
29+
/// Embedded in every minted JWT's `kid` header so future rotation can
30+
/// be performed in-place by adding a second key without breaking
31+
/// in-flight tokens.
32+
pub kid: String,
33+
}
34+
35+
/// Generate a fresh Ed25519 JWT signing key.
36+
///
37+
/// Output PEM is in the formats `jsonwebtoken` consumes via
38+
/// `EncodingKey::from_ed_pem` (signing) and `DecodingKey::from_ed_pem`
39+
/// (validation), so the gateway can round-trip its own tokens with no
40+
/// further conversion.
41+
pub fn generate_jwt_key() -> Result<JwtKeyMaterial> {
42+
let keypair = KeyPair::generate_for(&PKCS_ED25519)
43+
.into_diagnostic()
44+
.wrap_err("failed to generate Ed25519 JWT signing key")?;
45+
let signing_key_pem = keypair.serialize_pem();
46+
let public_key_pem = keypair.public_key_pem();
47+
let kid = kid_from_public_key_der(&keypair.public_key_der());
48+
Ok(JwtKeyMaterial {
49+
signing_key_pem,
50+
public_key_pem,
51+
kid,
52+
})
53+
}
54+
55+
/// Stable `kid` derived from the SHA-256 of the public-key DER.
56+
///
57+
/// First 16 bytes hex-encoded — collision-resistant for the small N of
58+
/// signing keys a single deployment ever has, while staying short enough
59+
/// to keep JWT headers compact.
60+
fn kid_from_public_key_der(public_key_der: &[u8]) -> String {
61+
let digest = Sha256::digest(public_key_der);
62+
hex_encode_prefix(&digest, 16)
63+
}
64+
65+
fn hex_encode_prefix(bytes: &[u8], n: usize) -> String {
66+
use std::fmt::Write as _;
67+
let mut out = String::with_capacity(n * 2);
68+
for byte in bytes.iter().take(n) {
69+
let _ = write!(out, "{byte:02x}");
70+
}
71+
out
72+
}
73+
74+
#[cfg(test)]
75+
mod tests {
76+
use super::*;
77+
78+
#[test]
79+
fn generate_jwt_key_produces_parseable_pem() {
80+
let material = generate_jwt_key().expect("generate_jwt_key");
81+
assert!(material.signing_key_pem.contains("BEGIN PRIVATE KEY"));
82+
assert!(material.public_key_pem.contains("BEGIN PUBLIC KEY"));
83+
assert_eq!(material.kid.len(), 32, "kid is 16 bytes hex-encoded");
84+
assert!(material.kid.chars().all(|c| c.is_ascii_hexdigit()));
85+
}
86+
87+
#[test]
88+
fn kid_is_stable_for_identical_public_keys() {
89+
// Same input -> same kid. Hash of a fixed byte string.
90+
let kid_a = kid_from_public_key_der(b"abc");
91+
let kid_b = kid_from_public_key_der(b"abc");
92+
assert_eq!(kid_a, kid_b);
93+
}
94+
95+
#[test]
96+
fn kid_differs_for_different_public_keys() {
97+
let kid_a = kid_from_public_key_der(b"first");
98+
let kid_b = kid_from_public_key_der(b"second");
99+
assert_ne!(kid_a, kid_b);
100+
}
101+
102+
#[test]
103+
fn generated_keys_are_unique() {
104+
let a = generate_jwt_key().expect("generate_jwt_key");
105+
let b = generate_jwt_key().expect("generate_jwt_key");
106+
assert_ne!(
107+
a.kid, b.kid,
108+
"fresh keypairs must produce distinct public keys"
109+
);
110+
assert_ne!(a.signing_key_pem, b.signing_key_pem);
111+
}
112+
}

crates/openshell-bootstrap/src/lib.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
pub mod build;
55
pub mod edge_token;
6+
pub mod jwt;
67
pub mod oidc_token;
78

89
mod metadata;

crates/openshell-bootstrap/src/metadata.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ pub struct GatewayMetadata {
2525
#[serde(skip_serializing_if = "Option::is_none", default)]
2626
pub resolved_host: Option<String>,
2727

28-
/// Auth mode: `None` or `"mtls"` = mTLS (default), `"plaintext"` = direct HTTP,
28+
/// Auth mode: `None` or `"mtls"` = mTLS, `"plaintext"` = direct HTTP,
2929
/// `"cloudflare_jwt"` = CF JWT.
3030
#[serde(default, skip_serializing_if = "Option::is_none")]
3131
pub auth_mode: Option<String>,

crates/openshell-bootstrap/src/pki.rs

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
// SPDX-License-Identifier: Apache-2.0
33

4+
use crate::jwt::{JwtKeyMaterial, generate_jwt_key};
45
use miette::{IntoDiagnostic, Result, WrapErr};
56
use rcgen::{BasicConstraints, CertificateParams, DnType, Ia5String, IsCa, KeyPair, SanType};
67
use std::net::IpAddr;
@@ -15,6 +16,12 @@ pub struct PkiBundle {
1516
pub server_key_pem: String,
1617
pub client_cert_pem: String,
1718
pub client_key_pem: String,
19+
/// PKCS#8 PEM Ed25519 private key for minting per-sandbox JWTs.
20+
pub jwt_signing_key_pem: String,
21+
/// SPKI PEM Ed25519 public key, paired with `jwt_signing_key_pem`.
22+
pub jwt_public_key_pem: String,
23+
/// Stable identifier embedded in the `kid` header of every minted JWT.
24+
pub jwt_key_id: String,
1825
}
1926

2027
/// Default SANs always included on the server certificate. Covers the host
@@ -99,13 +106,23 @@ pub fn generate_pki(extra_sans: &[String]) -> Result<PkiBundle> {
99106
.into_diagnostic()
100107
.wrap_err("failed to sign client certificate")?;
101108

109+
// --- JWT signing key (Ed25519, used to mint per-sandbox identity tokens) ---
110+
let JwtKeyMaterial {
111+
signing_key_pem: jwt_signing_key_pem,
112+
public_key_pem: jwt_public_key_pem,
113+
kid: jwt_key_id,
114+
} = generate_jwt_key().wrap_err("failed to generate JWT signing key")?;
115+
102116
Ok(PkiBundle {
103117
ca_cert_pem: ca_cert.pem(),
104118
ca_key_pem: ca_key.serialize_pem(),
105119
server_cert_pem: server_cert.pem(),
106120
server_key_pem: server_key.serialize_pem(),
107121
client_cert_pem: client_cert.pem(),
108122
client_key_pem: client_key.serialize_pem(),
123+
jwt_signing_key_pem,
124+
jwt_public_key_pem,
125+
jwt_key_id,
109126
})
110127
}
111128

@@ -148,6 +165,9 @@ mod tests {
148165
assert!(bundle.server_key_pem.contains("BEGIN PRIVATE KEY"));
149166
assert!(bundle.client_cert_pem.contains("BEGIN CERTIFICATE"));
150167
assert!(bundle.client_key_pem.contains("BEGIN PRIVATE KEY"));
168+
assert!(bundle.jwt_signing_key_pem.contains("BEGIN PRIVATE KEY"));
169+
assert!(bundle.jwt_public_key_pem.contains("BEGIN PUBLIC KEY"));
170+
assert_eq!(bundle.jwt_key_id.len(), 32, "kid is 16 bytes hex-encoded");
151171
}
152172

153173
#[test]

0 commit comments

Comments
 (0)