You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .agents/skills/debug-navigator-cluster/SKILL.md
+33-33Lines changed: 33 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,24 +1,24 @@
1
1
---
2
2
name: debug-navigator-cluster
3
-
description: Debug why a navigator cluster failed to start or is unhealthy. Use when the user has a failed `nav cluster admin deploy`, cluster health check failure, or wants to diagnose cluster infrastructure issues. Trigger keywords - debug cluster, cluster failing, cluster not starting, deploy failed, cluster troubleshoot, cluster health, cluster diagnose, why won't my cluster start, health check failed.
3
+
description: Debug why a nemoclaw cluster failed to start or is unhealthy. Use when the user has a failed `ncl cluster admin deploy`, cluster health check failure, or wants to diagnose cluster infrastructure issues. Trigger keywords - debug cluster, cluster failing, cluster not starting, deploy failed, cluster troubleshoot, cluster health, cluster diagnose, why won't my cluster start, health check failed.
4
4
---
5
5
6
-
# Debug Navigator Cluster
6
+
# Debug NemoClaw Cluster
7
7
8
-
Diagnose why a navigator cluster failed to start after `nav cluster admin deploy`.
8
+
Diagnose why a nemoclaw cluster failed to start after `ncl cluster admin deploy`.
9
9
10
10
## Overview
11
11
12
-
`nav cluster admin deploy` creates a Docker container running k3s with the Navigator server and Envoy Gateway deployed via Helm. The deployment stages, in order, are:
12
+
`ncl cluster admin deploy` creates a Docker container running k3s with the NemoClaw server and Envoy Gateway deployed via Helm. The deployment stages, in order, are:
13
13
14
-
1.**Pre-deploy check**: `nav cluster admin deploy` in interactive mode prompts to **reuse** (keep volume, clean stale nodes) or **recreate** (destroy everything, fresh start). `mise run cluster` always recreates before deploy.
14
+
1.**Pre-deploy check**: `ncl cluster admin deploy` in interactive mode prompts to **reuse** (keep volume, clean stale nodes) or **recreate** (destroy everything, fresh start). `mise run cluster` always recreates before deploy.
15
15
2. Ensure cluster image is available (local build or remote pull)
16
16
3. Create Docker network (`navigator-cluster`) and volume (`navigator-cluster-{name}`)
17
17
4. Create and start a privileged Docker container (`navigator-cluster-{name}`)
18
18
5. Wait for k3s to generate kubeconfig (up to 60s)
19
19
6.**Clean stale nodes**: Remove any `NotReady` k3s nodes left over from previous container instances that reused the same persistent volume
20
-
7.**Prepare local images** (if `NAVIGATOR_PUSH_IMAGES` is set): In `internal` registry mode, bootstrap waits for the in-cluster registry and pushes tagged images there. In `external` mode, bootstrap uses legacy `ctr -n k8s.io images import` push-mode behavior.
21
-
7.**Reconcile TLS PKI**: Load existing TLS secrets from the cluster; if missing, incomplete, or malformed, generate fresh PKI (CA + server + client certs). Apply secrets to cluster. If rotation happened and the navigator workload is already running, rollout restart and wait for completion (failed rollout aborts deploy).
20
+
7.**Prepare local images** (if `NEMOCLAW_PUSH_IMAGES` is set): In `internal` registry mode, bootstrap waits for the in-cluster registry and pushes tagged images there. In `external` mode, bootstrap uses legacy `ctr -n k8s.io images import` push-mode behavior.
21
+
7.**Reconcile TLS PKI**: Load existing TLS secrets from the cluster; if missing, incomplete, or malformed, generate fresh PKI (CA + server + client certs). Apply secrets to cluster. If rotation happened and the NemoClaw workload is already running, rollout restart and wait for completion (failed rollout aborts deploy).
@@ -188,7 +188,7 @@ Component images (server, sandbox, pki-job) can reach kubelet via two paths:
188
188
**Local/external pull mode** (default local via `mise run cluster` / `mise run cluster:build`): Local images are tagged to the configured local registry base (default `127.0.0.1:5000/navigator/*`), pushed to that registry, and pulled by k3s via `registries.yaml` mirror endpoint (typically `host.docker.internal:5000`). `cluster:build` builds then pushes images; `cluster` pushes prebuilt local tags (`navigator/*:dev`, falling back to `localhost:5000/navigator/*:dev` or `127.0.0.1:5000/navigator/*:dev`).
189
189
190
190
```bash
191
-
# Verify image refs currently used by navigator deployment
191
+
# Verify image refs currently used by nemoclaw deployment
192
192
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator get deploy navigator -o jsonpath="{.spec.template.spec.containers[*].image}"'
@@ -208,7 +208,7 @@ If images are missing, re-import with:
208
208
docker save <image-ref>| docker exec -i navigator-cluster-<name> ctr -a /run/k3s/containerd/containerd.sock images import -
209
209
```
210
210
211
-
**External pull mode** (remote deploy, or local with `NAVIGATOR_REGISTRY_HOST`/`IMAGE_REPO_BASE` pointing at a non-local registry): Images are pulled from an external registry at runtime. The entrypoint generates `/etc/rancher/k3s/registries.yaml`.
211
+
**External pull mode** (remote deploy, or local with `NEMOCLAW_REGISTRY_HOST`/`IMAGE_REPO_BASE` pointing at a non-local registry): Images are pulled from an external registry at runtime. The entrypoint generates `/etc/rancher/k3s/registries.yaml`.
212
212
213
213
```bash
214
214
# Verify registries.yaml exists and has credentials
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml crictl pull d1i0nduu2f6qxk.cloudfront.net/navigator/pki-job:latest'
219
219
```
220
220
221
-
If `registries.yaml` is missing or has wrong values, verify env wiring (`NAVIGATOR_REGISTRY_HOST`, `NAVIGATOR_REGISTRY_INSECURE`, username/password for authenticated registries).
221
+
If `registries.yaml` is missing or has wrong values, verify env wiring (`NEMOCLAW_REGISTRY_HOST`, `NEMOCLAW_REGISTRY_INSECURE`, username/password for authenticated registries).
222
222
223
223
### Step 7: Check mTLS / PKI
224
224
@@ -232,15 +232,15 @@ docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yam
232
232
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator get secret navigator-server-tls -o jsonpath="{.data.tls\.crt}" | base64 -d | openssl x509 -noout -dates 2>/dev/null || echo "openssl not available"'
233
233
234
234
# Check if CLI-side mTLS files exist locally
235
-
ls -la ~/.config/navigator/clusters/<name>/mtls/
235
+
ls -la ~/.config/nemoclaw/clusters/<name>/mtls/
236
236
```
237
237
238
-
On redeploy, bootstrap reuses existing secrets if they are valid PEM. If secrets are missing or malformed, fresh PKI is generated and the navigator workload is automatically restarted. If the rollout restart fails after rotation, the deploy aborts and CLI-side certs are not updated. Certificates use rcgen defaults (effectively never expire).
238
+
On redeploy, bootstrap reuses existing secrets if they are valid PEM. If secrets are missing or malformed, fresh PKI is generated and the NemoClaw workload is automatically restarted. If the rollout restart fails after rotation, the deploy aborts and CLI-side certs are not updated. Certificates use rcgen defaults (effectively never expire).
239
239
240
240
Common mTLS issues:
241
241
-**Secrets missing**: The `navigator` namespace may not have been created yet (Helm controller race). Bootstrap waits up to 2 minutes for the namespace.
242
242
-**mTLS mismatch after manual secret deletion**: Delete all three secrets and redeploy — bootstrap will regenerate and restart the workload.
243
-
-**CLI can't connect after redeploy**: Check that `~/.config/navigator/clusters/<name>/mtls/` contains `ca.crt`, `tls.crt`, `tls.key` and that they were updated at deploy time.
243
+
-**CLI can't connect after redeploy**: Check that `~/.config/nemoclaw/clusters/<name>/mtls/` contains `ca.crt`, `tls.crt`, `tls.key` and that they were updated at deploy time.
244
244
245
245
### Step 8: Check Kubernetes Events
246
246
@@ -290,10 +290,10 @@ If DNS is broken, all image pulls from the distribution registry will fail, as w
| Container exited, non-zero exit | k3s crash, port conflict, privilege issue | Check `docker logs` and `docker inspect` for details |
292
292
|`/readyz` fails | k3s still starting or crashed | Wait longer or check container logs for k3s errors |
293
-
|Navigator pods `Pending`| Insufficient CPU/memory for scheduling, or PVC not bound | Check `kubectl describe pod` for scheduling failures and `kubectl get pvc -n navigator` for volume status |
294
-
|Navigator pods `CrashLoopBackOff`| Server application error | Check `kubectl logs` on the crashing pod |
295
-
|Navigator pods `ImagePullBackOff` (push mode) | Images not imported or wrong containerd namespace | Check `k3s ctr -n k8s.io images ls` for component images (Step 6) |
296
-
|Navigator pods `ImagePullBackOff` (pull mode) | Registry auth or DNS issue | Check `/etc/rancher/k3s/registries.yaml` credentials and DNS (Step 8) |
293
+
|NemoClaw pods `Pending`| Insufficient CPU/memory for scheduling, or PVC not bound | Check `kubectl describe pod` for scheduling failures and `kubectl get pvc -n navigator` for volume status |
294
+
|NemoClaw pods `CrashLoopBackOff`| Server application error | Check `kubectl logs` on the crashing pod |
295
+
|NemoClaw pods `ImagePullBackOff` (push mode) | Images not imported or wrong containerd namespace | Check `k3s ctr -n k8s.io images ls` for component images (Step 6) |
296
+
|NemoClaw pods `ImagePullBackOff` (pull mode) | Registry auth or DNS issue | Check `/etc/rancher/k3s/registries.yaml` credentials and DNS (Step 8) |
297
297
| Image import fails (`k3s ctr` exit code != 0) | Corrupt tar stream or containerd not ready | Retry after k3s is fully started; check container logs |
298
298
| Push mode images not found by kubelet | Imported into wrong containerd namespace | Must use `k3s ctr -n k8s.io images import`, not `k3s ctr images import`|
299
299
| Gateway not `Programmed`| Envoy Gateway not ready | Check `envoy-gateway-system` pods and Helm install logs |
Copy file name to clipboardExpand all lines: .agents/skills/tui-development/SKILL.md
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
name: tui-development
3
-
description: Guide for developing the "Gator" TUI — a ratatui-based terminal UI for the Navigator platform. Covers architecture, navigation, data fetching, theming, UX conventions, and development workflow. Trigger keywords - gator, TUI, terminal UI, ratatui, navigator-tui, tui development, gator feature, gator bug.
3
+
description: Guide for developing the "Gator" TUI — a ratatui-based terminal UI for the NemoClaw platform. Covers architecture, navigation, data fetching, theming, UX conventions, and development workflow. Trigger keywords - gator, TUI, terminal UI, ratatui, navigator-tui, tui development, gator feature, gator bug.
4
4
---
5
5
6
6
# Gator TUI Development Guide
@@ -9,14 +9,14 @@ Comprehensive reference for any agent working on the Gator TUI.
9
9
10
10
## 1. Overview
11
11
12
-
Gator is a ratatui-based terminal UI for the Navigator platform. It provides a keyboard-driven interface for managing clusters, sandboxes, and logs — the same operations available via the `nav` CLI, but with a live, interactive dashboard.
12
+
Gator is a ratatui-based terminal UI for the NemoClaw platform. It provides a keyboard-driven interface for managing clusters, sandboxes, and logs — the same operations available via the `ncl` CLI, but with a live, interactive dashboard.
-**`navigator-tui` cannot depend on `navigator-cli`** — this would create a circular dependency. TLS channel building for cluster switching is done directly in `lib.rs` using `tonic::transport` primitives (`Certificate`, `Identity`, `ClientTlsConfig`, `Endpoint`).
340
-
- mTLS certs are read from `~/.config/navigator/clusters/<name>/mtls/` (ca.crt, tls.crt, tls.key).
340
+
- mTLS certs are read from `~/.config/nemoclaw/clusters/<name>/mtls/` (ca.crt, tls.crt, tls.key).
0 commit comments