Skip to content

Timber on Vercel: serverless /v1 API over Neon Postgres#3

Open
ItsmeBlackOps wants to merge 20 commits into
mainfrom
feat/timber-on-vercel
Open

Timber on Vercel: serverless /v1 API over Neon Postgres#3
ItsmeBlackOps wants to merge 20 commits into
mainfrom
feat/timber-on-vercel

Conversation

@ItsmeBlackOps

@ItsmeBlackOps ItsmeBlackOps commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Rebuilds Timber as a fully Vercel-hosted log service: the existing React Console plus the /v1/* API as Vercel serverless functions backed by Neon serverless Postgres. Replaces the Docker + self-hosted MongoDB deployment for the hosted use case (that artifact stays in the repo).

What is included

  • POST /v1/logs ingest (single or batch) into Neon, language-agnostic REST (no SDK).
  • GET /v1/logs (filter + keyset pagination), /v1/stats (date_trunc buckets, percentile_cont latency, error rate, AI cost/token sums), /v1/groupby, /v1/facets, /v1/events, /v1/jobs.
  • /v1/projects CRUD + per-project scoping; all six lenses (Errors, AI Usage, By User, By Service, Slow Operations, Cron and Jobs).
  • /healthz adapted to Postgres; daily retention via Vercel Cron (level-based TTL).
  • Console reused unchanged (same same-origin /v1/* contract).
  • Python + JS client helpers with heartbeat filter + batching (in clients/).

How it works

  • Every parse*Query ported verbatim from src/query/*; only the Mongo pipeline became SQL, preserving the response shapes in web/src/lib/types.ts.
  • Neon HTTP driver (@neondatabase/serverless), stateless per query; functions in web/api/.

Verification

  • 30 API unit tests + 488 Console tests pass; clean build.
  • Every SQL pattern validated against real Neon; full ingest/query/stats/groupby/jobs/auth flow smoke-tested on the live production deployment.

Notes

  • Deployed to production via Vercel CLI (project is not Git-connected). Env: DATABASE_URL, TIMBER_KEYS, CRON_SECRET.
  • Production build uses web/tsconfig.app.build.json (app-only) so tests are excluded from the deployed typecheck.

Spec: docs/superpowers/specs/2026-06-22-timber-on-vercel-design.md
Plan: docs/superpowers/plans/2026-06-22-timber-on-vercel.md


Update: Logflare dual-write + fallback (Neon at capacity)

Neon hit its hard 512 MB project size limit (error 53100) and was rejecting all inserts, so ingest was returning 500s and losing writes. This adds Logflare as a parallel store and a fallback path.

What changed

  • POST /v1/logs now dual-writes each batch to Neon and Logflare in parallel (Promise.allSettled). Neon stays primary.
  • Fallback: when the Neon insert fails but Logflare accepted the batch, the request still returns 201 {store:"logflare"}. Only when both fail does it error. This stops the 500s and prevents log loss while Neon is full.
  • forwardToLogflare reports success (checks res.ok, returns a boolean); buildLogflarePayload is a pure mapper (event name to message, everything else to metadata).
  • New GET /v1/logflare proxies filter queries (same param surface as /v1/logs) to a Logflare Endpoint and normalizes rows to the /v1/logs shape. Returns a clean 503 until LOGFLARE_ENDPOINT_ID is set (Logflare Endpoints are still in beta for this account).
  • New env: LOGFLARE_SOURCE_ID, LOGFLARE_API_KEY, LOGFLARE_ENDPOINT_ID (via logflareConfig()).

Verification

  • Full suite 503 pass / 0 fail / 10 skipped.
  • Live on production: dual-write returns 201 {store:"logflare"} with Neon full; GET /v1/logflare returns clean 503; GET /v1/logs still reads existing Neon data.

Follow-up (not in this PR)

  • Neon is still full; the fallback masks it. Root fix is trimming what dailyDashboard logs (skip info-level db.query polling; the notification_outbox poll alone is ~9M events/month) and/or freeing Neon space.

Plan: docs/superpowers/plans/2026-06-30-logflare-dual-write.md

Summary by CodeRabbit

  • New Features

    • Added REST endpoints for logs, stats, facets, group-by, jobs, events, projects, health, retention, and Logflare access.
    • Introduced lightweight Python and Node clients with automatic batching and shutdown flushing.
    • Added project management actions for listing, creating, updating, and deleting projects.
  • Bug Fixes

    • Improved log ingestion reliability with safer batching, validation, and graceful failure handling.
    • Added keyset pagination and scoped filtering for more consistent query results.
    • Enabled automated retention cleanup to remove expired records regularly.

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR re-hosts Timber on Vercel with Neon Postgres, adding shared API libraries (auth, keyring, env, db, cursor, respond, validate, where, scope), SQL query modules for logs/stats/groupby/facets/events/jobs, endpoint handlers, project CRUD, healthz, retention cron, Logflare dual-write/proxy, REST log clients, tests, and migration documentation.

Changes

Vercel/Neon API implementation

Layer / File(s) Summary
Shared library: auth, keyring, env, db, response, cursor
web/api/_lib/auth.js, web/api/_lib/keyring.js, web/api/_lib/env.js, web/api/_lib/db.js, web/api/_lib/respond.js, web/api/_lib/cursor.js, test/api/lib.test.js
Bearer-token keyring, request-level read/write guards, environment readers, lazy Neon client, JSON response helpers, and keyset cursor encode/decode with unit tests.
Validation, WHERE builder, app scoping
web/api/_lib/validate.js, web/api/_lib/where.js, web/api/_lib/scope.js, test/api/where.test.js
Envelope/batch validation, SQL WHERE-clause builder replacing Mongo filters, and app-scope SQL composition with tests.
Ingestion insert builder and POST/GET /v1/logs
web/api/_lib/ingest.js, web/api/v1/logs.js, test/api/ingest.test.js
Multi-row insert SQL builder with TTL-derived expiry, and the logs handler dispatching ingest/query.
Read endpoints: logs, stats, groupby, facets, events, jobs
web/api/_lib/sql/*.js, web/api/v1/{logs,stats,groupby,facets,events,jobs}.js, test/api/{groupby,stats,facets-events-jobs}.test.js
SQL builders/mappers and route handlers for GET /v1/logs, stats, groupby, facets, events, and jobs, with tests.
Project registry CRUD and scoping
web/api/_lib/projects.js, web/api/v1/projects.js, test/api/projects.test.js
Slug/validation helpers, CRUD SQL operations, project scoping, and /v1/projects router with tests.
Health check and retention cron
web/api/healthz.js, web/api/cron/retention.js, web/vercel.json
Liveness endpoint and secured daily retention cron deleting expired events, wired via Vercel routing/cron config.
Logflare dual-write and proxy endpoint
web/api/_lib/logflare.js, web/api/v1/logflare.js, test/api/logflare.test.js
Payload/forwarding helpers for parallel Logflare ingestion and a GET proxy endpoint normalizing upstream results.
Build tooling and dependency updates
web/package.json, web/tsconfig.app.build.json
Updated build script and TypeScript build config, plus Neon serverless dependency bump.

Estimated code review effort: 4 (Complex) | ~75 minutes

REST log clients and migration documentation

Layer / File(s) Summary
Timber log client libraries and README
clients/timber-client.js, clients/timber_client.py, clients/README.md
JS and Python batching/flushing log clients with heartbeat-noise filtering and backpressure, plus usage documentation.
Migration and design planning documents
docs/superpowers/plans/*.md, docs/superpowers/specs/*.md
Vercel migration plan, Logflare dual-write plan, and design specification documents.

Estimated code review effort: 2 (Simple) | ~10 minutes

Related PRs: None identified.

Suggested labels: api, database, documentation

Suggested reviewers: None identified.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.95% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: moving Timber’s /v1 API to serverless Vercel functions backed by Neon Postgres.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/timber-on-vercel

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the re-hosting of Timber on Vercel, transitioning the backend from MongoDB to Neon Postgres with serverless functions and introducing lightweight Python and Node.js logging clients. The review feedback highlights several critical issues: a wildcard rewrite rule in vercel.json that breaks API routing; potential query crashes in where.js due to unsafe numeric casting on JSONB fields; performance degradation from bypassing GIN indexes with the #>> operator; memory exhaustion risks in the retention cron due to bulk DELETE ... RETURNING 1; a type resolution error in stats.js with date_trunc; and socket leaks in the Python client from unclosed HTTP connections.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread web/vercel.json
"rewrites": [
{ "source": "/v1/:path*", "destination": "/api/v1/:path*" },
{ "source": "/healthz", "destination": "/api/healthz" },
{ "source": "/(.*)", "destination": "/index.html" }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The wildcard rewrite rule /(.*) is too broad and will match all requests that do not match the previous rules, including direct requests to /api/... (such as the retention cron at /api/cron/retention). This will rewrite those API requests to /index.html, completely breaking the retention cron and any direct API access.

As specified in the design document, the wildcard rewrite should explicitly exclude the /api path using a negative lookahead.

Suggested change
{ "source": "/(.*)", "destination": "/index.html" }
{ "source": "/((?!api/).*)", "destination": "/index.html" }

Comment thread web/api/_lib/where.js
if (pathStr.length === 0) return fail(`unknown parameter "${name}"`);
if (!NUMERIC_RE.test(v)) return fail(`${name} requires a numeric value`);
const path = pathStr.split('.');
clauses.push(`(data #>> ${push(path)})::numeric ${rangeOp} ${push(Number(v))}`);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Casting the JSONB value directly to ::numeric without validation is extremely risky. In a semi-structured log database, JSONB fields like data.status might occasionally contain non-numeric string values (e.g., "unknown", "failed", or "OK"). If even a single row contains a non-numeric value at the queried path, this cast will throw a Postgres error and crash the entire query.

To make this robust, use a safe regex-guarded comparison to ensure the value is numeric before casting.

Suggested change
clauses.push(`(data #>> ${push(path)})::numeric ${rangeOp} ${push(Number(v))}`);
clauses.push(`CASE WHEN (data #>> ${push(path)}) ~ '^-?[0-9]+(\\.[0-9]+)?$' THEN (data #>> ${push(path)})::numeric END ${rangeOp} ${push(Number(v))}`);

Comment thread web/api/_lib/where.js
Comment on lines +188 to +195
} else if (name.startsWith('ids.') && name.length > 'ids.'.length) {
const path = name.slice('ids.'.length).split('.');
clauses.push(`ids #>> ${push(path)} = ${push(v)}`);
} else if (name.startsWith('data.') && name.length > 'data.'.length) {
const rangeOp = name.endsWith('__gte') ? '>=' : name.endsWith('__lte') ? '<=' : null;
if (rangeOp === null) {
const path = name.slice('data.'.length).split('.');
clauses.push(`data #>> ${push(path)} = ${push(v)}`);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using the #>> operator for ids.* and data.* equality filters completely bypasses the GIN indexes on the ids and data columns, forcing Postgres to perform full table scans on large datasets.

To utilize the GIN indexes efficiently, use the @> JSONB containment operator by constructing a nested JSONB object dynamically in JavaScript.

Suggested change
} else if (name.startsWith('ids.') && name.length > 'ids.'.length) {
const path = name.slice('ids.'.length).split('.');
clauses.push(`ids #>> ${push(path)} = ${push(v)}`);
} else if (name.startsWith('data.') && name.length > 'data.'.length) {
const rangeOp = name.endsWith('__gte') ? '>=' : name.endsWith('__lte') ? '<=' : null;
if (rangeOp === null) {
const path = name.slice('data.'.length).split('.');
clauses.push(`data #>> ${push(path)} = ${push(v)}`);
} else if (name.startsWith('ids.') && name.length > 'ids.'.length) {
const path = name.slice('ids.'.length).split('.');
let obj = v;
for (let i = path.length - 1; i >= 0; i--) obj = { [path[i]]: obj };
clauses.push(`ids @> ${push(JSON.stringify(obj))}::jsonb`);
} else if (name.startsWith('data.') && name.length > 'data.'.length) {
const rangeOp = name.endsWith('__gte') ? '>=' : name.endsWith('__lte') ? '<=' : null;
if (rangeOp === null) {
const path = name.slice('data.'.length).split('.');
let obj = NUMERIC_RE.test(v) ? Number(v) : v === 'true' ? true : v === 'false' ? false : v === 'null' ? null : v;
for (let i = path.length - 1; i >= 0; i--) obj = { [path[i]]: obj };
clauses.push(`data @> ${push(JSON.stringify(obj))}::jsonb`);

Comment thread web/api/cron/retention.js Outdated
Comment on lines +14 to +15
const rows = await db()`DELETE FROM events WHERE expires_at < now() RETURNING 1`;
return json(res, 200, { ok: true, deleted: rows.length });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using RETURNING 1 on a bulk DELETE query is highly dangerous for a daily retention cron. If millions of expired logs are deleted at once, Postgres will construct and transmit a massive result set of millions of rows, and Node.js will attempt to parse a huge JSON array. This can easily cause the serverless function to run out of memory (OOM) or exceed Vercel's execution/payload limits.

Since the exact count of deleted rows is not critical for the cron's success, it is much safer to omit the RETURNING clause entirely.

Suggested change
const rows = await db()`DELETE FROM events WHERE expires_at < now() RETURNING 1`;
return json(res, 200, { ok: true, deleted: rows.length });
await db()`DELETE FROM events WHERE expires_at < now()`;
return json(res, 200, { ok: true });

Comment thread web/api/_lib/sql/stats.js
const lat = "coalesce(data->>'latencyMs', data->>'durationMs')";
const text = `
SELECT
date_trunc($1, received_at) AS bucket,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In Postgres, the first argument of date_trunc must be a non-parameterized string literal or explicitly cast. Passing $1 directly as a parameter will cause Postgres to fail with a type resolution error (function date_trunc(unknown, timestamp with time zone) is not unique) because the type of the parameter is ambiguous.

Explicitly cast the parameter to text to resolve this.

Suggested change
date_trunc($1, received_at) AS bucket,
date_trunc($1::text, received_at) AS bucket,

Comment thread clients/timber_client.py
},
method="POST",
)
urllib.request.urlopen(req, timeout=5).read()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling urllib.request.urlopen without closing the returned response object can leak sockets and HTTP connections. In a long-running background thread, this can eventually exhaust system resources.

Use a with statement to ensure the connection is properly closed after reading.

Suggested change
urllib.request.urlopen(req, timeout=5).read()
with urllib.request.urlopen(req, timeout=5) as response:
response.read()

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (2)
web/tsconfig.app.build.json-4-7 (1)

4-7: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Enable incremental builds here
tsBuildInfoFile alone won’t persist a useful cache. Add incremental: true here (or switch this build path to a composite tsc -b flow) so tsc -p tsconfig.app.build.json avoids rechecking everything on every run.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/tsconfig.app.build.json` around lines 4 - 7, Enable incremental
compilation in the build config so the existing tsBuildInfoFile cache is
actually used. Update tsconfig.app.build.json to set incremental to true (or
otherwise move this path to a tsc -b/composite setup) so the tsc -p
tsconfig.app.build.json build can reuse state instead of rechecking everything.
Focus on the build config keys around tsBuildInfoFile, include, and exclude.
clients/README.md-21-23 (1)

21-23: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Document the full ingest response.

/v1/logs returns more than accepted; the current API also includes rejected (and fallback paths can add store). Please update the example so client code doesn't hard-code a narrower contract.

💡 Suggested edit
- The response is `201 {"accepted":N}`.
+ The response is `201 {"accepted":N,"rejected":0}` (and fallback variants may add `store`).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@clients/README.md` around lines 21 - 23, The `/v1/logs` ingest response
description in README is too narrow and only mentions `accepted`, which can
mislead client code. Update the documented response in the README section for
the ingest API to reflect the full contract, including `rejected` and any
fallback-added `store` fields, so the example matches what the client actually
receives.
🧹 Nitpick comments (1)
web/api/cron/retention.js (1)

22-27: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Count deleted rows in SQL instead of materializing every RETURNING 1 row.

The batch is bounded, but this still transfers up to 10,000 rows per iteration just to compute rows.length. A small CTE can keep the same behavior while returning a single count row.

Suggested refactor
   for (let i = 0; i < MAX_BATCHES; i++) {
     const rows = await db()`
-      DELETE FROM events
-      WHERE id IN (SELECT id FROM events WHERE expires_at < now() LIMIT ${BATCH})
-      RETURNING 1`;
-    deleted += rows.length;
-    if (rows.length < BATCH) break;
+      WITH doomed AS (
+        SELECT id
+        FROM events
+        WHERE expires_at < now()
+        LIMIT ${BATCH}
+      ), deleted_rows AS (
+        DELETE FROM events e
+        USING doomed d
+        WHERE e.id = d.id
+        RETURNING 1
+      )
+      SELECT count(*)::int AS n FROM deleted_rows`;
+    deleted += rows[0].n;
+    if (rows[0].n < BATCH) break;
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/api/cron/retention.js` around lines 22 - 27, The retention batch delete
in the cron job is materializing every returned row just to compute a count.
Update the delete query in the retention handler to use a CTE or similar SQL
construct that returns a single deleted-row count instead of `RETURNING 1`, and
adjust the `deleted` accumulation and batch-termination check in the same flow
to use that count while preserving the existing `BATCH` behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@clients/timber_client.py`:
- Around line 74-97: The TimberClient.flush method is dropping buffered events
before the POST succeeds, and close only flushes once, so logs can be lost on
shutdown or transient HTTP failures. Update flush to only remove items from _buf
after urllib.request.urlopen succeeds, and on any exception leave the batch
queued for retry instead of discarding it; then make close in TimberClient
trigger repeated flushing until the buffer is empty or the queue can no longer
be drained.

In `@clients/timber-client.js`:
- Around line 39-64: The shutdown path in createLogger/flush only sends one
MAX_BATCH chunk and marks it gone even if delivery fails, so buffered events can
be lost and HTTP error responses are ignored. Update flush() so it checks the
fetch response and only removes items from buf after a successful 2xx send, and
make close() drain the queue by repeatedly calling flush() until buf is empty.
Keep the behavior in clients/timber-client.js and preserve the existing
non-crashing error handling.

In `@docs/superpowers/plans/2026-06-30-logflare-dual-write.md`:
- Around line 397-440: normalizeRow() is not preserving the Timber log item
shape because it drops _id and expiresAt, even though handler returns these
items as interchangeable with /v1/logs. Update normalizeRow() to carry through
_id and expiresAt from the upstream row when present, alongside the existing
metadata mappings, so the Console can consume both sources without special
cases.
- Around line 492-499: The curl example in the log sending section uses an
invalid top-level batch wrapper, so update the example under the test event
instructions to send either a single envelope or a raw JSON array as accepted by
/v1/logs. Locate the example around the send-test-event snippet and replace the
request body shape so it matches the endpoint contract used elsewhere in this
plan.
- Around line 237-273: The ingest path still waits for Logflare because
`Promise.allSettled` blocks until both promises settle, so update `ingest` in
`web/api/v1/logs.js` to start `forwardToLogflare` in the background and await
only the Neon insert from `db()(text, params)` before returning. Keep
`buildInsert`, `validateBatch`, and `requireWrite` flow intact, but make
Logflare best-effort by not awaiting its promise and handling any rejection
separately so it never delays the response.

In `@docs/superpowers/specs/2026-06-22-timber-on-vercel-design.md`:
- Around line 189-193: Update the AI Usage and user/grouping query descriptions
in the specs doc to match the shipped client contract: use
inputTokens/outputTokens instead of promptTokens/completionTokens, and reference
ids.userId or data.userId instead of user_id. Make the same field-name alignment
in the related “By User” and “By Service” metrics text so the documented SQL
matches what the client helpers emit, using the AI Usage section and its
adjacent metrics as the unique anchors.

In `@web/api/_lib/cursor.js`:
- Around line 10-16: decodeCursor currently accepts digit-only timestamps that
can still produce an invalid Date, which later breaks buildWhere’s toISOString()
path instead of being rejected as an invalid cursor. Update decodeCursor in
cursor.js to validate the parsed ms value by checking the Date is valid before
returning { receivedAt, id }, and return null for out-of-range timestamps so
buildWhere can keep handling it as invalid cursor.

In `@web/api/_lib/logflare.js`:
- Around line 31-35: The Logflare write in the fetch call should be bounded so a
slow or hung request does not keep the ingest path open. Update the Logflare
request logic in the helper that sends the POST to LOGFLARE_URL to use an abort
timeout, and ensure the caller degrades cleanly to false when the timeout is
hit. Keep the change localized to the Logflare send function and preserve the
existing success/failure behavior for the Promise.allSettled flow.

In `@web/api/_lib/respond.js`:
- Around line 21-28: `readJson()` is still vulnerable to hanging on aborted
uploads because it only resolves on `end` or `error`. Update the request-body
promise in `respond.js` to also listen for `close` on the `IncomingMessage` and
resolve immediately when the connection is disconnected, alongside the existing
`req.on('data')`, `req.on('end')`, and `req.on('error')` handling.

In `@web/api/_lib/sql/events.js`:
- Around line 6-13: The events query validation in parseEventsQuery is rejecting
scoped requests because KNOWN_PARAMS only allows app, but web/api/v1/events.js
forwards a resolved project param for scoped /v1/events calls. Update
parseEventsQuery to accept project as a valid parameter and preserve the
existing app handling so scoped requests pass validation while still returning
the resolved app value.

In `@web/api/_lib/sql/facets.js`:
- Around line 7-17: Allow the internal project query param to pass validation in
parseFacetsQuery, since facets scope resolution already consumes it before
parsing. Update the KNOWN_PARAMS whitelist in facets.js to include project so
project-scoped requests do not fail with unknown parameter, while keeping the
existing handling in parseFacetsQuery and related facets handler flow unchanged.

In `@web/api/_lib/sql/groupby.js`:
- Line 16: Treat project as an internal query param in the groupby parser by
adding it to OWN_PARAMS in parseGroupByQuery so it is not forwarded into
buildWhere. This should be done in the groupby query handling logic used by
parseGroupByQuery and any callers like web/api/v1/groupby.js, ensuring scoped
requests with project resolve correctly instead of failing validation.

In `@web/api/_lib/sql/jobs.js`:
- Around line 9-19: The jobs query validation in parseJobsQuery still rejects
the project parameter because the handler is passing the original search params
through unchanged. Update the jobs API flow so the project key is removed or
ignored before parseJobsQuery validates OWN_PARAMS, and make sure the existing
resolveScope/app filtering path remains the source of truth for project-scoped
requests.

In `@web/api/_lib/sql/stats.js`:
- Line 12: The numeric guard in NUM_RE is too restrictive and rejects valid
exponent-form numbers like 1e-7, causing the casting logic in the stats path to
drop those values. Update the regex used by the numeric check in stats.js so it
accepts exponent notation, and make sure the casts in the stats aggregation path
continue to flow through the existing guard. Also add a regression test in
stats.test.js covering an exponent-form metric value such as costUsd.

In `@web/api/_lib/validate.js`:
- Around line 106-113: The truncation check in validate() is using JSON string
length instead of byte size, so multibyte payloads can slip past the
maxDataBytes cap. Update the logic around serialized/raw.data to measure the
UTF-8 byte length of the serialized JSON before comparing against maxDataBytes,
and use that same byte count for _originalBytes in the truncated object. Keep
the existing truncation flow and symbols like serialized, TRUNCATED_HEAD_CHARS,
and value.data, but ensure the byte-based measurement is used consistently.

In `@web/api/_lib/where.js`:
- Around line 19-23: parseDateValue currently accepts digit-only values via new
Date(Number(v)) even when the result is Invalid Date, which later breaks
whereQuery/toISOString handling. Update parseDateValue to validate the
constructed Date before returning it and treat out-of-range epoch inputs as
null, so invalid from/to values are rejected as validation errors instead of
reaching the later ISO conversion path.
- Around line 191-201: The numeric range handling in where.js for the
data.*__gte/__lte branch casts `(data #>> ...)::numeric` directly, which can
fail on rows where the JSON value is non-numeric. Update the clause-building
logic in the `name.startsWith('data.')` path so `buildWhere` only applies the
numeric comparison when the extracted JSON text is safely numeric, and otherwise
routes the condition through a guarded check or equivalent validation before
casting. Keep the fix localized to the `rangeOp` / `pathStr` branch so
`validateEnvelope()` can still accept arbitrary `data` without a bad row
aborting the query.

In `@web/api/healthz.js`:
- Around line 10-12: The /healthz handler is doing an expensive full-table count
on events, so replace the db() query in the health check with a cheap
reachability probe and avoid deriving flushedTotal from count(*). Update the
healthz logic to use the existing healthz path with a lightweight query, and
source flushedTotal from an approximate or cached metric in the same health
response instead of scanning the full events table.
- Around line 17-22: The /healthz handler is always returning HTTP 200 from the
json(res, 200, ...) response even when db() fails, so update the health check
logic to derive the status code from the DB probe result. In the healthz
endpoint, ensure the db() call is wrapped/checked so that a failed probe returns
a non-200 failure status while keeping the existing response shape (ok, wal,
flusher, mongo) in the healthz response.

In `@web/api/v1/logflare.js`:
- Around line 57-68: The Logflare forwarding path in the GET handler is using
raw query params without enforcing project scope, so it can query another
project by changing app/env. Update the request flow to derive the forwarded
scope through resolveScope(...) before calling buildEndpointUrl, and use that
resolved scope instead of trusting parseParams(req.url) values. Keep the
existing read check and config validation, but ensure the forwarded app/env
always come from the scoped resolution used by GET /v1/logs.

---

Minor comments:
In `@clients/README.md`:
- Around line 21-23: The `/v1/logs` ingest response description in README is too
narrow and only mentions `accepted`, which can mislead client code. Update the
documented response in the README section for the ingest API to reflect the full
contract, including `rejected` and any fallback-added `store` fields, so the
example matches what the client actually receives.

In `@web/tsconfig.app.build.json`:
- Around line 4-7: Enable incremental compilation in the build config so the
existing tsBuildInfoFile cache is actually used. Update tsconfig.app.build.json
to set incremental to true (or otherwise move this path to a tsc -b/composite
setup) so the tsc -p tsconfig.app.build.json build can reuse state instead of
rechecking everything. Focus on the build config keys around tsBuildInfoFile,
include, and exclude.

---

Nitpick comments:
In `@web/api/cron/retention.js`:
- Around line 22-27: The retention batch delete in the cron job is materializing
every returned row just to compute a count. Update the delete query in the
retention handler to use a CTE or similar SQL construct that returns a single
deleted-row count instead of `RETURNING 1`, and adjust the `deleted`
accumulation and batch-termination check in the same flow to use that count
while preserving the existing `BATCH` behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 631d29fd-5251-4de3-91ef-cac23d70a7b0

📥 Commits

Reviewing files that changed from the base of the PR and between ea6778f and 54005b0.

⛔ Files ignored due to path filters (1)
  • web/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (45)
  • clients/README.md
  • clients/timber-client.js
  • clients/timber_client.py
  • docs/superpowers/plans/2026-06-22-timber-on-vercel.md
  • docs/superpowers/plans/2026-06-30-logflare-dual-write.md
  • docs/superpowers/specs/2026-06-22-timber-on-vercel-design.md
  • test/api/facets-events-jobs.test.js
  • test/api/groupby.test.js
  • test/api/ingest.test.js
  • test/api/lib.test.js
  • test/api/logflare.test.js
  • test/api/projects.test.js
  • test/api/stats.test.js
  • test/api/where.test.js
  • web/api/_lib/auth.js
  • web/api/_lib/cursor.js
  • web/api/_lib/db.js
  • web/api/_lib/env.js
  • web/api/_lib/ingest.js
  • web/api/_lib/keyring.js
  • web/api/_lib/logflare.js
  • web/api/_lib/projects.js
  • web/api/_lib/respond.js
  • web/api/_lib/scope.js
  • web/api/_lib/sql/events.js
  • web/api/_lib/sql/facets.js
  • web/api/_lib/sql/groupby.js
  • web/api/_lib/sql/jobs.js
  • web/api/_lib/sql/logs.js
  • web/api/_lib/sql/stats.js
  • web/api/_lib/validate.js
  • web/api/_lib/where.js
  • web/api/cron/retention.js
  • web/api/healthz.js
  • web/api/v1/events.js
  • web/api/v1/facets.js
  • web/api/v1/groupby.js
  • web/api/v1/jobs.js
  • web/api/v1/logflare.js
  • web/api/v1/logs.js
  • web/api/v1/projects.js
  • web/api/v1/stats.js
  • web/package.json
  • web/tsconfig.app.build.json
  • web/vercel.json

Comment thread clients/timber_client.py
Comment on lines +74 to +97
def flush(self):
with self._lock:
if not self._buf:
return
batch = self._buf[:_MAX_BATCH]
self._buf = self._buf[_MAX_BATCH:]
try:
req = urllib.request.Request(
self.url + "/v1/logs",
data=json.dumps(batch).encode("utf-8"),
headers={
"content-type": "application/json",
"authorization": "Bearer " + self.key,
},
method="POST",
)
urllib.request.urlopen(req, timeout=5).read()
except Exception:
pass # never raise into the caller

def close(self):
self._stop.set()
self.flush()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Don't drop the queued batch on shutdown or HTTP errors.

flush() removes up to 50 events before the request completes, and close() only calls it once. That means buffered logs are lost on shutdown, and any failed POST is discarded instead of retried.

🔧 Suggested fix
     def flush(self):
         with self._lock:
             if not self._buf:
                 return
             batch = self._buf[:_MAX_BATCH]
             self._buf = self._buf[_MAX_BATCH:]
         try:
             req = urllib.request.Request(
                 self.url + "/v1/logs",
                 data=json.dumps(batch).encode("utf-8"),
@@
             )
             urllib.request.urlopen(req, timeout=5).read()
         except Exception:
-            pass  # never raise into the caller
+            with self._lock:
+                self._buf = batch + self._buf
🧰 Tools
🪛 ast-grep (0.44.0)

[warning] 89-89: Request-controlled URL passed to urlopen; validate against an allowlist to prevent SSRF.
Context: urllib.request.urlopen(req, timeout=5)
Note: [CWE-918] Server-Side Request Forgery (SSRF).

(urlopen-unsanitized-data)


[info] 82-82: use jsonify instead of json.dumps for JSON output
Context: json.dumps(batch)
Note: [CWE-116] Improper Encoding or Escaping of Output.

(use-jsonify)

🪛 Ruff (0.15.20)

[error] 81-89: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


[error] 90-90: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


[error] 91-92: try-except-pass detected, consider logging the exception

(S110)


[warning] 91-91: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@clients/timber_client.py` around lines 74 - 97, The TimberClient.flush method
is dropping buffered events before the POST succeeds, and close only flushes
once, so logs can be lost on shutdown or transient HTTP failures. Update flush
to only remove items from _buf after urllib.request.urlopen succeeds, and on any
exception leave the batch queued for retry instead of discarding it; then make
close in TimberClient trigger repeated flushing until the buffer is empty or the
queue can no longer be drained.

Comment thread clients/timber-client.js
Comment on lines +39 to +64
async function flush() {
if (!url || !key || buf.length === 0) return;
const batch = buf.slice(0, MAX_BATCH);
buf = buf.slice(MAX_BATCH);
try {
await fetch(url + '/v1/logs', {
method: 'POST',
headers: { 'content-type': 'application/json', authorization: 'Bearer ' + key },
body: JSON.stringify(batch),
});
} catch {
// swallow: a logging failure must never crash the app
}
}

const timer = url && key ? setInterval(flush, intervalMs) : null;
if (timer && typeof timer.unref === 'function') timer.unref();

return {
log,
flush,
close: () => {
if (timer) clearInterval(timer);
return flush();
},
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Drain the queue before shutdown and only dequeue on success.

close() only triggers one flush(), so anything beyond the first 50 buffered events is lost on shutdown. Also, fetch() won't throw on 4xx/5xx, so bad responses are silently dropped today.

🔧 Suggested fix
   async function flush() {
     if (!url || !key || buf.length === 0) return;
     const batch = buf.slice(0, MAX_BATCH);
     buf = buf.slice(MAX_BATCH);
     try {
-      await fetch(url + '/v1/logs', {
+      const res = await fetch(url + '/v1/logs', {
         method: 'POST',
         headers: { 'content-type': 'application/json', authorization: 'Bearer ' + key },
         body: JSON.stringify(batch),
       });
+      if (!res.ok) throw new Error(`log upload failed: ${res.status}`);
     } catch {
-      // swallow: a logging failure must never crash the app
+      buf = batch.concat(buf);
     }
   }
🧰 Tools
🪛 ast-grep (0.44.0)

[warning] 53-53: Avoid using the initial state variable in setState
Context: setInterval(flush, intervalMs)
Note: [CWE-710] Improper Adherence to Coding Standards. Security best practice.

(setstate-same-var)


[error] 53-53: React's useState should not be directly called
Context: setInterval(flush, intervalMs)
Note: [CWE-710] Improper Adherence to Coding Standards. Security best practice.

(usestate-direct-usage)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@clients/timber-client.js` around lines 39 - 64, The shutdown path in
createLogger/flush only sends one MAX_BATCH chunk and marks it gone even if
delivery fails, so buffered events can be lost and HTTP error responses are
ignored. Update flush() so it checks the fetch response and only removes items
from buf after a successful 2xx send, and make close() drain the queue by
repeatedly calling flush() until buf is empty. Keep the behavior in
clients/timber-client.js and preserve the existing non-crashing error handling.

Comment on lines +237 to +273
The key constraint: Logflare forward must not block the response. We use `Promise.allSettled` so Neon insert and Logflare forward race in parallel - the response is sent as soon as Neon confirms, regardless of Logflare.

Actually, we want Neon to be the source of truth - respond after Neon succeeds. Logflare is best-effort. So: start both in parallel, await Neon, ignore Logflare result.

- [ ] **Step 1: Modify `web/api/v1/logs.js` ingest function**

Replace the current `ingest` function (lines 13-25):

```js
import { json, badRequest, methodNotAllowed, readJson } from '../_lib/respond.js';
import { requireWrite, requireRead } from '../_lib/auth.js';
import { validateBatch } from '../_lib/validate.js';
import { ttlDays, limits } from '../_lib/env.js';
import { buildInsert } from '../_lib/ingest.js';
import { db } from '../_lib/db.js';
import { parseLogsQuery, runLogs } from '../_lib/sql/logs.js';
import { resolveScope } from '../_lib/projects.js';
import { forwardToLogflare } from '../_lib/logflare.js';

async function ingest(req, res) {
const principal = requireWrite(req, res);
if (!principal) return;
const body = await readJson(req);
if (!body.ok) return badRequest(res, 'invalid or empty JSON body');
const v = validateBatch(body.value, limits());
if (!v.ok) {
return json(res, v.status ?? 400, v.index != null ? { error: v.error, index: v.index } : { error: v.error });
}
const { text, params } = buildInsert(v.events, principal, ttlDays(), new Date());
// Run Neon insert and Logflare forward in parallel. Only Neon result matters
// for the response; Logflare is best-effort and must never block the caller.
const [neonResult] = await Promise.allSettled([
db()(text, params),
forwardToLogflare(v.events, principal),
]);
if (neonResult.status === 'rejected') throw neonResult.reason;
return json(res, 201, { accepted: v.events.length, rejected: 0 });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Performance & Scalability | 🟠 Major | ⚡ Quick win

Don't wait on Logflare in the ingest path.

Promise.allSettled() still blocks until the Logflare request settles, so the client pays that latency despite the "fire-and-forget" intent. Kick off Logflare without awaiting it, and await only the Neon write before responding.

⚙️ Suggested fix
-  const [neonResult] = await Promise.allSettled([
-    db()(text, params),
-    forwardToLogflare(v.events, principal),
-  ]);
-  if (neonResult.status === 'rejected') throw neonResult.reason;
+  void forwardToLogflare(v.events, principal);
+  await db()(text, params);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
The key constraint: Logflare forward must not block the response. We use `Promise.allSettled` so Neon insert and Logflare forward race in parallel - the response is sent as soon as Neon confirms, regardless of Logflare.
Actually, we want Neon to be the source of truth - respond after Neon succeeds. Logflare is best-effort. So: start both in parallel, await Neon, ignore Logflare result.
- [ ] **Step 1: Modify `web/api/v1/logs.js` ingest function**
Replace the current `ingest` function (lines 13-25):
```js
import { json, badRequest, methodNotAllowed, readJson } from '../_lib/respond.js';
import { requireWrite, requireRead } from '../_lib/auth.js';
import { validateBatch } from '../_lib/validate.js';
import { ttlDays, limits } from '../_lib/env.js';
import { buildInsert } from '../_lib/ingest.js';
import { db } from '../_lib/db.js';
import { parseLogsQuery, runLogs } from '../_lib/sql/logs.js';
import { resolveScope } from '../_lib/projects.js';
import { forwardToLogflare } from '../_lib/logflare.js';
async function ingest(req, res) {
const principal = requireWrite(req, res);
if (!principal) return;
const body = await readJson(req);
if (!body.ok) return badRequest(res, 'invalid or empty JSON body');
const v = validateBatch(body.value, limits());
if (!v.ok) {
return json(res, v.status ?? 400, v.index != null ? { error: v.error, index: v.index } : { error: v.error });
}
const { text, params } = buildInsert(v.events, principal, ttlDays(), new Date());
// Run Neon insert and Logflare forward in parallel. Only Neon result matters
// for the response; Logflare is best-effort and must never block the caller.
const [neonResult] = await Promise.allSettled([
db()(text, params),
forwardToLogflare(v.events, principal),
]);
if (neonResult.status === 'rejected') throw neonResult.reason;
return json(res, 201, { accepted: v.events.length, rejected: 0 });
import { json, badRequest, methodNotAllowed, readJson } from '../_lib/respond.js';
import { requireWrite, requireRead } from '../_lib/auth.js';
import { validateBatch } from '../_lib/validate.js';
import { ttlDays, limits } from '../_lib/env.js';
import { buildInsert } from '../_lib/ingest.js';
import { db } from '../_lib/db.js';
import { parseLogsQuery, runLogs } from '../_lib/sql/logs.js';
import { resolveScope } from '../_lib/projects.js';
import { forwardToLogflare } from '../_lib/logflare.js';
async function ingest(req, res) {
const principal = requireWrite(req, res);
if (!principal) return;
const body = await readJson(req);
if (!body.ok) return badRequest(res, 'invalid or empty JSON body');
const v = validateBatch(body.value, limits());
if (!v.ok) {
return json(res, v.status ?? 400, v.index != null ? { error: v.error, index: v.index } : { error: v.error });
}
const { text, params } = buildInsert(v.events, principal, ttlDays(), new Date());
// Run Neon insert and Logflare forward in parallel. Only Neon result matters
// for the response; Logflare is best-effort and must never block the caller.
void forwardToLogflare(v.events, principal);
await db()(text, params);
return json(res, 201, { accepted: v.events.length, rejected: 0 });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/plans/2026-06-30-logflare-dual-write.md` around lines 237 -
273, The ingest path still waits for Logflare because `Promise.allSettled`
blocks until both promises settle, so update `ingest` in `web/api/v1/logs.js` to
start `forwardToLogflare` in the background and await only the Neon insert from
`db()(text, params)` before returning. Keep `buildInsert`, `validateBatch`, and
`requireWrite` flow intact, but make Logflare best-effort by not awaiting its
promise and handling any rejection separately so it never delays the response.

Comment on lines +397 to +440
function normalizeRow(row) {
const doc = {
app: row['metadata.app'] ?? row.app ?? '',
env: row['metadata.env'] ?? row.env ?? '',
event: row.event_message ?? row.event ?? '',
level: row['metadata.level'] ?? row.level ?? '',
receivedAt: row.timestamp ?? row['metadata.receivedAt'] ?? null,
};
if (row['metadata.ts']) doc.ts = row['metadata.ts'];
if (row['metadata.message']) doc.message = row['metadata.message'];
if (row['metadata.ids']) doc.ids = row['metadata.ids'];
if (row['metadata.data']) doc.data = row['metadata.data'];
return doc;
}

export default async function handler(req, res) {
if (req.method !== 'GET') return methodNotAllowed(res, 'GET');
if (!requireRead(req, res)) return;

const { endpointId, apiKey } = logflareConfig();
if (!endpointId || !apiKey) {
return json(res, 503, { error: 'logflare not configured' });
}

const sp = new url("https://github.com/ItsmeBlackOps/timber/pull/req.url,%20'http://localhost'").searchParams;
const parsed = parseParams(sp);
if (!parsed.ok) return badRequest(res, parsed.error);

const url = buildEndpointurl("https://github.com/ItsmeBlackOps/timber/pull/endpointId,%20parsed.value");
let upstream;
try {
upstream = await fetch(url, { headers: { 'X-API-KEY': apiKey } });
} catch (err) {
return json(res, 502, { error: 'logflare unreachable' });
}

if (!upstream.ok) {
const text = await upstream.text().catch(() => '');
return json(res, 502, { error: 'logflare error', detail: text });
}

const body = await upstream.json();
const items = (body.result ?? []).map(normalizeRow);
return json(res, 200, { items, nextCursor: null, source: 'logflare' });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Preserve the Timber log shape here.

normalizeRow() drops _id and expiresAt, but this endpoint claims to return the same item shape as /v1/logs. If the Console consumes both sources interchangeably, this will force a special-case path or break pagination/rendering.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/plans/2026-06-30-logflare-dual-write.md` around lines 397 -
440, normalizeRow() is not preserving the Timber log item shape because it drops
_id and expiresAt, even though handler returns these items as interchangeable
with /v1/logs. Update normalizeRow() to carry through _id and expiresAt from the
upstream row when present, alongside the existing metadata mappings, so the
Console can consume both sources without special cases.

Comment on lines +492 to +499
Send a test event (replace URL and key with your values):

```bash
curl -s -X POST https://timber-console.vercel.app/v1/logs \
-H "Authorization: Bearer YOUR_WRITE_KEY" \
-H "Content-Type: application/json" \
-d '{"batch":[{"event":"logflare.test","level":"info","data":{"hello":"world"}}]}'
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Send a raw event array here, not a batch wrapper.

/v1/logs accepts either a single envelope or a JSON array. This curl example will be rejected because batch is an unknown top-level key.

🧪 Suggested fix
-  -d '{"batch":[{"event":"logflare.test","level":"info","data":{"hello":"world"}}]}'
+  -d '[{"event":"logflare.test","level":"info","data":{"hello":"world"}}]'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Send a test event (replace URL and key with your values):
```bash
curl -s -X POST https://timber-console.vercel.app/v1/logs \
-H "Authorization: Bearer YOUR_WRITE_KEY" \
-H "Content-Type: application/json" \
-d '{"batch":[{"event":"logflare.test","level":"info","data":{"hello":"world"}}]}'
```
Send a test event (replace URL and key with your values):
🧰 Tools
🪛 Betterleaks (1.6.0)

[high] 495-496: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.

(curl-auth-header)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/plans/2026-06-30-logflare-dual-write.md` around lines 492 -
499, The curl example in the log sending section uses an invalid top-level batch
wrapper, so update the example under the test event instructions to send either
a single envelope or a raw JSON array as accepted by /v1/logs. Locate the
example around the send-test-event snippet and replace the request body shape so
it matches the endpoint contract used elsewhere in this plan.

Comment thread web/api/_lib/where.js
Comment on lines +19 to +23
function parseDateValue(v) {
if (EPOCH_MS_RE.test(v)) return new Date(Number(v));
const t = Date.parse(v);
return Number.isNaN(t) ? null : new Date(t);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Treat out-of-range epoch values as invalid input.

new Date(Number(v)) can return Invalid Date for huge digit-only from/to values, but parseDateValue() currently treats that as success. The later toISOString() calls on Lines 211-212 then throw, so from=99999999999999999999 becomes a 500 instead of a validation error.

Suggested fix
 function parseDateValue(v) {
-  if (EPOCH_MS_RE.test(v)) return new Date(Number(v));
+  if (EPOCH_MS_RE.test(v)) {
+    const d = new Date(Number(v));
+    return Number.isNaN(d.getTime()) ? null : d;
+  }
   const t = Date.parse(v);
   return Number.isNaN(t) ? null : new Date(t);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function parseDateValue(v) {
if (EPOCH_MS_RE.test(v)) return new Date(Number(v));
const t = Date.parse(v);
return Number.isNaN(t) ? null : new Date(t);
}
function parseDateValue(v) {
if (EPOCH_MS_RE.test(v)) {
const d = new Date(Number(v));
return Number.isNaN(d.getTime()) ? null : d;
}
const t = Date.parse(v);
return Number.isNaN(t) ? null : new Date(t);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/api/_lib/where.js` around lines 19 - 23, parseDateValue currently accepts
digit-only values via new Date(Number(v)) even when the result is Invalid Date,
which later breaks whereQuery/toISOString handling. Update parseDateValue to
validate the constructed Date before returning it and treat out-of-range epoch
inputs as null, so invalid from/to values are rejected as validation errors
instead of reaching the later ISO conversion path.

Comment thread web/api/_lib/where.js
Comment on lines +191 to +201
} else if (name.startsWith('data.') && name.length > 'data.'.length) {
const rangeOp = name.endsWith('__gte') ? '>=' : name.endsWith('__lte') ? '<=' : null;
if (rangeOp === null) {
const path = name.slice('data.'.length).split('.');
clauses.push(`data #>> ${push(path)} = ${push(v)}`);
} else {
const pathStr = name.slice('data.'.length, -'__gte'.length);
if (pathStr.length === 0) return fail(`unknown parameter "${name}"`);
if (!NUMERIC_RE.test(v)) return fail(`${name} requires a numeric value`);
const path = pathStr.split('.');
clauses.push(`(data #>> ${push(path)})::numeric ${rangeOp} ${push(Number(v))}`);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Guard the JSON numeric cast before applying __gte/__lte.

(data #>> ... )::numeric will abort the whole query if any row stores that path as non-numeric text. Since validateEnvelope() accepts arbitrary data objects, one bad event can turn a user filter into a 500.

Suggested fix
       } else {
         const pathStr = name.slice('data.'.length, -'__gte'.length);
         if (pathStr.length === 0) return fail(`unknown parameter "${name}"`);
         if (!NUMERIC_RE.test(v)) return fail(`${name} requires a numeric value`);
         const path = pathStr.split('.');
-        clauses.push(`(data #>> ${push(path)})::numeric ${rangeOp} ${push(Number(v))}`);
+        const pathRef = push(path);
+        const valueRef = push(Number(v));
+        clauses.push(
+          `CASE
+             WHEN (data #>> ${pathRef}) ~ '^-?\\d+(\\.\\d+)?$'
+             THEN (data #>> ${pathRef})::numeric ${rangeOp} ${valueRef}
+             ELSE false
+           END`
+        );
       }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else if (name.startsWith('data.') && name.length > 'data.'.length) {
const rangeOp = name.endsWith('__gte') ? '>=' : name.endsWith('__lte') ? '<=' : null;
if (rangeOp === null) {
const path = name.slice('data.'.length).split('.');
clauses.push(`data #>> ${push(path)} = ${push(v)}`);
} else {
const pathStr = name.slice('data.'.length, -'__gte'.length);
if (pathStr.length === 0) return fail(`unknown parameter "${name}"`);
if (!NUMERIC_RE.test(v)) return fail(`${name} requires a numeric value`);
const path = pathStr.split('.');
clauses.push(`(data #>> ${push(path)})::numeric ${rangeOp} ${push(Number(v))}`);
} else if (name.startsWith('data.') && name.length > 'data.'.length) {
const rangeOp = name.endsWith('__gte') ? '>=' : name.endsWith('__lte') ? '<=' : null;
if (rangeOp === null) {
const path = name.slice('data.'.length).split('.');
clauses.push(`data #>> ${push(path)} = ${push(v)}`);
} else {
const pathStr = name.slice('data.'.length, -'__gte'.length);
if (pathStr.length === 0) return fail(`unknown parameter "${name}"`);
if (!NUMERIC_RE.test(v)) return fail(`${name} requires a numeric value`);
const path = pathStr.split('.');
const pathRef = push(path);
const valueRef = push(Number(v));
clauses.push(
`CASE
WHEN (data #>> ${pathRef}) ~ '^-?\\d+(\\.\\d+)?$'
THEN (data #>> ${pathRef})::numeric ${rangeOp} ${valueRef}
ELSE false
END`
);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/api/_lib/where.js` around lines 191 - 201, The numeric range handling in
where.js for the data.*__gte/__lte branch casts `(data #>> ...)::numeric`
directly, which can fail on rows where the JSON value is non-numeric. Update the
clause-building logic in the `name.startsWith('data.')` path so `buildWhere`
only applies the numeric comparison when the extracted JSON text is safely
numeric, and otherwise routes the condition through a guarded check or
equivalent validation before casting. Keep the fix localized to the `rangeOp` /
`pathStr` branch so `validateEnvelope()` can still accept arbitrary `data`
without a bad row aborting the query.

Comment thread web/api/healthz.js
Comment on lines +10 to +12
try {
const rows = await db()`SELECT count(*)::int AS n FROM events`;
count = rows[0].n;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Performance & Scalability | 🟠 Major | ⚡ Quick win

Avoid count(*) on the full events table in /healthz.

This makes every health probe grow with table size, so the endpoint gets slower and more expensive as retention grows. Use a cheap reachability probe here, and source flushedTotal from an approximate or cached metric instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/api/healthz.js` around lines 10 - 12, The /healthz handler is doing an
expensive full-table count on events, so replace the db() query in the health
check with a cheap reachability probe and avoid deriving flushedTotal from
count(*). Update the healthz logic to use the existing healthz path with a
lightweight query, and source flushedTotal from an approximate or cached metric
in the same health response instead of scanning the full events table.

Comment thread web/api/healthz.js
Comment on lines +17 to +22
return json(res, 200, {
ok: connected,
wal: { totalBytes: 0, backlogBytes: 0, overBudget: false },
flusher: { running: true, caughtUp: true, flushedTotal: count, lastError: null },
mongo: { connected },
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Return a failure status when the DB probe fails.

/healthz still emits HTTP 200 when db() throws, so status-code-based monitors will treat an unusable API instance as healthy.

Suggested fix
-  return json(res, 200, {
+  return json(res, connected ? 200 : 503, {
     ok: connected,
     wal: { totalBytes: 0, backlogBytes: 0, overBudget: false },
     flusher: { running: true, caughtUp: true, flushedTotal: count, lastError: null },
     mongo: { connected },
   });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return json(res, 200, {
ok: connected,
wal: { totalBytes: 0, backlogBytes: 0, overBudget: false },
flusher: { running: true, caughtUp: true, flushedTotal: count, lastError: null },
mongo: { connected },
});
return json(res, connected ? 200 : 503, {
ok: connected,
wal: { totalBytes: 0, backlogBytes: 0, overBudget: false },
flusher: { running: true, caughtUp: true, flushedTotal: count, lastError: null },
mongo: { connected },
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/api/healthz.js` around lines 17 - 22, The /healthz handler is always
returning HTTP 200 from the json(res, 200, ...) response even when db() fails,
so update the health check logic to derive the status code from the DB probe
result. In the healthz endpoint, ensure the db() call is wrapped/checked so that
a failed probe returns a non-200 failure status while keeping the existing
response shape (ok, wal, flusher, mongo) in the healthz response.

Comment thread web/api/v1/logflare.js
Comment on lines +57 to +68
if (!requireRead(req, res)) return;

const { endpointId, apiKey } = logflareConfig();
if (!endpointId || !apiKey) {
return json(res, 503, { error: 'logflare not configured' });
}

const sp = new url("https://github.com/ItsmeBlackOps/timber/pull/req.url,%20'http://localhost'").searchParams;
const parsed = parseParams(sp);
if (!parsed.ok) return badRequest(res, parsed.error);

const url = buildEndpointurl("https://github.com/ItsmeBlackOps/timber/pull/endpointId,%20parsed.value");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Apply project scope here before forwarding app/env.

This endpoint skips the resolveScope gate that GET /v1/logs uses, so any valid read key can change ?app= / ?env= and query the shared Logflare endpoint for another project. Please derive the forwarded scope from resolveScope(...) instead of trusting raw query params.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/api/v1/logflare.js` around lines 57 - 68, The Logflare forwarding path in
the GET handler is using raw query params without enforcing project scope, so it
can query another project by changing app/env. Update the request flow to derive
the forwarded scope through resolveScope(...) before calling buildEndpointUrl,
and use that resolved scope instead of trusting parseParams(req.url) values.
Keep the existing read check and config validation, but ensure the forwarded
app/env always come from the scoped resolution used by GET /v1/logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant