fix(pipeline): align stale day detection with local midnight#48
fix(pipeline): align stale day detection with local midnight#48
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request introduces a Changes
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
apps/web/src/lib/server/datasets.ts (1)
23-31: Consider usinginterfaceforPreparedStatementfor consistency.Per coding guidelines, prefer interfaces over types.
ReadonlyDatasetDbcorrectly uses an interface, butPreparedStatementis defined as a type alias.♻️ Suggested change
-type PreparedStatement = { - get(...params: unknown[]): unknown; - all(...params: unknown[]): unknown[]; -}; +interface PreparedStatement { + get(...params: unknown[]): unknown; + all(...params: unknown[]): unknown[]; +}As per coding guidelines: "Use TypeScript for all code; prefer interfaces over types"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/web/src/lib/server/datasets.ts` around lines 23 - 31, The PreparedStatement type should be converted to an interface for consistency with ReadonlyDatasetDb; replace the type alias PreparedStatement with an interface named PreparedStatement that declares the same methods (get(...params: unknown[]): unknown and all(...params: unknown[]): unknown[]), then keep ReadonlyDatasetDb.prepare(sql: string): PreparedStatement and close(): void unchanged so the API surface remains identical.apps/web/src/routes/api/netflow/stats/+server.ts (1)
18-21: Consider adding inline documentation for this complex SQL expression.The bucket calculation logic is dense and non-obvious. The expression converts timestamps to local time, finds the start of the local day, calculates the offset within that day, then snaps to the nearest bucket boundary. A brief inline comment explaining the transformation steps would help future maintainers understand the DST-aware bucketing logic.
📝 Suggested documentation
+/** + * Generate SQL expression for epoch-based bucket calculation in local time. + * Server timezone is set to America/Los_Angeles to handle DST correctly. + * + * Calculation steps: + * 1. Convert timestamp to localtime and find start of local day + * 2. Calculate seconds elapsed since start of local day + * 3. Floor to nearest bucket boundary (offset / bucketSize * bucketSize) + * 4. Add offset back to start of day and convert to UTC epoch + */ function getBucketStartQuery(groupBy: string): string { const bucketSize = BUCKET_SIZES[groupBy] ?? BUCKET_SIZES.date;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/web/src/routes/api/netflow/stats/`+server.ts around lines 18 - 21, Add a brief inline comment above the getBucketStartQuery function (which uses BUCKET_SIZES) that explains, step-by-step, what the SQL expression does: convert stored unix timestamp to local datetime, compute the start of the local day, calculate seconds elapsed since that start, divide by the selected bucket size and multiply back to snap to the nearest bucket boundary, and convert back to unix seconds (this explains the DST-aware bucketing logic and purpose of the printf/CAST/strftime pieces so future maintainers can quickly understand the transformation).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/web/src/lib/server/datasets.ts`:
- Around line 213-229: The openNodeSqliteDatabase function currently requires
'node:sqlite' unguarded and will crash on pre-22.5 Node versions; wrap the
require('node:sqlite') call in a try/catch inside openNodeSqliteDatabase, detect
module-load errors (e.g. ERR_UNKNOWN_BUILTIN_MODULE) and either throw a clear,
actionable Error that states the minimum Node.js version required (22.5.0+) and
how to upgrade, or return a graceful fallback (e.g. a ReadonlyDatasetDb
implementation that throws meaningful errors on use), and update any
docs/comments to record the Node.js requirement; refer to
openNodeSqliteDatabase, DatabaseSync, and PreparedStatement when making the
change.
- Around line 251-261: The current error handling in the getDatasetDb function
catches failures from the better-sqlite3 loading fallback to node:sqlite, but if
openNodeSqliteDatabase also throws an error, it is not caught or logged. To fix
this, add a nested try-catch around the call to openNodeSqliteDatabase to catch
any errors it throws, log a detailed error message indicating both backends
failed to load the database for the given datasetId, and then rethrow the error
to ensure upstream awareness.
In `@apps/web/tests/lib/server/datasets.test.ts`:
- Around line 7-26: The test currently only mocks better-sqlite3
(betterSqlite3Factory / vi.mock('better-sqlite3')) so the fallback path to
node:sqlite is untested and can fail in CI; either mock 'node:sqlite' to provide
the same minimal API your production code expects (e.g., open/Database or
whatever exported functions your code uses) when betterSqlite3Factory is forced
to throw via mockImplementationOnce, or guard/skip the test when node:sqlite is
not present by detecting availability (try { require('node:sqlite') } catch { /*
skip */ }) and using Vitest's it.skipIf to skip the fallback test—update the
test that asserts "falls back to node:sqlite when better-sqlite3 fails to load"
to use one of these two approaches and reference betterSqlite3Factory and the
mocked module names ('better-sqlite3' and 'node:sqlite') so the CI reliably
exercises or skips the fallback.
- Around line 119-156: The test seeds the SQLite DB by calling
spawnSync('sqlite3', ...) which fails in CI because the sqlite3 CLI is not
present; replace that external call with an in-process DB creation using Node's
builtin sqlite API (or a lightweight Node sqlite library) to create the file at
dbPath and run the SQL statements used in the test (CREATE TABLE ...; INSERT
...), then remove the expect(seedResult.status).toBe(0) assertion and proceed
with writing registryPath and the rest of the test; keep references to dbPath,
registryPath, loadDatasetsModule, and the
betterSqlite3Factory.mockImplementationOnce stub so the test still simulates the
fallback path.
In `@tools/netflow-db/repair_mirrored_root_duplicates.py`:
- Around line 132-143: The unique index creation for
idx_processed_files_router_timestamp_unique and
idx_netflow_router_timestamp_unique is not protected and can raise
sqlite3.IntegrityError (rolling back the surrounding with conn: deletions);
modify the code to either wrap each conn.execute(...) that creates these indexes
in a try/except sqlite3.IntegrityError block (catch the exception and log/ignore
it) or move those conn.execute calls outside the transactional with conn: block
so index creation failures don’t undo the deletions—update the section that
executes the CREATE UNIQUE INDEX statements accordingly.
---
Nitpick comments:
In `@apps/web/src/lib/server/datasets.ts`:
- Around line 23-31: The PreparedStatement type should be converted to an
interface for consistency with ReadonlyDatasetDb; replace the type alias
PreparedStatement with an interface named PreparedStatement that declares the
same methods (get(...params: unknown[]): unknown and all(...params: unknown[]):
unknown[]), then keep ReadonlyDatasetDb.prepare(sql: string): PreparedStatement
and close(): void unchanged so the API surface remains identical.
In `@apps/web/src/routes/api/netflow/stats/`+server.ts:
- Around line 18-21: Add a brief inline comment above the getBucketStartQuery
function (which uses BUCKET_SIZES) that explains, step-by-step, what the SQL
expression does: convert stored unix timestamp to local datetime, compute the
start of the local day, calculate seconds elapsed since that start, divide by
the selected bucket size and multiply back to snap to the nearest bucket
boundary, and convert back to unix seconds (this explains the DST-aware
bucketing logic and purpose of the printf/CAST/strftime pieces so future
maintainers can quickly understand the transformation).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 627747fc-a455-4cd0-8765-6f62c0f4040e
📒 Files selected for processing (11)
apps/web/src/lib/server/datasets.tsapps/web/src/routes/api/netflow/stats/+server.tsapps/web/tests/lib/server/datasets.test.tsapps/web/tests/routes/api-netflow-stats.test.tsterm-update.mdtests/python/test_discovery.pytests/python/test_flow_db.pytools/netflow-db/common.pytools/netflow-db/discovery.pytools/netflow-db/flow_db.pytools/netflow-db/repair_mirrored_root_duplicates.py
| function openNodeSqliteDatabase(dbPath: string): ReadonlyDatasetDb { | ||
| const { DatabaseSync } = require('node:sqlite') as typeof import('node:sqlite'); | ||
| type SQLInputValue = import('node:sqlite').SQLInputValue; | ||
| const db = new DatabaseSync(dbPath, { open: true, readOnly: true }); | ||
| return { | ||
| prepare(sql: string): PreparedStatement { | ||
| const stmt = db.prepare(sql); | ||
| return { | ||
| get: (...params: unknown[]) => stmt.get(...(params as SQLInputValue[])), | ||
| all: (...params: unknown[]) => stmt.all(...(params as SQLInputValue[])) | ||
| }; | ||
| }, | ||
| close() { | ||
| db.close(); | ||
| } | ||
| }; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What Node.js version introduced node:sqlite module?
💡 Result:
Node.js version 22.5.0 introduced the node:sqlite module.
Citations:
- 1: lib,src,test,doc: add node:sqlite module nodejs/node#53752
- 2: https://mapsam.com/posts/node-sqlite-native
- 3: https://nodejs.org/docs/latest/api/sqlite.html
- 4: https://nodejs.org/api/sqlite.html
- 5: https://nodejs.org/download/release/v22.12.0/docs/api/sqlite.html
- 6: https://nodejs.org/download/nightly/v24.0.0-nightly202412035ef4985175/docs/api/sqlite.html
Add graceful error handling for node:sqlite module availability.
The node:sqlite module was introduced in Node.js 22.5.0 and is not available in earlier versions. The pipeline failure shows ERR_UNKNOWN_BUILTIN_MODULE, indicating the environment uses an older Node.js version. The current code path will fail when this module cannot be loaded, without clear error recovery.
Add explicit error handling for module import failures and document the minimum Node.js version requirement. Consider whether the environment guarantees Node.js 22.5.0+, or implement a fallback mechanism that handles missing module gracefully.
🧰 Tools
🪛 GitHub Actions: CI
[error] 214-214: Failed to derive default start date for dataset 'alpha'. better-sqlite3 failed to load; fallback to node:sqlite failed with 'ERR_UNKNOWN_BUILTIN_MODULE' / Error: No such built-in module: node:sqlite.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/web/src/lib/server/datasets.ts` around lines 213 - 229, The
openNodeSqliteDatabase function currently requires 'node:sqlite' unguarded and
will crash on pre-22.5 Node versions; wrap the require('node:sqlite') call in a
try/catch inside openNodeSqliteDatabase, detect module-load errors (e.g.
ERR_UNKNOWN_BUILTIN_MODULE) and either throw a clear, actionable Error that
states the minimum Node.js version required (22.5.0+) and how to upgrade, or
return a graceful fallback (e.g. a ReadonlyDatasetDb implementation that throws
meaningful errors on use), and update any docs/comments to record the Node.js
requirement; refer to openNodeSqliteDatabase, DatabaseSync, and
PreparedStatement when making the change.
| let db: ReadonlyDatasetDb; | ||
| try { | ||
| db = new Database(dbPath, { readonly: true }); | ||
| } catch (error) { | ||
| if (error instanceof Error && 'code' in error && error.code === 'ERR_DLOPEN_FAILED') { | ||
| console.warn(`better-sqlite3 failed to load for '${datasetId}', falling back to node:sqlite`); | ||
| db = openNodeSqliteDatabase(dbPath); | ||
| } else { | ||
| throw error; | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing error handling when node:sqlite fallback also fails.
If openNodeSqliteDatabase throws (e.g., node:sqlite unavailable), the error propagates up. While callers like getDatasetDefaultStartDate have their own catch blocks, other callers of getDatasetDb may not handle this gracefully. Consider logging a more informative error when both backends fail.
💡 Suggested improvement
try {
db = new Database(dbPath, { readonly: true });
} catch (error) {
if (error instanceof Error && 'code' in error && error.code === 'ERR_DLOPEN_FAILED') {
console.warn(`better-sqlite3 failed to load for '${datasetId}', falling back to node:sqlite`);
- db = openNodeSqliteDatabase(dbPath);
+ try {
+ db = openNodeSqliteDatabase(dbPath);
+ } catch (fallbackError) {
+ console.error(`node:sqlite fallback also failed for '${datasetId}':`, fallbackError);
+ throw new Error(
+ `Failed to open database for '${datasetId}': better-sqlite3 and node:sqlite both unavailable`
+ );
+ }
} else {
throw error;
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/web/src/lib/server/datasets.ts` around lines 251 - 261, The current
error handling in the getDatasetDb function catches failures from the
better-sqlite3 loading fallback to node:sqlite, but if openNodeSqliteDatabase
also throws an error, it is not caught or logged. To fix this, add a nested
try-catch around the call to openNodeSqliteDatabase to catch any errors it
throws, log a detailed error message indicating both backends failed to load the
database for the given datasetId, and then rethrow the error to ensure upstream
awareness.
| const betterSqlite3Factory = vi.fn((dbPath: string) => ({ | ||
| prepare(query: string) { | ||
| return { | ||
| get: () => { | ||
| const result = spawnSync('sqlite3', [dbPath, query], { encoding: 'utf-8' }); | ||
| if (result.status !== 0) { | ||
| throw new Error(result.stderr || 'sqlite3 query failed'); | ||
| } | ||
| }; | ||
| } | ||
|
|
||
| close() {} | ||
| } | ||
| const minTimestamp = Number(result.stdout.trim()); | ||
| return { minTimestamp: Number.isFinite(minTimestamp) ? minTimestamp : null }; | ||
| } | ||
| }; | ||
| }, | ||
| close() {} | ||
| })); | ||
|
|
||
| vi.mock('better-sqlite3', () => ({ | ||
| default: vi.fn().mockImplementation((dbPath: string) => betterSqlite3Factory(dbPath)) | ||
| })); |
There was a problem hiding this comment.
Mock structure doesn't support testing the node:sqlite fallback path.
The betterSqlite3Factory mock uses the sqlite3 CLI to execute real queries against the seeded database. However, when mockImplementationOnce throws to simulate better-sqlite3 failure, the production code falls back to node:sqlite, which:
- Isn't mocked in this test
- May not be available in the Node.js version used by CI
- Won't automatically use the same query execution pattern as the mock
To properly test the fallback, you need to also mock node:sqlite or use conditional test skipping based on Node.js version availability.
💡 Suggested approach: Skip test if node:sqlite unavailable
- it('falls back to node:sqlite when better-sqlite3 fails to load', async () => {
+ it('falls back to node:sqlite when better-sqlite3 fails to load', async () => {
+ // Skip if node:sqlite is not available (requires Node 22.5+)
+ try {
+ require('node:sqlite');
+ } catch {
+ console.log('Skipping test: node:sqlite not available');
+ return;
+ }
+
const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'datasets-test-'));Or use Vitest's it.skipIf:
const nodeSqliteAvailable = (() => {
try {
require('node:sqlite');
return true;
} catch {
return false;
}
})();
it.skipIf(!nodeSqliteAvailable)('falls back to node:sqlite when better-sqlite3 fails to load', async () => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/web/tests/lib/server/datasets.test.ts` around lines 7 - 26, The test
currently only mocks better-sqlite3 (betterSqlite3Factory /
vi.mock('better-sqlite3')) so the fallback path to node:sqlite is untested and
can fail in CI; either mock 'node:sqlite' to provide the same minimal API your
production code expects (e.g., open/Database or whatever exported functions your
code uses) when betterSqlite3Factory is forced to throw via
mockImplementationOnce, or guard/skip the test when node:sqlite is not present
by detecting availability (try { require('node:sqlite') } catch { /* skip */ })
and using Vitest's it.skipIf to skip the fallback test—update the test that
asserts "falls back to node:sqlite when better-sqlite3 fails to load" to use one
of these two approaches and reference betterSqlite3Factory and the mocked module
names ('better-sqlite3' and 'node:sqlite') so the CI reliably exercises or skips
the fallback.
| it('falls back to node:sqlite when better-sqlite3 fails to load', async () => { | ||
| const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'datasets-test-')); | ||
| const dbPath = path.join(tempDir, 'netflow.sqlite'); | ||
| const registryPath = path.join(tempDir, 'datasets.json'); | ||
|
|
||
| const seedResult = spawnSync( | ||
| 'sqlite3', | ||
| [ | ||
| dbPath, | ||
| 'CREATE TABLE netflow_stats (timestamp INTEGER NOT NULL); INSERT INTO netflow_stats (timestamp) VALUES (1740823200);' | ||
| ], | ||
| { encoding: 'utf-8' } | ||
| ); | ||
| expect(seedResult.status).toBe(0); | ||
|
|
||
| fs.writeFileSync( | ||
| registryPath, | ||
| JSON.stringify([ | ||
| { | ||
| dataset_id: 'alpha', | ||
| label: 'Alpha Label', | ||
| root_path: tempDir, | ||
| db_path: dbPath | ||
| } | ||
| ]) | ||
| ); | ||
|
|
||
| betterSqlite3Factory.mockImplementationOnce(() => { | ||
| const error = new Error('Module did not self-register') as Error & { code: string }; | ||
| error.code = 'ERR_DLOPEN_FAILED'; | ||
| throw error; | ||
| }); | ||
|
|
||
| vi.stubEnv('DATASETS_CONFIG_PATH', registryPath); | ||
|
|
||
| const datasets = await loadDatasetsModule(); | ||
| expect(datasets.getDatasetDefaultStartDate('alpha')).toBe('2025-03-01'); | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check Node.js version to verify node:sqlite availability (requires Node 22.5+)
node --version
# Check if node:sqlite is available
node -e "try { require('node:sqlite'); console.log('node:sqlite available'); } catch(e) { console.log('node:sqlite NOT available:', e.message); }"Repository: flamboh/netflow-analysis
Length of output: 279
🏁 Script executed:
# Read the test file to understand the implementation
head -160 apps/web/tests/lib/server/datasets.test.ts | tail -45Repository: flamboh/netflow-analysis
Length of output: 1385
🏁 Script executed:
# Find and read the datasets.ts file to understand the fallback logic
fd 'datasets\.ts$' --type f | grep -v test | head -1Repository: flamboh/netflow-analysis
Length of output: 96
🏁 Script executed:
# Read the datasets.ts file to understand the fallback logic
cat -n apps/web/src/lib/datasets.ts | head -200Repository: flamboh/netflow-analysis
Length of output: 2404
🏁 Script executed:
# Find server-side datasets file
fd 'datasets\.ts$' --type f apps/web/src/lib/server/ 2>/dev/nullRepository: flamboh/netflow-analysis
Length of output: 103
🏁 Script executed:
# Search for getFallbackStartDate function
rg 'getFallbackStartDate' --type tsRepository: flamboh/netflow-analysis
Length of output: 282
🏁 Script executed:
# Read the server-side datasets.ts file
cat -n apps/web/src/lib/server/datasets.ts | head -250Repository: flamboh/netflow-analysis
Length of output: 8955
🏁 Script executed:
# Continue reading to see the complete getDatasetDb implementation
cat -n apps/web/src/lib/server/datasets.ts | sed -n '250,280p'Repository: flamboh/netflow-analysis
Length of output: 859
🏁 Script executed:
# Check the test file setup to see how mocks are configured
head -40 apps/web/tests/lib/server/datasets.test.tsRepository: flamboh/netflow-analysis
Length of output: 1227
🏁 Script executed:
# Check if there are any environment setup or mocking issues in the test
grep -n "node:sqlite" apps/web/tests/lib/server/datasets.test.tsRepository: flamboh/netflow-analysis
Length of output: 152
🏁 Script executed:
# Check if node:sqlite requires special handling in the test environment
grep -n "vi.mock" apps/web/tests/lib/server/datasets.test.tsRepository: flamboh/netflow-analysis
Length of output: 105
🏁 Script executed:
# Check if the database seeding command works (recreate the test scenario)
tempDir=$(mktemp -d)
dbPath="$tempDir/test.sqlite"
sqlite3 "$dbPath" "CREATE TABLE netflow_stats (timestamp INTEGER NOT NULL); INSERT INTO netflow_stats (timestamp) VALUES (1740823200);"
echo "Seeding status: $?"
# Test if node:sqlite can read this database
node -e "
try {
const { DatabaseSync } = require('node:sqlite');
const db = new DatabaseSync('$dbPath', { open: true, readOnly: true });
const row = db.prepare('SELECT MIN(timestamp) AS minTimestamp FROM netflow_stats').get();
console.log('Query result:', row);
db.close();
} catch (e) {
console.log('Error:', e.message);
}
"
# Clean up
rm -rf "$tempDir"Repository: flamboh/netflow-analysis
Length of output: 351
🏁 Script executed:
# Check if the test has any skip conditions or checks for sqlite3
grep -B5 -A5 "spawnSync" apps/web/tests/lib/server/datasets.test.ts | head -20Repository: flamboh/netflow-analysis
Length of output: 804
🏁 Script executed:
# Check what the actual error is when running the test
grep -n "spawnSync" apps/web/tests/lib/server/datasets.test.tsRepository: flamboh/netflow-analysis
Length of output: 267
🏁 Script executed:
# Check the exact test assertion for seedResult
sed -n '119,160p' apps/web/tests/lib/server/datasets.test.ts | grep -n "seedResult"Repository: flamboh/netflow-analysis
Length of output: 141
Test fails because database seeding requires the sqlite3 CLI tool, which is unavailable in the test environment.
The test attempts to seed a SQLite database using spawnSync('sqlite3', ...) at line 124, but the sqlite3 command-line tool is not available in the test environment. This causes the seeding to fail with status code 127, preventing the database from being created. The test assertion at line 138 (expect(seedResult.status).toBe(0)) should fail at this point.
To fix this, either:
- Install the
sqlite3CLI tool in the test environment, or - Use an alternative seeding approach that doesn't depend on external CLI tools (e.g., directly use node:sqlite or a Node.js library to create and populate the test database)
Note: node:sqlite is available (Node.js v24.3.0) and working correctly.
🧰 Tools
🪛 GitHub Actions: CI
[error] 155-155: Test failed: expected '2026-02-22' to be '2025-03-01' (Object.is equality).
🪛 GitHub Check: Web + Python Checks
[failure] 155-155: tests/lib/server/datasets.test.ts > dataset server helpers > falls back to node:sqlite when better-sqlite3 fails to load
AssertionError: expected '2026-02-22' to be '2025-03-01' // Object.is equality
Expected: "2025-03-01"
Received: "2026-02-22"
❯ tests/lib/server/datasets.test.ts:155:56
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/web/tests/lib/server/datasets.test.ts` around lines 119 - 156, The test
seeds the SQLite DB by calling spawnSync('sqlite3', ...) which fails in CI
because the sqlite3 CLI is not present; replace that external call with an
in-process DB creation using Node's builtin sqlite API (or a lightweight Node
sqlite library) to create the file at dbPath and run the SQL statements used in
the test (CREATE TABLE ...; INSERT ...), then remove the
expect(seedResult.status).toBe(0) assertion and proceed with writing
registryPath and the rest of the test; keep references to dbPath, registryPath,
loadDatasetsModule, and the betterSqlite3Factory.mockImplementationOnce stub so
the test still simulates the fallback path.
b7be1eb to
3e811af
Compare
1a783c2 to
1eed41c
Compare

Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation