Skip to content

fix(pipeline): align stale day detection with local midnight#48

Merged
flamboh merged 6 commits intomainfrom
codex/fix-stale-day-boundary
Mar 27, 2026
Merged

fix(pipeline): align stale day detection with local midnight#48
flamboh merged 6 commits intomainfrom
codex/fix-stale-day-boundary

Conversation

@flamboh
Copy link
Copy Markdown
Owner

@flamboh flamboh commented Mar 24, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added automatic fallback to alternative SQLite implementation for improved compatibility
    • Introduced new repair tool for handling mirrored-root duplicate records in the NetFlow database
  • Bug Fixes

    • Enhanced duplicate file path deduplication in database operations
    • Improved time bucket calculations for NetFlow statistics
  • Documentation

    • Updated visualization platform terminology

Copy link
Copy Markdown
Owner Author

flamboh commented Mar 24, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 24, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 29656a74-ea86-4fff-a860-8f4bc3e04e04

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request introduces a node:sqlite fallback for dataset database access when better-sqlite3 fails to load, adds local day-based bucket boundary calculations for netflow stats, implements deduplication logic for mirrored-path file records across discovery and flow processing, and adds tooling to repair existing duplicate records.

Changes

Cohort / File(s) Summary
Dataset DB abstraction and node:sqlite fallback
apps/web/src/lib/server/datasets.ts, apps/web/tests/lib/server/datasets.test.ts
Introduced ReadonlyDatasetDb interface to abstract DB access; getDatasetDb() now attempts better-sqlite3 and falls back to node:sqlite on ERR_DLOPEN_FAILED. New openNodeSqliteDatabase() helper wraps prepare/get/all with typed argument casting. Updated tests to mock factory, add cleanup, and exercise the fallback path via seeded real SQLite DB.
Netflow stats bucket boundary calculation
apps/web/src/routes/api/netflow/stats/+server.ts, apps/web/tests/routes/api-netflow-stats.test.ts
Updated getBucketStartQuery() SQL to calculate bucket boundaries from local day start (using datetime(..., 'start of day', 'localtime')) rather than epoch division. Test now asserts presence of timezone-aware 'start of day', 'utc' SQL fragments in prepared statement arguments.
Deduplication infrastructure for mirrored-path records
tools/netflow-db/common.py, tools/netflow-db/flow_db.py, tools/netflow-db/discovery.py, tests/python/test_discovery.py, tests/python/test_flow_db.py
Added unique index creation on (router, timestamp) across processed\_files and netflow\_stats with integrity-error fallback. sync_processed_files_table() now performs conditional UPDATE (with metadata reset) instead of unconditional INSERT when rows exist; get_stale_days() uses localtime-based day bucketing. batch_insert_results() cleans up mirrored-path duplicates prior to INSERT. New tests verify local day boundaries, path updates without duplication, and batch insert replacement behavior.
Repair tooling for mirrored-path duplicates
tools/netflow-db/repair_mirrored_root_duplicates.py
New executable script to identify and remove mirrored-root duplicate records from processed\_files and netflow\_stats. Accepts DB path, preferred/remove prefix patterns, date range, and --dry-run mode; reports counts and deletes duplicates where preferred row exists for same (router, timestamp). Applies CREATE UNIQUE INDEX IF NOT EXISTS after cleanup.
Documentation update
term-update.md
Changed "web-based" to "web dashboard" in Visualization platform section; added blank line after top-level heading for Markdown consistency.
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title describes aligning stale day detection with local midnight, which is the primary focus of the core changes to discovery.py and flow_db.py, but incompletely represents the broader scope that includes SQLite fallback loading, bucket epoch changes, and path deduplication.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch codex/fix-stale-day-boundary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@flamboh flamboh marked this pull request as ready for review March 24, 2026 17:32
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
apps/web/src/lib/server/datasets.ts (1)

23-31: Consider using interface for PreparedStatement for consistency.

Per coding guidelines, prefer interfaces over types. ReadonlyDatasetDb correctly uses an interface, but PreparedStatement is defined as a type alias.

♻️ Suggested change
-type PreparedStatement = {
-	get(...params: unknown[]): unknown;
-	all(...params: unknown[]): unknown[];
-};
+interface PreparedStatement {
+	get(...params: unknown[]): unknown;
+	all(...params: unknown[]): unknown[];
+}

As per coding guidelines: "Use TypeScript for all code; prefer interfaces over types"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/lib/server/datasets.ts` around lines 23 - 31, The
PreparedStatement type should be converted to an interface for consistency with
ReadonlyDatasetDb; replace the type alias PreparedStatement with an interface
named PreparedStatement that declares the same methods (get(...params:
unknown[]): unknown and all(...params: unknown[]): unknown[]), then keep
ReadonlyDatasetDb.prepare(sql: string): PreparedStatement and close(): void
unchanged so the API surface remains identical.
apps/web/src/routes/api/netflow/stats/+server.ts (1)

18-21: Consider adding inline documentation for this complex SQL expression.

The bucket calculation logic is dense and non-obvious. The expression converts timestamps to local time, finds the start of the local day, calculates the offset within that day, then snaps to the nearest bucket boundary. A brief inline comment explaining the transformation steps would help future maintainers understand the DST-aware bucketing logic.

📝 Suggested documentation
+/**
+ * Generate SQL expression for epoch-based bucket calculation in local time.
+ * Server timezone is set to America/Los_Angeles to handle DST correctly.
+ *
+ * Calculation steps:
+ * 1. Convert timestamp to localtime and find start of local day
+ * 2. Calculate seconds elapsed since start of local day
+ * 3. Floor to nearest bucket boundary (offset / bucketSize * bucketSize)
+ * 4. Add offset back to start of day and convert to UTC epoch
+ */
 function getBucketStartQuery(groupBy: string): string {
 	const bucketSize = BUCKET_SIZES[groupBy] ?? BUCKET_SIZES.date;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/routes/api/netflow/stats/`+server.ts around lines 18 - 21, Add a
brief inline comment above the getBucketStartQuery function (which uses
BUCKET_SIZES) that explains, step-by-step, what the SQL expression does: convert
stored unix timestamp to local datetime, compute the start of the local day,
calculate seconds elapsed since that start, divide by the selected bucket size
and multiply back to snap to the nearest bucket boundary, and convert back to
unix seconds (this explains the DST-aware bucketing logic and purpose of the
printf/CAST/strftime pieces so future maintainers can quickly understand the
transformation).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/web/src/lib/server/datasets.ts`:
- Around line 213-229: The openNodeSqliteDatabase function currently requires
'node:sqlite' unguarded and will crash on pre-22.5 Node versions; wrap the
require('node:sqlite') call in a try/catch inside openNodeSqliteDatabase, detect
module-load errors (e.g. ERR_UNKNOWN_BUILTIN_MODULE) and either throw a clear,
actionable Error that states the minimum Node.js version required (22.5.0+) and
how to upgrade, or return a graceful fallback (e.g. a ReadonlyDatasetDb
implementation that throws meaningful errors on use), and update any
docs/comments to record the Node.js requirement; refer to
openNodeSqliteDatabase, DatabaseSync, and PreparedStatement when making the
change.
- Around line 251-261: The current error handling in the getDatasetDb function
catches failures from the better-sqlite3 loading fallback to node:sqlite, but if
openNodeSqliteDatabase also throws an error, it is not caught or logged. To fix
this, add a nested try-catch around the call to openNodeSqliteDatabase to catch
any errors it throws, log a detailed error message indicating both backends
failed to load the database for the given datasetId, and then rethrow the error
to ensure upstream awareness.

In `@apps/web/tests/lib/server/datasets.test.ts`:
- Around line 7-26: The test currently only mocks better-sqlite3
(betterSqlite3Factory / vi.mock('better-sqlite3')) so the fallback path to
node:sqlite is untested and can fail in CI; either mock 'node:sqlite' to provide
the same minimal API your production code expects (e.g., open/Database or
whatever exported functions your code uses) when betterSqlite3Factory is forced
to throw via mockImplementationOnce, or guard/skip the test when node:sqlite is
not present by detecting availability (try { require('node:sqlite') } catch { /*
skip */ }) and using Vitest's it.skipIf to skip the fallback test—update the
test that asserts "falls back to node:sqlite when better-sqlite3 fails to load"
to use one of these two approaches and reference betterSqlite3Factory and the
mocked module names ('better-sqlite3' and 'node:sqlite') so the CI reliably
exercises or skips the fallback.
- Around line 119-156: The test seeds the SQLite DB by calling
spawnSync('sqlite3', ...) which fails in CI because the sqlite3 CLI is not
present; replace that external call with an in-process DB creation using Node's
builtin sqlite API (or a lightweight Node sqlite library) to create the file at
dbPath and run the SQL statements used in the test (CREATE TABLE ...; INSERT
...), then remove the expect(seedResult.status).toBe(0) assertion and proceed
with writing registryPath and the rest of the test; keep references to dbPath,
registryPath, loadDatasetsModule, and the
betterSqlite3Factory.mockImplementationOnce stub so the test still simulates the
fallback path.

In `@tools/netflow-db/repair_mirrored_root_duplicates.py`:
- Around line 132-143: The unique index creation for
idx_processed_files_router_timestamp_unique and
idx_netflow_router_timestamp_unique is not protected and can raise
sqlite3.IntegrityError (rolling back the surrounding with conn: deletions);
modify the code to either wrap each conn.execute(...) that creates these indexes
in a try/except sqlite3.IntegrityError block (catch the exception and log/ignore
it) or move those conn.execute calls outside the transactional with conn: block
so index creation failures don’t undo the deletions—update the section that
executes the CREATE UNIQUE INDEX statements accordingly.

---

Nitpick comments:
In `@apps/web/src/lib/server/datasets.ts`:
- Around line 23-31: The PreparedStatement type should be converted to an
interface for consistency with ReadonlyDatasetDb; replace the type alias
PreparedStatement with an interface named PreparedStatement that declares the
same methods (get(...params: unknown[]): unknown and all(...params: unknown[]):
unknown[]), then keep ReadonlyDatasetDb.prepare(sql: string): PreparedStatement
and close(): void unchanged so the API surface remains identical.

In `@apps/web/src/routes/api/netflow/stats/`+server.ts:
- Around line 18-21: Add a brief inline comment above the getBucketStartQuery
function (which uses BUCKET_SIZES) that explains, step-by-step, what the SQL
expression does: convert stored unix timestamp to local datetime, compute the
start of the local day, calculate seconds elapsed since that start, divide by
the selected bucket size and multiply back to snap to the nearest bucket
boundary, and convert back to unix seconds (this explains the DST-aware
bucketing logic and purpose of the printf/CAST/strftime pieces so future
maintainers can quickly understand the transformation).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 627747fc-a455-4cd0-8765-6f62c0f4040e

📥 Commits

Reviewing files that changed from the base of the PR and between 667e683 and b7be1eb.

📒 Files selected for processing (11)
  • apps/web/src/lib/server/datasets.ts
  • apps/web/src/routes/api/netflow/stats/+server.ts
  • apps/web/tests/lib/server/datasets.test.ts
  • apps/web/tests/routes/api-netflow-stats.test.ts
  • term-update.md
  • tests/python/test_discovery.py
  • tests/python/test_flow_db.py
  • tools/netflow-db/common.py
  • tools/netflow-db/discovery.py
  • tools/netflow-db/flow_db.py
  • tools/netflow-db/repair_mirrored_root_duplicates.py

Comment on lines +213 to +229
function openNodeSqliteDatabase(dbPath: string): ReadonlyDatasetDb {
const { DatabaseSync } = require('node:sqlite') as typeof import('node:sqlite');
type SQLInputValue = import('node:sqlite').SQLInputValue;
const db = new DatabaseSync(dbPath, { open: true, readOnly: true });
return {
prepare(sql: string): PreparedStatement {
const stmt = db.prepare(sql);
return {
get: (...params: unknown[]) => stmt.get(...(params as SQLInputValue[])),
all: (...params: unknown[]) => stmt.all(...(params as SQLInputValue[]))
};
},
close() {
db.close();
}
};
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

What Node.js version introduced node:sqlite module?

💡 Result:

Node.js version 22.5.0 introduced the node:sqlite module.

Citations:


Add graceful error handling for node:sqlite module availability.

The node:sqlite module was introduced in Node.js 22.5.0 and is not available in earlier versions. The pipeline failure shows ERR_UNKNOWN_BUILTIN_MODULE, indicating the environment uses an older Node.js version. The current code path will fail when this module cannot be loaded, without clear error recovery.

Add explicit error handling for module import failures and document the minimum Node.js version requirement. Consider whether the environment guarantees Node.js 22.5.0+, or implement a fallback mechanism that handles missing module gracefully.

🧰 Tools
🪛 GitHub Actions: CI

[error] 214-214: Failed to derive default start date for dataset 'alpha'. better-sqlite3 failed to load; fallback to node:sqlite failed with 'ERR_UNKNOWN_BUILTIN_MODULE' / Error: No such built-in module: node:sqlite.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/lib/server/datasets.ts` around lines 213 - 229, The
openNodeSqliteDatabase function currently requires 'node:sqlite' unguarded and
will crash on pre-22.5 Node versions; wrap the require('node:sqlite') call in a
try/catch inside openNodeSqliteDatabase, detect module-load errors (e.g.
ERR_UNKNOWN_BUILTIN_MODULE) and either throw a clear, actionable Error that
states the minimum Node.js version required (22.5.0+) and how to upgrade, or
return a graceful fallback (e.g. a ReadonlyDatasetDb implementation that throws
meaningful errors on use), and update any docs/comments to record the Node.js
requirement; refer to openNodeSqliteDatabase, DatabaseSync, and
PreparedStatement when making the change.

Comment on lines +251 to +261
let db: ReadonlyDatasetDb;
try {
db = new Database(dbPath, { readonly: true });
} catch (error) {
if (error instanceof Error && 'code' in error && error.code === 'ERR_DLOPEN_FAILED') {
console.warn(`better-sqlite3 failed to load for '${datasetId}', falling back to node:sqlite`);
db = openNodeSqliteDatabase(dbPath);
} else {
throw error;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing error handling when node:sqlite fallback also fails.

If openNodeSqliteDatabase throws (e.g., node:sqlite unavailable), the error propagates up. While callers like getDatasetDefaultStartDate have their own catch blocks, other callers of getDatasetDb may not handle this gracefully. Consider logging a more informative error when both backends fail.

💡 Suggested improvement
 	try {
 		db = new Database(dbPath, { readonly: true });
 	} catch (error) {
 		if (error instanceof Error && 'code' in error && error.code === 'ERR_DLOPEN_FAILED') {
 			console.warn(`better-sqlite3 failed to load for '${datasetId}', falling back to node:sqlite`);
-			db = openNodeSqliteDatabase(dbPath);
+			try {
+				db = openNodeSqliteDatabase(dbPath);
+			} catch (fallbackError) {
+				console.error(`node:sqlite fallback also failed for '${datasetId}':`, fallbackError);
+				throw new Error(
+					`Failed to open database for '${datasetId}': better-sqlite3 and node:sqlite both unavailable`
+				);
+			}
 		} else {
 			throw error;
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/lib/server/datasets.ts` around lines 251 - 261, The current
error handling in the getDatasetDb function catches failures from the
better-sqlite3 loading fallback to node:sqlite, but if openNodeSqliteDatabase
also throws an error, it is not caught or logged. To fix this, add a nested
try-catch around the call to openNodeSqliteDatabase to catch any errors it
throws, log a detailed error message indicating both backends failed to load the
database for the given datasetId, and then rethrow the error to ensure upstream
awareness.

Comment on lines +7 to 26
const betterSqlite3Factory = vi.fn((dbPath: string) => ({
prepare(query: string) {
return {
get: () => {
const result = spawnSync('sqlite3', [dbPath, query], { encoding: 'utf-8' });
if (result.status !== 0) {
throw new Error(result.stderr || 'sqlite3 query failed');
}
};
}

close() {}
}
const minTimestamp = Number(result.stdout.trim());
return { minTimestamp: Number.isFinite(minTimestamp) ? minTimestamp : null };
}
};
},
close() {}
}));

vi.mock('better-sqlite3', () => ({
default: vi.fn().mockImplementation((dbPath: string) => betterSqlite3Factory(dbPath))
}));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mock structure doesn't support testing the node:sqlite fallback path.

The betterSqlite3Factory mock uses the sqlite3 CLI to execute real queries against the seeded database. However, when mockImplementationOnce throws to simulate better-sqlite3 failure, the production code falls back to node:sqlite, which:

  1. Isn't mocked in this test
  2. May not be available in the Node.js version used by CI
  3. Won't automatically use the same query execution pattern as the mock

To properly test the fallback, you need to also mock node:sqlite or use conditional test skipping based on Node.js version availability.

💡 Suggested approach: Skip test if node:sqlite unavailable
-	it('falls back to node:sqlite when better-sqlite3 fails to load', async () => {
+	it('falls back to node:sqlite when better-sqlite3 fails to load', async () => {
+		// Skip if node:sqlite is not available (requires Node 22.5+)
+		try {
+			require('node:sqlite');
+		} catch {
+			console.log('Skipping test: node:sqlite not available');
+			return;
+		}
+
 		const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'datasets-test-'));

Or use Vitest's it.skipIf:

const nodeSqliteAvailable = (() => {
	try {
		require('node:sqlite');
		return true;
	} catch {
		return false;
	}
})();

it.skipIf(!nodeSqliteAvailable)('falls back to node:sqlite when better-sqlite3 fails to load', async () => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/tests/lib/server/datasets.test.ts` around lines 7 - 26, The test
currently only mocks better-sqlite3 (betterSqlite3Factory /
vi.mock('better-sqlite3')) so the fallback path to node:sqlite is untested and
can fail in CI; either mock 'node:sqlite' to provide the same minimal API your
production code expects (e.g., open/Database or whatever exported functions your
code uses) when betterSqlite3Factory is forced to throw via
mockImplementationOnce, or guard/skip the test when node:sqlite is not present
by detecting availability (try { require('node:sqlite') } catch { /* skip */ })
and using Vitest's it.skipIf to skip the fallback test—update the test that
asserts "falls back to node:sqlite when better-sqlite3 fails to load" to use one
of these two approaches and reference betterSqlite3Factory and the mocked module
names ('better-sqlite3' and 'node:sqlite') so the CI reliably exercises or skips
the fallback.

Comment on lines +119 to +156
it('falls back to node:sqlite when better-sqlite3 fails to load', async () => {
const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'datasets-test-'));
const dbPath = path.join(tempDir, 'netflow.sqlite');
const registryPath = path.join(tempDir, 'datasets.json');

const seedResult = spawnSync(
'sqlite3',
[
dbPath,
'CREATE TABLE netflow_stats (timestamp INTEGER NOT NULL); INSERT INTO netflow_stats (timestamp) VALUES (1740823200);'
],
{ encoding: 'utf-8' }
);
expect(seedResult.status).toBe(0);

fs.writeFileSync(
registryPath,
JSON.stringify([
{
dataset_id: 'alpha',
label: 'Alpha Label',
root_path: tempDir,
db_path: dbPath
}
])
);

betterSqlite3Factory.mockImplementationOnce(() => {
const error = new Error('Module did not self-register') as Error & { code: string };
error.code = 'ERR_DLOPEN_FAILED';
throw error;
});

vi.stubEnv('DATASETS_CONFIG_PATH', registryPath);

const datasets = await loadDatasetsModule();
expect(datasets.getDatasetDefaultStartDate('alpha')).toBe('2025-03-01');
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check Node.js version to verify node:sqlite availability (requires Node 22.5+)
node --version

# Check if node:sqlite is available
node -e "try { require('node:sqlite'); console.log('node:sqlite available'); } catch(e) { console.log('node:sqlite NOT available:', e.message); }"

Repository: flamboh/netflow-analysis

Length of output: 279


🏁 Script executed:

# Read the test file to understand the implementation
head -160 apps/web/tests/lib/server/datasets.test.ts | tail -45

Repository: flamboh/netflow-analysis

Length of output: 1385


🏁 Script executed:

# Find and read the datasets.ts file to understand the fallback logic
fd 'datasets\.ts$' --type f | grep -v test | head -1

Repository: flamboh/netflow-analysis

Length of output: 96


🏁 Script executed:

# Read the datasets.ts file to understand the fallback logic
cat -n apps/web/src/lib/datasets.ts | head -200

Repository: flamboh/netflow-analysis

Length of output: 2404


🏁 Script executed:

# Find server-side datasets file
fd 'datasets\.ts$' --type f apps/web/src/lib/server/ 2>/dev/null

Repository: flamboh/netflow-analysis

Length of output: 103


🏁 Script executed:

# Search for getFallbackStartDate function
rg 'getFallbackStartDate' --type ts

Repository: flamboh/netflow-analysis

Length of output: 282


🏁 Script executed:

# Read the server-side datasets.ts file
cat -n apps/web/src/lib/server/datasets.ts | head -250

Repository: flamboh/netflow-analysis

Length of output: 8955


🏁 Script executed:

# Continue reading to see the complete getDatasetDb implementation
cat -n apps/web/src/lib/server/datasets.ts | sed -n '250,280p'

Repository: flamboh/netflow-analysis

Length of output: 859


🏁 Script executed:

# Check the test file setup to see how mocks are configured
head -40 apps/web/tests/lib/server/datasets.test.ts

Repository: flamboh/netflow-analysis

Length of output: 1227


🏁 Script executed:

# Check if there are any environment setup or mocking issues in the test
grep -n "node:sqlite" apps/web/tests/lib/server/datasets.test.ts

Repository: flamboh/netflow-analysis

Length of output: 152


🏁 Script executed:

# Check if node:sqlite requires special handling in the test environment
grep -n "vi.mock" apps/web/tests/lib/server/datasets.test.ts

Repository: flamboh/netflow-analysis

Length of output: 105


🏁 Script executed:

# Check if the database seeding command works (recreate the test scenario)
tempDir=$(mktemp -d)
dbPath="$tempDir/test.sqlite"
sqlite3 "$dbPath" "CREATE TABLE netflow_stats (timestamp INTEGER NOT NULL); INSERT INTO netflow_stats (timestamp) VALUES (1740823200);"
echo "Seeding status: $?"

# Test if node:sqlite can read this database
node -e "
try {
  const { DatabaseSync } = require('node:sqlite');
  const db = new DatabaseSync('$dbPath', { open: true, readOnly: true });
  const row = db.prepare('SELECT MIN(timestamp) AS minTimestamp FROM netflow_stats').get();
  console.log('Query result:', row);
  db.close();
} catch (e) {
  console.log('Error:', e.message);
}
"

# Clean up
rm -rf "$tempDir"

Repository: flamboh/netflow-analysis

Length of output: 351


🏁 Script executed:

# Check if the test has any skip conditions or checks for sqlite3
grep -B5 -A5 "spawnSync" apps/web/tests/lib/server/datasets.test.ts | head -20

Repository: flamboh/netflow-analysis

Length of output: 804


🏁 Script executed:

# Check what the actual error is when running the test
grep -n "spawnSync" apps/web/tests/lib/server/datasets.test.ts

Repository: flamboh/netflow-analysis

Length of output: 267


🏁 Script executed:

# Check the exact test assertion for seedResult
sed -n '119,160p' apps/web/tests/lib/server/datasets.test.ts | grep -n "seedResult"

Repository: flamboh/netflow-analysis

Length of output: 141


Test fails because database seeding requires the sqlite3 CLI tool, which is unavailable in the test environment.

The test attempts to seed a SQLite database using spawnSync('sqlite3', ...) at line 124, but the sqlite3 command-line tool is not available in the test environment. This causes the seeding to fail with status code 127, preventing the database from being created. The test assertion at line 138 (expect(seedResult.status).toBe(0)) should fail at this point.

To fix this, either:

  • Install the sqlite3 CLI tool in the test environment, or
  • Use an alternative seeding approach that doesn't depend on external CLI tools (e.g., directly use node:sqlite or a Node.js library to create and populate the test database)

Note: node:sqlite is available (Node.js v24.3.0) and working correctly.

🧰 Tools
🪛 GitHub Actions: CI

[error] 155-155: Test failed: expected '2026-02-22' to be '2025-03-01' (Object.is equality).

🪛 GitHub Check: Web + Python Checks

[failure] 155-155: tests/lib/server/datasets.test.ts > dataset server helpers > falls back to node:sqlite when better-sqlite3 fails to load
AssertionError: expected '2026-02-22' to be '2025-03-01' // Object.is equality

Expected: "2025-03-01"
Received: "2026-02-22"

❯ tests/lib/server/datasets.test.ts:155:56

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/tests/lib/server/datasets.test.ts` around lines 119 - 156, The test
seeds the SQLite DB by calling spawnSync('sqlite3', ...) which fails in CI
because the sqlite3 CLI is not present; replace that external call with an
in-process DB creation using Node's builtin sqlite API (or a lightweight Node
sqlite library) to create the file at dbPath and run the SQL statements used in
the test (CREATE TABLE ...; INSERT ...), then remove the
expect(seedResult.status).toBe(0) assertion and proceed with writing
registryPath and the rest of the test; keep references to dbPath, registryPath,
loadDatasetsModule, and the betterSqlite3Factory.mockImplementationOnce stub so
the test still simulates the fallback path.

Comment thread tools/netflow-db/repair_mirrored_root_duplicates.py Outdated
@flamboh flamboh force-pushed the codex/fix-stale-day-boundary branch from b7be1eb to 3e811af Compare March 24, 2026 17:44
@flamboh flamboh force-pushed the codex/fix-stale-day-boundary branch from 1a783c2 to 1eed41c Compare March 24, 2026 18:24
@flamboh flamboh merged commit 5ac136a into main Mar 27, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant