Summary
This issue proposes a comprehensive ETag strategy for JavaScriptSolidServer aligned with HTTP caching best practices. ETags are critical for efficient caching, bandwidth reduction, and preventing mid-air collisions during concurrent edits.
Difficulty: 35/100
Estimated Effort: 2-3 days
Dependencies: None
Current State Analysis
ETag Generation
File: src/storage/filesystem.js line 32
etag: `"${crypto.createHash('md5').update(stats.mtime.toISOString() + stats.size).digest('hex')}"`
Current approach: MD5 hash of mtime + size (metadata-based)
| Aspect |
Current |
Issue |
| Algorithm |
MD5 |
Cryptographically weak (acceptable for ETags, but not ideal) |
| Input |
mtime + size |
Not content-based, can miss changes if mtime preserved |
| Type |
Strong (no W/ prefix) |
Claims byte-identical but isn't truly content-based |
| Caching |
Synchronous crypto |
Blocks event loop on every stat() call |
Conditional Request Handling
File: src/utils/conditional.js (154 lines)
✅ Well implemented:
If-Match header for safe updates (412 on mismatch)
If-None-Match for GET/HEAD (304 Not Modified)
If-None-Match for PUT/POST (create-only with *)
- Wildcard (
*) support
- Proper normalization (strips W/ prefix, quotes)
Cache-Control Headers
Current usage:
| Location |
Value |
Purpose |
resource.js:225 |
no-store |
Mashlib HTML responses |
resource.js:303 |
no-store |
Mashlib HTML responses |
idp/index.js:202 |
public, max-age=3600 |
JWKS endpoint |
idp/index.js:209 |
public, max-age=3600 |
OpenID configuration |
idp/credentials.js:147 |
no-store |
Credentials endpoint |
Missing: No Cache-Control on regular resource responses.
Last-Modified Header
Status: ❌ Not implemented
mtime is available from stat() but not exposed as Last-Modified header.
Issues Identified
1. Strong ETag Mismatch
Severity: MEDIUM
Current ETags are formatted as strong ("abc123") but are generated from metadata, not content. Per RFC 7232:
A strong validator is representation metadata that changes value whenever a change occurs to the representation data that would be observable in the payload body of a 200 (OK) response to GET.
Problem: If a file is modified but mtime is preserved (e.g., touch -m), the ETag won't change even though content changed.
2. Missing Content-Based ETags for Dynamic Content
Severity: MEDIUM
Content negotiation transforms stored content:
- Turtle → JSON-LD conversion
- JSON-LD → Turtle conversion
- HTML data island extraction
These transformations produce different byte streams, but may use the same source file ETag.
3. No Last-Modified Header
Severity: LOW
Some clients prefer Last-Modified over ETags. Both should be provided per best practices.
4. Container Listing ETags
Severity: MEDIUM
Container listings are dynamically generated. Current implementation may use directory mtime, but this doesn't reflect:
- File additions/deletions
- Nested container changes
- ACL changes affecting visibility
5. Synchronous Hash Calculation
Severity: LOW (Performance)
Runs synchronously on every stat() call. For high-traffic servers, this could become a bottleneck.
6. Cache-Control Strategy Missing
Severity: MEDIUM
No systematic Cache-Control headers on resource responses. This means:
- Browsers may cache indefinitely (heuristic caching)
- Or revalidate on every request (no caching benefit)
- CDNs can't optimize caching
Web Best Practices
RFC 7232 - Conditional Requests
- Strong ETags: Byte-for-byte identical representations
- Weak ETags: Semantically equivalent (use
W/ prefix)
If-Match: For safe mutations (optimistic concurrency)
If-None-Match: For caching (GET) or create-only (PUT)
RFC 7234 - HTTP Caching
Cache-Control: Primary caching directive
ETag + Cache-Control: Work together for efficient revalidation
Last-Modified: Fallback for clients not supporting ETags
Industry Recommendations
| Source |
Recommendation |
| MDN |
Use both ETag and Last-Modified; combine with Cache-Control |
| Cloudflare |
Strong ETags for byte-identical; weak for semantic equivalence |
| Fastly |
Content hash for strong ETags; metadata for weak |
| Google |
Set explicit Cache-Control; don't rely on heuristics |
Proposed Strategy
1. ETag Generation Tiers
Tier 1: Strong ETag (content-based)
- Use for: Static files where content hash is feasible
- Algorithm: SHA-256, base64url encoded, 27 chars
Tier 2: Weak ETag (metadata-based)
- Use for: Large files, dynamic content, containers
- Format:
W/"hash" with mtime + size + extras
Tier 3: Version ETag (for transformed content)
- Use for: Content negotiation results
- Format:
W/"hash" derived from source ETag + transformation type
2. ETag Strategy by Resource Type
| Resource Type |
ETag Strategy |
Rationale |
| Small files (<1MB) |
Strong (content hash) |
Accurate, worth the compute |
| Large files (>1MB) |
Weak (metadata) |
Too expensive to hash |
| Containers |
Weak (mtime + child count) |
Dynamic, changes frequently |
| Conneg results |
Weak (source + transform) |
Derived content |
| Mashlib/UI |
Weak (version) |
Static but frequently updated |
3. Cache-Control Strategy
| Profile |
Cache-Control Value |
Use Case |
resource |
private, no-cache, must-revalidate |
User-generated content |
container |
private, no-cache, must-revalidate |
Container listings |
static |
public, max-age=3600, stale-while-revalidate=86400 |
Mashlib, schemas |
immutable |
public, max-age=31536000, immutable |
Versioned assets |
sensitive |
private, no-store |
Credentials, tokens |
discovery |
public, max-age=3600 |
Well-known endpoints |
4. Last-Modified Header
Add Last-Modified to all resource responses using stats.mtime.toUTCString().
5. Vary Header for Content Negotiation
When content negotiation is enabled, add:
Vary: Accept, Accept-Language
This tells caches that different Accept headers produce different responses.
Implementation Plan
Phase 1: Foundation
Phase 2: Headers
Phase 3: Content-Based ETags
Phase 4: Container ETags
Phase 5: Conneg ETags
Configuration Options
{
"etag": {
"algorithm": "sha256",
"strongThreshold": 1048576,
"cacheEtags": true,
"cacheMaxSize": 10000
},
"caching": {
"defaultProfile": "resource",
"staticMaxAge": 3600,
"immutableAssets": false
}
}
Comparison Matrix
Current vs Proposed
| Aspect |
Current |
Proposed |
| ETag algorithm |
MD5 |
SHA-256 |
| ETag basis |
Metadata only |
Content (small) / Metadata (large) |
| ETag type |
Always strong |
Strong or weak based on accuracy |
| Last-Modified |
❌ Missing |
✅ Always included |
| Cache-Control |
❌ Inconsistent |
✅ Profile-based |
| Vary header |
❌ Missing |
✅ For conneg |
| Container ETags |
Basic mtime |
Enhanced (children, membership) |
| Conneg ETags |
Source ETag |
Distinct per transformation |
Solid Ecosystem Comparison
| Server |
ETag Strategy |
| Node Solid Server |
Content hash (MD5) |
| Community Solid Server |
Content hash + representation metadata |
| ESS (Inrupt) |
Proprietary, content-based |
| JSS (current) |
Metadata-based MD5 |
| JSS (proposed) |
Tiered: content/metadata with proper typing |
Testing Plan
Unit Tests
- Strong ETag format validation
- Weak ETag format validation
- Different ETags for conneg transforms
- 304 responses for matching ETags
- 412 responses for If-Match mismatch
Integration Tests
Security Considerations
-
ETag as fingerprint: ETags can be used to track users across requests. Mitigated by using private in Cache-Control.
-
Timing attacks: Content-based ETags reveal if content changed. This is inherent to caching and generally acceptable.
-
ETag collision: SHA-256 with 162+ bits is collision-resistant. MD5 collisions are feasible but unlikely to be exploited via ETags.
References
Related Issues
Summary
This issue proposes a comprehensive ETag strategy for JavaScriptSolidServer aligned with HTTP caching best practices. ETags are critical for efficient caching, bandwidth reduction, and preventing mid-air collisions during concurrent edits.
Difficulty: 35/100
Estimated Effort: 2-3 days
Dependencies: None
Current State Analysis
ETag Generation
File:
src/storage/filesystem.jsline 32etag: `"${crypto.createHash('md5').update(stats.mtime.toISOString() + stats.size).digest('hex')}"`Current approach: MD5 hash of
mtime + size(metadata-based)Conditional Request Handling
File:
src/utils/conditional.js(154 lines)✅ Well implemented:
If-Matchheader for safe updates (412 on mismatch)If-None-Matchfor GET/HEAD (304 Not Modified)If-None-Matchfor PUT/POST (create-only with*)*) supportCache-Control Headers
Current usage:
resource.js:225no-storeresource.js:303no-storeidp/index.js:202public, max-age=3600idp/index.js:209public, max-age=3600idp/credentials.js:147no-storeMissing: No
Cache-Controlon regular resource responses.Last-Modified Header
Status: ❌ Not implemented
mtimeis available fromstat()but not exposed asLast-Modifiedheader.Issues Identified
1. Strong ETag Mismatch
Severity: MEDIUM
Current ETags are formatted as strong (
"abc123") but are generated from metadata, not content. Per RFC 7232:Problem: If a file is modified but
mtimeis preserved (e.g.,touch -m), the ETag won't change even though content changed.2. Missing Content-Based ETags for Dynamic Content
Severity: MEDIUM
Content negotiation transforms stored content:
These transformations produce different byte streams, but may use the same source file ETag.
3. No Last-Modified Header
Severity: LOW
Some clients prefer
Last-Modifiedover ETags. Both should be provided per best practices.4. Container Listing ETags
Severity: MEDIUM
Container listings are dynamically generated. Current implementation may use directory
mtime, but this doesn't reflect:5. Synchronous Hash Calculation
Severity: LOW (Performance)
Runs synchronously on every
stat()call. For high-traffic servers, this could become a bottleneck.6. Cache-Control Strategy Missing
Severity: MEDIUM
No systematic
Cache-Controlheaders on resource responses. This means:Web Best Practices
RFC 7232 - Conditional Requests
W/prefix)If-Match: For safe mutations (optimistic concurrency)If-None-Match: For caching (GET) or create-only (PUT)RFC 7234 - HTTP Caching
Cache-Control: Primary caching directiveETag+Cache-Control: Work together for efficient revalidationLast-Modified: Fallback for clients not supporting ETagsIndustry Recommendations
Proposed Strategy
1. ETag Generation Tiers
Tier 1: Strong ETag (content-based)
Tier 2: Weak ETag (metadata-based)
W/"hash"with mtime + size + extrasTier 3: Version ETag (for transformed content)
W/"hash"derived from source ETag + transformation type2. ETag Strategy by Resource Type
3. Cache-Control Strategy
resourceprivate, no-cache, must-revalidatecontainerprivate, no-cache, must-revalidatestaticpublic, max-age=3600, stale-while-revalidate=86400immutablepublic, max-age=31536000, immutablesensitiveprivate, no-storediscoverypublic, max-age=36004. Last-Modified Header
Add
Last-Modifiedto all resource responses usingstats.mtime.toUTCString().5. Vary Header for Content Negotiation
When content negotiation is enabled, add:
This tells caches that different
Acceptheaders produce different responses.Implementation Plan
Phase 1: Foundation
src/utils/etag.jswith tiered generation functionssrc/utils/caching.jswith cache profilesPhase 2: Headers
Last-Modifiedheader to all resource responsesCache-Controlprofiles by resource typeVaryheader for conneg responsesPhase 3: Content-Based ETags
Phase 4: Container ETags
Phase 5: Conneg ETags
Configuration Options
{ "etag": { "algorithm": "sha256", "strongThreshold": 1048576, "cacheEtags": true, "cacheMaxSize": 10000 }, "caching": { "defaultProfile": "resource", "staticMaxAge": 3600, "immutableAssets": false } }Comparison Matrix
Current vs Proposed
Solid Ecosystem Comparison
Testing Plan
Unit Tests
Integration Tests
Security Considerations
ETag as fingerprint: ETags can be used to track users across requests. Mitigated by using
privatein Cache-Control.Timing attacks: Content-based ETags reveal if content changed. This is inherent to caching and generally acceptable.
ETag collision: SHA-256 with 162+ bits is collision-resistant. MD5 collisions are feasible but unlikely to be exploited via ETags.
References
Related Issues