|
| 1 | +# AGENTS.md — HtmlUnit-CSSParser |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +HtmlUnit-CSSParser is a **CSS parser for Java** that reads CSS source text and produces a DOM-style object tree. It is the CSS parser powering [HtmlUnit](https://www.htmlunit.org/) since version 1.30. The project originated as a fork of [CSSParser 0.9.25](http://cssparser.sourceforge.net/), with the SAC (`org.w3c.css.sac`) dependency removed and a more flexible object model introduced. |
| 6 | + |
| 7 | +- **Group/Artifact:** `org.htmlunit:htmlunit-cssparser` |
| 8 | +- **License:** Apache License 2.0 |
| 9 | +- **Default branch:** `master` |
| 10 | +- **Java version:** JDK 17+ (version 5.x, current development); JDK 8+ for 4.x releases |
| 11 | +- **Build system:** Maven |
| 12 | + |
| 13 | +## Repository Structure |
| 14 | + |
| 15 | +``` |
| 16 | +htmlunit-cssparser/ |
| 17 | +├── pom.xml # Maven build configuration |
| 18 | +├── checkstyle.xml # Checkstyle rules (enforced on build) |
| 19 | +├── checkstyle_suppressions.xml # Checkstyle suppression rules |
| 20 | +├── README.md |
| 21 | +├── LICENSE # Apache 2.0 |
| 22 | +├── .github/ |
| 23 | +│ ├── workflows/ |
| 24 | +│ │ └── codeql.yml # CodeQL security scanning (Java) |
| 25 | +│ ├── dependabot.yml # Dependabot dependency updates |
| 26 | +│ └── FUNDING.yml # Sponsorship info |
| 27 | +├── src/ |
| 28 | +│ ├── main/ |
| 29 | +│ │ ├── java/org/htmlunit/cssparser/ |
| 30 | +│ │ │ ├── dom/ # CSS DOM implementation classes |
| 31 | +│ │ │ ├── parser/ # Core parser classes |
| 32 | +│ │ │ │ ├── condition/ # CSS selector conditions |
| 33 | +│ │ │ │ ├── selector/ # CSS selector model |
| 34 | +│ │ │ │ └── media/ # Media query support |
| 35 | +│ │ │ └── util/ # Utility classes |
| 36 | +│ │ └── javacc/ |
| 37 | +│ │ └── CSS3Parser.jj # JavaCC grammar file (generates the parser) |
| 38 | +│ └── test/ |
| 39 | +│ ├── java/ # JUnit 5 test classes |
| 40 | +│ └── resources/ # CSS test fixture files |
| 41 | +└── target/ # Build output (not committed) |
| 42 | +``` |
| 43 | + |
| 44 | +## Build and Test |
| 45 | + |
| 46 | +### Prerequisites |
| 47 | + |
| 48 | +- **Maven 3.6.3+** |
| 49 | +- **JDK 17+** (for current master / version 5.x) |
| 50 | + |
| 51 | +### Commands |
| 52 | + |
| 53 | +```bash |
| 54 | +# Compile (this also runs JavaCC to generate the parser from CSS3Parser.jj) |
| 55 | +mvn compile |
| 56 | + |
| 57 | +# Run all tests |
| 58 | +mvn test |
| 59 | + |
| 60 | +# Full build with checkstyle verification |
| 61 | +mvn -U clean test |
| 62 | + |
| 63 | +# Check for dependency/plugin updates |
| 64 | +mvn versions:display-plugin-updates |
| 65 | +mvn versions:display-dependency-updates |
| 66 | +``` |
| 67 | + |
| 68 | +### Generated Code |
| 69 | + |
| 70 | +The CSS parser is generated from a **JavaCC grammar file** at `src/main/javacc/CSS3Parser.jj`. During the `generate-sources` phase, the `ph-javacc-maven-plugin` generates Java source files into `target/generated-sources/javacc/org/htmlunit/cssparser/parser/javacc/`. A post-processing step using the `maven-replacer-plugin` cleans up the generated code (removes dead code patterns produced by JavaCC). |
| 71 | + |
| 72 | +**Do not manually edit files in `target/generated-sources/`** — they are regenerated on every build. If parser behavior needs to change, edit `src/main/javacc/CSS3Parser.jj`. |
| 73 | + |
| 74 | +## Architecture and Key Packages |
| 75 | + |
| 76 | +### `org.htmlunit.cssparser.parser` — Core Parser |
| 77 | + |
| 78 | +The main entry point for users. Key classes: |
| 79 | + |
| 80 | +| Class | Purpose | |
| 81 | +|---|---| |
| 82 | +| `CSSOMParser` | High-level parser that produces a DOM-style tree from CSS input. Main public API. | |
| 83 | +| `AbstractCSSParser` | Base class with shared parsing logic; `CSS3Parser` (generated) extends this. | |
| 84 | +| `InputSource` | Wraps a `Reader` to feed CSS text to the parser. Replaces the old SAC `InputSource`. | |
| 85 | +| `LexicalUnit` / `LexicalUnitImpl` | Represents CSS values (lengths, colors, functions, etc.) as a linked list of lexical tokens. | |
| 86 | +| `CSSErrorHandler` | Interface for custom error handling during parsing. Replaces the old SAC `ErrorHandler`. | |
| 87 | +| `CSSException` / `CSSParseException` | Exception types for parse errors. | |
| 88 | +| `DocumentHandler` / `HandlerBase` | Event-based (SAX-like) callback interface for streaming CSS parsing. | |
| 89 | +| `Locator` / `Locatable` | Source location tracking (line/column numbers). | |
| 90 | + |
| 91 | +### `org.htmlunit.cssparser.parser.selector` — Selector Model |
| 92 | + |
| 93 | +Represents CSS selectors as an object model: |
| 94 | + |
| 95 | +- `Selector`, `SimpleSelector` — base types |
| 96 | +- `ElementSelector` — type selectors (`h1`, `div`, `*`) |
| 97 | +- `DescendantSelector`, `ChildSelector` — combinators (` `, `>`) |
| 98 | +- `DirectAdjacentSelector`, `GeneralAdjacentSelector` — combinators (`+`, `~`) |
| 99 | +- `PseudoElementSelector` — pseudo-elements (`::before`, `::after`) |
| 100 | +- `RelativeSelector` — for `:has()` relative selectors |
| 101 | +- `SelectorList` / `SelectorListImpl` — ordered list of selectors |
| 102 | +- `SelectorSpecificity` — calculates selector specificity |
| 103 | +- `Combinator` — enum of CSS combinator types |
| 104 | + |
| 105 | +### `org.htmlunit.cssparser.parser.condition` — Selector Conditions |
| 106 | + |
| 107 | +Conditions attached to selectors (class, id, attribute, pseudo-class matching): |
| 108 | + |
| 109 | +- `ClassCondition` (`.foo`), `IdCondition` (`#bar`) |
| 110 | +- `AttributeCondition` (`[attr=val]`), `PrefixAttributeCondition` (`[attr^=val]`), `SuffixAttributeCondition` (`[attr$=val]`), `SubstringAttributeCondition` (`[attr*=val]`), `OneOfAttributeCondition` (`[attr~=val]`), `BeginHyphenAttributeCondition` (`[attr|=val]`) |
| 111 | +- `PseudoClassCondition` (`:hover`, `:nth-child()`, etc.) |
| 112 | +- `NotPseudoClassCondition` (`:not()`), `IsPseudoClassCondition` (`:is()`), `HasPseudoClassCondition` (`:has()`), `WherePseudoClassCondition` (`:where()`) |
| 113 | +- `LangCondition` (`:lang()`) |
| 114 | + |
| 115 | +### `org.htmlunit.cssparser.parser.media` — Media Queries |
| 116 | + |
| 117 | +- `MediaQuery` — a single media query (`screen and (min-width: 768px)`) |
| 118 | +- `MediaQueryList` — a list of media queries |
| 119 | + |
| 120 | +### `org.htmlunit.cssparser.dom` — CSS DOM Implementation |
| 121 | + |
| 122 | +Implements a CSS object model (style sheets, rules, values): |
| 123 | + |
| 124 | +- `CSSStyleSheetImpl` — represents a complete stylesheet |
| 125 | +- `CSSStyleRuleImpl` — a style rule (`selector { declarations }`) |
| 126 | +- `CSSStyleDeclarationImpl` — a set of property declarations |
| 127 | +- `CSSMediaRuleImpl`, `CSSImportRuleImpl`, `CSSPageRuleImpl`, `CSSFontFaceRuleImpl`, `CSSCharsetRuleImpl`, `CSSUnknownRuleImpl` — at-rule implementations |
| 128 | +- `CSSRuleListImpl` — ordered list of rules |
| 129 | +- `CSSValueImpl` — wraps parsed CSS values |
| 130 | +- `Property` — a single CSS property with name, value, and priority |
| 131 | +- Color classes: `RGBColorImpl`, `HSLColorImpl`, `HWBColorImpl`, `LABColorImpl`, `LCHColorImpl` (plus `AbstractColor` base) |
| 132 | +- `RectImpl`, `CounterImpl` — CSS `rect()` and `counter()` value types |
| 133 | +- `MediaListImpl`, `CSSStyleSheetListImpl` — list types |
| 134 | +- `DOMExceptionImpl` — DOM exception handling |
| 135 | + |
| 136 | +### `org.htmlunit.cssparser.util` — Utilities |
| 137 | + |
| 138 | +- `ParserUtils` — string processing helpers used by the generated parser (trimming, unescaping) |
| 139 | + |
| 140 | +## Code Style and Quality |
| 141 | + |
| 142 | +### Checkstyle |
| 143 | + |
| 144 | +Checkstyle is **strictly enforced** via `checkstyle.xml` and runs during the build. Key rules: |
| 145 | + |
| 146 | +- **Line length:** 120 characters max |
| 147 | +- **Indentation:** 4-space tabs |
| 148 | +- **Braces:** opening brace on same line (`eol`), closing brace on its own line (`alone`) |
| 149 | +- **Naming conventions:** |
| 150 | + - Member fields: `camelCase_` (trailing underscore) |
| 151 | + - Static fields: `CamelCase_` (capital start, trailing underscore) |
| 152 | + - Constants: `UPPER_SNAKE_CASE` (exception: `log`) |
| 153 | + - Methods: `camelCase` (test methods may use underscores: `test[A-Z][a-zA-Z0-9_]+`) |
| 154 | + - Catch parameters: `e`, `ex`, `ignored`, or `expected` |
| 155 | +- **Javadoc:** Required on all public/protected methods, types, and packages. Author tag format: `@author Firstname Lastname` |
| 156 | +- **Imports:** No star imports, no unused imports, no redundant imports |
| 157 | +- **License header:** Required on every source file: |
| 158 | + ``` |
| 159 | + /* |
| 160 | + * Copyright (c) 2019-2026 Ronald Brill. |
| 161 | + * |
| 162 | + * Licensed under the Apache License, Version 2.0 ... |
| 163 | + */ |
| 164 | + ``` |
| 165 | +- **No `serialVersionUID`** fields |
| 166 | +- **No `@version`** tags |
| 167 | +- **No `System.out`/`System.err`** in production code |
| 168 | +- **Final local variables** and parameters are enforced |
| 169 | +- **No trailing whitespace**, no tab characters, no double blank lines |
| 170 | +- Single empty line after package declaration, none before it |
| 171 | + |
| 172 | +Checkstyle suppressions (`checkstyle_suppressions.xml`): |
| 173 | +- Test files are exempt from `JavadocPackage`, `JavadocMethod`, and `LineLength` |
| 174 | +- Generated files in `target/generated-sources/javacc` are fully exempt |
| 175 | +- `CssCharStream.java` is fully exempt (special character stream handling) |
| 176 | + |
| 177 | +### Testing |
| 178 | + |
| 179 | +- **Framework:** JUnit Jupiter (JUnit 5), version 6.x |
| 180 | +- **Test dependency:** `commons-io` (test scope only) |
| 181 | +- **Test resources:** CSS fixture files in `src/test/resources/` |
| 182 | +- **Run tests:** `mvn test` (uses `maven-surefire-plugin`) |
| 183 | + |
| 184 | +## CI/CD |
| 185 | + |
| 186 | +- **CodeQL:** GitHub Actions workflow (`.github/workflows/codeql.yml`) runs security analysis on pushes/PRs to `master` and weekly (Mondays 23:34 UTC). Analyzes Java code only. |
| 187 | +- **Dependabot:** Configured via `.github/dependabot.yml` for automated dependency update PRs. |
| 188 | +- **Jenkins:** Primary CI runs on an external Jenkins server at `https://jenkins.wetator.org/job/HtmlUnit%20-%20CSS%20Parser/`. |
| 189 | + |
| 190 | +## Making Changes |
| 191 | + |
| 192 | +### Modifying Parser Behavior |
| 193 | + |
| 194 | +1. Edit the JavaCC grammar: `src/main/javacc/CSS3Parser.jj` |
| 195 | +2. Run `mvn compile` to regenerate and compile |
| 196 | +3. Add/update tests to cover the change |
| 197 | +4. Run `mvn test` to verify |
| 198 | + |
| 199 | +### Adding Support for New CSS Features |
| 200 | + |
| 201 | +New CSS features typically require changes in multiple layers: |
| 202 | + |
| 203 | +1. **Grammar** (`CSS3Parser.jj`) — add token definitions and production rules |
| 204 | +2. **Lexical units** (`LexicalUnit.java`, `LexicalUnitImpl.java`) — add new `LexicalUnitType` enum values if needed |
| 205 | +3. **Conditions** (`parser/condition/`) — for new pseudo-classes or attribute selectors |
| 206 | +4. **Selectors** (`parser/selector/`) — for new selector types or combinators |
| 207 | +5. **DOM** (`dom/`) — for new at-rule types or value types |
| 208 | +6. **Tests** — comprehensive tests for parsing, serialization, and error handling |
| 209 | + |
| 210 | +### Code Conventions for PRs |
| 211 | + |
| 212 | +- Run `mvn -U clean test` and ensure all tests pass |
| 213 | +- Run checkstyle: it's part of the build; fix all violations |
| 214 | +- Follow the naming conventions (especially trailing underscores on fields) |
| 215 | +- Add Javadoc to all new public/protected API |
| 216 | +- Keep the license header on all new files |
| 217 | +- Do not modify generated files in `target/` |
| 218 | + |
| 219 | +## Versioning and Releases |
| 220 | + |
| 221 | +- **Current development:** 5.0.0-SNAPSHOT (requires JDK 17+) |
| 222 | +- **Latest stable:** 4.21.0 (December 2025, JDK 8+) |
| 223 | +- **Artifacts:** Published to Maven Central via Sonatype Central Publishing |
| 224 | +- **Release process:** (from README) |
| 225 | + 1. Ensure all tests pass |
| 226 | + 2. Update version in `pom.xml` and `README.md` |
| 227 | + 3. Commit, build, and deploy: `mvn -up clean deploy` |
| 228 | + 4. Publish on Maven Central Portal |
| 229 | + 5. Create GitHub release with signed JARs |
| 230 | + 6. Bump to next SNAPSHOT version |
| 231 | + |
| 232 | +## Dependencies |
| 233 | + |
| 234 | +### Runtime |
| 235 | + |
| 236 | +**None.** The library has zero runtime dependencies — it is completely self-contained. |
| 237 | + |
| 238 | +### Test Only |
| 239 | + |
| 240 | +- `org.junit.jupiter:junit-jupiter-engine` |
| 241 | +- `org.junit.platform:junit-platform-launcher` |
| 242 | +- `commons-io:commons-io` |
| 243 | + |
| 244 | +## Key Design Decisions |
| 245 | + |
| 246 | +1. **No SAC dependency:** The `org.w3c.css.sac` API (stalled since 2008) was removed. All interfaces are built-in, giving the project full control over the object model. |
| 247 | +2. **JavaCC-based parser:** The CSS grammar is defined in `CSS3Parser.jj` and compiled by JavaCC. This provides robust, specification-aligned tokenization and parsing. |
| 248 | +3. **Event-based + DOM-based API:** The parser supports both SAX-like streaming (`DocumentHandler`) and tree-building (`CSSOMParser`) usage patterns. |
| 249 | +4. **Zero runtime dependencies:** Makes the library safe to embed anywhere without dependency conflicts. |
| 250 | + |
| 251 | +## Links |
| 252 | + |
| 253 | +- **Repository:** https://github.com/HtmlUnit/htmlunit-cssparser |
| 254 | +- **Maven Central:** https://central.sonatype.com/artifact/org.htmlunit/htmlunit-cssparser |
| 255 | +- **HtmlUnit:** https://www.htmlunit.org/ |
| 256 | +- **Developer Blog:** https://htmlunit.github.io/htmlunit-blog/ |
| 257 | +- **CI:** https://jenkins.wetator.org/job/HtmlUnit%20-%20CSS%20Parser/ |
| 258 | +- **Sponsor:** https://github.com/sponsors/rbri |
| 259 | +- **Predecessor:** http://cssparser.sourceforge.net/ |
0 commit comments