-
Notifications
You must be signed in to change notification settings - Fork 112
Fix selector splitting inside functional pseudo-classes #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -34,6 +34,27 @@ class RuleSet | |||||||
| LPAREN = '('.freeze | ||||||||
| RPAREN = ')'.freeze | ||||||||
| IMPORTANT = '!important'.freeze | ||||||||
|
|
||||||||
| # Regex that matches a comma NOT inside parentheses. | ||||||||
| COMMA_OUTSIDE_PARENS = /,(?![^()]*\))/ | ||||||||
|
|
||||||||
| # Action codes for the byte-scan loop: | ||||||||
| # 0 = boring (no-op) — fast path for ~80% of bytes | ||||||||
| # 1 = comma — potential split point | ||||||||
| # 2 = open paren — increase depth | ||||||||
| # 3 = close paren — decrease depth | ||||||||
| # 4 = dot / hash / colon — CSS grammar guarantees next byte is an | ||||||||
| # identifier start, never a delimiter; skip it | ||||||||
| SELECTOR_ACTION = Array.new(256, 0) | ||||||||
| SELECTOR_ACTION[0x2C] = 1 # , | ||||||||
| SELECTOR_ACTION[0x28] = 2 # ( | ||||||||
| SELECTOR_ACTION[0x29] = 3 # ) | ||||||||
| SELECTOR_ACTION[0x2E] = 4 # . | ||||||||
| SELECTOR_ACTION[0x23] = 4 # # | ||||||||
| SELECTOR_ACTION[0x3A] = 4 # : | ||||||||
| SELECTOR_ACTION.freeze | ||||||||
|
|
||||||||
| private_constant :COMMA_OUTSIDE_PARENS, :SELECTOR_ACTION | ||||||||
| class Declarations | ||||||||
| class Value | ||||||||
| attr_reader :value | ||||||||
|
|
@@ -684,17 +705,61 @@ def unmatched_open_parenthesis?(declarations) | |||||||
| (lparen_index = declarations.index(LPAREN)) && !declarations.index(RPAREN, lparen_index) | ||||||||
| end | ||||||||
|
|
||||||||
| #-- | ||||||||
| # TODO: way too simplistic | ||||||||
| #++ | ||||||||
| # Split selector string on commas, but not commas inside parentheses | ||||||||
| # (e.g. :is(rect, circle) or :where(.a, .b) should not be split). | ||||||||
| def parse_selectors!(selectors) # :nodoc: | ||||||||
| @selectors = selectors.split(',').map do |s| | ||||||||
| @selectors = split_selectors(selectors).map do |s| | ||||||||
| s.gsub!(/\s+/, ' ') | ||||||||
| s.strip! | ||||||||
| s | ||||||||
| end | ||||||||
| end | ||||||||
|
|
||||||||
| def split_selectors(selectors) | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should add this to stay fast:
Suggested change
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... actually how about that performed the same (within 1%) and looks simpler to me
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good call - I've spent some more time looking at performance and persuaded Claude to try a CSS selector aware scan, which seems to be only marginally less performant than .split(',') on newer rubies, especially with YJIT enabled. |
||||||||
| len = selectors.bytesize | ||||||||
| return [selectors] if len == 0 | ||||||||
|
Comment on lines
+719
to
+720
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would remove that, not really a common case |
||||||||
|
|
||||||||
| if !selectors.include?("(") | ||||||||
| # Fast path: no parens — every comma is a split point. | ||||||||
| # Delegates to C-level String#split. | ||||||||
| return selectors.split(",") | ||||||||
| end | ||||||||
|
|
||||||||
| if len < 80 | ||||||||
| # Medium path: short string with parens — action-table byte scan. | ||||||||
| # Exploits CSS grammar: after . # : the next byte is always an | ||||||||
| # identifier start, so we skip it unconditionally. | ||||||||
| scan_selectors(selectors, len) | ||||||||
| else | ||||||||
| # Long path: C regex engine outpaces Ruby-level byte iteration. | ||||||||
| selectors.split(COMMA_OUTSIDE_PARENS) | ||||||||
| end | ||||||||
| end | ||||||||
|
|
||||||||
| def scan_selectors(str, len) | ||||||||
| results = [] | ||||||||
| segment_start = 0 | ||||||||
| depth = 0 | ||||||||
| pos = 0 | ||||||||
|
|
||||||||
| while pos < len | ||||||||
| case SELECTOR_ACTION[str.getbyte(pos)] | ||||||||
| when 0 # ~80% of bytes hit this: one lookup, one comparison, done | ||||||||
| when 1 # comma | ||||||||
|
Comment on lines
+747
to
+748
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. faster and easier to read to just do |
||||||||
| if depth == 0 | ||||||||
| results << str.byteslice(segment_start, pos - segment_start) | ||||||||
| segment_start = pos + 1 | ||||||||
| end | ||||||||
| when 2 then depth += 1 # ( | ||||||||
| when 3 then depth -= 1 # ) | ||||||||
| when 4 then pos += 1 # . # : — skip next byte | ||||||||
| end | ||||||||
| pos += 1 | ||||||||
| end | ||||||||
|
|
||||||||
| results << str.byteslice(segment_start, len - segment_start) | ||||||||
| end | ||||||||
|
|
||||||||
| def split_value_preserving_function_whitespace(value) | ||||||||
| split_value = value.gsub(RE_FUNCTIONS) do |c| | ||||||||
| c.gsub!(/\s+/, WHITESPACE_REPLACEMENT) | ||||||||
|
|
||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment belongs to split_selectors