strip_identifier can cause a panic with cjk identifiers by mznet · Pull Request #138 · getsentry/rust-sourcemap

mznet · 2026-01-17T07:29:32Z

Calling strip_identifier with identifiers that contain cjk characters causes a Rust panic as shown below

assert!(is_valid_javascript_identifier("한글"));

thread 'js_identifiers::tests::test_is_valid_javascript_identifier' (1076350) panicked at src/js_identifiers.rs:49:12:
byte index 4 is not a char boundary; it is inside '글' (bytes 3..6) of `한글`
stack backtrace:
   0: __rustc::rust_begin_unwind
             at /rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/std/src/panicking.rs:698:5
   1: core::panicking::panic_fmt
             at /rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/core/src/panicking.rs:75:14
   2: core::str::slice_error_fail_rt
   3: core::str::slice_error_fail
             at /rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/core/src/str/mod.rs:69:5
   4: core::str::traits::<impl core::slice::index::SliceIndex<str> for core::ops::range::Range<usize>>::index
             at /Users/mjet.plane/.rustup/toolchains/1.91.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/str/traits.rs:248:21
   5: core::str::traits::<impl core::slice::index::SliceIndex<str> for core::ops::range::RangeInclusive<usize>>::index
             at /Users/mjet.plane/.rustup/toolchains/1.91.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/str/traits.rs:664:33
   6: core::str::traits::<impl core::slice::index::SliceIndex<str> for core::ops::range::RangeToInclusive<usize>>::index
             at /Users/mjet.plane/.rustup/toolchains/1.91.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/str/traits.rs:751:24
   7: core::str::traits::<impl core::ops::index::Index<I> for str>::index
             at /Users/mjet.plane/.rustup/toolchains/1.91.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/str/traits.rs:63:15
   8: sourcemap::js_identifiers::strip_identifier
             at ./src/js_identifiers.rs:49:12
   9: sourcemap::js_identifiers::is_valid_javascript_identifier
             at ./src/js_identifiers.rs:54:5

Using cjk characters in identifiers is not common, but I found examples where cjk characters are used as identifiers and the panic happened.
Javascript identifiers are not limited to ascii characters and can include unicode characters.

The current implementation stored only the start byte index of each character in end_idx while iterating, and then sliced the string using an inclusive range &s[..=end_idx].

For example, the string "한글", '한' occupies bytes 0–2 and '글' occupies bytes 3–5, but when processing '글', i = 3 is stored as end_idx and slicing with &s[..=3] breaks the UTF-8 character boundary.
This results in a byte index is not a char boundary panic.

To fix, the code now tracks the end position of each character instead of the start position by calculating end_idx = i + c.len_utf8(), and uses an exclusive range &s[..end_idx] when slicing.

This change covers not only CJK characters but also other non-ASCII Unicode identifiers.

Change "変数名" (Japanese) to "变量名" (Chinese) for better CJK coverage.

loewenheim

Hi, thank you very much for the report and fix. Nice catch!

ETA: Can you please run cargo fmt and update the PR?

by breaking it across multiple lines.

mznet · 2026-01-20T00:18:21Z

@loewenheim I applied cargo fmt to the code to improve readability.

mznet added 2 commits January 17, 2026 15:40

fix: Correct UTF-8 boundary handling in strip_identifier

b47507b

test: Replace Japanese with Chinese in CJK test

22b6ea7

Change "変数名" (Japanese) to "变量名" (Chinese) for better CJK coverage.

loewenheim approved these changes Jan 19, 2026

View reviewed changes

loewenheim enabled auto-merge (squash) January 19, 2026 14:35

Reformat the Korean identifier test assertion to improve readability

ca83ab4

by breaking it across multiple lines.

auto-merge was automatically disabled January 20, 2026 00:15
Head branch was pushed to by a user without write access

loewenheim merged commit c3c213d into getsentry:master Jan 20, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

strip_identifier can cause a panic with cjk identifiers#138

strip_identifier can cause a panic with cjk identifiers#138
loewenheim merged 3 commits intogetsentry:masterfrom
mznet:utf8-boundary-strip-identifier

mznet commented Jan 17, 2026

Uh oh!

loewenheim left a comment •

edited

Loading

Uh oh!

mznet commented Jan 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mznet commented Jan 17, 2026

Uh oh!

loewenheim left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mznet commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

loewenheim left a comment •

edited

Loading

mznet commented Jan 20, 2026 •

edited

Loading