From 77f4eea0bba65ab59bb9781453f9e8375a88d715 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Wed, 25 Feb 2026 08:49:39 -0800 Subject: [PATCH 1/2] Update shebang This updates the shebang description, primarily to add a grammar for it. It also reworks the rules so the introduction explains what this section is talking about, and placing specific behaviors in individual rules. --- src/input-format.md | 32 ++++++++++++++++++++++---------- src/whitespace.md | 2 +- 2 files changed, 23 insertions(+), 11 deletions(-) diff --git a/src/input-format.md b/src/input-format.md index 3e35cba1ee..c1daabfd0a 100644 --- a/src/input-format.md +++ b/src/input-format.md @@ -42,21 +42,33 @@ r[input.shebang] ## Shebang removal r[input.shebang.intro] -If the remaining sequence begins with the characters `#!`, the characters up to and including the first `U+000A` (LF) are removed from the sequence. +A *shebang* is an optional line that is typically used in Unix-like systems to specify an interpreter for executing the file. -For example, the first line of the following file would be ignored: +> [!EXAMPLE] +> +> ```rust,ignore +> #!/usr/bin/env rustx +> +> fn main() { +> println!("Hello!"); +> } +> ``` - -```rust,ignore -#!/usr/bin/env rustx +r[input.shebang.syntax] -fn main() { - println!("Hello!"); -} +```grammar,lexer +@root SHEBANG -> + `#!` !((WHITESPACE | LINE_COMMENT | BLOCK_COMMENT)* `[`) + ~LF* (LF | EOF) ``` -r[input.shebang.inner-attribute] -As an exception, if the `#!` characters are followed (ignoring intervening [comments] or [whitespace]) by a `[` token, nothing is removed. This prevents an [inner attribute] at the start of a source file being removed. +The shebang starts with the characters `#!`. However, if these characters are followed by `[` (ignoring any intervening [comments] or [whitespace]), the line is not considered a shebang to avoid ambiguity with an [inner attribute]. The shebang continues to and including the first `U+000A` (LF), or to EOF if there is no line ending. + +r[input.shebang.position] +The shebang may appear immediately at the start of the file or after the optional [byte order mark]. + +r[input.shebang.removal] +The shebang is removed from the input sequence and is ignored. r[input.tokenization] ## Tokenization diff --git a/src/whitespace.md b/src/whitespace.md index 236680f74d..7e16c51d41 100644 --- a/src/whitespace.md +++ b/src/whitespace.md @@ -3,7 +3,7 @@ r[lex.whitespace] r[whitespace.syntax] ```grammar,lexer -@root WHITESPACE -> +WHITESPACE -> U+0009 // Horizontal tab, `'\t'` | U+000A // Line feed, `'\n'` | U+000B // Vertical tab From c6f8a7f747da3665d4eaa74fd768dabdbf1bb996 Mon Sep 17 00:00:00 2001 From: Travis Cross Date: Sun, 1 Mar 2026 07:01:56 +0000 Subject: [PATCH 2/2] Revise shebang section prose and formatting The shebang section was rewritten to add a grammar and to restructure the rules. Let's improve some things editorially. In this commit, we reorder the syntax explanation so that the basic description (start and end) comes before the exception (the `#![` disambiguation). We replace the awkward "continues to and including" with "extends through", convert the purpose clause into a parenthetical, remove a stray blank line between the rule identifier and the grammar block, and use the existing `[shebang]` link definition so that the term links to Wikipedia. --- src/input-format.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/src/input-format.md b/src/input-format.md index c1daabfd0a..2d7a2124c1 100644 --- a/src/input-format.md +++ b/src/input-format.md @@ -42,7 +42,7 @@ r[input.shebang] ## Shebang removal r[input.shebang.intro] -A *shebang* is an optional line that is typically used in Unix-like systems to specify an interpreter for executing the file. +A *[shebang]* is an optional line that is typically used in Unix-like systems to specify an interpreter for executing the file. > [!EXAMPLE] > @@ -55,20 +55,19 @@ A *shebang* is an optional line that is typically used in Unix-like systems to s > ``` r[input.shebang.syntax] - ```grammar,lexer @root SHEBANG -> `#!` !((WHITESPACE | LINE_COMMENT | BLOCK_COMMENT)* `[`) - ~LF* (LF | EOF) + ~LF* (LF | EOF) ``` -The shebang starts with the characters `#!`. However, if these characters are followed by `[` (ignoring any intervening [comments] or [whitespace]), the line is not considered a shebang to avoid ambiguity with an [inner attribute]. The shebang continues to and including the first `U+000A` (LF), or to EOF if there is no line ending. +The shebang starts with the characters `#!` and extends through the first `U+000A` (LF) or through EOF if no LF is present. If the `#!` characters are followed by `[` (ignoring any intervening [comments] or [whitespace]), the line is not considered a shebang (to avoid ambiguity with an [inner attribute]). r[input.shebang.position] The shebang may appear immediately at the start of the file or after the optional [byte order mark]. r[input.shebang.removal] -The shebang is removed from the input sequence and is ignored. +The shebang is removed from the input sequence (and is therefore ignored). r[input.tokenization] ## Tokenization @@ -88,5 +87,5 @@ The resulting sequence of characters is then converted into tokens as described [BYTE ORDER MARK]: https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8 [comments]: comments.md [Crates and source files]: crates-and-source-files.md -[_shebang_]: https://en.wikipedia.org/wiki/Shebang_(Unix) +[shebang]: https://en.wikipedia.org/wiki/Shebang_(Unix) [whitespace]: whitespace.md