Disambiguate between context declarations and context references#58
Disambiguate between context declarations and context references#58skaupper wants to merge 2 commits intoPaebbels:devfrom
Conversation
…tions Add functions to the TokenToBlockParser which can be generally useful: HandleNonCodeTokens creates NonCode (i.e. whitespace and comment) blocks. The LRM allows these blocks basically everywhere. Note: Delimited comments are strictly speaking not separators according to the LRM (§15.3)! ReparseFromTokenMarker allows to iterate over a set of tokens a second time. This is useful if you cannot decide the block type based on a single token. `Context.StartBlock` searches ahead (without modifying tokens or adding blocks) until it can decide whether the `context` keyword is used in a context declaration or a context reference. Additionally, some changes are made to satisfy the static type checker.
2053561 to
faac227
Compare
| # parserState.PushState = ExpressionBlockEndedByLoopORToORDownto.stateExpression | ||
| # return | ||
| # elif token == ';': | ||
| parserState.NewToken = BoundaryToken(fromExistingToken=token) |
There was a problem hiding this comment.
Do you know why you commented out this block in the first place?
| """ | ||
| @classmethod | ||
| def stateContextKeyword(cls, parserState: TokenToBlockParser): | ||
| cls.stateWhitespace1(parserState) |
There was a problem hiding this comment.
It seems like you named all your states after the token which was expected in the previous state, which I find a little counter-intuitive.
For the sake of a consistent interface I kept that naming scheme for the first state, and named all subsequent states after the token they are expecting right now.
| token = parserState.Token | ||
|
|
||
| if isinstance(token, ContextKeyword): | ||
| parserState.NextState = cls.stateWhitespace |
There was a problem hiding this comment.
Do not create the ReferenceStartBlock here, so the behaviour of stateContextKeyword can be the same whether it is called from here, or from within a DeclarationBody.
| return | ||
|
|
||
| # This condition is also guaranteed. Otherwise `StartBlock.stateContextKeyword` would have thrown an error. | ||
| assert False, "Expected whitespace after keyword CONTEXT." |
There was a problem hiding this comment.
I used failing assertions for unreachable code locations. If the interpreter gets there, there is either a bug somewhere, or the assessment that this location is unreachable is wrong.
If you do not want that kind of behaviour we can replace them with BlockParserExceptions for example.
| def stateLibOrContextName(cls, parserState: TokenToBlockParser): | ||
| token = parserState.Token | ||
|
|
||
| if parserState.HandleNonCodeTokens(None): |
There was a problem hiding this comment.
That's my take on handling non-code tokens, without repeating the same 20 lines in every other state. If a block type is passed as the first parameter, an (multi part) instance of that type is created before any whitespace/comment block.
| return | ||
|
|
||
| if isinstance(token, WordToken) and (token == "is"): | ||
| parserState.ReparseFromTokenMarker(DeclarationStartBlock.stateFromStartBlock) |
There was a problem hiding this comment.
The state machine decided, that the block can only represent a context declaration. ReparseFromTokenMarker only needs to know in which state the second parsing should be started. The first token passed is the TokenMarker itself.
This pull request fixes #16, among other things.
The
contextkeyword is used for context declarations (which were already implemented) and context references (which were still missing). To disambiguate between these two (and to avoid having to deal with this issue at a higher level again), I implemented a lookahead mechanism.Context.StartBlockcontains a set of states which only check whether the token stream describes a context declaration, reference or neither. These states do not alter the tokens nor do they generate new blocks by themselves.As soon as it is decideable,
Context.StartBlockhands control over to eitherContext.ReferenceStartBlockorContext.DeclarationStartBlockrespectively.Since the
TokenToBlockParseronly supported looking at a token a single time, I created a second parser instance, which iterates over everything fromTokenMarker(which holds theContextKeywordtoken) up to the current token. The methodTokenToBlockParser.ReparseFromTokenMarkercan be used to do exactly that.I also implemented a parser method
TokenToBlockParser.HandleNonCodeTokenswhich can create non-code blocks (i.e. all kinds of whitespaces and comments) since the LRM allows them basically everywhere. While the LRM does not technically list delimited comments (/* ... */) as separators (§15.3), they effectively act as separators insofar as they separate adjacent lexical elements.I would love to know your thoughts about these changes and if/how you want to integrate them!