The external Scanner API now requires a .tokens list to add tokens
into.
Where possible, it will still attempt to only store one token per call,
however it may store more if the situation requires, typically when
calculating if a given scalar is a key, or when handling indentation
tokens.
This is a part of an API changes I"ll be making, which will allow
allocations in the scanner code. This change is being made for a few
reasons.
1. Allows me to make the Scanner API nicer, as callers will only need
to pass in the underlying data being scanned, and will not be tied to
a mutable lifetime which limits them to scanning tokens one at a
time.
2. Makes the code simpler, as I no longer need to ensure the mutable
'owned' lifetime is honored throughout the call stack.
3. I'll need to allocate anyway for the indentation stack, and thus not
allocating in other places that are sensible is less important.
This module contains the beginnings of the state required to track and
store tokens which may be "found" out of sequence, notably when we first
need to parse a scalar (Token #1) then check if a value sequence follows
it (Token #2), and if so, return a key token first (Token #3), where the
correct order of tokens is:
Token #3 -> Token #1 -> Token #2
We also need to track in the scanner whether a key is even possible, e.g
if we just parsed a value token, the next scalar _is not_ a key
this allows callers to decide when to convert the range into a Token
ref, which will be important when we need to save a scalar because we
need to return a Key Token first
A struct for doing book keeping about where we are in the buffer:
1. How much we've read
2. How many lines we've seen
3. The current column
I'll likely add variants to advance!, as its the primary method used to
traverse the buffer
This will likely be passed as an extra argument down the various scan
call stacks, and care will need to be taken to ensure we're handling
line breaks correctly (because I bet we're not currently)
Tests will need to be updated to test that we're getting the stats we
expect.
An unfortunate glitch in the compiler requires that I use a match
statement over fall through if guards, as the borrow checker is to
restrictive, and will not allow the code to compile despite being
clearly correct.
This required me to lift the stream checks into next_token, which means
we now have a redundant check. Hopefully in the future the borrow
checker will become smarter.
This commit also refactors stream checks to use a const identifier shared
between the call site in next_token and the function proper, and misc
changes to Scanner.eat_whitespace
As we'll be using it throughout the various scanner/* modules, not just
on the scanner struct itself.
This commit also improves eat_whitespace to chomp any valid YAML
whitespace, not just newlines and spaces.
Split out tag scanning functions into their own module.
This commit includes two functions scan_tag_uri, and scan_tag_handle for
process prefix/suffix'es and handles respectively.
Note that these functions by themselves cannot properly parse either %TAG
directives or YAML node tags; but higher level functions can use these to
correctly scan both.
This makes the naming more consistent, we now have isBreak! variants for
line breaks, isBlank! variants for space and isWhiteSpace! variants for
both.
This commit also adds $error variants to isBlank! and isBreak! to remain
consistent with isWhiteSpace!.
as we can no longer uphold the Iterator contract on Scanner directly, as
next_token now requires it be given a scratch space handle; we move the
iterator impl onto a separate struct, ScanIter.
For the moment, this struct is private but this may change in the
future.
This commit takes the first steps towards the final API of Scanner,
wherein it returns Result<Ref>s over Result<Token>s. This change allows
the struct to access a scratch space which it can and will use when
borrowing from the underlying data is impossible, such as when
encountering escape sequences (which must be unescaped), line joining,
or data (type) transformation.
Regardless, these changes were required to correctly handle escape
sequences in tag directives.
Note that the test suite for lib/scanner is broken as of this commit, it
will be fixed in the next.