cache! allows the Scanner to state that it requires 'N' more codepoints
before it can correctly process the byte stream.
Its primary purpose is its interaction with O_EXTENDABLE, which allows
the caller to hint to the Scanner that the buffer could grow, likewise
cache! returns an error that hints to the caller that they should extend
the byte stream before calling the Scanner again -- or pass opts without
O_EXTENDABLE.
This struct is a C style bitflag container, which controls various
aspects of Scanner functionality.
The initial flags available are O_ZEROED, O_EXTENDABLE and O_LAZY. Read
each's documentation for an explanation.
This split allows future maintainers (i.e: me) to quickly know whether a
function handles the conversion of bytes into tokens -- scan_* function
family -- or handles updating the Scanner's state -- the fetch_*
function family.
Typically one might thing of the call stack as:
1. a Scanner
2. fetches a token
3. by scanning the byte stream
and refactor out the scanning code into scan_directive which is called
from the relevant Scanner method. This makes directive scanning more
consistent with the other scanning functions
Split out the _massive_ tests module into smaller focused modules, one
per area, explained below:
- anchor | For anchor '&' and alias '*' node tags
- collection | For flow and block collections
- complex | For interactions between token types
- directive | For directives '%'
- document | For doc starts '---' and endings '...'
- key | For mapping keys explicit and implicit
- tag | For node type tags '!!', '!'
- whitespace | For whitespace chomping between tokens
This vastly reduces the size of lib/scanner's file leading to notably
better performance by rustfmt and rustanalyzer
This commit adds the 3rd of the 5 possible scalar types in YAML to the
scanner. It is compliant with the YAML spec, _except_ for its handling
of "JSON like" keys, which allow for the following value token (e.g ':')
to _not_ have a whitespace following it.
I frankly find this exception absurd, as the spec _clearly_ half assed
this in so that they can declare that they are a "strict super set of
JSON", nevermind that _a lot_ of the semantics of _every_ other context
for keys rely on a key being followed by whitespace.
I may eventually return to this add it; I've a pretty good idea how --
we just need to keep track of the "last" token produced, as only
?'"]} characters would modify the behavior, but I'd need to
make sure I haven't missed any subtle side effects, as almost all other
key handling implicitly relies on: Key token === ": ".
before the loop would incorrectly update scalar_stats _after_ reaching a
': ' terminus. This is now fixed, as I check for the cases before
reentering the word loop.
the primary driver for scanning plain YAML scalars. This implementation
tries to fit as closely as possible to the YAML spec, particularly in
its handling of (the lack of) spacing requirements inside flow contexts,
comment detection and special casing of - ? : as first character in flow
contexts.
Two things that are notably missing:
1. Proper tab '\t' handling in block context indentation
2. A sane maximum whitespace limit && better handling of whitespace
storage. Rather than storing every whitespace given, I could instead
count the whitespace separated by line breaks, and then add it back
later, such that the maximum described above would apply to total
line breaks, with the intervening whitespace stored as a u64/usize