Commit Graph

305 Commits

Author SHA1 Message Date
Paul Stemmet 0663bebd0c scanner/error: add variant Extend
This variant suggests to the caller that they should extend the byte
stream before calling the Scanner again.
2021-09-09 20:29:29 +01:00
Paul Stemmet 0b023bd062 scanner/flag: add Flags for Scanner control
This struct is a C style bitflag container, which controls various
aspects of Scanner functionality.

The initial flags available are O_ZEROED, O_EXTENDABLE and O_LAZY. Read
each's documentation for an explanation.
2021-09-09 20:29:29 +01:00
Paul Stemmet 6147424c30 Cargo: add dependencies.bitflags = 1 2021-09-09 20:29:29 +01:00
Paul Stemmet c1aeb0d3f0 Cargo: dependencies.anyhow -> dev-dependencies.anyhow
anyhow is used for testing, not in library code.
2021-09-07 18:32:26 +01:00
Paul Stemmet da15105e1d lib: prune dead module reader
The code here was from earlier experiments and is not relevant or useful
anymore.
2021-09-07 18:32:26 +01:00
Paul Stemmet 82a6e70d8b lib/scanner: prune dead documentation 2021-09-07 18:18:53 +01:00
Paul Stemmet c24f9ef286 lib/scanner: move Scanner.eat_whitespace out of fetch_* methods
put with the other various "helper" functions
2021-09-07 18:18:53 +01:00
Paul Stemmet 2afc2b2606 lib/scanner: rename Scanner token retrieval methods to fetch_*
This split allows future maintainers (i.e: me) to quickly know whether a
function handles the conversion of bytes into tokens -- scan_* function
family -- or handles updating the Scanner's state -- the fetch_*
function family.

Typically one might thing of the call stack as:

1. a Scanner
2. fetches a token
3. by scanning the byte stream
2021-09-07 18:18:53 +01:00
Paul Stemmet 0f76f9bb08 lib/scanner: move test code into scanner/tests
ScanIter was never supposed to be used outside of tests
2021-09-07 18:18:53 +01:00
Paul Stemmet 633e461f4a scanner/anchor: refactor anchor scanning into its own module
and add scan_anchor which the relevant Scanner method calls to scan the
anchor, bringing it more inline with other scan_* functions
2021-09-07 18:18:53 +01:00
Paul Stemmet 829f5c0e81 lib/scanner: merge crate:: and self:: use statements
Not sure why rustfmt decided to do imports that way, but I prefer a
single block.
2021-09-07 18:18:53 +01:00
Paul Stemmet 4bc2eb5c9f scanner/directive: move directive scanning to a separate module
and refactor out the scanning code into scan_directive which is called
from the relevant Scanner method. This makes directive scanning more
consistent with the other scanning functions
2021-09-07 18:18:53 +01:00
Paul Stemmet 842ed7cacb scanner/stats: move MStats into its own module
and update the various use statements, plus better documentation on the
fields / methods
2021-09-07 18:18:53 +01:00
Paul Stemmet 1ed9d45344 lib/scanner: use const indicators over byte literals 2021-09-07 18:18:53 +01:00
Paul Stemmet 0b343ffa72 lib/scanner: refactor tests
Split out the _massive_ tests module into smaller focused modules, one
per area, explained below:

- anchor      | For anchor '&' and alias '*' node tags
- collection  | For flow and block collections
- complex     | For interactions between token types
- directive   | For directives '%'
- document    | For doc starts '---' and endings '...'
- key         | For mapping keys explicit and implicit
- tag         | For node type tags '!!', '!'
- whitespace  | For whitespace chomping between tokens

This vastly reduces the size of lib/scanner's file leading to notably
better performance by rustfmt and rustanalyzer
2021-09-07 18:18:53 +01:00
Paul Stemmet 369f49a248 lib/scanner: unit tests for block scalar token streams 2021-09-04 17:34:57 +01:00
Paul Stemmet 1f144caf76 lib/scanner: add catch all error, documentation 2021-09-04 17:34:57 +01:00
Paul Stemmet b26301cbd0 lib/scanner: add support for block scalars 2021-09-04 17:34:57 +01:00
Paul Stemmet 0f42818f4b scanner/error: add variant UnknownDelimiter
this is a catch all error for if we've exhausted the possible token
delimiters.
2021-09-04 17:34:57 +01:00
Paul Stemmet 8f4cab4f5b scalar/block: add unit test for header comments 2021-09-04 17:34:57 +01:00
Paul Stemmet 87af3bc96b scalar/block: fix skip_blanks comment handling 2021-09-04 17:34:57 +01:00
Paul Stemmet 8505569d3e scalar/block: clippy lints 2021-09-04 17:34:57 +01:00
Paul Stemmet 645007938f scalar/block: code reorganization 2021-09-04 17:34:57 +01:00
Paul Stemmet 4ec311d684 scalar/block: documentation 2021-09-04 17:34:57 +01:00
Paul Stemmet 0ffccab6c3 scalar/block: add unit tests for scan_block_scalar 2021-09-04 17:34:57 +01:00
Paul Stemmet bcf1e405ec scalar/block: add scan_block_scalar 2021-09-04 17:34:57 +01:00
Paul Stemmet dba9212224 scanner/macros: add widthOf!
for determining the length of a UTF8 unicode point. Uses the bit
distribution of UTF8 to determine the code point length
2021-09-04 17:34:57 +01:00
Paul Stemmet 3318f8762a scanner/macros: add isBreakZ!
wrapper around 'isBreak! || check!(buffer => [])'
2021-09-04 17:34:57 +01:00
Paul Stemmet 1b517b518f scanner/error: add InvalidBlockScalar, InvalidTab variants 2021-09-04 17:34:57 +01:00
Paul Stemmet a0d71ef644
docs/plain-scalar-indent: commit notes 2021-08-15 09:03:22 +00:00
Paul Stemmet 4eced7d9f4 lib/scanner: add complex test for plain scalars
for peace of mind
2021-08-14 19:59:07 +01:00
Paul Stemmet 4189d3db96 lib/scanner: add test for YAML indicators in plain scalar 2021-08-14 19:59:07 +01:00
Paul Stemmet 76d9ec5561 scalar/plain: fix indentation level to account for the 0'th level
and add the trait implementations to Indent for <usize> <op> <Indent>
comparisons
2021-08-14 19:59:07 +01:00
Paul Stemmet 338db9ce42 lib/scanner: fix is_plain_scalar to block unsafe plain chars
before the guard would fail, and implicitly fall through to the catchall
statement allowing illegal characters
2021-08-14 19:59:07 +01:00
Paul Stemmet 28a7ce9191 lib/scanner: clippy lints 2021-08-14 19:59:07 +01:00
Paul Stemmet 048550a7b1 lib/scanner: add unit tests for plain scalar token sequences 2021-08-14 19:59:07 +01:00
Paul Stemmet 5d8f78be25 lib/scanner: add support for plain scalars
This commit adds the 3rd of the 5 possible scalar types in YAML to the
scanner. It is compliant with the YAML spec, _except_ for its handling
of "JSON like" keys, which allow for the following value token (e.g ':')
to _not_ have a whitespace following it.

I frankly find this exception absurd, as the spec _clearly_ half assed
this in so that they can declare that they are a "strict super set of
JSON", nevermind that _a lot_ of the semantics of _every_ other context
for keys rely on a key being followed by whitespace.

I may eventually return to this add it; I've a pretty good idea how --
we just need to keep track of the "last" token produced, as only
?'"]} characters would modify the behavior, but I'd need to
make sure I haven't missed any subtle side effects, as almost all other
key handling implicitly relies on: Key token === ": ".
2021-08-14 19:59:07 +01:00
Paul Stemmet 88ac017647 scalar/plain: fix handling of non EOF trailing whitespace
before the loop would incorrectly update scalar_stats _after_ reaching a
': ' terminus. This is now fixed, as I check for the cases before
reentering the word loop.
2021-08-14 19:59:07 +01:00
Paul Stemmet b3e86dea9c scalar/plain: add unit tests for scan_plain_scalar 2021-08-14 19:59:07 +01:00
Paul Stemmet cddc1dae09 scalar/plain: add scan_plain_scalar
the primary driver for scanning plain YAML scalars. This implementation
tries to fit as closely as possible to the YAML spec, particularly in
its handling of (the lack of) spacing requirements inside flow contexts,
comment detection and special casing of - ? : as first character in flow
contexts.

Two things that are notably missing:

1. Proper tab '\t' handling in block context indentation
2. A sane maximum whitespace limit && better handling of whitespace
   storage. Rather than storing every whitespace given, I could instead
   count the whitespace separated by line breaks, and then add it back
   later, such that the maximum described above would apply to total
   line breaks, with the intervening whitespace stored as a u64/usize
2021-08-14 19:59:07 +01:00
Paul Stemmet 2aae6760f5 scalar/flow: use isDocumentIndicator! over longhand 2021-08-14 19:59:07 +01:00
Paul Stemmet 0fcc614771 scanner/macros: add isDocumentIndicator!
short hand for checking '--- ' or '... ' sequences
2021-08-14 19:59:07 +01:00
Paul Stemmet 7d600cd29e scanner/error: add variant InvalidPlainScalar 2021-08-14 19:59:07 +01:00
Paul Stemmet ce7acbb754 lib/scanner: clippy lints 2021-08-08 10:59:11 +01:00
Paul Stemmet 71266f1530 lib/scanner: add tests for explicit key cases 2021-08-08 10:59:11 +01:00
Paul Stemmet 8558dada84 lib/scanner: add explicit key support to Scanner 2021-08-08 10:59:11 +01:00
Paul Stemmet 4c61af7eb9 scanner/error: add variant InvalidKey
for catching cases where a key was given but not valid, typically
involving explicit keys ('?')
2021-08-08 10:59:11 +01:00
Paul Stemmet 76be7001bb lib/scanner: add test for zero indented sequence decrement 2021-08-08 10:59:11 +01:00
Paul Stemmet 5d0572d02d lib/scanner: further fixes to zero indented sequence handling
While the previous commit did add support for _adding_ zero indented
sequences to the token stream, it unfortunately relied on the indent
stack flush that happens once reaching end of stream to push the stored
BlockEnd tokens.

This commit adds better support for removing zero indented sequences
from the stack once finished.

The heuristic used here is:

A zero_indented BlockSequence starts when:

- The top stored indent is for a BlockMapping
- A BlockEntry occupies the same indentation level

And terminates when:

- The top indent stored is a BlockSequence & is tagged as zero indented
- A BlockEntry _does not_ occupy the same indentation level
2021-08-08 10:59:11 +01:00
Paul Stemmet 18d6430cc2 lib/scanner: produce token for zero indentation block sequence
This fixes the edge case YAML allows where a sequence may be zero
indented, but still start a sequence.

E.g using the following YAML:

key:
- "one"
- "two"

The following tokens would have been produced (before this commit):

StreamStart
BlockMappingStart
Key
Scalar('key')
Value
BlockEntry
Scalar('one')
BlockEntry
Scalar('two')
BlockEnd
StreamEnd

Note the the lack of any indication that the values are in a sequence.
Post commit, the following is produced:

StreamStart
BlockMappingStart
Key
Scalar('key')
Value
BlockSequenceStart  <--
BlockEntry
Scalar('one')
BlockEntry
Scalar('two')
BlockEnd <--
BlockEnd
StreamEnd
2021-08-08 10:59:11 +01:00