Commit graph

232 commits

Author SHA1 Message Date
Paul Stemmet 4ec311d684 scalar/block: documentation 2021-09-04 17:34:57 +01:00
Paul Stemmet 0ffccab6c3 scalar/block: add unit tests for scan_block_scalar 2021-09-04 17:34:57 +01:00
Paul Stemmet bcf1e405ec scalar/block: add scan_block_scalar 2021-09-04 17:34:57 +01:00
Paul Stemmet dba9212224 scanner/macros: add widthOf!
for determining the length of a UTF8 unicode point. Uses the bit
distribution of UTF8 to determine the code point length
2021-09-04 17:34:57 +01:00
Paul Stemmet 3318f8762a scanner/macros: add isBreakZ!
wrapper around 'isBreak! || check!(buffer => [])'
2021-09-04 17:34:57 +01:00
Paul Stemmet 1b517b518f scanner/error: add InvalidBlockScalar, InvalidTab variants 2021-09-04 17:34:57 +01:00
Paul Stemmet a0d71ef644
docs/plain-scalar-indent: commit notes 2021-08-15 09:03:22 +00:00
Paul Stemmet 4eced7d9f4 lib/scanner: add complex test for plain scalars
for peace of mind
2021-08-14 19:59:07 +01:00
Paul Stemmet 4189d3db96 lib/scanner: add test for YAML indicators in plain scalar 2021-08-14 19:59:07 +01:00
Paul Stemmet 76d9ec5561 scalar/plain: fix indentation level to account for the 0'th level
and add the trait implementations to Indent for <usize> <op> <Indent>
comparisons
2021-08-14 19:59:07 +01:00
Paul Stemmet 338db9ce42 lib/scanner: fix is_plain_scalar to block unsafe plain chars
before the guard would fail, and implicitly fall through to the catchall
statement allowing illegal characters
2021-08-14 19:59:07 +01:00
Paul Stemmet 28a7ce9191 lib/scanner: clippy lints 2021-08-14 19:59:07 +01:00
Paul Stemmet 048550a7b1 lib/scanner: add unit tests for plain scalar token sequences 2021-08-14 19:59:07 +01:00
Paul Stemmet 5d8f78be25 lib/scanner: add support for plain scalars
This commit adds the 3rd of the 5 possible scalar types in YAML to the
scanner. It is compliant with the YAML spec, _except_ for its handling
of "JSON like" keys, which allow for the following value token (e.g ':')
to _not_ have a whitespace following it.

I frankly find this exception absurd, as the spec _clearly_ half assed
this in so that they can declare that they are a "strict super set of
JSON", nevermind that _a lot_ of the semantics of _every_ other context
for keys rely on a key being followed by whitespace.

I may eventually return to this add it; I've a pretty good idea how --
we just need to keep track of the "last" token produced, as only
?'"]} characters would modify the behavior, but I'd need to
make sure I haven't missed any subtle side effects, as almost all other
key handling implicitly relies on: Key token === ": ".
2021-08-14 19:59:07 +01:00
Paul Stemmet 88ac017647 scalar/plain: fix handling of non EOF trailing whitespace
before the loop would incorrectly update scalar_stats _after_ reaching a
': ' terminus. This is now fixed, as I check for the cases before
reentering the word loop.
2021-08-14 19:59:07 +01:00
Paul Stemmet b3e86dea9c scalar/plain: add unit tests for scan_plain_scalar 2021-08-14 19:59:07 +01:00
Paul Stemmet cddc1dae09 scalar/plain: add scan_plain_scalar
the primary driver for scanning plain YAML scalars. This implementation
tries to fit as closely as possible to the YAML spec, particularly in
its handling of (the lack of) spacing requirements inside flow contexts,
comment detection and special casing of - ? : as first character in flow
contexts.

Two things that are notably missing:

1. Proper tab '\t' handling in block context indentation
2. A sane maximum whitespace limit && better handling of whitespace
   storage. Rather than storing every whitespace given, I could instead
   count the whitespace separated by line breaks, and then add it back
   later, such that the maximum described above would apply to total
   line breaks, with the intervening whitespace stored as a u64/usize
2021-08-14 19:59:07 +01:00
Paul Stemmet 2aae6760f5 scalar/flow: use isDocumentIndicator! over longhand 2021-08-14 19:59:07 +01:00
Paul Stemmet 0fcc614771 scanner/macros: add isDocumentIndicator!
short hand for checking '--- ' or '... ' sequences
2021-08-14 19:59:07 +01:00
Paul Stemmet 7d600cd29e scanner/error: add variant InvalidPlainScalar 2021-08-14 19:59:07 +01:00
Paul Stemmet ce7acbb754 lib/scanner: clippy lints 2021-08-08 10:59:11 +01:00
Paul Stemmet 71266f1530 lib/scanner: add tests for explicit key cases 2021-08-08 10:59:11 +01:00
Paul Stemmet 8558dada84 lib/scanner: add explicit key support to Scanner 2021-08-08 10:59:11 +01:00
Paul Stemmet 4c61af7eb9 scanner/error: add variant InvalidKey
for catching cases where a key was given but not valid, typically
involving explicit keys ('?')
2021-08-08 10:59:11 +01:00
Paul Stemmet 76be7001bb lib/scanner: add test for zero indented sequence decrement 2021-08-08 10:59:11 +01:00
Paul Stemmet 5d0572d02d lib/scanner: further fixes to zero indented sequence handling
While the previous commit did add support for _adding_ zero indented
sequences to the token stream, it unfortunately relied on the indent
stack flush that happens once reaching end of stream to push the stored
BlockEnd tokens.

This commit adds better support for removing zero indented sequences
from the stack once finished.

The heuristic used here is:

A zero_indented BlockSequence starts when:

- The top stored indent is for a BlockMapping
- A BlockEntry occupies the same indentation level

And terminates when:

- The top indent stored is a BlockSequence & is tagged as zero indented
- A BlockEntry _does not_ occupy the same indentation level
2021-08-08 10:59:11 +01:00
Paul Stemmet 18d6430cc2 lib/scanner: produce token for zero indentation block sequence
This fixes the edge case YAML allows where a sequence may be zero
indented, but still start a sequence.

E.g using the following YAML:

key:
- "one"
- "two"

The following tokens would have been produced (before this commit):

StreamStart
BlockMappingStart
Key
Scalar('key')
Value
BlockEntry
Scalar('one')
BlockEntry
Scalar('two')
BlockEnd
StreamEnd

Note the the lack of any indication that the values are in a sequence.
Post commit, the following is produced:

StreamStart
BlockMappingStart
Key
Scalar('key')
Value
BlockSequenceStart  <--
BlockEntry
Scalar('one')
BlockEntry
Scalar('two')
BlockEnd <--
BlockEnd
StreamEnd
2021-08-08 10:59:11 +01:00
Paul Stemmet 76acbeba93 scanner/key: clippy lints 2021-08-01 17:21:17 +01:00
Paul Stemmet 29afcda3ee lib/scanner: add tests for catching expected errors in stale keys 2021-08-01 17:21:17 +01:00
Paul Stemmet cbde7ccb91 lib/scanner: fix Scanner.value ignoring key state
Before we only checked for the existence of a saved key, but *didn't*
also check that it was still valid / possible.

This lead to a subtle error wherein scalar that weren't valid keys
(anymore) would still be picked up and used.
2021-08-01 17:21:17 +01:00
Paul Stemmet 72e38d2100 lib/scanner: add tests for block collections 2021-08-01 17:21:17 +01:00
Paul Stemmet 5d6a023077 lib/scanner: add support for BlockEntry tokens to the Scanner 2021-08-01 17:21:17 +01:00
Paul Stemmet 5ba216147d lib/scanner: update tests to expect BlockEnd tokens 2021-08-01 17:21:17 +01:00
Paul Stemmet 8773625259 lib/scanner: update scanner code to decrement indent
Also reorganized the main loop of scan_next_token, adding comment
placeholders for the remaining missing token fetchers
2021-08-01 17:21:17 +01:00
Paul Stemmet e5b1bb0dac scanner/context: allow passing indents directly to indent_decrement
As we need to be able to reset the indents to nil/starting in the
scanner
2021-08-01 17:21:17 +01:00
Paul Stemmet 90d2ac8a06 lib/scanner: document Scanner.value, allow bare ':' in flow context
as the flow context does not require a Value (':') token is followed by
whitespace of some kind
2021-08-01 09:37:23 +01:00
Paul Stemmet 349e62be0a lib/scanner: move saved key check into value function
as before we were double checking for the existence of a Value, once
after parsing a scalar, and again when actually adding the Value token
to the queue. This way we simplify the flow for scalar tokens, and stop
doing unnecessary work
2021-08-01 09:37:23 +01:00
Paul Stemmet 458f806055 lib/scanner: add expire_stale_saved_key check
so we don't allow keys that have expired to interfere, and we don't lose
a key that is required
2021-08-01 09:37:23 +01:00
Paul Stemmet 46b616ea6f scanner/error: add InvalidValue 2021-08-01 09:37:23 +01:00
Paul Stemmet ce4deb76fa lib/scanner: remove duplicate check from Scanner.value
we already check this before calling the function
2021-08-01 09:37:23 +01:00
Paul Stemmet b5fa29ca09 lib/scanner: fix tests
ensure Key is produced before any node decorators, and ensure that a
BlockMappingStart is produced if we've found a key/value
2021-08-01 08:12:47 +01:00
Paul Stemmet 6825e3ebd4 lib/scanner: ensure Scanner does not exit before Key resolution
just loop until we either:

1. Have produced >1 tokens
AND
2. A key isn't possible
OR
3. We've produced stream end
2021-08-01 08:12:47 +01:00
Paul Stemmet ebda074d66 lib/scanner: remove dead code 2021-08-01 08:12:47 +01:00
Paul Stemmet 0a36fc9e6e lib/scanner: update block_collection_entry roll_indent call 2021-08-01 08:12:47 +01:00
Paul Stemmet f8f8536631 lib/scanner: update ScanIter for Queue based token stream 2021-08-01 08:12:47 +01:00
Paul Stemmet 5935ba0056 lib/scanner: fix flow_scalar Key check 2021-08-01 08:12:47 +01:00
Paul Stemmet 1ac7eb556b lib/scanner: fix un/roll_indent function defs
and enqueue! tokens they produce
2021-08-01 08:12:47 +01:00
Paul Stemmet 0ee871dad5 lib/scanner: save keys across Scanner
and fix a few function / type defs
2021-08-01 08:12:47 +01:00
Paul Stemmet 01039b62e2 lib/scanner: add save_key, remove_saved_key
These are functions for saving/resetting potential implicit key positions
2021-08-01 08:12:47 +01:00
Paul Stemmet aa7214ee35 lib/scanner: enqueue! tokens
rather than tokens.push them. This gives me the flexibility to later
make tokens a Trait, and only need to fix the macro rather than every
call site
2021-08-01 08:12:47 +01:00