Commit graph

164 commits

Author SHA1 Message Date
Paul Stemmet 0fcc614771 scanner/macros: add isDocumentIndicator!
short hand for checking '--- ' or '... ' sequences
2021-08-14 19:59:07 +01:00
Paul Stemmet 7d600cd29e scanner/error: add variant InvalidPlainScalar 2021-08-14 19:59:07 +01:00
Paul Stemmet ce7acbb754 lib/scanner: clippy lints 2021-08-08 10:59:11 +01:00
Paul Stemmet 71266f1530 lib/scanner: add tests for explicit key cases 2021-08-08 10:59:11 +01:00
Paul Stemmet 8558dada84 lib/scanner: add explicit key support to Scanner 2021-08-08 10:59:11 +01:00
Paul Stemmet 4c61af7eb9 scanner/error: add variant InvalidKey
for catching cases where a key was given but not valid, typically
involving explicit keys ('?')
2021-08-08 10:59:11 +01:00
Paul Stemmet 76be7001bb lib/scanner: add test for zero indented sequence decrement 2021-08-08 10:59:11 +01:00
Paul Stemmet 5d0572d02d lib/scanner: further fixes to zero indented sequence handling
While the previous commit did add support for _adding_ zero indented
sequences to the token stream, it unfortunately relied on the indent
stack flush that happens once reaching end of stream to push the stored
BlockEnd tokens.

This commit adds better support for removing zero indented sequences
from the stack once finished.

The heuristic used here is:

A zero_indented BlockSequence starts when:

- The top stored indent is for a BlockMapping
- A BlockEntry occupies the same indentation level

And terminates when:

- The top indent stored is a BlockSequence & is tagged as zero indented
- A BlockEntry _does not_ occupy the same indentation level
2021-08-08 10:59:11 +01:00
Paul Stemmet 18d6430cc2 lib/scanner: produce token for zero indentation block sequence
This fixes the edge case YAML allows where a sequence may be zero
indented, but still start a sequence.

E.g using the following YAML:

key:
- "one"
- "two"

The following tokens would have been produced (before this commit):

StreamStart
BlockMappingStart
Key
Scalar('key')
Value
BlockEntry
Scalar('one')
BlockEntry
Scalar('two')
BlockEnd
StreamEnd

Note the the lack of any indication that the values are in a sequence.
Post commit, the following is produced:

StreamStart
BlockMappingStart
Key
Scalar('key')
Value
BlockSequenceStart  <--
BlockEntry
Scalar('one')
BlockEntry
Scalar('two')
BlockEnd <--
BlockEnd
StreamEnd
2021-08-08 10:59:11 +01:00
Paul Stemmet 76acbeba93 scanner/key: clippy lints 2021-08-01 17:21:17 +01:00
Paul Stemmet 29afcda3ee lib/scanner: add tests for catching expected errors in stale keys 2021-08-01 17:21:17 +01:00
Paul Stemmet cbde7ccb91 lib/scanner: fix Scanner.value ignoring key state
Before we only checked for the existence of a saved key, but *didn't*
also check that it was still valid / possible.

This lead to a subtle error wherein scalar that weren't valid keys
(anymore) would still be picked up and used.
2021-08-01 17:21:17 +01:00
Paul Stemmet 72e38d2100 lib/scanner: add tests for block collections 2021-08-01 17:21:17 +01:00
Paul Stemmet 5d6a023077 lib/scanner: add support for BlockEntry tokens to the Scanner 2021-08-01 17:21:17 +01:00
Paul Stemmet 5ba216147d lib/scanner: update tests to expect BlockEnd tokens 2021-08-01 17:21:17 +01:00
Paul Stemmet 8773625259 lib/scanner: update scanner code to decrement indent
Also reorganized the main loop of scan_next_token, adding comment
placeholders for the remaining missing token fetchers
2021-08-01 17:21:17 +01:00
Paul Stemmet e5b1bb0dac scanner/context: allow passing indents directly to indent_decrement
As we need to be able to reset the indents to nil/starting in the
scanner
2021-08-01 17:21:17 +01:00
Paul Stemmet 90d2ac8a06 lib/scanner: document Scanner.value, allow bare ':' in flow context
as the flow context does not require a Value (':') token is followed by
whitespace of some kind
2021-08-01 09:37:23 +01:00
Paul Stemmet 349e62be0a lib/scanner: move saved key check into value function
as before we were double checking for the existence of a Value, once
after parsing a scalar, and again when actually adding the Value token
to the queue. This way we simplify the flow for scalar tokens, and stop
doing unnecessary work
2021-08-01 09:37:23 +01:00
Paul Stemmet 458f806055 lib/scanner: add expire_stale_saved_key check
so we don't allow keys that have expired to interfere, and we don't lose
a key that is required
2021-08-01 09:37:23 +01:00
Paul Stemmet 46b616ea6f scanner/error: add InvalidValue 2021-08-01 09:37:23 +01:00
Paul Stemmet ce4deb76fa lib/scanner: remove duplicate check from Scanner.value
we already check this before calling the function
2021-08-01 09:37:23 +01:00
Paul Stemmet b5fa29ca09 lib/scanner: fix tests
ensure Key is produced before any node decorators, and ensure that a
BlockMappingStart is produced if we've found a key/value
2021-08-01 08:12:47 +01:00
Paul Stemmet 6825e3ebd4 lib/scanner: ensure Scanner does not exit before Key resolution
just loop until we either:

1. Have produced >1 tokens
AND
2. A key isn't possible
OR
3. We've produced stream end
2021-08-01 08:12:47 +01:00
Paul Stemmet ebda074d66 lib/scanner: remove dead code 2021-08-01 08:12:47 +01:00
Paul Stemmet 0a36fc9e6e lib/scanner: update block_collection_entry roll_indent call 2021-08-01 08:12:47 +01:00
Paul Stemmet f8f8536631 lib/scanner: update ScanIter for Queue based token stream 2021-08-01 08:12:47 +01:00
Paul Stemmet 5935ba0056 lib/scanner: fix flow_scalar Key check 2021-08-01 08:12:47 +01:00
Paul Stemmet 1ac7eb556b lib/scanner: fix un/roll_indent function defs
and enqueue! tokens they produce
2021-08-01 08:12:47 +01:00
Paul Stemmet 0ee871dad5 lib/scanner: save keys across Scanner
and fix a few function / type defs
2021-08-01 08:12:47 +01:00
Paul Stemmet 01039b62e2 lib/scanner: add save_key, remove_saved_key
These are functions for saving/resetting potential implicit key positions
2021-08-01 08:12:47 +01:00
Paul Stemmet aa7214ee35 lib/scanner: enqueue! tokens
rather than tokens.push them. This gives me the flexibility to later
make tokens a Trait, and only need to fix the macro rather than every
call site
2021-08-01 08:12:47 +01:00
Paul Stemmet 971e2c76d4 lib/scanner: use simple_key_allowed over self.key_* 2021-08-01 08:12:47 +01:00
Paul Stemmet cda72c58fa lib/scanner: switch Tokens->Queue<TokenEntry>, add Scanner.simple_key_allowed 2021-08-01 08:12:47 +01:00
Paul Stemmet 87640bd1f4 scanner/key: refactor
This commit completely rewrites the key subsystem of the Scanner. Rather
than merely tracking whether a key could be added, Key now manages the
state tracking for potential implicit keys.
2021-08-01 08:12:47 +01:00
Paul Stemmet a82aa3c35d scanner/macros: add enqueue! 2021-08-01 08:12:47 +01:00
Paul Stemmet 840868f82e scanner/entry: a custom Ord Token wrapper
A TokenEntry is designed as a wrapper for Tokens returned from the
Scanner, ensuring that they are returned from the Queue in an order that
mirrors where in the buffer the token was read.

This will allow me to push Tokens out of order particularly when
handling Keys and still have them returned in the expected order
2021-08-01 08:12:47 +01:00
Paul Stemmet 7e567aa8a9 lib/queue: add Queue, a stable min binary heap
The structure will be how tokens are returned via the Scanner, over the
current Vec. This change is occurring because:

The genesis of this structure is a need in the Scanner for fast pops,
and fast inserts. A binary heap gives me both, namely O(1) inserts and
O(log(n)) pops -- with allocations amortized.

This is because of how YAML handles implicit keys... in that you don't
know whether you have one until you hit a value (': '). The easiest
solution is just to save these potential implicit keys and then insert
them into the token list at the correct position, but this would require
memcopy'ing everything >key.pos and potentially cause many more
reallocations than required.

Enter the Queue. I couldn't just use std::BinaryHeap for two reasons:

1. Its a max heap
2. Its not stable, the order of equal elements is unspecified

The Queue fixes both of these problems, first by innately using std::Reverse,
and second by guaranteeing that equal elements are returned in the order
added.

These two attributes allow me to use Scanner.stats.read (number of
bytes consumed so far) and a bit of elbow grease to get my tokens out in
the right order.
2021-08-01 08:12:47 +01:00
Paul Stemmet 5212077ae8 lib/scanner: add tests for flow contexts 2021-08-01 08:12:47 +01:00
Paul Stemmet 24a8f2b211 lib/scanner: add test for simple flow sequence 2021-08-01 08:12:47 +01:00
Paul Stemmet 9e295b8f72 scanner/context: fix flow de/increment
I forgot when removing the macro that I need to actually assign the
computation to self.flow
2021-08-01 08:12:47 +01:00
Paul Stemmet 37221ad020 lib/scanner: add flow/block entry scan functions
still need to add tests for them
2021-08-01 08:12:47 +01:00
Paul Stemmet 6b8965268c lib/scanner: add un/roll_indent functions
for handling the indent increment / decrement of the Scanner
2021-08-01 08:12:47 +01:00
Paul Stemmet 5ec3d0ae2b scanner/error: add InvalidBlockEntry 2021-08-01 08:12:47 +01:00
Paul Stemmet e24fe38a7e lib/scanner: add unit tests for flow_collection_* methods
though only for the naive cases, e.g '{}' and '[]'.
More to come...
2021-08-01 08:12:47 +01:00
Paul Stemmet a84f64e2b7 lib/scanner: track YAML context, add flow_collection_* methods
Also improve the various self.key.possible calls to calculate more
correctly whether a key is required.

This commit adds support for the Scanner to pick up
Flow{Mapping,Sequence}{Start,End} tokens, but _does not_ yet allow for
FlowEntry tokens.
2021-08-01 08:12:47 +01:00
Paul Stemmet c2734e33e1 scanner/context: add Context
Context handles tracking the current YAML context, and provides the
mechanisms to update it efficiently
2021-08-01 08:12:47 +01:00
Paul Stemmet 93470734ba scanner/error: add IntOverflow variant 2021-08-01 08:12:47 +01:00
Paul Stemmet d32fc77b17 lib/scanner: rename key.impossible -> key.forbidden 2021-08-01 08:12:47 +01:00
Paul Stemmet 0fe2e99426 lib/token: cmp with marker by ref (clippy) 2021-07-25 12:41:57 +01:00