While the previous commit did add support for _adding_ zero indented
sequences to the token stream, it unfortunately relied on the indent
stack flush that happens once reaching end of stream to push the stored
BlockEnd tokens.
This commit adds better support for removing zero indented sequences
from the stack once finished.
The heuristic used here is:
A zero_indented BlockSequence starts when:
- The top stored indent is for a BlockMapping
- A BlockEntry occupies the same indentation level
And terminates when:
- The top indent stored is a BlockSequence & is tagged as zero indented
- A BlockEntry _does not_ occupy the same indentation level
This fixes the edge case YAML allows where a sequence may be zero
indented, but still start a sequence.
E.g using the following YAML:
key:
- "one"
- "two"
The following tokens would have been produced (before this commit):
StreamStart
BlockMappingStart
Key
Scalar('key')
Value
BlockEntry
Scalar('one')
BlockEntry
Scalar('two')
BlockEnd
StreamEnd
Note the the lack of any indication that the values are in a sequence.
Post commit, the following is produced:
StreamStart
BlockMappingStart
Key
Scalar('key')
Value
BlockSequenceStart <--
BlockEntry
Scalar('one')
BlockEntry
Scalar('two')
BlockEnd <--
BlockEnd
StreamEnd
Before we only checked for the existence of a saved key, but *didn't*
also check that it was still valid / possible.
This lead to a subtle error wherein scalar that weren't valid keys
(anymore) would still be picked up and used.
as before we were double checking for the existence of a Value, once
after parsing a scalar, and again when actually adding the Value token
to the queue. This way we simplify the flow for scalar tokens, and stop
doing unnecessary work
This commit completely rewrites the key subsystem of the Scanner. Rather
than merely tracking whether a key could be added, Key now manages the
state tracking for potential implicit keys.
A TokenEntry is designed as a wrapper for Tokens returned from the
Scanner, ensuring that they are returned from the Queue in an order that
mirrors where in the buffer the token was read.
This will allow me to push Tokens out of order particularly when
handling Keys and still have them returned in the expected order
The structure will be how tokens are returned via the Scanner, over the
current Vec. This change is occurring because:
The genesis of this structure is a need in the Scanner for fast pops,
and fast inserts. A binary heap gives me both, namely O(1) inserts and
O(log(n)) pops -- with allocations amortized.
This is because of how YAML handles implicit keys... in that you don't
know whether you have one until you hit a value (': '). The easiest
solution is just to save these potential implicit keys and then insert
them into the token list at the correct position, but this would require
memcopy'ing everything >key.pos and potentially cause many more
reallocations than required.
Enter the Queue. I couldn't just use std::BinaryHeap for two reasons:
1. Its a max heap
2. Its not stable, the order of equal elements is unspecified
The Queue fixes both of these problems, first by innately using std::Reverse,
and second by guaranteeing that equal elements are returned in the order
added.
These two attributes allow me to use Scanner.stats.read (number of
bytes consumed so far) and a bit of elbow grease to get my tokens out in
the right order.
Also improve the various self.key.possible calls to calculate more
correctly whether a key is required.
This commit adds support for the Scanner to pick up
Flow{Mapping,Sequence}{Start,End} tokens, but _does not_ yet allow for
FlowEntry tokens.