This commit completely rewrites the key subsystem of the Scanner. Rather
than merely tracking whether a key could be added, Key now manages the
state tracking for potential implicit keys.
A TokenEntry is designed as a wrapper for Tokens returned from the
Scanner, ensuring that they are returned from the Queue in an order that
mirrors where in the buffer the token was read.
This will allow me to push Tokens out of order particularly when
handling Keys and still have them returned in the expected order
The structure will be how tokens are returned via the Scanner, over the
current Vec. This change is occurring because:
The genesis of this structure is a need in the Scanner for fast pops,
and fast inserts. A binary heap gives me both, namely O(1) inserts and
O(log(n)) pops -- with allocations amortized.
This is because of how YAML handles implicit keys... in that you don't
know whether you have one until you hit a value (': '). The easiest
solution is just to save these potential implicit keys and then insert
them into the token list at the correct position, but this would require
memcopy'ing everything >key.pos and potentially cause many more
reallocations than required.
Enter the Queue. I couldn't just use std::BinaryHeap for two reasons:
1. Its a max heap
2. Its not stable, the order of equal elements is unspecified
The Queue fixes both of these problems, first by innately using std::Reverse,
and second by guaranteeing that equal elements are returned in the order
added.
These two attributes allow me to use Scanner.stats.read (number of
bytes consumed so far) and a bit of elbow grease to get my tokens out in
the right order.
Also improve the various self.key.possible calls to calculate more
correctly whether a key is required.
This commit adds support for the Scanner to pick up
Flow{Mapping,Sequence}{Start,End} tokens, but _does not_ yet allow for
FlowEntry tokens.
I misunderstood YAML key semantics around implicit keys, thinking that
only plain (e.g non flow / block) scalars required length and line limit
checking, but it turns out that _all_ possible implicit key variants
have these limits, so I've added checks here, and corrected a bad test
The external Scanner API now requires a .tokens list to add tokens
into.
Where possible, it will still attempt to only store one token per call,
however it may store more if the situation requires, typically when
calculating if a given scalar is a key, or when handling indentation
tokens.
This is a part of an API changes I"ll be making, which will allow
allocations in the scanner code. This change is being made for a few
reasons.
1. Allows me to make the Scanner API nicer, as callers will only need
to pass in the underlying data being scanned, and will not be tied to
a mutable lifetime which limits them to scanning tokens one at a
time.
2. Makes the code simpler, as I no longer need to ensure the mutable
'owned' lifetime is honored throughout the call stack.
3. I'll need to allocate anyway for the indentation stack, and thus not
allocating in other places that are sensible is less important.
This module contains the beginnings of the state required to track and
store tokens which may be "found" out of sequence, notably when we first
need to parse a scalar (Token #1) then check if a value sequence follows
it (Token #2), and if so, return a key token first (Token #3), where the
correct order of tokens is:
Token #3 -> Token #1 -> Token #2
We also need to track in the scanner whether a key is even possible, e.g
if we just parsed a value token, the next scalar _is not_ a key
this allows callers to decide when to convert the range into a Token
ref, which will be important when we need to save a scalar because we
need to return a Key Token first