Commit Graph

132 Commits

Author SHA1 Message Date
Paul Stemmet 971e2c76d4 lib/scanner: use simple_key_allowed over self.key_* 2021-08-01 08:12:47 +01:00
Paul Stemmet cda72c58fa lib/scanner: switch Tokens->Queue<TokenEntry>, add Scanner.simple_key_allowed 2021-08-01 08:12:47 +01:00
Paul Stemmet 87640bd1f4 scanner/key: refactor
This commit completely rewrites the key subsystem of the Scanner. Rather
than merely tracking whether a key could be added, Key now manages the
state tracking for potential implicit keys.
2021-08-01 08:12:47 +01:00
Paul Stemmet a82aa3c35d scanner/macros: add enqueue! 2021-08-01 08:12:47 +01:00
Paul Stemmet 840868f82e scanner/entry: a custom Ord Token wrapper
A TokenEntry is designed as a wrapper for Tokens returned from the
Scanner, ensuring that they are returned from the Queue in an order that
mirrors where in the buffer the token was read.

This will allow me to push Tokens out of order particularly when
handling Keys and still have them returned in the expected order
2021-08-01 08:12:47 +01:00
Paul Stemmet 7e567aa8a9 lib/queue: add Queue, a stable min binary heap
The structure will be how tokens are returned via the Scanner, over the
current Vec. This change is occurring because:

The genesis of this structure is a need in the Scanner for fast pops,
and fast inserts. A binary heap gives me both, namely O(1) inserts and
O(log(n)) pops -- with allocations amortized.

This is because of how YAML handles implicit keys... in that you don't
know whether you have one until you hit a value (': '). The easiest
solution is just to save these potential implicit keys and then insert
them into the token list at the correct position, but this would require
memcopy'ing everything >key.pos and potentially cause many more
reallocations than required.

Enter the Queue. I couldn't just use std::BinaryHeap for two reasons:

1. Its a max heap
2. Its not stable, the order of equal elements is unspecified

The Queue fixes both of these problems, first by innately using std::Reverse,
and second by guaranteeing that equal elements are returned in the order
added.

These two attributes allow me to use Scanner.stats.read (number of
bytes consumed so far) and a bit of elbow grease to get my tokens out in
the right order.
2021-08-01 08:12:47 +01:00
Paul Stemmet 5212077ae8 lib/scanner: add tests for flow contexts 2021-08-01 08:12:47 +01:00
Paul Stemmet 24a8f2b211 lib/scanner: add test for simple flow sequence 2021-08-01 08:12:47 +01:00
Paul Stemmet 9e295b8f72 scanner/context: fix flow de/increment
I forgot when removing the macro that I need to actually assign the
computation to self.flow
2021-08-01 08:12:47 +01:00
Paul Stemmet 37221ad020 lib/scanner: add flow/block entry scan functions
still need to add tests for them
2021-08-01 08:12:47 +01:00
Paul Stemmet 6b8965268c lib/scanner: add un/roll_indent functions
for handling the indent increment / decrement of the Scanner
2021-08-01 08:12:47 +01:00
Paul Stemmet 5ec3d0ae2b scanner/error: add InvalidBlockEntry 2021-08-01 08:12:47 +01:00
Paul Stemmet e24fe38a7e lib/scanner: add unit tests for flow_collection_* methods
though only for the naive cases, e.g '{}' and '[]'.
More to come...
2021-08-01 08:12:47 +01:00
Paul Stemmet a84f64e2b7 lib/scanner: track YAML context, add flow_collection_* methods
Also improve the various self.key.possible calls to calculate more
correctly whether a key is required.

This commit adds support for the Scanner to pick up
Flow{Mapping,Sequence}{Start,End} tokens, but _does not_ yet allow for
FlowEntry tokens.
2021-08-01 08:12:47 +01:00
Paul Stemmet c2734e33e1 scanner/context: add Context
Context handles tracking the current YAML context, and provides the
mechanisms to update it efficiently
2021-08-01 08:12:47 +01:00
Paul Stemmet 93470734ba scanner/error: add IntOverflow variant 2021-08-01 08:12:47 +01:00
Paul Stemmet d32fc77b17 lib/scanner: rename key.impossible -> key.forbidden 2021-08-01 08:12:47 +01:00
Paul Stemmet 0fe2e99426 lib/token: cmp with marker by ref (clippy) 2021-07-25 12:41:57 +01:00
Paul Stemmet e35a1e87db lib/scanner: fix check_is_key, correct test
I misunderstood YAML key semantics around implicit keys, thinking that
only plain (e.g non flow / block) scalars required length and line limit
checking, but it turns out that _all_ possible implicit key variants
have these limits, so I've added checks here, and corrected a bad test
2021-07-25 12:41:57 +01:00
Paul Stemmet dd04944fe9 lib/scanner: clippy, fmt 2021-07-25 12:41:57 +01:00
Paul Stemmet eee164caa1 scanner/key: remove old code
From before I had decided to use a token array for token storage.
2021-07-25 12:41:57 +01:00
Paul Stemmet b1713c73f6 lib/scanner: allow multi token calls
The external Scanner API now requires a .tokens list to add tokens
into.

Where possible, it will still attempt to only store one token per call,
however it may store more if the situation requires, typically when
calculating if a given scalar is a key, or when handling indentation
tokens.
2021-07-25 12:41:57 +01:00
Paul Stemmet 53a8c8eccb scanner/key: adjustments for the API changes 2021-07-25 12:41:57 +01:00
Paul Stemmet 8986f36f00 scanner/tag: remove ref, return owned Slice variants
This is a part of an API changes I"ll be making, which will allow
allocations in the scanner code. This change is being made for a few
reasons.

1. Allows me to make the Scanner API nicer, as callers will only need
   to pass in the underlying data being scanned, and will not be tied to
   a mutable lifetime which limits them to scanning tokens one at a
   time.
2. Makes the code simpler, as I no longer need to ensure the mutable
   'owned' lifetime is honored throughout the call stack.
3. I'll need to allocate anyway for the indentation stack, and thus not
   allocating in other places that are sensible is less important.
2021-07-25 12:41:57 +01:00
Paul Stemmet 8f84972cd5 scalar: tidy syntax / includes 2021-07-25 12:41:57 +01:00
Paul Stemmet ea64559444 lib/token: add marker
This commit adds a Marker enum which mirrors the variants of Token, but
is data-less.
2021-07-25 12:41:57 +01:00
Paul Stemmet 37278ab219 WIP 2021-07-25 12:41:57 +01:00
Paul Stemmet cd7859f3d4 lib/scanner: save the scanned scalar's stats 2021-07-25 12:41:57 +01:00
Paul Stemmet 696b71b083 lib/scanner: remove reset_stale_keys, dbgs 2021-07-25 12:41:57 +01:00
Paul Stemmet edc70ed81a WIP 2021-07-25 12:41:57 +01:00
Paul Stemmet 7857839d6c scanner/scalar: add tests to catch trailing ws bugs 2021-07-25 12:41:57 +01:00
Paul Stemmet e416e3d0ab scanner/scalar: bugfix always count whitespace 2021-07-25 12:41:57 +01:00
Paul Stemmet a59527944b lib/scanner: add value token scanner, track keys 2021-07-25 12:41:57 +01:00
Paul Stemmet a0e184431f scanner/key: add, structs for managing key tokens
This module contains the beginnings of the state required to track and
store tokens which may be "found" out of sequence, notably when we first
need to parse a scalar (Token #1) then check if a value sequence follows
it (Token #2), and if so, return a key token first (Token #3), where the
correct order of tokens is:

    Token #3 -> Token #1 -> Token #2

We also need to track in the scanner whether a key is even possible, e.g
if we just parsed a value token, the next scalar _is not_ a key
2021-07-25 12:41:57 +01:00
Paul Stemmet 81975e197f scanner/scalar: return ScalarRange over Ref
this allows callers to decide when to convert the range into a Token
ref, which will be important when we need to save a scalar because we
need to return a Key Token first
2021-07-25 12:41:57 +01:00
Paul Stemmet ffcdce5961 lib/token: derive clone on style 2021-07-25 12:41:57 +01:00
Paul Stemmet cebd1d6e7d lib/scanner: add test for implicit key
So I can have a case to test my implementation against
2021-07-25 12:41:57 +01:00
Paul Stemmet 17f09e4d30
scanner/token: fix primary branch in scan_node_tag
we were double consuming a character, add a unit test to catch this
issue in the future
2021-06-29 22:50:19 +00:00
Paul Stemmet 298b15cad7 lib/scanner: document MStats 2021-06-29 23:14:30 +01:00
Paul Stemmet 6490de8974 lib/scanner: add stats test to unit tests 2021-06-29 23:14:30 +01:00
Paul Stemmet 393ce6372b scalar/flow: fix unit tests 2021-06-29 23:14:30 +01:00
Paul Stemmet 5e025374cc lib/scanner, scalar/flow: track stats in flow_scalar
Also fix the incorrect document stream indicator check now that we have
a column to check against
2021-06-29 23:14:30 +01:00
Paul Stemmet 32ada94850 lib/scanner: track stats in anchor 2021-06-29 23:14:30 +01:00
Paul Stemmet de2960a325 lib/scanner, scanner/tag: track stats in node,directive tags 2021-06-29 23:14:30 +01:00
Paul Stemmet 46fcd61aec lib/scanner: track stats in version directive 2021-06-29 23:14:30 +01:00
Paul Stemmet 911f861320 lib/scanner: track stats in document_marker 2021-06-29 23:14:30 +01:00
Paul Stemmet f2399e0eb4 lib/scanner: track stats in eat_whitespace 2021-06-29 23:14:30 +01:00
Paul Stemmet 69574b3628 scanner/macros: allow advance! to optionally update :stats 2021-06-29 23:14:30 +01:00
Paul Stemmet cb6d64dfc7 lib/scanner: add MStats
A struct for doing book keeping about where we are in the buffer:

1. How much we've read
2. How many lines we've seen
3. The current column

I'll likely add variants to advance!, as its the primary method used to
traverse the buffer

This will likely be passed as an extra argument down the various scan
call stacks, and care will need to be taken to ensure we're handling
line breaks correctly (because I bet we're not currently)

Tests will need to be updated to test that we're getting the stats we
expect.
2021-06-29 23:14:30 +01:00
Paul Stemmet 8be1ca8329 lib/scanner: fix tokens! ScanIter lifetimes 2021-06-29 23:14:30 +01:00