Feature/scanner/contexts #17

Merged
bazaah merged 27 commits from feature/scanner/contexts into master 2021-08-01 07:12:47 +00:00
bazaah commented 2021-07-31 20:24:58 +00:00 (Migrated from github.com)

This PR adds support for YAML flow and block contexts, adds a stable min heap Queue, and updates the Scanner to use these.

Components

  • adds Scanner.Context
  • Support all flow collection token types
    • MappingStart
    • SequenceStart
    • MappingEnd
    • SequenceEnd
    • Entry
  • Switch Tokens typedef to Queue<TokenEntry>
  • Update Scanner.Key to save positions in the byte stream as possible Key locations
  • Update Scanner broadly to use both Context and the refactored Key
  • Partial support for block context tokens
  • Updated tests

Context

This is another Scanner subsystem responsible for tracking the current YAML context (block or flow). Which context you're in changes the rules on what is legal or not, and many indicators are only legal in one or the other -- classic example being block scalars:

|
Hi from a block scalar!

Which are only allowed in the block context.

Queue

Reposted from my commit message: 4845a83

The structure will be how tokens are returned via the Scanner, over the
current Vec. This change is occurring because:

The genesis of this structure is a need in the Scanner for fast pops,
and fast inserts. A binary heap gives me both, namely O(1) inserts and
O(log(n)) pops -- with allocations amortized.

This is because of how YAML handles implicit keys... in that you don't
know whether you have one until you hit a value (': '). The easiest
solution is just to save these potential implicit keys and then insert
them into the token list at the correct position, but this would require
memcopy'ing everything >key.pos and potentially cause many more
reallocations than required.

Enter the Queue. I couldn't just use std::BinaryHeap for two reasons:

  1. Its a max heap
  2. Its not stable, the order of equal elements is unspecified

The Queue fixes both of these problems, first by innately using std::Reverse,
and second by guaranteeing that equal elements are returned in the order
added.

These two attributes allow me to use Scanner.stats.read (number of
bytes consumed so far) and a bit of elbow grease to get my tokens out in
the right order.

Refactor of Key

In order to take advantage of the Queue I modified Scanner.Key to instead save the position of a potential key, rather than just if the key is possible. This allows me to later insert the key at the correct buffer position.

General chaos in Scanner

This PR sees a lot of movement in the Scanner code, due to the rearranging, however almost all of it is simple rewrites to use the new functions and not "new code" per-se, outside of calls to self.context or the un/roll_indent functions.

This PR adds support for YAML flow and block contexts, adds a stable min heap Queue, and updates the Scanner to use these. ## Components - adds `Scanner.Context` - Support all flow collection token types - `MappingStart` - `SequenceStart` - `MappingEnd` - `SequenceEnd` - `Entry` - Switch `Tokens` typedef to `Queue<TokenEntry>` - Update `Scanner.Key` to save positions in the byte stream as possible Key locations - Update `Scanner` broadly to use both `Context` and the refactored `Key` - Partial support for block context tokens - Updated tests ## `Context` This is another `Scanner` subsystem responsible for tracking the current YAML context (block or flow). Which context you're in changes the rules on what is legal or not, and many indicators are only legal in one or the other -- classic example being block scalars: ```yaml | Hi from a block scalar! ``` Which are only allowed in the block context. ## `Queue` Reposted from my commit message: 4845a83 > The structure will be how tokens are returned via the Scanner, over the current Vec. This change is occurring because: > > The genesis of this structure is a need in the Scanner for fast pops, and fast inserts. A binary heap gives me both, namely O(1) inserts and O(log(n)) pops -- with allocations amortized. > > This is because of how YAML handles implicit keys... in that you don't know whether you have one until you hit a value (': '). The easiest solution is just to save these potential implicit keys and then insert them into the token list at the correct position, but this would require memcopy'ing everything >key.pos and potentially cause many more reallocations than required. > > Enter the Queue. I couldn't just use std::BinaryHeap for two reasons: > > 1. Its a max heap > 2. Its not stable, the order of equal elements is unspecified > > The Queue fixes both of these problems, first by innately using std::Reverse, and second by guaranteeing that equal elements are returned in the order added. > > These two attributes allow me to use Scanner.stats.read (number of bytes consumed so far) and a bit of elbow grease to get my tokens out in the right order. ## Refactor of `Key` In order to take advantage of the Queue I modified `Scanner.Key` to instead save the position of a potential key, rather than just if the key is possible. This allows me to later insert the key at the correct buffer position. ## General chaos in `Scanner` This PR sees a lot of movement in the Scanner code, due to the rearranging, however almost all of it is simple rewrites to use the new functions and not "new code" per-se, outside of calls to `self.context` or the `un/roll_indent` functions.
Sign in to join this conversation.
No description provided.