Commit graph

214 commits

Author SHA1 Message Date
Paul Stemmet 836716f5a3 scalar/plain: cache! before fetch
also fix the call stack in lib/scanner
2021-09-09 20:29:29 +01:00
Paul Stemmet 1f22f9d609 scanner/anchor: cache! before fetch 2021-09-09 20:29:29 +01:00
Paul Stemmet 564ee1476e scalar/escape: fix tests 2021-09-09 20:29:29 +01:00
Paul Stemmet f1fa8a6620 scalar/flow: cache! before fetch
also fixes the call stack in lib/scanner
2021-09-09 20:29:29 +01:00
Paul Stemmet 13ff795bf3 scanner/tag: cache! before fetch
note that this commit also fixes code that fetch_directive uses
2021-09-09 20:29:29 +01:00
Paul Stemmet 6d82e8b045 scanner/directive: cache! before fetch 2021-09-09 20:29:29 +01:00
Paul Stemmet 86cc5e72d9 lib/scanner: cache! before fetch in scan_next_token 2021-09-09 20:29:29 +01:00
Paul Stemmet 69e202a4cb lib/scanner: add opts to scan_tokens, eat_whitespace cache!
This commit adds initial support for cache!-ing characters in the
Scanner, starting with eat_whitespace.
2021-09-09 20:29:29 +01:00
Paul Stemmet 9b13d54e44 scanner/macros: add cache!
cache! allows the Scanner to state that it requires 'N' more codepoints
before it can correctly process the byte stream.

Its primary purpose is its interaction with O_EXTENDABLE, which allows
the caller to hint to the Scanner that the buffer could grow, likewise
cache! returns an error that hints to the caller that they should extend
the byte stream before calling the Scanner again -- or pass opts without
O_EXTENDABLE.
2021-09-09 20:29:29 +01:00
Paul Stemmet 0663bebd0c scanner/error: add variant Extend
This variant suggests to the caller that they should extend the byte
stream before calling the Scanner again.
2021-09-09 20:29:29 +01:00
Paul Stemmet 0b023bd062 scanner/flag: add Flags for Scanner control
This struct is a C style bitflag container, which controls various
aspects of Scanner functionality.

The initial flags available are O_ZEROED, O_EXTENDABLE and O_LAZY. Read
each's documentation for an explanation.
2021-09-09 20:29:29 +01:00
Paul Stemmet 6147424c30 Cargo: add dependencies.bitflags = 1 2021-09-09 20:29:29 +01:00
Paul Stemmet c1aeb0d3f0 Cargo: dependencies.anyhow -> dev-dependencies.anyhow
anyhow is used for testing, not in library code.
2021-09-07 18:32:26 +01:00
Paul Stemmet da15105e1d lib: prune dead module reader
The code here was from earlier experiments and is not relevant or useful
anymore.
2021-09-07 18:32:26 +01:00
Paul Stemmet 82a6e70d8b lib/scanner: prune dead documentation 2021-09-07 18:18:53 +01:00
Paul Stemmet c24f9ef286 lib/scanner: move Scanner.eat_whitespace out of fetch_* methods
put with the other various "helper" functions
2021-09-07 18:18:53 +01:00
Paul Stemmet 2afc2b2606 lib/scanner: rename Scanner token retrieval methods to fetch_*
This split allows future maintainers (i.e: me) to quickly know whether a
function handles the conversion of bytes into tokens -- scan_* function
family -- or handles updating the Scanner's state -- the fetch_*
function family.

Typically one might thing of the call stack as:

1. a Scanner
2. fetches a token
3. by scanning the byte stream
2021-09-07 18:18:53 +01:00
Paul Stemmet 0f76f9bb08 lib/scanner: move test code into scanner/tests
ScanIter was never supposed to be used outside of tests
2021-09-07 18:18:53 +01:00
Paul Stemmet 633e461f4a scanner/anchor: refactor anchor scanning into its own module
and add scan_anchor which the relevant Scanner method calls to scan the
anchor, bringing it more inline with other scan_* functions
2021-09-07 18:18:53 +01:00
Paul Stemmet 829f5c0e81 lib/scanner: merge crate:: and self:: use statements
Not sure why rustfmt decided to do imports that way, but I prefer a
single block.
2021-09-07 18:18:53 +01:00
Paul Stemmet 4bc2eb5c9f scanner/directive: move directive scanning to a separate module
and refactor out the scanning code into scan_directive which is called
from the relevant Scanner method. This makes directive scanning more
consistent with the other scanning functions
2021-09-07 18:18:53 +01:00
Paul Stemmet 842ed7cacb scanner/stats: move MStats into its own module
and update the various use statements, plus better documentation on the
fields / methods
2021-09-07 18:18:53 +01:00
Paul Stemmet 1ed9d45344 lib/scanner: use const indicators over byte literals 2021-09-07 18:18:53 +01:00
Paul Stemmet 0b343ffa72 lib/scanner: refactor tests
Split out the _massive_ tests module into smaller focused modules, one
per area, explained below:

- anchor      | For anchor '&' and alias '*' node tags
- collection  | For flow and block collections
- complex     | For interactions between token types
- directive   | For directives '%'
- document    | For doc starts '---' and endings '...'
- key         | For mapping keys explicit and implicit
- tag         | For node type tags '!!', '!'
- whitespace  | For whitespace chomping between tokens

This vastly reduces the size of lib/scanner's file leading to notably
better performance by rustfmt and rustanalyzer
2021-09-07 18:18:53 +01:00
Paul Stemmet 369f49a248 lib/scanner: unit tests for block scalar token streams 2021-09-04 17:34:57 +01:00
Paul Stemmet 1f144caf76 lib/scanner: add catch all error, documentation 2021-09-04 17:34:57 +01:00
Paul Stemmet b26301cbd0 lib/scanner: add support for block scalars 2021-09-04 17:34:57 +01:00
Paul Stemmet 0f42818f4b scanner/error: add variant UnknownDelimiter
this is a catch all error for if we've exhausted the possible token
delimiters.
2021-09-04 17:34:57 +01:00
Paul Stemmet 8f4cab4f5b scalar/block: add unit test for header comments 2021-09-04 17:34:57 +01:00
Paul Stemmet 87af3bc96b scalar/block: fix skip_blanks comment handling 2021-09-04 17:34:57 +01:00
Paul Stemmet 8505569d3e scalar/block: clippy lints 2021-09-04 17:34:57 +01:00
Paul Stemmet 645007938f scalar/block: code reorganization 2021-09-04 17:34:57 +01:00
Paul Stemmet 4ec311d684 scalar/block: documentation 2021-09-04 17:34:57 +01:00
Paul Stemmet 0ffccab6c3 scalar/block: add unit tests for scan_block_scalar 2021-09-04 17:34:57 +01:00
Paul Stemmet bcf1e405ec scalar/block: add scan_block_scalar 2021-09-04 17:34:57 +01:00
Paul Stemmet dba9212224 scanner/macros: add widthOf!
for determining the length of a UTF8 unicode point. Uses the bit
distribution of UTF8 to determine the code point length
2021-09-04 17:34:57 +01:00
Paul Stemmet 3318f8762a scanner/macros: add isBreakZ!
wrapper around 'isBreak! || check!(buffer => [])'
2021-09-04 17:34:57 +01:00
Paul Stemmet 1b517b518f scanner/error: add InvalidBlockScalar, InvalidTab variants 2021-09-04 17:34:57 +01:00
Paul Stemmet a0d71ef644
docs/plain-scalar-indent: commit notes 2021-08-15 09:03:22 +00:00
Paul Stemmet 4eced7d9f4 lib/scanner: add complex test for plain scalars
for peace of mind
2021-08-14 19:59:07 +01:00
Paul Stemmet 4189d3db96 lib/scanner: add test for YAML indicators in plain scalar 2021-08-14 19:59:07 +01:00
Paul Stemmet 76d9ec5561 scalar/plain: fix indentation level to account for the 0'th level
and add the trait implementations to Indent for <usize> <op> <Indent>
comparisons
2021-08-14 19:59:07 +01:00
Paul Stemmet 338db9ce42 lib/scanner: fix is_plain_scalar to block unsafe plain chars
before the guard would fail, and implicitly fall through to the catchall
statement allowing illegal characters
2021-08-14 19:59:07 +01:00
Paul Stemmet 28a7ce9191 lib/scanner: clippy lints 2021-08-14 19:59:07 +01:00
Paul Stemmet 048550a7b1 lib/scanner: add unit tests for plain scalar token sequences 2021-08-14 19:59:07 +01:00
Paul Stemmet 5d8f78be25 lib/scanner: add support for plain scalars
This commit adds the 3rd of the 5 possible scalar types in YAML to the
scanner. It is compliant with the YAML spec, _except_ for its handling
of "JSON like" keys, which allow for the following value token (e.g ':')
to _not_ have a whitespace following it.

I frankly find this exception absurd, as the spec _clearly_ half assed
this in so that they can declare that they are a "strict super set of
JSON", nevermind that _a lot_ of the semantics of _every_ other context
for keys rely on a key being followed by whitespace.

I may eventually return to this add it; I've a pretty good idea how --
we just need to keep track of the "last" token produced, as only
?'"]} characters would modify the behavior, but I'd need to
make sure I haven't missed any subtle side effects, as almost all other
key handling implicitly relies on: Key token === ": ".
2021-08-14 19:59:07 +01:00
Paul Stemmet 88ac017647 scalar/plain: fix handling of non EOF trailing whitespace
before the loop would incorrectly update scalar_stats _after_ reaching a
': ' terminus. This is now fixed, as I check for the cases before
reentering the word loop.
2021-08-14 19:59:07 +01:00
Paul Stemmet b3e86dea9c scalar/plain: add unit tests for scan_plain_scalar 2021-08-14 19:59:07 +01:00
Paul Stemmet cddc1dae09 scalar/plain: add scan_plain_scalar
the primary driver for scanning plain YAML scalars. This implementation
tries to fit as closely as possible to the YAML spec, particularly in
its handling of (the lack of) spacing requirements inside flow contexts,
comment detection and special casing of - ? : as first character in flow
contexts.

Two things that are notably missing:

1. Proper tab '\t' handling in block context indentation
2. A sane maximum whitespace limit && better handling of whitespace
   storage. Rather than storing every whitespace given, I could instead
   count the whitespace separated by line breaks, and then add it back
   later, such that the maximum described above would apply to total
   line breaks, with the intervening whitespace stored as a u64/usize
2021-08-14 19:59:07 +01:00
Paul Stemmet 2aae6760f5 scalar/flow: use isDocumentIndicator! over longhand 2021-08-14 19:59:07 +01:00