Commit Graph

274 Commits

Author SHA1 Message Date
Paul Stemmet f7d75b836f lib: pin rust version to 1.53
This isn't a hard guarantee, if a new version of Rust offers something
useful, this will be moved with no warnings.
2022-03-18 15:26:45 +00:00
Paul Stemmet d3fd96ea31 ci/github: add MSRV == 1.52
This can be bumped as needed but I'd like to stop checking it myself
2022-01-09 21:45:59 +00:00
Paul Stemmet f288b71f83 ci/github: improve test naming
- Remove unnecessary words from task names
- Add rust version to action name :: [$os/$rustv] $taskname
2022-01-09 21:45:59 +00:00
Paul Stemmet c14ba7829c ci/github: improve toolchain install task
- set default rust version

instead of a folder override, as we always expect to use the provided
version globally per run.

- explicitly declare extra rustup components

rather than implicitly rely on the current defaults
2022-01-09 21:45:59 +00:00
Paul Stemmet 056b9e27be lib/event: add module documentation 2021-12-29 15:07:26 +00:00
Paul Stemmet fcfb870583 lib/event/tests: add Parser tests 2021-12-29 15:07:26 +00:00
Paul Stemmet 2239a884fd event/tests/macros: add tokens!, events!, event!, node!, scalar!
These macros make op the test harness used by module tests. They allow
us to declare a set of tokens! which will be matched against the expected
events! that the tokens should produce.

The others simplify the process of declaring some of the more nested
event structures quickly
2021-12-29 15:07:26 +00:00
Paul Stemmet 2024724d04 lib/event: add handler for YAML nodes
- node

Note that this function must never call any other handlers, so the
Parser remains non-recursive.
2021-12-29 15:07:26 +00:00
Paul Stemmet 5dc521c278 lib/event: add handlers for flow_sequence->mappings
- flow_sequence_entry_mapping_key
- flow_sequence_entry_mapping_value
- flow_sequence_entry_mapping_end

These are special cased due to how some of the implied values can pop
up, and because we need far fewer rules then in the transition from
block_{sequence,mapping}->flow_mapping.
2021-12-29 15:07:26 +00:00
Paul Stemmet 34d893f4b3 lib/event: add handlers for sequences/mappings
- block_sequence_entry
- block_mapping_key
- block_mapping_value
- flow_sequence_entry
- flow_mapping_key
- flow_mapping_value

These were mostly straightforward, only tricky bit is handling all the
cases in which YAML allows a (scalar) node to be "implied".
2021-12-29 15:07:26 +00:00
Paul Stemmet 89c5b2df5e lib/event: add handlers for YAML document state
- document_start
- document_end
- explicit_document_content

Note that we guarantee at least one (DocumentStart, DocumentEnd) event
pair in the event stream, regardless of whether these tokens exist or
not.

We also guarantee that each DocumentStart _will_ have a DocumentEnd
eventually, again regardless of whether such exists in the token stream.

This isn't explicitly required by the YAML spec, but makes usage of the
Parser more pleasant to callers, as all "indentation" events --
documents, sequences, mappings -- have a guaranteed start and end event,
without the caller needing to infer this behavior from the stream
itself.

If the caller is interested, each DocumentStart and DocumentEnd event
records whether it was implicit (missing from the byte stream), or not.
2021-12-29 15:07:26 +00:00
Paul Stemmet d33c7daf4e lib/event: add stream_start, stream_end, empty_scalar handlers 2021-12-29 15:07:26 +00:00
Paul Stemmet 36bf2fef52 lib/event: add state handler skeletons 2021-12-29 15:07:26 +00:00
Paul Stemmet f086975a63 lib/event: add Parser, EventIter skeletons
This commit defines the public API of this module: the Parser. Next
steps are to finish out all of the todo! methods on the StateMachine
branches.
2021-12-29 15:07:26 +00:00
Paul Stemmet 763400291e event/macros: add peek!, pop!, state!, consume!, initEvent!
These macros will be used in the module proper, when operating the event
state machine.
2021-12-29 15:07:26 +00:00
Paul Stemmet 5ddbd93ce7 event/types: add Event, EventData and child structures
The most notable of the types included in this commit is EventData. Its
parent, Event, is a small wrapper with some additional stream information
encoded -- the approximate start and end bytes covered.

EventData has 10 variants:

 1. StreamStart
 2. StreamEnd
 3. DocumentStart
 4. DocumentEnd
 5. Alias
 6. Scalar
 7. MappingStart
 8. MappingEnd
 9. SequenceStart
10. SequenceEnd

Combined, they allow us to express a stream of YAML in an iterative
event model, that should hopefully be easy (at least compared to YAML
proper) to consume.

Expressed in pseudo backus-naur, this is the expected form of any
given event stream:

=== Event Stream ===
stream          := StreamStart document+ StreamEnd
document        := DocumentStart content? DocumentEnd
content         := Scalar | collection
collection      := sequence | mapping
sequence        := SequenceStart node* SequenceEnd
mapping         := MappingStart (node node)* MappingEnd
node            := Alias | content
=== Syntax ===
?               => 0 or 1 of prefix
*               => 0 or more of prefix
+               => 1 or more of prefix
()              => production grouping
|               => production logical OR
=== End ===
2021-12-29 15:07:26 +00:00
Paul Stemmet a21385e92c event/error: add module Error, Result typedef
Plus some From impls for Reader, and Scanner error types.
2021-12-29 15:07:26 +00:00
Paul Stemmet 0f6fb62cb7 event/state: add StateMachine, Flags
This module describes the various states that we can reach in a YAML
Token stream, and provides the machinery for manipulating it.
2021-12-29 15:07:26 +00:00
Paul Stemmet 19f294cb1c lib/event: add module stub
This module will house the first, lowest level public API of this
library, eventually exposing a structure that allows callers to consume
high level YAML 'Events', likely with an Iterator interface.
2021-12-29 15:07:26 +00:00
Paul Stemmet 49116317c1 reader/owned: add test_reader! tests 2021-12-29 15:07:26 +00:00
Paul Stemmet e1fe33e202 reader/owned: add OwnedReader
This is an implementation of "Stacked Borrows" wherein memory is
allocated in chunks, and once a chunk is reached, a new chunk is
allocated and the old one's stack state (cap,len,ptr) is moved into
the tail.
2021-12-29 15:07:26 +00:00
Paul Stemmet fb90078a3e reader/borrow: add test_reader! tests 2021-12-29 15:07:26 +00:00
Paul Stemmet e49c473604 reader/borrow: add BorrowReader
Naive implementation of Read using an existing, borrowed UTF8
slice (&str).
2021-12-29 15:07:26 +00:00
Paul Stemmet 8fa290d374 reader/test_util: add test_reader! macro
This macro generates a test suite with provided Read implementation,
allowing me to quickly and uniformly test reader implementations
2021-12-29 15:07:26 +00:00
Paul Stemmet 43353dae19 lib/reader: add Reader, PeekReader structs
These will be used by higher level APIs to drive the underlying Read
implementation
2021-12-29 15:07:26 +00:00
Paul Stemmet a08aab7992 lib/reader: add trait Read
Any Read implementation must uphold the contract:

    (&'de self) -> Tokens<'de>

That is, any borrows into the backing bytes given out must not be mutated
in any way.

For an existing borrow (e.g &str) this is trivially possible, however
things get much more complicated when dealing with an owned source that
might not be complete -- a `std::io::Read` object, for example.

While we could simply read the entire thing first, and then borrow from
the complete byte stream this is less than ideal, particularly for Serde
implementations as an owned source will only provide a DeserializeOwned
implementation, consequently copying data. It also makes stream
processing YAML arbitrarily limited to the total size of the stream,
rather than the actual data stored -- e.g: sum(SCALAR.len()) + count(SCALAR)
-- which is a strong limitation, given YAML natural stream processing
capabilities.

To overcome this limitation, I've decided to introduce a "Stacked Borrow"
pattern with the use of a little unsafe.

```
; A rust vector is just a capacity, length and ptr to somewhere in the
; heap
VEC := (cap,len,ptr)

; Each OwnedReader keeps two VECs, one for bytes (u8) and another for
; VECs of bytes
OwnedReader := {
  head: (cap, len, ptr)
  tail: (cap, len, ptr)
}

; Demonstration of the various memory segments stored on the program's heap
; and how the OwnedReader's ptrs connect
HEAP := {
  head.ptr->[u8..]
  tail.ptr->[VEC..]
  tail[0].ptr->[u8..]
  tail[n].ptr->[u8..]
}
```

The OwnedReader makes a promise to NEVER call realloc on an existing
heap segment; therefore any references given out to heap segments are
immutable, fulfilling the contract required by the Parser (and Scanner).

Instead, if/when more of the byte stream is requested, it will allocate
a new .head and swapping out the old .head onto the .tail stack thus
keep the memory live.

Notably, this process hasn't described how to determine if any .tail
segments are no longer needed and unload them. Mostly because I haven't
figured that part out completely yet. Probably keeping track of the
lowest borrowed segment somehow and running reconciliation periodically.
But it _is_ possible using this strategy.
2021-12-29 15:07:26 +00:00
Paul Stemmet a8a2aee615 scanner/entry: add .marker() method
This allows users of TokenEntry(s) to have a quick, cheap method of
ascertaining what the underlying Token is, even if the entry itself
is deferred.
2021-12-29 15:07:26 +00:00
Paul Stemmet bdbf510a24 lib/scanner: clippy lints from 1.56 2021-12-29 15:07:26 +00:00
Paul Stemmet ce8b59b646 lib/scanner: add offset controls 2021-12-29 15:07:26 +00:00
Paul Stemmet 0343c29021 lib/{token,scanner/entry}: derive Clone on more structs 2021-12-29 15:07:26 +00:00
Paul Stemmet 91545f4c70 lib: fix visibility on Queue, Scanner, TokenEntry
Each of these will likely appear in the parts of the public API, even if
they aren't directly used.

Its likely these will be "public but unreachable" -- e.g a public type
in a private module.

This will likely be revisited on the way to a stable 1.0 library
version, but works for now.
2021-12-29 15:07:26 +00:00
Paul Stemmet ccc1bc16ab
license/mpl2
* LICENSE: MPL 2.0

* lib/**: add MPL 2.0 header to source code

* Cargo: license = "MPL-2.0"
2021-09-17 17:32:30 +01:00
Paul Stemmet 7d90804cc5 ci/github: add matrix targets for test_lazy 2021-09-17 17:03:13 +01:00
Paul Stemmet c977815ddc scalar/block: module documentation updates 2021-09-17 17:03:13 +01:00
Paul Stemmet 9ed1bcc00e scalar/plain: fix subtle slice error in scan_plain_scalar_lazy
When checking for a terminating sequence in plain scalars, we either
need a flow indicator (in flow contexts only), or a ': ' byte sequence,
where the space can be any valid YAML whitespace.

The issue here is that the lazy variant was correctly identifying the
terminating sequence, _but not recording it_ for the Deferred's slice.
This commit fixes that, ensuring we always record the final 1 or 2 bytes
before exiting the main loop.
2021-09-17 17:03:13 +01:00
Paul Stemmet 1cdad01126 scalar/flow: add unit test for escaped double quote 2021-09-17 17:03:13 +01:00
Paul Stemmet 0c38dda908 scalar/flow: fixes to scan_flow_scalar_lazy's chomping
1. Handle linebreaks separately from other characters (for stats)
2. Don't quit early on an escaped double quote (\")
2021-09-17 17:03:13 +01:00
Paul Stemmet 0a4a7930a5 scalar/*/tests: rename TEST_OPTS -> TEST_FLAGS
to remain consistent with scanner/tests, also derive the base TEST_FLAGS
from scanner/tests.TEST_FLAGS, minus options that do not make sense for
the test battery (O_EXTENDABLE)
2021-09-17 17:03:13 +01:00
Paul Stemmet 8d01532b1f Cargo: add feature.test_lazy
For testing the Scanner with O_LAZY active
2021-09-17 17:03:13 +01:00
Paul Stemmet 9c75400697 scanner/tests: update ScanIter to use TEST_FLAGS always 2021-09-17 17:03:13 +01:00
Paul Stemmet f8dc375d14 scanner/tests: add test_flags and const TEST_FLAGS
These additively apply flags according to the controlling feature flag
2021-09-17 17:03:13 +01:00
Paul Stemmet 522d38b665 scanner/context: add Indent.as_usize
This makes explicit what was happening under the hood with the
'cxt.indent() + 0' expression. It also clearly describes the
circumstances in which it is possible to use the function safely
2021-09-17 17:03:13 +01:00
Paul Stemmet 4c63c0c047 scalar/*/tests: refactor shared functions/consts
this will allow us to easily manipulate the TEST_FLAGS for each scalar
type's tests, using a feature.
2021-09-17 17:03:13 +01:00
Paul Stemmet bfdb0d78a7 scalar/block: fix tests 2021-09-17 17:03:13 +01:00
Paul Stemmet 64adcb10b7 scanner/entry: add ScalarB variant to MaybeToken for block scalars 2021-09-17 17:03:13 +01:00
Paul Stemmet 34144344fc scalar/block: add scan_block_scalar_lazy, return MaybeToken 2021-09-17 17:03:13 +01:00
Paul Stemmet fc669a6e92 scalar/plain: fix tests 2021-09-17 17:03:13 +01:00
Paul Stemmet 3e6e04c3b2 scanner/entry: add ScalarP variant to MaybeToken for plain scalars 2021-09-17 17:03:13 +01:00
Paul Stemmet 5786cc159f scalar/plain: add scan_plain_scalar_lazy, return MaybeToken 2021-09-17 17:03:13 +01:00
Paul Stemmet 0f7d65ee7d scalar/flow: fix tests 2021-09-17 17:03:13 +01:00