Commit Graph

285 Commits

Author SHA1 Message Date
Paul Stemmet 790b9b55d5 docs: add README, explaining library purpose and status 2022-03-18 15:40:12 +00:00
Paul Stemmet a8230f86f2 lib: stub library documentation 2022-03-18 15:26:45 +00:00
Paul Stemmet ba41069dc6 event/error: document ParseError variants 2022-03-18 15:26:45 +00:00
Paul Stemmet 95eeec30f5 lib: expose reader and event modules 2022-03-18 15:26:45 +00:00
Paul Stemmet 183dbc3b1b lib/event: expose public API for YAML event streams
This commit surfaces a public API for streaming YAML events from a read
source. It provides callers an Events{} type that can be generated from
any reader::Read implementation -- so for the moment, OwnedReader(s) and
BorrowReader(s) -- via the module functions from_reader() and
from_reader_with(). This type implements IntoIterator, and thus can be
integrated with any iterator based flows, and benefits from the entire,
extensive ecosystem around them.

That said, I expect this to be a relatively unused part of this library
in the long term, being the lowest level public API exposed by this
library.
2022-03-18 15:26:45 +00:00
Paul Stemmet 65f990872f event/flag: add public Flags exposed to callers
These define the configuration that library users are allowed to set
when iterating over Events.

It currently only has one meaningful option, O_LAZY which reflects the
behavior exposed by lib/scanner. This will likely change in the future,
if more customization is desired when working with Event streams.
2022-03-18 15:26:45 +00:00
Paul Stemmet 44759d458d event/parser: use relative paths in test macros 2022-03-18 15:26:45 +00:00
Paul Stemmet bdfafc057f event/parser: module doc 2022-03-18 15:26:45 +00:00
Paul Stemmet d2c25e2bd0 lib/event: add module doc
A large portion of this was split out from event/parser's module doc to
coerce git to rename files.
2022-03-18 15:26:45 +00:00
Paul Stemmet 76824e9db7 lib/event: move Parser to lib/event/parser 2022-03-18 15:26:45 +00:00
Paul Stemmet 2e77556dd1 reader: fix visibility of public readers 2022-03-18 15:26:45 +00:00
Paul Stemmet f7d75b836f lib: pin rust version to 1.53
This isn't a hard guarantee, if a new version of Rust offers something
useful, this will be moved with no warnings.
2022-03-18 15:26:45 +00:00
Paul Stemmet d3fd96ea31 ci/github: add MSRV == 1.52
This can be bumped as needed but I'd like to stop checking it myself
2022-01-09 21:45:59 +00:00
Paul Stemmet f288b71f83 ci/github: improve test naming
- Remove unnecessary words from task names
- Add rust version to action name :: [$os/$rustv] $taskname
2022-01-09 21:45:59 +00:00
Paul Stemmet c14ba7829c ci/github: improve toolchain install task
- set default rust version

instead of a folder override, as we always expect to use the provided
version globally per run.

- explicitly declare extra rustup components

rather than implicitly rely on the current defaults
2022-01-09 21:45:59 +00:00
Paul Stemmet 056b9e27be lib/event: add module documentation 2021-12-29 15:07:26 +00:00
Paul Stemmet fcfb870583 lib/event/tests: add Parser tests 2021-12-29 15:07:26 +00:00
Paul Stemmet 2239a884fd event/tests/macros: add tokens!, events!, event!, node!, scalar!
These macros make op the test harness used by module tests. They allow
us to declare a set of tokens! which will be matched against the expected
events! that the tokens should produce.

The others simplify the process of declaring some of the more nested
event structures quickly
2021-12-29 15:07:26 +00:00
Paul Stemmet 2024724d04 lib/event: add handler for YAML nodes
- node

Note that this function must never call any other handlers, so the
Parser remains non-recursive.
2021-12-29 15:07:26 +00:00
Paul Stemmet 5dc521c278 lib/event: add handlers for flow_sequence->mappings
- flow_sequence_entry_mapping_key
- flow_sequence_entry_mapping_value
- flow_sequence_entry_mapping_end

These are special cased due to how some of the implied values can pop
up, and because we need far fewer rules then in the transition from
block_{sequence,mapping}->flow_mapping.
2021-12-29 15:07:26 +00:00
Paul Stemmet 34d893f4b3 lib/event: add handlers for sequences/mappings
- block_sequence_entry
- block_mapping_key
- block_mapping_value
- flow_sequence_entry
- flow_mapping_key
- flow_mapping_value

These were mostly straightforward, only tricky bit is handling all the
cases in which YAML allows a (scalar) node to be "implied".
2021-12-29 15:07:26 +00:00
Paul Stemmet 89c5b2df5e lib/event: add handlers for YAML document state
- document_start
- document_end
- explicit_document_content

Note that we guarantee at least one (DocumentStart, DocumentEnd) event
pair in the event stream, regardless of whether these tokens exist or
not.

We also guarantee that each DocumentStart _will_ have a DocumentEnd
eventually, again regardless of whether such exists in the token stream.

This isn't explicitly required by the YAML spec, but makes usage of the
Parser more pleasant to callers, as all "indentation" events --
documents, sequences, mappings -- have a guaranteed start and end event,
without the caller needing to infer this behavior from the stream
itself.

If the caller is interested, each DocumentStart and DocumentEnd event
records whether it was implicit (missing from the byte stream), or not.
2021-12-29 15:07:26 +00:00
Paul Stemmet d33c7daf4e lib/event: add stream_start, stream_end, empty_scalar handlers 2021-12-29 15:07:26 +00:00
Paul Stemmet 36bf2fef52 lib/event: add state handler skeletons 2021-12-29 15:07:26 +00:00
Paul Stemmet f086975a63 lib/event: add Parser, EventIter skeletons
This commit defines the public API of this module: the Parser. Next
steps are to finish out all of the todo! methods on the StateMachine
branches.
2021-12-29 15:07:26 +00:00
Paul Stemmet 763400291e event/macros: add peek!, pop!, state!, consume!, initEvent!
These macros will be used in the module proper, when operating the event
state machine.
2021-12-29 15:07:26 +00:00
Paul Stemmet 5ddbd93ce7 event/types: add Event, EventData and child structures
The most notable of the types included in this commit is EventData. Its
parent, Event, is a small wrapper with some additional stream information
encoded -- the approximate start and end bytes covered.

EventData has 10 variants:

 1. StreamStart
 2. StreamEnd
 3. DocumentStart
 4. DocumentEnd
 5. Alias
 6. Scalar
 7. MappingStart
 8. MappingEnd
 9. SequenceStart
10. SequenceEnd

Combined, they allow us to express a stream of YAML in an iterative
event model, that should hopefully be easy (at least compared to YAML
proper) to consume.

Expressed in pseudo backus-naur, this is the expected form of any
given event stream:

=== Event Stream ===
stream          := StreamStart document+ StreamEnd
document        := DocumentStart content? DocumentEnd
content         := Scalar | collection
collection      := sequence | mapping
sequence        := SequenceStart node* SequenceEnd
mapping         := MappingStart (node node)* MappingEnd
node            := Alias | content
=== Syntax ===
?               => 0 or 1 of prefix
*               => 0 or more of prefix
+               => 1 or more of prefix
()              => production grouping
|               => production logical OR
=== End ===
2021-12-29 15:07:26 +00:00
Paul Stemmet a21385e92c event/error: add module Error, Result typedef
Plus some From impls for Reader, and Scanner error types.
2021-12-29 15:07:26 +00:00
Paul Stemmet 0f6fb62cb7 event/state: add StateMachine, Flags
This module describes the various states that we can reach in a YAML
Token stream, and provides the machinery for manipulating it.
2021-12-29 15:07:26 +00:00
Paul Stemmet 19f294cb1c lib/event: add module stub
This module will house the first, lowest level public API of this
library, eventually exposing a structure that allows callers to consume
high level YAML 'Events', likely with an Iterator interface.
2021-12-29 15:07:26 +00:00
Paul Stemmet 49116317c1 reader/owned: add test_reader! tests 2021-12-29 15:07:26 +00:00
Paul Stemmet e1fe33e202 reader/owned: add OwnedReader
This is an implementation of "Stacked Borrows" wherein memory is
allocated in chunks, and once a chunk is reached, a new chunk is
allocated and the old one's stack state (cap,len,ptr) is moved into
the tail.
2021-12-29 15:07:26 +00:00
Paul Stemmet fb90078a3e reader/borrow: add test_reader! tests 2021-12-29 15:07:26 +00:00
Paul Stemmet e49c473604 reader/borrow: add BorrowReader
Naive implementation of Read using an existing, borrowed UTF8
slice (&str).
2021-12-29 15:07:26 +00:00
Paul Stemmet 8fa290d374 reader/test_util: add test_reader! macro
This macro generates a test suite with provided Read implementation,
allowing me to quickly and uniformly test reader implementations
2021-12-29 15:07:26 +00:00
Paul Stemmet 43353dae19 lib/reader: add Reader, PeekReader structs
These will be used by higher level APIs to drive the underlying Read
implementation
2021-12-29 15:07:26 +00:00
Paul Stemmet a08aab7992 lib/reader: add trait Read
Any Read implementation must uphold the contract:

    (&'de self) -> Tokens<'de>

That is, any borrows into the backing bytes given out must not be mutated
in any way.

For an existing borrow (e.g &str) this is trivially possible, however
things get much more complicated when dealing with an owned source that
might not be complete -- a `std::io::Read` object, for example.

While we could simply read the entire thing first, and then borrow from
the complete byte stream this is less than ideal, particularly for Serde
implementations as an owned source will only provide a DeserializeOwned
implementation, consequently copying data. It also makes stream
processing YAML arbitrarily limited to the total size of the stream,
rather than the actual data stored -- e.g: sum(SCALAR.len()) + count(SCALAR)
-- which is a strong limitation, given YAML natural stream processing
capabilities.

To overcome this limitation, I've decided to introduce a "Stacked Borrow"
pattern with the use of a little unsafe.

```
; A rust vector is just a capacity, length and ptr to somewhere in the
; heap
VEC := (cap,len,ptr)

; Each OwnedReader keeps two VECs, one for bytes (u8) and another for
; VECs of bytes
OwnedReader := {
  head: (cap, len, ptr)
  tail: (cap, len, ptr)
}

; Demonstration of the various memory segments stored on the program's heap
; and how the OwnedReader's ptrs connect
HEAP := {
  head.ptr->[u8..]
  tail.ptr->[VEC..]
  tail[0].ptr->[u8..]
  tail[n].ptr->[u8..]
}
```

The OwnedReader makes a promise to NEVER call realloc on an existing
heap segment; therefore any references given out to heap segments are
immutable, fulfilling the contract required by the Parser (and Scanner).

Instead, if/when more of the byte stream is requested, it will allocate
a new .head and swapping out the old .head onto the .tail stack thus
keep the memory live.

Notably, this process hasn't described how to determine if any .tail
segments are no longer needed and unload them. Mostly because I haven't
figured that part out completely yet. Probably keeping track of the
lowest borrowed segment somehow and running reconciliation periodically.
But it _is_ possible using this strategy.
2021-12-29 15:07:26 +00:00
Paul Stemmet a8a2aee615 scanner/entry: add .marker() method
This allows users of TokenEntry(s) to have a quick, cheap method of
ascertaining what the underlying Token is, even if the entry itself
is deferred.
2021-12-29 15:07:26 +00:00
Paul Stemmet bdbf510a24 lib/scanner: clippy lints from 1.56 2021-12-29 15:07:26 +00:00
Paul Stemmet ce8b59b646 lib/scanner: add offset controls 2021-12-29 15:07:26 +00:00
Paul Stemmet 0343c29021 lib/{token,scanner/entry}: derive Clone on more structs 2021-12-29 15:07:26 +00:00
Paul Stemmet 91545f4c70 lib: fix visibility on Queue, Scanner, TokenEntry
Each of these will likely appear in the parts of the public API, even if
they aren't directly used.

Its likely these will be "public but unreachable" -- e.g a public type
in a private module.

This will likely be revisited on the way to a stable 1.0 library
version, but works for now.
2021-12-29 15:07:26 +00:00
Paul Stemmet ccc1bc16ab
license/mpl2
* LICENSE: MPL 2.0

* lib/**: add MPL 2.0 header to source code

* Cargo: license = "MPL-2.0"
2021-09-17 17:32:30 +01:00
Paul Stemmet 7d90804cc5 ci/github: add matrix targets for test_lazy 2021-09-17 17:03:13 +01:00
Paul Stemmet c977815ddc scalar/block: module documentation updates 2021-09-17 17:03:13 +01:00
Paul Stemmet 9ed1bcc00e scalar/plain: fix subtle slice error in scan_plain_scalar_lazy
When checking for a terminating sequence in plain scalars, we either
need a flow indicator (in flow contexts only), or a ': ' byte sequence,
where the space can be any valid YAML whitespace.

The issue here is that the lazy variant was correctly identifying the
terminating sequence, _but not recording it_ for the Deferred's slice.
This commit fixes that, ensuring we always record the final 1 or 2 bytes
before exiting the main loop.
2021-09-17 17:03:13 +01:00
Paul Stemmet 1cdad01126 scalar/flow: add unit test for escaped double quote 2021-09-17 17:03:13 +01:00
Paul Stemmet 0c38dda908 scalar/flow: fixes to scan_flow_scalar_lazy's chomping
1. Handle linebreaks separately from other characters (for stats)
2. Don't quit early on an escaped double quote (\")
2021-09-17 17:03:13 +01:00
Paul Stemmet 0a4a7930a5 scalar/*/tests: rename TEST_OPTS -> TEST_FLAGS
to remain consistent with scanner/tests, also derive the base TEST_FLAGS
from scanner/tests.TEST_FLAGS, minus options that do not make sense for
the test battery (O_EXTENDABLE)
2021-09-17 17:03:13 +01:00
Paul Stemmet 8d01532b1f Cargo: add feature.test_lazy
For testing the Scanner with O_LAZY active
2021-09-17 17:03:13 +01:00