2
0
Fork 0
mirror of https://github.com/bazel-contrib/bazel-lib synced 2024-12-01 07:15:24 +00:00
Commit graph

15 commits

Author SHA1 Message Date
Sahin Yort f7fb4a92ff
doc: warn about gzip timestamps (#979) 2024-11-04 06:20:11 -08:00
Peter Lobsinger bca34bd17c
perf: report unused inputs for the tar rule (#951)
* perf: report unused inputs for the tar rule

The `mtree` spec passed to the `tar` rule very often selects a subset of the
inputs made available through the `srcs` attribute. In many cases, these
subsets do not break down cleanly along dependency-tree lines and there
is no simple way just pass less content to the `tar` rule.

One prominent example where this occurs is when constructing the tars
for OCI image layers. For instance when [building a Python-based
container image](https://github.com/bazel-contrib/rules_oci/blob/main/docs/python.md),
we might want to split the Python interpreter, third-party dependencies, and
application code into their own layers. This is done by [filtering the
`mtree_spec`](85cb2aaf8c/oci_python_image/py_layer.bzl (L39)).

However, in the operation to construct a `tar` from a subsetted mtree,
it is usually still an unsubsetted tree of `srcs` that gets passed. As
a result, the subset tarball is considered dependent upon a larger set
of sources than is strictly necessary.

This over-scoping runs counter to a very common objective associated with
breaking up an image into layers - isolating churn to a smaller slice of
the application. Because of the spurious relationships established in
Bazel's dependency graph, all tars get rebuilt anytime any content in
the application gets changed. Tar rebuilds can even be triggered by
changes to files that are completely filtered-out from all layers of the container.

Redundent creation of archive content is usually not too computationally
intensive, but the archives can be quite large in some cases, and
avoiding a rebuild might free up gigabytes of disk and/or network
bandwidth for
better use. In addition, eliminating the spurious dependency edges
removes erroneous constraints applied to the build action schedule;
these tend to push all Tar-building operations towards the end of a
build, even when some archive construction could be scheduled much earlier.

## Risk assessment and mitigation

The `unused_inputs_list` mechanism used to report spurious dependency
relationships is a bit difficult to use. Reporting an actually-used
input as unused can create difficult to diagnose problems down the line.

However, the behaviour of the `mtree`-based `tar` rule is sufficiently
simple and self-contained that I am fairly confident that this rule's
used/unused set can be determined accurately in a maintainable fashion.

Out of an abundance of caution I have gated this feature behind a
default-off flag. The `tar` rule will continue to operate as it had
before - typically over-reporting dependencies - unless the
`--@aspect_bazel_lib//lib:tar_compute_unused_inputs` flag is passed.

### Filter accuracy

The `vis` encoding used by the `mtree` format to resiliently handle path
names has a small amount of "play" to it - it is reversable but the
encoded representation of a string is not
unique. Two unequal encoded strings might decode to the same value; this
can happen when at least one of the encoded strings contains unnecessary
escapes that are nevertheless honoured by the decoder.

The unused-inputs set is determined using a filter that compares
`vis`-encoded strings. In the presence of non-canonically-encoded
paths, false-mismatches can lead to falsely reporting that an input is
unused.

The only `vis`-encoded path content that is under the control of callers
is the `mtree` content itself; all other `vis`-encoded strings are
constructed internally to this package, not exposed publicly, and are
all derived using the `lib/private/tar.bzl%_vis_encode` function; all of
these paths are expected to compare exactly. Additionally, it is expected that
many/most users will use this package's helpers (e.g. `mtree_spec`) when
crafting their mtree content; such content is also safe. It is only when
the user crafts their own mtree, or modifies an mtree spec's `content=`
fields' encoding in some way, that a risk of inaccurate reporting
arises. The chances for this are expected to be minor since this seems
like an inconvenient and not-particularly-useful thing for a user to go
out of their way to do.

* Also include other bsdtar toolchain files in keep set

* Add tri-state attribute to control unused-inputs behaviour

This control surface provides for granular control of the feature. The
interface is selected to mirror the common behaviour of `stamp` attributes.

* Add bzl_library level dep

* Update docs

* pre-commit

* Add reminder to change flag default on major-version bump

* Add note about how to make unused input computation exactly correct

* Add a test for unused_inputs listing

* Support alternate contents= form

This is accepted by bsdtar/libarchive. In fact `contents=` is the only of
the pair documented in `mtree(5)`; `content=` is an undocumented
alternate form supported by libarchive.

* Don't try to prune the unprunable

Bazel's interpretation of unused_inputs_list cannot accomodate certain
things in filenames. These are also likely to mess up our own
line-oriented protocol in the shellscript that produces this file.

Co-authored-by: Sahin Yort <thesayyn@gmail.com>

* Rerun docs update

---------

Co-authored-by: Sahin Yort <thesayyn@gmail.com>
2024-10-13 09:58:56 -07:00
Marcel ca80d07fca
Fix unknown repo error with mtree_mutate and Bzlmod (#948)
With Bzlmod, every repo has its own namespace. Using Label() should make sure it uses the namespace of the .bzl file instead of the caller's one.
2024-10-04 15:41:24 +00:00
Alex Eagle 0f5e1dcafd
chore(deps): upgrade stardoc (#894)
* chore(deps): upgrade stardoc

This uses the Bazel 7 'starlark_doc_extract' rule which our docsite expects for slurping data.

* chore: stardoc setup in WORKSPACE too

* chore: skip stardoc on bazel 6 in cases where the legacy extractor produces different docstrings
2024-08-08 12:56:11 -07:00
Alex Eagle 109f32eefb
docs(tar): point to the tests as useful examples (#892)
* docs(tar): point to the tests as useful examples

Improve the content to make it easier to reference as examples of usage.

* fix broken link
2024-08-05 11:18:57 -07:00
Tobias Schlatter 086624ae47
fix(tar): expose package_dir argument in mtree_mutate (#873)
This was likely forgotten in #829 when making the parameters explicit
during review.
2024-07-02 13:29:24 +03:00
Alex Eagle 977f27f7a0
feat(tar): add ergonomic way to strip_prefix (#829) 2024-05-01 12:36:39 -07:00
Sahin Yort a29dd93c0b
fix: srcs is not mandatory (#786) 2024-03-08 10:47:38 -08:00
Sahin Yort 197b2da974
feat: support location expansion in tar (#774) 2024-03-01 14:51:47 -08:00
Alex Eagle 38fecbcbb5
Update tar.bzl (#751)
* Update tar.bzl

Fix header so it's not presented at the same level as the parent (the `tar` macro)

* fix docs
2024-02-08 15:26:05 -08:00
Alex Eagle f65019be4e
chore: improve docs about mtree mutation (#692) 2023-12-13 15:03:32 -08:00
Sahin Yort a219f5260d
fix: expose tar_lib as public (#680) 2023-12-08 10:31:01 -08:00
Derek Cormier 5bd6e5fdd4
fix(ci): fix bzlmod issues and enable on ci (#658) 2023-11-15 15:07:03 -08:00
Alex Eagle 472bf9b122
feat: tar includes runfiles (#595)
* feat: tar includes runfiles

* chore: try to fix red circleci

* fix: tracked down problem

* chore: document tar#srcs supports runfiles

* chore: add comment about logic for trimming manifest suffix

* chore: missed a replacement spot

* chore: give up on the listing test for now
2023-10-09 15:57:52 -07:00
Alex Eagle a283a8216d feat: add a tar toolchain (#468)
* feat: add a BSD tar toolchain

@thesayyn discovered that it has a feature which should make it a drop-in replacement for pkg_tar
including fine-grained file permissions and symlinks:
https://man.freebsd.org/cgi/man.cgi?mtree(8)

* show example of mtree usage

* feat: introduce tar rule

* cleanup and get test passing

* more cleanup

* chore: add support for compress flags

* chore: add docs

* chore: add docs

* feat: implement linux bsdtar toolchain (#566)

* chore: improve target naming

* WIP: args

* feat: generate mtree spec

Also allow arbitrary args

* refactor: mtree is required

* refactor: style nits

* fix: support mix of source and generated artifacts

* feat: demonstrate strip_prefix

* chore: regen docs

* fix: make host toolchain a fallback toolchain

* fix: include libarchive13.so when installing BSD tar

* chore: buildifier

* fix: aarch64 cpu constraint

* fix(ci): include libarchive13.so when running tar

* chore: add libnettle

* refactor: inputs mutated less

* refactor: remove unneeded substitution arg

* refactor: don't advertise unsupported modes

* fix: hack enough to make it run on my machine

* chore: dynamic libraries included in sh_binary under toolchain

* make sh_binary work

* refactor: drop arm64 for now

* fix toolchain

* fix test

* chore: improve test naming scheme

---------

Co-authored-by: Sahin Yort <thesayyn@gmail.com>
2023-10-03 13:50:55 -07:00