bazel-lib/docs/tar.md

12 KiB

General-purpose rule to create tar archives.

Unlike pkg_tar from rules_pkg:

  • It does not depend on any Python interpreter setup
  • The "manifest" specification is a mature public API and uses a compact tabular format, fixing https://github.com/bazelbuild/rules_pkg/pull/238
  • It doesn't rely custom program to produce the output, instead we rely on the well-known C++ program called "tar". Specifically, we use the BSD variant of tar since it provides a means of controlling mtimes, uid, symlinks, etc.

We also provide full control for tar'ring binaries including their runfiles.

The tar binary is hermetic and fully statically-linked. It is fetched as a toolchain from https://github.com/aspect-build/bsdtar-prebuilt.

Important Note

When using compress = "gzip" its important to disable the non-deterministic time header by providing the --options=gzip:!timestamp option.

See: https://datatracker.ietf.org/doc/html/rfc1952#page-5 See: https://github.com/bazel-contrib/bazel-lib/issues/783

Examples

See the tar tests for examples of usage.

Mutating the tar contents

The mtree_spec rule can be used to create an mtree manifest for the tar file. Then you can mutate that spec using mtree_mutate and feed the result as the mtree attribute of the tar rule.

For example, to set the owner uid of files in the tar, you could:

_TAR_SRCS = ["//some:files"]

mtree_spec(
    name = "mtree",
    srcs = _TAR_SRCS,
)

mtree_mutate(
    name = "change_owner",
    mtree = ":mtree",
    owner = "1000",
)

tar(
    name = "tar",
    srcs = _TAR_SRCS,
    mtree = "change_owner",
)

TODO:

  • Provide convenience for rules_pkg users to re-use or replace pkg_files trees

mtree_spec

mtree_spec(name, srcs, out)

Create an mtree specification to map a directory hierarchy. See https://man.freebsd.org/cgi/man.cgi?mtree(8)

ATTRIBUTES

Name Description Type Mandatory Default
name A unique name for this target. Name required
srcs Files that are placed into the tar List of labels optional []
out Resulting specification file to write Label optional None

tar_rule

tar_rule(name, srcs, out, args, compress, compute_unused_inputs, mode, mtree)

Rule that executes BSD tar. Most users should use the tar macro, rather than load this directly.

ATTRIBUTES

Name Description Type Mandatory Default
name A unique name for this target. Name required
srcs Files, directories, or other targets whose default outputs are placed into the tar.

If any of the srcs are binaries with runfiles, those are copied into the resulting tar as well.
List of labels optional []
out Resulting tar file to write. If absent, [name].tar is written. Label optional None
args Additional flags permitted by BSD tar; see the man page. List of strings optional []
compress Compress the archive file with a supported algorithm. String optional ""
compute_unused_inputs Whether to discover and prune input files that will not contribute to the archive.

Unused inputs are discovered by comparing the set of input files in srcs to the set of files referenced by mtree. Files not used for content by the mtree specification will not be read by the tar tool when creating the archive and can be pruned from the input set using the unused_inputs_list mechanism.

Benefits: pruning unused input files can reduce the amount of work the build system must perform. Pruned files are not included in the action cache key; changes to them do not invalidate the cache entry, which can lead to higher cache hit rates. Actions do not need to block on the availability of pruned inputs, which can increase the available parallelism of builds. Pruned files do not need to be transferred to remote-execution workers, which can reduce network costs.

Risks: pruning an actually-used input file can lead to unexpected, incorrect results. The comparison performed between srcs and mtree is currently inexact and may fail to handle handwritten or externally-derived mtree specifications. However, it is safe to use this feature when the lines found in mtree are derived from one or more mtree_spec rules, filtered and/or merged on whole-line basis only.

Possible values:

- compute_unused_inputs = 1: Always perform unused input discovery and pruning. - compute_unused_inputs = 0: Never discover or prune unused inputs. - compute_unused_inputs = -1: Discovery and pruning of unused inputs is controlled by the --[no]@aspect_bazel_lib//lib:tar_compute_unused_inputs flag.
Integer optional -1
mode A mode indicator from the following list, copied from the tar manpage:

- create: Create a new archive containing the specified items. - append: Like create, but new entries are appended to the archive. Note that this only works on uncompressed archives stored in regular files. The -f option is required. - list: List archive contents to stdout. - update: Like append, but new entries are added only if they have a modification date newer than the corresponding entry in the archive. Note that this only works on uncompressed archives stored in regular files. The -f option is required. - extract: Extract to disk from the archive. If a file with the same name appears more than once in the archive, each copy will be extracted, with later copies overwriting (replacing) earlier copies.
String optional "create"
mtree An mtree specification file Label required

mtree_mutate

mtree_mutate(name, mtree, strip_prefix, package_dir, mtime, owner, ownername, awk_script, kwargs)

Modify metadata in an mtree file.

PARAMETERS

Name Description Default Value
name name of the target, output will be [name].mtree. none
mtree input mtree file, typically created by mtree_spec. none
strip_prefix prefix to remove from all paths in the tar. Files and directories not under this prefix are dropped. None
package_dir directory prefix to add to all paths in the tar. None
mtime new modification time for all entries. None
owner new uid for all entries. None
ownername new uname for all entries. None
awk_script may be overridden to change the script containing the modification logic. Label("@aspect_bazel_lib//lib/private:modify_mtree.awk")
kwargs additional named parameters to genrule none

tar

tar(name, mtree, stamp, kwargs)

Wrapper macro around tar_rule.

Options for mtree

mtree provides the "specification" or manifest of a tar file. See https://man.freebsd.org/cgi/man.cgi?mtree(8) Because BSD tar doesn't have a flag to set modification times to a constant, we must always supply an mtree input to get reproducible builds. See https://reproducible-builds.org/docs/archives/ for more explanation.

  1. By default, mtree is "auto" which causes the macro to create an mtree_spec rule.

  2. mtree may be supplied as an array literal of lines, e.g.

mtree =[
    "usr/bin uid=0 gid=0 mode=0755 type=dir",
    "usr/bin/ls uid=0 gid=0 mode=0755 time=0 type=file content={}/a".format(package_name()),
],

For the format of a line, see "There are four types of lines in a specification" on the man page for BSD mtree, https://man.freebsd.org/cgi/man.cgi?mtree(8)

  1. mtree may be a label of a file containing the specification lines.

PARAMETERS

Name Description Default Value
name name of resulting tar_rule none
mtree "auto", or an array of specification lines, or a label of a file that contains the lines. Subject to $(location) and "Make variable" substitution. "auto"
stamp should mtree attribute be stamped 0
kwargs additional named parameters to pass to tar_rule none

tar_lib.common.add_compression_args

tar_lib.common.add_compression_args(compress, args)

PARAMETERS

Name Description Default Value
compress

-

none
args

-

none

tar_lib.implementation

tar_lib.implementation(ctx)

PARAMETERS

Name Description Default Value
ctx

-

none

tar_lib.mtree_implementation

tar_lib.mtree_implementation(ctx)

PARAMETERS

Name Description Default Value
ctx

-

none