Although #17669 fixed the permissions of the release pipeline to push new
commits, there was still an error when invoking the `build` workflow.
The format of the reference was changed in #17103 such that we're sending the
git ref (a SHA) and not the "--ref" argument required by the GH actions workflow
API, which in this case is apparently specially defined as "The branch or tag
name which contains the version of the workflow file you'd like to run" and not
what git calls a "ref".
This changeset:
* Removes the third-party action entirely so that we're using GitHub's own
tooling. This removes one more thing from the supply chain to pin and ensures a
1:1 mapping of args to what's documented by GitHub.
* Removes the `--ref` argument entirely, which causes it to default to the
current branch that the release workflow is running on (which is always what
we want).
Since the matrix exercises different test cases, it's better to allow
all partitions to completely run, even if one of them fails, so it's
easier to catch multiple test failures.
In #17103 we set read-only permissions on all the workflows. Unfortunately we
missed that the `release` workflow makes git commits and pushes them to the
repository, so it needs to have write permissions.
Some of the paths ignored by `test-core.yaml` need to be checked by
`make check`. The `checks.yaml` workflow run on these paths and can also
be used as a reusable workflow.
This seems to finish in about 20 minutes, or run for 6+ hours until hitting
a default timeout. Set a timeout to 30 minutes so we aren't wasting time
and runners.
* Exam to parallelize tests
* Logging to try to solve test flakiness
* Logging in another failure
* Hardening for one test and snapshot for another
* Explicitly set the first one as the servicedAlloc instead of randomly picking
* A wild CircleCI test failure appears
* de-log
namely, these workflows:
test-e2e, test-ui, and test-windows
extra-curricularly, as part of the overall
migration effort company-wide, this also includes
some standardization such as:
* explicit permissions:read on various workflows
* pinned action version shas (per https://github.com/hashicorp/security-public-tsccr)
* actionlint, which among other things runs
shellcheck on GHA run steps
Co-authored-by: emilymianeil <eneil@hashicorp.com>
Co-authored-by: Daniel Kimsey <daniel.kimsey@hashicorp.com>
The 32-bit Intel builds (aka "386") are not tested and likely have bugs
involving platform-sized integers when operated at any non-trivial scale. Remove
these builds from the upcoming Nomad 1.6.0 and provide recommendations in the
upgrade notes for those users who might have hobbyist boards running 32-bit
ARM (this will primarily be the RaspberryPi Zero or older spins of the RaspPi).
DO NOT BACKPORT TO 1.5.x OR EARLIER!
The file path in the TSCCR repo for the `returntocorp/semgrep` action was
incorrect, so the pinning tool was not able to find the correct entry and it was
not pinned in #17238.
The repository is fixed in https://github.com/hashicorp/security-tsccr/pull/431
The `backspace/ember-asset-size` action we're using is unmaintained and has a
bunch of vulns in it, so it won't pass security screening (this is a NodeJS
action so it has piles of dependencies, 99% of which won't be in use but fails
automated screening anyways). Move this to the upstream version.
The `machine-learning-apps/pr-comment` action also presents a problem for the
ProdSec security screening because it's archived and also runs an external
Docker image. Move this to a likely-ok maintained action for now, until we can
spare some time to remove this in lieu of something more reasonable that isn't a
GitHub action.
Many of the GitHub Actions from the build pipeline are written in a truly
ancient version of NodeJS. Upgrade to more recent versions.
Remove RelEng from codeowners
This reverts commit 1721e687c0832bea3d9b7eec5dcd3c4e7a924d71.
The change was expected to solve the sporadic problems we were having
with Backport Assistant, but it end up creating even more failures.
Examples in the documentation frequently include tokens, including Vault tokens
which end up triggering GitHub's secret scanner. Remove these from consideration
so that we don't get false positive reports.
Instead of attempting to pick each individual commit in a PR using
`BACKPORT_MERGE_COMMIT` only picks the commit that was merged into
`main`.
This reduces the amount of work done during a backport, generating
cleaner merges and avoiding potential issues on specific commits.
With this setting PRs that are not squashed will fail to backport and
must be handled manually, but those are considered exceptions.
Running tests `on: push` prevents GitHub from showing the workflow approval
button, which prevents tests from being run on community-contributed (or even
just non-Nomad HashiCorp folks) PRs. Running `on: pull_request` automatically
picks up opened, reopened, and synchronize hooks (where "synchronize" means a
push to HEAD on the PR's branch, so that'll pick up rebases and updates).
But we also want to run tests on `main` and the various `release` backport
branches, so retain a `on: push` for those.
This PR tries to make API tests run fast, as an experiment to later apply
to all packages. Key changes include
- Swapping freeport for test/portal for port allocations
- Swappng some uses of WaitForResult with test/wait
- Turning on parallelism in api/testutil/slow.go
- Switching to custom public runner (32 vcpu)
There's also chunk of cleanup brought in for the ride
The `ubuntu-latest` runner has been migrated to Ubuntu 22.04, which doesn't have
all the same multilib packages as 20.04. Although we'll probably want to migrate
eventually, we should ship Nomad 1.4.3 with the same toolchain as we did
previously so that we're not introducing new issues.
This PR changes test-core to make use of
https://github.com/hashicorp/setup-golang
to consolidate the setting up of the Go compiler and the Go modules cache
used for the CI job.
Fixes: #14905
* [no ci] use json for grouping packages for testing
* [no ci] able to get packages in group
* [no ci] able to run groups of tests
* [no ci] more
* [no ci] try disable circle unit tests
* ci: use actions/checkout@v3
* ci: rename to quick
* ci: need make dev in mods cache step
* ci: make compile step depend on checks step
* ci: bump consul and vault versions
* ci: need make dev for group tests
* ci: update ci unit testing docs
* docs: spell plumbing correctly
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
When community members comment on long-closed issues, there's a number of
failure modes that make for a bad experience for them:
* Their comments are often missed entirely because notification settings make it
impractical for most developers to read comments on inactive issues.
* In our experience, the problem is only rarely a regression; because failures
are complex, totally different code paths can result in symptoms that initially
appear to be the same but turn out to be completely different under close
examination. This is particularly the case for issues fixed in very old
versions (sometimes 2 or more years old).
The Terraform core team uses a bot that locks issues after only 30 days. But
because we typically close issues automatically on PR merge but don't have
rolling releases, it'd frequently happen that unrelease fixes will have locked
comments, which isn't a good experience either. I've looked through the pace of
releases since Nomad 0.9.0 and the longest window between releases was 3
months. Set the window for the lock bot to 120 days to give us plenty of
breathing room so it doesn't feel like we're shutting down discussion
prematurely.