Find a file
Mahmood Ali 07a30580ac health: fail health if any task is pending
Fixes a bug where an allocation is considered healthy if some of the
tasks are being restarted and as such, their checks aren't tracked by
consul agent client.

Here, we fix the immediate case by ensuring that an alloc is healthy
only if tasks are running and the registered checks at the time are
healthy.

Previously, health tracker tracked task "health" independently from
checks and leads to problems when a task restarts.  Consider the
following series of events:

1. all tasks start running -> `tracker.tasksHealthy` is true
2. one task has unhealthy checks and get restarted
3. remaining checks are healthy -> `tracker.checksHealthy` is true
4. propagate health status now that `tracker.tasksHealthy` and
`tracker.checksHealthy`.

This change ensures that we accurately use the latest status of tasks
and checks regardless of their status changes.

Also, ensures that we only consider check health after tasks are
considered healthy, otherwise we risk trusting incomplete checks.

This approach accomodates task dependencies well.  Service jobs can have
prestart short-lived tasks that will terminate before main process runs.
These dead tasks that complete successfully will not negate health
status.
2020-03-22 11:13:41 -04:00
.circleci update website docker job to use proper git url 2020-03-20 15:43:36 -04:00
.github Revert "Add the digital marketing team as the code owners for the website dir" 2020-02-19 12:25:42 -06:00
.netlify Remove most Netlify configuration (#6194) 2019-08-22 15:54:23 -05:00
acl acl: check ACL against object namespace 2019-10-08 12:59:22 -04:00
api change jobspec lifecycle stanza to use sidecar attribute instead of 2020-03-21 17:52:57 -04:00
client health: fail health if any task is pending 2020-03-22 11:13:41 -04:00
command change jobspec lifecycle stanza to use sidecar attribute instead of 2020-03-21 17:52:57 -04:00
contributing add note to check for job diff 2020-03-04 12:06:32 -05:00
demo ci: bump consul and vault 2020-03-15 11:01:55 +01:00
dev dev: Tweaks to cluster dev scripts 2020-02-03 11:50:43 -05:00
devices/gpu/nvidia Update devices/gpu/nvidia/README.md 2019-01-23 17:44:24 -08:00
dist Set OOMScoreAdjust within systemd dist example (#6679) 2019-11-12 08:30:54 -05:00
drivers Merge pull request #7236 from hashicorp/b-remove-rkt 2020-03-17 09:07:35 -04:00
e2e e2e: use unique CSI token 2020-03-15 21:55:26 -04:00
helper Remove rkt as a built-in driver 2020-02-26 22:16:41 -05:00
integrations spelling: registrations 2018-03-11 18:40:53 +00:00
internal/testing/apitests update rest of consul packages 2020-02-16 16:25:04 -06:00
jobspec fix one test usage of BlockUntil 2020-03-21 18:25:13 -04:00
lib circbufwritter: add defer to stop ticker in flush loop 2019-01-28 14:33:20 -05:00
nomad change jobspec lifecycle stanza to use sidecar attribute instead of 2020-03-21 17:52:57 -04:00
plugins fix typo in comment 2020-03-13 09:09:46 -05:00
scheduler fix bug in lifecycle scheduler test mocks 2020-03-21 17:52:51 -04:00
scripts Merge pull request #7236 from hashicorp/b-remove-rkt 2020-03-17 09:07:35 -04:00
terraform separate vars and outputs into their own files and update default link in nomad binary variable to 0.10.0 release (#6550) 2019-10-25 14:15:30 -04:00
testutil tests: wait until leadership loop finishes 2020-03-06 14:41:59 -05:00
ui Remove the question mark from the Volume th 2020-02-14 16:56:51 -08:00
vendor fixup! vendor: add golang.org/x/crypto/ed25519 2020-03-21 18:03:09 +01:00
version release: prep for next dev cycle 2020-02-19 14:31:31 -05:00
website Merge pull request #7313 from hashicorp/docs-gh-7294 2020-03-18 15:43:41 +01:00
.gitattributes Remove invalid gitattributes 2018-02-14 14:47:43 -08:00
.gitignore gitignore: only ignore toplevel tags and bin 2019-12-03 13:36:54 -05:00
.golangci.yml chore: Switch from gometalinter to golangci-lint 2019-12-05 18:58:13 -06:00
appveyor.yml use golang 1.14 2020-03-02 13:55:02 -05:00
build_linux_arm.go Fix 32bit arm build 2017-02-09 11:22:17 -08:00
CHANGELOG.md Update CHANGELOG.md with #5970 entry. 2020-03-12 10:10:33 +01:00
GNUmakefile Merge pull request #7254 from jboero/patch-1 2020-03-19 11:07:20 -07:00
LICENSE Initial commit 2015-06-01 12:21:00 +02:00
main.go fix comment typo 2019-09-18 09:11:08 -04:00
main_test.go Adding initial skeleton 2015-06-01 13:46:21 +02:00
README.md use golang 1.14 2020-03-02 13:55:02 -05:00
Vagrantfile Remove rkt as a built-in driver 2020-02-26 22:16:41 -05:00

Nomad Build Status Discuss

Overview

Nomad is an easy-to-use, flexible, and performant workload orchestrator that deploys:

Nomad enables developers to use declarative infrastructure-as-code for deploying their applications (jobs). Nomad uses bin packing to efficiently schedule jobs and optimize for resource utilization. Nomad is supported on macOS, Windows, and Linux.

Nomad is widely adopted and used in production by PagerDuty, Target, Citadel, Trivago, SAP, Pandora, Roblox, eBay, Deluxe Entertainment, and more.

  • Deploy Containers and Legacy Applications: Nomads flexibility as an orchestrator enables an organization to run containers, legacy, and batch applications together on the same infrastructure. Nomad brings core orchestration benefits to legacy applications without needing to containerize via pluggable task drivers.

  • Simple & Reliable: Nomad runs as a single 75MB binary and is entirely self contained - combining resource management and scheduling into a single system. Nomad does not require any external services for storage or coordination. Nomad automatically handles application, node, and driver failures. Nomad is distributed and resilient, using leader election and state replication to provide high availability in the event of failures.

  • Device Plugins & GPU Support: Nomad offers built-in support for GPU workloads such as machine learning (ML) and artificial intelligence (AI). Nomad uses device plugins to automatically detect and utilize resources from hardware devices such as GPU, FPGAs, and TPUs.

  • Federation for Multi-Region, Multi-Cloud: Nomad was designed to support infrastructure at a global scale. Nomad supports federation out-of-the-box and can deploy jobs across multiple regions and clouds.

  • Proven Scalability: Nomad is optimistically concurrent, which increases throughput and reduces latency for workloads. Nomad has been proven to scale to clusters of 10K+ nodes in real-world production environments.

  • HashiCorp Ecosystem: Nomad integrates seamlessly with Terraform, Consul, Vault for provisioning, service discovery, and secrets management.

Getting Started

Get started with Nomad quickly in a sandbox environment on the public cloud or on your computer.

These methods are not meant for production.

Documentation & Guides

Documentation is available on the Nomad website here.

Resources

Who Uses Nomad

...and more!

Contributing to Nomad

If you wish to contribute to Nomad, you will need Go installed on your machine (version 1.14+ is required, and gcc-go is not supported).

See the contributing directory for more developer documentation.

Developing with Vagrant There is an included Vagrantfile that can help bootstrap the process. The created virtual machine is based off of Ubuntu 16, and installs several of the base libraries that can be used by Nomad.

To use this virtual machine, checkout Nomad and run vagrant up from the root of the repository:

$ git clone https://github.com/hashicorp/nomad.git
$ cd nomad
$ vagrant up

The virtual machine will launch, and a provisioning script will install the needed dependencies.

Developing locally For local dev first make sure Go is properly installed, including setting up a GOPATH. After setting up Go, clone this repository into $GOPATH/src/github.com/hashicorp/nomad. Then you can download the required build tools such as vet, cover, godep etc by bootstrapping your environment.

$ make bootstrap
...

Nomad creates many file handles for communicating with tasks, log handlers, etc. In some development environments, particularly macOS, the default number of file descriptors is too small to run Nomad's test suite. You should set ulimit -n 1024 or higher in your shell. This setting is scoped to your current shell and doesn't affect other running shells or future shells.

Afterwards type make test. This will run the tests. If this exits with exit status 0, then everything is working!

$ make test
...

To compile a development version of Nomad, run make dev. This will put the Nomad binary in the bin and $GOPATH/bin folders:

$ make dev

Optionally run Consul to enable service discovery and health checks:

$ sudo consul agent -dev

And finally start the nomad agent:

$ sudo bin/nomad agent -dev

If the Nomad UI is desired in the development version, run make dev-ui. This will build the UI from source and compile it into the dev binary.

$ make dev-ui
...
$ bin/nomad
...

To compile protobuf files, installing protoc is required: See
https://github.com/google/protobuf for more information.

Note: Building the Nomad UI from source requires Node, Yarn, and Ember CLI. These tools are already in the Vagrant VM. Read the UI README for more info.

To cross-compile Nomad, run make prerelease and make release. This will generate all the static assets, compile Nomad for multiple platforms and place the resulting binaries into the ./pkg directory:

$ make prerelease
$ make release
...
$ ls ./pkg
...