Commit graph

2316 commits

Author SHA1 Message Date
Alex Dadgar 6c2782f037 move catalog + grpcutils 2019-01-22 15:11:57 -08:00
Mahmood Ali 05e32fb525
Merge pull request #5213 from hashicorp/b-api-separate
Slimmer /api package
2019-01-18 20:52:53 -05:00
Mahmood Ali 5df63fda7c
Merge pull request #5190 from hashicorp/f-memory-usage
Track Basic Memory Usage as reported by cgroups
2019-01-18 16:46:02 -05:00
Mahmood Ali 6bdb9864de api: remove MockJob from exported functions
`api.MockJob` is a test utility, that's only used by `command/agent`
package.  This moves it to the package and removes it from the public
API.
2019-01-18 14:51:31 -05:00
Michael Schurter 48afda786b
Merge pull request #5187 from hashicorp/test-consul
Port a bunch of pre-0.9 Consul tests to 0.9
2019-01-15 07:41:50 -08:00
Alex Dadgar 471fdb3ccf
Merge pull request #5173 from hashicorp/b-log-levels
Plugins use parent loggers
2019-01-14 16:14:30 -08:00
Mahmood Ali 9909d98bee Track Basic Memory Usage as reported by cgroups
Track current memory usage, `memory.usage_in_bytes`, in addition to
`memory.max_memory_usage_in_bytes` and friends.  This number is closer
what Docker reports.

Related to https://github.com/hashicorp/nomad/issues/5165 .
2019-01-14 18:47:52 -05:00
Michael Schurter fc1bb95ef8 Remove old comment; it's been fixed! 2019-01-14 09:56:53 -08:00
Preetha Appan 7bd1440710
REfactor statedb factory config to set it directly in client config 2019-01-12 10:38:20 -06:00
Preetha Appan f059ef8a47
Modified destroy failure handling to rely on allocrunner's destroy method
Added a unit test with custom statedb implementation that errors, to
use to verify destroy errors
2019-01-12 10:37:12 -06:00
Alex Dadgar 5621086f50 Enable json logs 2019-01-11 11:36:37 -08:00
Alex Dadgar 14ed757a56 Plugins use parent loggers
This PR fixes various instances of plugins being launched without using
the parent loggers. This meant that logs would not all go to the same
output, break formatting etc.
2019-01-11 11:36:37 -08:00
Preetha Appan b46728a88b
Make spread weight a pointer with default value if unset 2019-01-11 10:31:21 -06:00
Nick Wales 7a7b5da0df Adds optional Consul service tags to nomad server and agent services, gh#4297 2019-01-09 22:02:46 +00:00
Chris Baker e9db2ae822 Merge branch 'master' of github.com:hashicorp/nomad into f-1157-validate-node-meta-variables 2019-01-09 18:56:49 +00:00
Chris Baker d5b1a56f3b increased config validation coverage for dev mode 2019-01-09 18:56:40 +00:00
Michael Schurter ac169008f0
Merge pull request #5045 from hashicorp/b-drivermanager-tests-drain
drain: fix node drain monitoring
2019-01-09 10:23:28 -08:00
Mahmood Ali 90f3cea187
Merge pull request #5157 from hashicorp/r-drivers-no-cstructs
drivers: avoid referencing client/structs package
2019-01-09 13:06:46 -05:00
Mahmood Ali 03a9e812c8 cli: support hitting pre-0.9 nomad agents
node.NodeResources is nil when operating against pre-0.9.
2019-01-08 19:32:26 -05:00
Chris Baker d8a3a74c43 move if dev check into config validation, to support dev-mod
validation in the future
2019-01-08 22:21:48 +00:00
Michael Schurter 8a6b1acaa6 drain: fix node drain monitoring
The whole approach to monitoring drains has ordering issues and lacks
state to output useful error messages.

AFAICT to get the tests passing reliably I needed to change the behavior
of monitoring.

Parts of these tests are skipped in CI, and they should be rewritten as
e2e tests.
2019-01-08 09:35:16 -08:00
Chris Baker 220e9e838f refactored config validation into a new method, modified Meta.Client
tests appropriately
2019-01-08 15:07:36 +00:00
Mahmood Ali 916a40bb9e move cstructs.DeviceNetwork to drivers pkg 2019-01-08 09:11:47 -05:00
Chris Baker 91449d6809 Merge branch 'master' of github.com:hashicorp/nomad into f-1157-validate-node-meta-variables 2019-01-08 02:17:35 +00:00
Chris Baker bf00f93d87 moved interp key regex out to a helper function 2019-01-08 00:11:47 +00:00
Alex Dadgar 8a35d7b1dd Test recovery 2019-01-07 14:49:41 -08:00
Chris Baker f99e18aaf4 gofmt to make check happy 2019-01-07 18:01:59 +00:00
Chris Baker a61afad5bb added validation on client metadata keys 2019-01-07 17:16:38 +00:00
Nick Ethier ab3c5c0a8b
fix test 2018-12-20 13:54:29 -05:00
Nick Ethier fad553ab6a
command: wait for drivers to be ready before test 2018-12-20 13:52:33 -05:00
Nick Ethier 5b9bba08c6
fix tests 2018-12-20 01:05:17 -05:00
Nick Ethier 060ceb3635
fix test 2018-12-20 01:01:53 -05:00
Nick Ethier a96afb6c91
fix tests that fail as a result of async client startup 2018-12-20 00:53:44 -05:00
Nick Ethier 82175d1328
client/drivermananger: add driver manager
The driver manager is modeled after the device manager and is started by the client.
It's responsible for handling driver lifecycle and reattachment state, as well as
processing the incomming fingerprint and task events from each driver. The mananger
exposes a method for registering event handlers for task events that is used by the
task runner to update the server when a task has been updated with an event.

Since driver fingerprinting has been implemented by the driver manager, it is no
longer needed in the fingerprint mananger and has been removed.
2018-12-18 22:55:18 -05:00
Alex Dadgar 4c57d2ec4d Add plugin API versioning to plugin loader and plugins 2018-12-18 16:48:00 -08:00
Nick Ethier 09dadf0a23
Merge branch 'master' into f-grpc-executor
* master: (71 commits)
  Fix output of 'nomad deployment fail' with no arg
  Always create a running allocation when testing task state
  tests: ensure exec tests pass valid task resources (#4992)
  some changes for more idiomatic code
  fix iops related tests
  fixed bug in loop delay
  gofmt
  improved code for readability
  client: updateAlloc release lock after read
  fixup! device attributes in `nomad node status -verbose`
  drivers/exec: support device binds and mounts
  fix iops bug and increase test matrix coverage
  tests: tag image explicitly
  changelog
  ci: install lxc-templates explicitly
  tests: skip checking rdma cgroup
  ci: use Ubuntu 16.04 (Xenial) in TravisCI
  client: update driver info on new fingerprint
  drivers/docker: enforce volumes.enabled (#4983)
  client: Style: use fluent style for building loggers
  ...
2018-12-13 14:41:09 -05:00
Brian Lalor 31ef34838e
Fix output of 'nomad deployment fail' with no arg 2018-12-13 13:22:17 -05:00
Mahmood Ali d497729826
Merge pull request #4978 from hashicorp/f-device-tweaks
Display device attributes in `nomad node status -verbose`
2018-12-12 19:45:07 -05:00
Mahmood Ali 00c9385a2b fixup! device attributes in nomad node status -verbose 2018-12-12 09:17:31 -05:00
Alex Dadgar 86d9ad4397 fix iops bug and increase test matrix coverage 2018-12-11 15:28:21 -08:00
Mahmood Ali 69b2355274
Merge pull request #4975 from hashicorp/fix-master-20181209
Some test fixes and remedies
2018-12-11 18:00:21 -05:00
Alex Dadgar 1531b6d534
Merge pull request #4970 from hashicorp/f-no-iops
Deprecate IOPS
2018-12-11 12:51:22 -08:00
Mahmood Ali 5a487ac884 tests: prevent indefinite blocking in some tests
Noticed few places where tests seem to block indefinitely and panic
after the test run reaches the test package timeout.

I intend to follow up with the proper fix later, but timing out is much
better than indefinitely blocking.
2018-12-11 09:35:26 -05:00
Michael Schurter 8808ab9cea
Merge pull request #4953 from hashicorp/b-script-context-wrapper
consul: add ScriptExecutor context wrapper
2018-12-10 17:22:53 -08:00
Michael Schurter 4c5f3ae82c
Merge pull request #4952 from hashicorp/b-script-context
consul: fix script checks exiting after 1 run
2018-12-10 17:22:15 -08:00
Mahmood Ali 14668f48d1 device attributes in nomad node status -verbose
This reports device attributes like the following:

```
$ nomad node status -self -verbose
ID          = f7adb958-29e1-2a5a-2303-9d61ffaab33a
Name        = mars.local
Class       = <none>
DC          = dc1
Drain       = false
Eligibility = eligible
Status      = ready
Uptime      = 12h40m13s

Drivers
Driver       Detected  Healthy  Message                               Time
docker       true      true     healthy                               2018-12-10T11:47:19-05:00
...

Attributes
cpu.arch                      = amd64
cpu.frequency                 = 2200
cpu.modelname                 = Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
cpu.numcores                  = 12
...

Device Group Attributes
Device Group = nomad/file/mock
block_device = sda1
filesystem   = ext4
size         = 63.2 GB

Meta
```
2018-12-10 12:18:24 -05:00
Mahmood Ali 9f69b8bfec Rename helper_stats -> helper_devices 2018-12-10 12:18:24 -05:00
Nick Ethier 47df1dde10
Merge branch 'master' into f-grpc-executor 2018-12-06 21:42:38 -05:00
Nick Ethier 29ef54c0ee
executor: merge plugin shim with executor package 2018-12-06 21:13:45 -05:00
Alex Dadgar c918a96490 Warn if IOPS is being used 2018-12-06 16:17:09 -08:00
Alex Dadgar 1e3c3cb287 Deprecate IOPS
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Nick Ethier 8b20de4801
executor: use grpc instead of netrpc as plugin protocol
* Added protobuf spec for executor
 * Seperated executor structs into their own package
2018-12-05 11:03:56 -05:00
Mahmood Ali f8efc40b8b tests: stop integration tests tasks explicitly
Also update the new recommended `nomad job` subcommands
2018-12-04 11:50:59 -05:00
Michael Schurter 8fa5e90095 consul: add ScriptExecutor context wrapper
Since d335a82859ca2177bc6deda0c2c85b559daf2db3 ScriptExecutors now take
a timeout duration instead of a context. This broke the script check
removal code which used context cancelation propagation to remove
script checks while they were executing.

This commit adds a wrapper around ScriptExecutors that obeys context
cancelation again. The only downside is that it leaks a goroutine until
the underlying Exec call completes or timeouts.

Since check removal is relatively rare, check timeouts usually low, and
scripts usually fast, the risk of leaking a goroutine seems very small.
2018-12-03 20:26:31 -08:00
Michael Schurter 6459c19ffc consul: fix script checks exiting after 1 run
Fixes a regression caused in d335a82859ca2177bc6deda0c2c85b559daf2db3

The removal of the inner context made the remaining cancels cancel the
outer context and cause script checks to exit prematurely.
2018-12-03 18:50:02 -08:00
Danielle Tomlinson 51a9f7369e
Merge pull request #4936 from hashicorp/f-legacy-refactor
Refactor and repackage client/driver
2018-11-30 13:38:06 +01:00
Danielle Tomlinson ffc5e5d56b executors: Unify go-plugin handshake 2018-11-30 10:59:23 +01:00
Danielle Tomlinson fdfe93aa25 fixup: executorplugin: fix rkt build 2018-11-30 10:47:08 +01:00
Danielle Tomlinson d26a310db0 client: Move executor plugins into own package 2018-11-30 10:46:13 +01:00
Danielle Tomlinson 9b3e731f88 command: Remove Extraneous field in nodedrain test 2018-11-30 10:46:13 +01:00
Nick Ethier 80ae7e34f4
Merge pull request #4906 from hashicorp/f-metric-prefix-master
Port metric prefix filtering to master
2018-11-29 22:27:47 -05:00
Nick Ethier b1484aec33
nomad: fix hclog usage 2018-11-29 22:27:39 -05:00
Alex Dadgar 4ee603c382 Device hook and devices affect computed node class
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.

TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00
Nick Ethier ed65610ec6
command/agent: additional tests for telemetry config parsing 2018-11-19 23:22:33 -05:00
Nick Ethier b81e4e18f0
agent: suppose filter_default telemetry option 2018-11-19 23:21:48 -05:00
Nick Ethier 85b221a1d6
nomad: add flag to disable publishing of job_summary metrics for dispatched jobs 2018-11-19 23:21:19 -05:00
Nick Ethier 9e64ce7d73
docker: properly launch docker logger process 2018-11-19 22:59:12 -05:00
Mahmood Ali 9479015f51
Merge pull request #4884 from hashicorp/f-alloc-devices-cli
Report alloc device statistics in API and CLI
2018-11-16 18:04:54 -05:00
Mahmood Ali 6f9126f475 show Device Stats header in alloc status 2018-11-16 17:34:37 -05:00
Mahmood Ali 00ffd02ced Show stable order of device attributes 2018-11-16 17:34:37 -05:00
Preetha 5f094633fa
Merge pull request #4889 from hashicorp/f-service-meta
Pass service metadata "external-source" for consul UI integration
2018-11-16 12:24:21 -06:00
Preetha Appan 18708d3f0b
Pass service metadata "external-source" for consul UI integration 2018-11-16 11:28:56 -06:00
Mahmood Ali 33c96a803a tweak whitespace in device stats output 2018-11-16 10:37:39 -05:00
Mahmood Ali 159b8f866a Display device stats in nomad alloc status 2018-11-16 10:26:32 -05:00
Mahmood Ali d88a3f8413 Prepare to reuse device resources printing 2018-11-16 10:26:32 -05:00
Michael Schurter 6f3712ed48 gofmt -s -w command/helper_stats_test.go
Fixes the static checks build
2018-11-15 14:14:05 -08:00
Mahmood Ali 24b37e0aaf Display StatsObject nested objects as well 2018-11-15 08:09:54 -05:00
Mahmood Ali ee9353fbd6 Use disk display format for devices 2018-11-14 22:13:23 -05:00
Mahmood Ali 0712be643f Print verbose device in nomad node status -stats 2018-11-14 22:13:23 -05:00
Mahmood Ali 93e8fc53f9 device stats summary in node status
Sample output with a mock device:

```
Host Resource Utilization
CPU             Memory          Disk
2651/26400 MHz  9.6 GiB/16 GiB  98 GiB/234 GiB

Device Resource Utilization
nomad/file/mock[README.md]    511 bytes
nomad/file/mock[e2e.go]       239 bytes
nomad/file/mock[e2e_test.go]  128 bytes

Allocations
No allocations placed
```
2018-11-14 22:13:23 -05:00
Mahmood Ali c62ec124c0 Set clean config for mock driver
The default job here contains some exec task config (for setting
command and args) that aren't used for mock driver.  Now, the alloc
runner seems stricter about validating fields and errors on unexpected
fields.

Updating configs in tests so we can have an explicit task config
whenever driver is set explicitly.
2018-11-13 10:21:40 -05:00
Mahmood Ali c7610d8c22 mark and skip failing consul failing tests 2018-11-13 10:21:40 -05:00
Preetha Appan fd0ba320da
change path to v1/scheduler/configuration 2018-11-12 15:57:45 -06:00
Preetha Appan 3a10a589d7
Fix failing test 2018-11-10 19:53:47 -06:00
Preetha Appan 7ef126a027
Smaller methods, and added tests for RPC layer 2018-11-10 17:37:33 -06:00
Preetha Appan 75662b50d1
Use response object/querymeta/writemeta in scheduler config API 2018-11-10 10:31:10 -06:00
Alex Dadgar 98398a8a44
Merge pull request #4842 from hashicorp/b-deployment-progress-deadline
Fix multiple bugs with progress deadline handling
2018-11-08 13:31:54 -08:00
Preetha Appan f1a69e529c
Fix vet error 2018-11-08 09:48:43 -06:00
Preetha Appan da75388a9b
review feedback 2018-11-08 09:48:43 -06:00
Preetha Appan d88ec0267f
Comments 2018-11-08 09:48:43 -06:00
Preetha Appan 5f0a9d2cfd
Show preemption output in plan CLI 2018-11-08 09:48:43 -06:00
Alex Dadgar 204ca8230c Device manager
Introduce a device manager that manages the lifecycle of device plugins
on the client. It fingerprints, collects stats, and forwards Reserve
requests to the correct plugin. The manager, also handles device plugins
failing and validates their output.
2018-11-07 10:43:15 -08:00
Michael Schurter 392d548b85
Merge pull request #4828 from hashicorp/b-restore
Implement client agent restarting
2018-11-05 18:50:15 -06:00
Alex Dadgar 1c31970464 Fix multiple tgs with progress deadline handling
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.

Further fix an issue where the blocked eval optimization would make it
so no evals were created to progress the deployment. To reproduce this
issue, prior to this commit, you can create a job with two task groups.
The first group has count 1 and resources such that it can not be
placed. The second group has count 3, max_parallel=1, and can be placed.
Run this first and then update the second group to do a deployment. It
will place the first of three, but never progress since there exists a
blocked eval. However, that doesn't capture the fact that there are two
groups being deployed.
2018-11-05 16:06:17 -08:00
Michael Schurter 6bdbfb8129 tests: get consul integration tests building 2018-11-05 12:32:05 -08:00
Preetha b2b52b1ada
Merge pull request #4794 from hashicorp/f-preemption-systemjobs
Preemption for system jobs
2018-11-02 16:28:06 -05:00
Preetha Appan d03201adf8
Fix formatting of allocation score metrics 2018-10-30 12:03:23 -05:00
Preetha Appan 9d316cbbef
Fix return type in tests after refactor 2018-10-30 11:10:46 -05:00
Preetha Appan 8f7eb61823
Introduce a response object for scheduler configuration 2018-10-30 11:06:32 -05:00
Preetha Appan c1c1c230e4
Make preemption config a struct to allow for enabling based on scheduler type 2018-10-30 11:06:32 -05:00
Preetha Appan bd34cbb1f7
Support for new scheduler config API, first use case is to disable preemption 2018-10-30 11:06:32 -05:00
Michael Schurter d71a1b4547 tests: more fixes due to api changes 2018-10-29 15:25:22 -07:00
Michael Schurter 2b1b3d7e1e tests: get tests building if not yet passing 2018-10-16 16:56:57 -07:00
Michael Schurter 1a29337e48 register drivers by default
Do not register mock_driver on release builds.
2018-10-16 16:56:56 -07:00
Nick Ethier 3183b33d24 client: review comments and fixup/skip tests 2018-10-16 16:56:56 -07:00
Nick Ethier f192c3752a client: refactor post allocrunnerv2 finalization 2018-10-16 16:56:56 -07:00
Nick Ethier 4a4c7dbbfc client: begin driver plugin integration
client: fingerprint driver plugins
2018-10-16 16:56:56 -07:00
Alex Dadgar 7946a14aa8 Fix lints 2018-10-16 16:56:56 -07:00
Alex Dadgar 45e41cca03 allocrunnerv2 -> allocrunner 2018-10-16 16:56:56 -07:00
Alex Dadgar 6c9d9d5173 move files around 2018-10-16 16:56:55 -07:00
Michael Schurter f279b1d1b1 tests: test logs endpoint against pending task
Although the really exciting change is making WaitForRunning return the
allocations that it started. This should cut down test boilerplate
significantly.
2018-10-16 16:56:55 -07:00
Michael Schurter 6bcf772f3c tests: test via ServeMux so http codes are set 2018-10-16 16:56:55 -07:00
Michael Schurter 960f3be76c client: expose task state to client
The interesting decision in this commit was to expose AR's state and not
a fully materialized Allocation struct. AR.clientAlloc builds an Alloc
that contains the task state, so I considered simply memoizing and
exposing that method.

However, that would lead to AR having two awkwardly similar methods:
 - Alloc() - which returns the server-sent alloc
 - ClientAlloc() - which returns the fully materialized client alloc

Since ClientAlloc() could be memoized it would be just as cheap to call
as Alloc(), so why not replace Alloc() entirely?

Replacing Alloc() entirely would require Update() to immediately
materialize the task states on server-sent Allocs as there may have been
local task state changes since the server received an Alloc update.

This quickly becomes difficult to reason about: should Update hooks use
the TaskStates? Are state changes caused by TR Update hooks immediately
reflected in the Alloc? Should AR persist its copy of the Alloc? If so,
are its TaskStates canonical or the TaskStates on TR?

So! Forget that. Let's separate the static Allocation from the dynamic
AR & TR state!

 - AR.Alloc() is for static Allocation access (often for the Job)
 - AR.AllocState() is for the dynamic AR & TR runtime state (deployment
   status, task states, etc).

If code needs to know the status of a task: AllocState()
If code needs to know the names of tasks: Alloc()

It should be very easy for a developer to reason about which method they
should call and what they can do with the return values.
2018-10-16 16:56:55 -07:00
Michael Schurter 1c9ccdeab5 tests: fix races caused by sharing a buffer
httptest.ResponseRecorder exposes a bytes.Buffer which we were reading
and writing concurrently to test streaming log APIs. This is a race, so
I wrapped the struct in a lock with some helpers.
2018-10-16 16:56:55 -07:00
Alex Dadgar 84ce8c3487 extra logging 2018-10-16 16:56:55 -07:00
Alex Dadgar 6f0ed6184b Fix client reloading and pass the plugin loaders to server and client 2018-10-16 16:56:55 -07:00
Alex Dadgar 183561cf82 Plugin loader initialization 2018-10-16 16:54:12 -07:00
Nick Ethier 5dee1141d1 executor v2 (#4656)
* client/executor: refactor client to remove interpolation

* executor: POC libcontainer based executor

* vendor: use hashicorp libcontainer fork

* vendor: add libcontainer/nsenter dep

* executor: updated executor interface to simplify operations

* executor: implement logging pipe

* logmon: new logmon plugin to manage task logs

* driver/executor: use logmon for log management

* executor: fix tests and windows build

* executor: fix logging key names

* executor: fix test failures

* executor: add config field to toggle between using libcontainer and standard executors

* logmon: use discover utility to discover nomad executable

* executor: only call libcontainer-shim on main in linux

* logmon: use seperate path configs for stdout/stderr fifos

* executor: windows fixes

* executor: created reusable pid stats collection utility that can be used in an executor

* executor: update fifo.Open calls

* executor: fix build

* remove executor from docker driver

* executor: Shutdown func to kill and cleanup executor and its children

* executor: move linux specific universal executor funcs to seperate file

* move logmon initialization to a task runner hook

* client: doc fixes and renaming from code review


* taskrunner: use shared config struct for logmon fifo fields

* taskrunner: logmon only needs to be started once per task
2018-10-16 16:53:31 -07:00
Michael Schurter a4b4d7b266 consul service hook
Deregistration works but difficult to test due to terminal updates not
being fully implemented in the new client/ar/tr.
2018-10-16 16:53:29 -07:00
Alex Dadgar a78cefec18 use int64 2018-10-16 15:34:32 -07:00
Preetha Appan 7c0d8c646c
Change CPU/Disk/MemoryMB to int everywhere in new resource structs 2018-10-16 16:21:42 -05:00
Alex Dadgar 5a07f9f96e parse affinities and constraints on devices 2018-10-11 14:05:19 -07:00
Alex Dadgar 87cacb427f parse devices 2018-10-08 16:09:41 -07:00
Alex Dadgar 6b08b9d6b6 Define device request structs 2018-10-08 15:38:03 -07:00
Alex Dadgar 01f8e5b95f renames 2018-10-04 14:57:25 -07:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar bac5cb1e8b Scheduler uses allocated resources 2018-10-02 17:08:25 -07:00
Alex Dadgar 5c8697667e Node reserved resources 2018-09-29 18:44:55 -07:00
Alex Dadgar c031b22d03 Fix autopilot set enable custom upgrades flag 2018-09-25 13:49:35 -07:00
Alex Dadgar ca28afa3b2 small fixes 2018-09-15 16:42:38 -07:00
Alex Dadgar 3c19d01d7a server 2018-09-15 16:23:13 -07:00
Alex Dadgar 7739ef51ce agent + consul 2018-09-13 10:43:40 -07:00
Alex Dadgar 4f89cabd34
Merge pull request #4631 from hashicorp/f-plugin-config
Parse plugin configs
2018-09-04 17:04:13 -07:00
Alex Dadgar cc92cd92cd
Merge pull request #4642 from hashicorp/b-vet
Fix vet errors and use newer go version in travis
2018-09-04 17:04:02 -07:00
Alex Dadgar c6576ddac1 Fix make check errors 2018-09-04 16:03:52 -07:00
Preetha Appan 254e90ba0e
Fix linting 2018-09-04 16:10:11 -05:00
Preetha Appan 4f8e925b54
Move topk and delay heap to separate packages under lib 2018-09-04 16:10:11 -05:00
Preetha Appan 9bc0962527
Track top k nodes by norm score rather than top k nodes per scorer 2018-09-04 16:10:11 -05:00
Preetha Appan 063004502a
Fix linting 2018-09-04 16:10:11 -05:00
Preetha Appan 6ed527c636
Use heap to store top K scoring nodes.
Scoring metadata is now aggregated by scorer type to make it easier
to parse when reading it in the CLI.
2018-09-04 16:10:11 -05:00
Preetha Appan 659cfa3f64
Parsing and API layer for spread stanza 2018-09-04 16:10:11 -05:00
Preetha Appan f3c4eead91
Refactor method to return affinity struct, and add extra test at task level 2018-09-04 16:10:11 -05:00
Preetha Appan 9f0caa9c3d
Affinity parsing, api and structs 2018-09-04 16:10:11 -05:00
Alex Dadgar c0de218747 plugin dir parsing 2018-08-30 13:43:09 -07:00
Alex Dadgar bff1669ee4 Plugin config parsing 2018-08-29 17:06:01 -07:00
Wyatt Anderson 9dccb62489 Add documentation for eligibility toggle endpoint 2018-08-24 10:50:12 -04:00
Chelsea Komlo 0a69cdb304
Merge pull request #4565 from hashicorp/b-compare-cert-alg
Error if TLS Certificate signature algorithm isn't supported in cipher suites
2018-08-15 16:09:46 -04:00
Chelsea Holland Komlo 71a4ced04c fix up test failure due to keyloader instantiated on tls config during parsing 2018-08-15 00:59:29 -04:00
Chelsea Komlo a936c452b5
Merge pull request #4577 from hashicorp/b-panic-job-history
Fix for panic when submitting non-existent version for job history CLI command
2018-08-14 17:34:05 -04:00
Chelsea Holland Komlo ba7a46471f spelling fix 2018-08-14 14:06:04 -04:00
Chelsea Holland Komlo 3e85a197b8 fix panic for job history cli command when used with non-existent job version 2018-08-13 16:57:36 -04:00
Chelsea Holland Komlo e8379c9059 skip update checking if DisableUpdateCheck is set to true 2018-08-10 13:08:13 -04:00
Chelsea Holland Komlo b92098fd08 change function signature to take entire tls config object 2018-08-10 12:37:21 -04:00
Chelsea Holland Komlo 75d631a1c8 fix reload issue for tls certificates in dev mode 2018-07-05 17:08:31 -04:00
Dirk Kok 0cb04c2cbf
Fix typo in nomad node help text
The command `nomad node eligibility` doesn't accept the `-disabled` option, this should be `-disable`.
2018-06-14 15:48:01 +02:00
Alex Dadgar b61051b3cd
Merge pull request #4409 from hashicorp/r-client-packages
Refactor client packages
2018-06-13 17:32:25 -07:00
Alex Dadgar 300b1a7a15 Tests only use testlog package logger 2018-06-13 15:40:56 -07:00
Chelsea Komlo 03075b603a
Merge pull request #4399 from hashicorp/r-reload-refactor
Refactor logic for dynamic reloading
2018-06-13 13:35:12 -04:00
Alex Dadgar 90c2108bfb Fix gc tests + parallel destroy + small test fixes 2018-06-12 10:23:45 -07:00
Alex Dadgar f5ff509fa5 Refactor - wip 2018-06-12 10:23:45 -07:00
Alex Dadgar af5753d2cd bump version + generated files 2018-06-11 13:39:42 -07:00
Chelsea Holland Komlo 3b5d5c7be8 remove logic to reload RPC connections from agent 2018-06-08 13:14:40 -04:00
Alex Dadgar 0181f5defc test less of the monitor on travis 2018-06-07 15:47:03 -07:00
Alex Dadgar 8efe9696ad move log line 2018-06-07 15:12:51 -07:00
Chelsea Komlo d738976234
Merge pull request #4395 from hashicorp/b-vault-second
Fix for dynamically reloading vault
2018-06-07 18:03:00 -04:00
Chelsea Holland Komlo dcc9cdfeb7 fixup! comment and move to always log server reload operation 2018-06-07 17:12:36 -04:00
Chelsea Holland Komlo 9f6bd7bf3a move logic for testing equality for vault config 2018-06-07 16:23:50 -04:00
Chelsea Holland Komlo 282f37b1ee fix for dynamically reloading vault 2018-06-07 15:34:18 -04:00
Alex Dadgar cfaa52e55e
Merge pull request #4380 from hashicorp/b-drain-monitor
Monitoring non-draining node exits
2018-06-06 17:50:30 -07:00
Michael Schurter 0fc624133d
Merge pull request #4384 from hashicorp/b-global-log-flags
agent: global logger should use the same flags
2018-06-06 15:15:15 -07:00
Alex Dadgar 72effb8632 code review 2018-06-06 14:52:26 -07:00
Alex Dadgar c441c17927
Merge pull request #4382 from hashicorp/b-init
Progress deadline included in nomad init
2018-06-06 14:49:10 -07:00
Alex Dadgar d478b50393 indentation 2018-06-06 14:48:51 -07:00
Alex Dadgar 217231347f Handle force draining 2018-06-06 13:05:39 -07:00
Michael Schurter f8e12e6ee7 agent: global logger should use the same flags
Prior to this change logs from the global logger only used seconds:

```
2018/06/06 18:25:58 http: TLS handshake error from ...
```

After this change they properly use the microseconds flag:

```
2018/06/06 18:39:50.702447 http: TLS handshake error ...
```

They still lack a log level unfortunately.
2018-06-06 11:40:08 -07:00
Alex Dadgar 14c1bec157 progress deadline init 2018-06-06 10:30:47 -07:00
Alex Dadgar f4fccd7ed2 Monitoring non-draining node exits 2018-06-05 17:58:44 -07:00
Preetha Appan 82837839eb
Fix bug with determining when agent is a client
This fixes a bug introduced in commit e27caadca6 that sets a boolean flag
when the agent is a client. It incorrectly checked state before initializing
the client. This leads to Nomad clients not deregistering any services registered
in Consul after allocs are destroyed
2018-06-05 19:19:52 -05:00
Alex Dadgar c0386819b3 bump version/lint/generated files 2018-06-01 15:23:10 -07:00
Alex Dadgar 247f1edb11 spelling 2018-06-01 14:53:08 -07:00
Preetha Appan ce6d4a8d7a
Fix tests and move isClient to constructor 2018-06-01 15:59:53 -05:00
Preetha Appan a5bfaa098c
Fix unnecessary deregistration in consul sync
This commit fixes an issue where if a nomad client and server shared the same consul instance, the server would deregister any services and checks registered by clients for running tasks.
2018-06-01 14:48:25 -05:00
Alex Dadgar 40fec81315
Merge pull request #4277 from hashicorp/f-retry-join-clients
Add go-discover support to Nomad clients
2018-06-01 16:57:40 +00:00
Alex Dadgar 62665d8619 Fix node drain monitor 2018-05-31 15:50:05 -07:00
Alex Dadgar aca8d5cece Actually disable the schedulers 2018-05-31 13:11:11 -07:00
Alex Dadgar d098885b79 Disable schedulers for TestHTTP_AllocSnapshot_Atomic 2018-05-31 12:05:44 -07:00
Alex Dadgar 4765b62284 Improve validation/defaulting, handle start-join
This commit:
* Improves how we combine the old retry-* fields and the new stanza and
how it is validated
* Handles the new stanza setting start_join
* Fixes integration test to not bind to the standard port and instead be
randomized.
* Simplifies parsing of the old retry_interval
* Fixes the errors from retry join being masked
* Flags get parsed into new server_join stanza
2018-05-31 10:53:26 -07:00
Alex Dadgar e1bf8780b5 validation errors 2018-05-31 10:53:26 -07:00
Alex Dadgar a02fbe3e0f indentation 2018-05-31 10:53:26 -07:00
Chelsea Holland Komlo 2bf2af4378 ensure default value of 30s is set for server_join stanza 2018-05-31 10:50:04 -07:00
Chelsea Holland Komlo 307458d4a3 ignore default values for retry interval
add additional validation case
2018-05-31 10:50:04 -07:00
Chelsea Holland Komlo ebc758aa0e add stronger protections for nil pointers in server join merge 2018-05-31 10:50:04 -07:00
Chelsea Holland Komlo 10aff14509 update config parse test
documentation fixes
2018-05-31 10:50:04 -07:00
Chelsea Holland Komlo ac1411ce95 RetryInterval should be a time.Duration 2018-05-31 10:50:04 -07:00
Chelsea Holland Komlo e79bc29e1a set retryInterval and other code feedback 2018-05-31 10:50:04 -07:00
Chelsea Holland Komlo de03c884bc add further configuration validation for server_join 2018-05-31 10:50:04 -07:00
Chelsea Holland Komlo df7539b9d0 update documentation for server_join 2018-05-31 10:50:04 -07:00
Chelsea Holland Komlo a4e514e07f update server_join naming and improve logging 2018-05-31 10:50:03 -07:00
Chelsea Holland Komlo 064b5481e0 add server join info to server and client 2018-05-31 10:50:03 -07:00
Preetha Appan 7414395daa
Use constant in test 2018-05-30 17:27:04 -05:00
Preetha Appan 6cbd25945c
Add unit test to verify compatibility code for node drains 2018-05-30 17:14:53 -05:00
Preetha Appan 4f835790d7
Set node eligibility to true when old client calls disable 2018-05-30 16:54:07 -05:00
Preetha Appan 34db410b74
Fix failing test TestClientStatusRequest 2018-05-30 15:11:54 -05:00
Preetha Appan 2752204f26
Fix failing test TestHTTP_AllocAllGC 2018-05-30 15:11:54 -05:00
Chelsea Holland Komlo 19e4a5489b add support for tls PreferServerCipherSuites
add further tests for tls configuration
2018-05-25 13:20:00 -04:00
Chelsea Komlo af15dda45a
Merge pull request #4328 from hashicorp/r-single-tls-config-constructor
Refactor to prefer using NewTLSConfiguration constructor
2018-05-24 13:46:29 -04:00
Alex Dadgar b1de61e012
Merge pull request #4321 from hashicorp/f-network-info
Display bind/advertise addresses on agent startup
2018-05-24 17:30:56 +00:00
Charlie Voiselle bbbd385dff Fixed typo in deployment help text 2018-05-24 12:44:21 -04:00
Nick Ethier b62825b49c
command: fix node drain monitor case 2018-05-24 06:39:12 -04:00
Nick Ethier b1d2437cf6
command: add docs for node drain -monitor flag 2018-05-24 06:37:28 -04:00
Nick Ethier 3c55f89738
command: use 0 as index for monitor request 2018-05-24 06:37:28 -04:00
Nick Ethier b52d2e3e74
command: add '-monitor' flag to node drain 2018-05-24 06:37:25 -04:00
Chelsea Holland Komlo 38f611a7f2 refactor NewTLSConfiguration to pass in verifyIncoming/verifyOutgoing
add missing fields to TLS merge method
2018-05-23 18:35:30 -04:00
Alex Dadgar 51e67daf69 Use Tags when CanaryTags isn't specified
This PR fixes a bug where we weren't defaulting to `tags` when
`canary_tags` was empty and adds documentation.
2018-05-23 13:07:47 -07:00
Alex Dadgar dd52ec402c Display bind/advertise addresses on agent startup
Sample outputs from demo/vagrant/(server/client1).hcl and `nomad agent -dev` mode

Server:

```
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 192.168.1.75:4646; RPC: 192.168.1.75:4647; Serf: 192.168.1.75:4648
            Bind Addrs: HTTP: 0.0.0.0:4646; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648
                Client: false
             Log Level: DEBUG
                Region: global (DC: dc1)
                Server: true
               Version: 0.8.4-dev
```

Client:

```
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 192.168.1.75:5656
            Bind Addrs: HTTP: 0.0.0.0:5656
                Client: true
             Log Level: DEBUG
                Region: global (DC: dc1)
                Server: false
               Version: 0.8.4-dev
```

Dev:

```
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 127.0.0.1:4646; RPC: 127.0.0.1:4647; Serf: 127.0.0.1:4648
            Bind Addrs: HTTP: 127.0.0.1:4646; RPC: 127.0.0.1:4647; Serf: 127.0.0.1:4648
                Client: true
             Log Level: DEBUG
                Region: global (DC: dc1)
                Server: true
               Version: 0.8.4-dev
```
2018-05-22 15:14:33 -07:00
Alex Dadgar 44697efd9a safety guard 2018-05-22 14:45:34 -07:00
Alex Dadgar 586895965c Unit test for dev agent 2018-05-22 14:45:34 -07:00
Alex Dadgar 58d2a4c7c2 Do not bypass normal RPC codepath when running both client and server at once 2018-05-22 14:45:34 -07:00
Alex Dadgar 21c5ed850d Register events 2018-05-22 14:06:33 -07:00
Preetha 159888a856
Merge pull request #4274 from hashicorp/f-force-rescheduling
Add CLI and API support for forcing rescheduling of failed allocs
2018-05-21 16:24:22 -07:00
Preetha Appan 64ae37e19f
remove extra return 2018-05-21 18:00:14 -05:00
Chelsea Holland Komlo f0a5018a91 Add autocomplete where missing 2018-05-11 18:05:43 -04:00
Preetha Appan 3a8040e36f
Add new method EvaluateWithOptions to avoid breaking go API client 2018-05-11 14:18:53 -05:00
Preetha Appan e7d8ae70b2
more review feedback 2018-05-11 13:39:55 -05:00
Chelsea Komlo 687c26093c
Merge pull request #4269 from hashicorp/f-tls-remove-weak-standards
Configurable TLS cipher suites and versions; disallow weak ciphers
2018-05-11 08:11:46 -04:00
Nick Ethier b3612824ed
Merge pull request #4279 from hashicorp/f-short-init
Add job init '-short' command docs to website
2018-05-10 23:20:59 -04:00
Nick Ethier 29ddef040d
command: add autocomplete for init -short flag 2018-05-10 23:19:08 -04:00
Preetha Appan 24115138e8
unit test for job eval should detach 2018-05-10 15:30:44 -05:00
Preetha Appan e4ea18aee7
Add support for monitoring evals, and -detach/-verbose support 2018-05-10 15:02:58 -05:00
Preetha Appan bfa0937bbb
Code review feedback 2018-05-10 14:42:24 -05:00
Nick Ethier 5881e785c5
command: remove ephemeral disk from short init jobspec 2018-05-10 13:16:45 -04:00
Chelsea Holland Komlo 44f536f18e add support for configurable TLS minimum version 2018-05-09 18:07:12 -04:00
Chelsea Holland Komlo 796bae6f1b allow configurable cipher suites
disallow 3DES and RC4 ciphers

add documentation for tls_cipher_suites
2018-05-09 17:15:31 -04:00
Preetha Appan b12df3c64b
Added CLI for evaluating job given ID, and modified client API for evaluate to take a request payload 2018-05-09 15:04:27 -05:00
Preetha Appan c1b92c284e
Work in progress - force rescheduling of failed allocs 2018-05-08 17:26:57 -05:00
Preetha e7ae6e98d9
Merge pull request #4259 from hashicorp/f-deployment-improvements 2018-05-08 16:37:10 -05:00
Chelsea Holland Komlo 136635f04d only write error log line on error 2018-05-07 16:57:07 -04:00
Chelsea Holland Komlo 30584639b5 remove log line for empty addresses which could confuse on initalization 2018-05-07 16:57:07 -04:00
Chelsea Holland Komlo 24ff40df01 retry until all options are exhausted 2018-05-07 16:57:07 -04:00
Chelsea Holland Komlo ec4be4f871 ensure provider= is always the string prefix 2018-05-07 16:57:07 -04:00
Chelsea Holland Komlo 5422b1b088 update test for more realistic IP address from go-discover 2018-05-07 16:57:07 -04:00
Chelsea Holland Komlo 7e4d4f8088 comments and other fixups 2018-05-07 16:57:06 -04:00
Chelsea Holland Komlo 8f584f6474 add go-discover 2018-05-07 16:57:06 -04:00
Chelsea Holland Komlo 25ad6eaf96 refactor to retryJoiner interface 2018-05-07 16:57:06 -04:00
Michael Schurter f1d13683e6
consul: remove services with/without canary tags
Guard against Canary being set to false at the same time as an
allocation is being stopped: this could cause RemoveTask to be called
with the wrong Canary value and leaking a service.

Deleting both Canary values is the safest route.
2018-05-07 14:55:01 -05:00
Michael Schurter 50e04c976e
consul: support canary tags for services
Also refactor Consul ServiceClient to take a struct instead of a massive
set of arguments. Meant updating a lot of code but it should be far
easier to extend in the future as you will only need to update a single
struct instead of every single call site.

Adds an e2e test for canary tags.
2018-05-07 14:55:01 -05:00
Alex Dadgar f4af30fbb5
Canary tags structs 2018-05-07 14:50:01 -05:00
Alex Dadgar f95ab4ade8
Mark canaries on creation, and unmark on promotion 2018-05-07 14:50:01 -05:00
Alex Dadgar 6f92e0711c
CLI 2018-05-07 14:50:01 -05:00
Alex Dadgar ee50789c22
Initial implementation 2018-05-07 14:50:01 -05:00