open-nomad/client
Danielle Lancashire 4f2343e1c0
client: Return empty values when host stats fail
Currently, there is an issue when running on Windows whereby under some
circumstances the Windows stats API's will begin to return errors (such
as internal timeouts) when a client is under high load, and potentially
other forms of resource contention / system states (and other unknown
cases).

When an error occurs during this collection, we then short circuit
further metrics emission from the client until the next interval.

This can be problematic if it happens for a sustained number of
intervals, as our metrics aggregator will begin to age out older
metrics, and we will eventually stop emitting various types of metrics
including `nomad.client.unallocated.*` metrics.

However, when metrics collection fails on Linux, gopsutil will in many cases
(e.g cpu.Times) silently return 0 values, rather than an error.

Here, we switch to returning empty metrics in these failures, and
logging the error at the source. This brings the behaviour into line
with Linux/Unix platforms, and although making aggregation a little
sadder on intermittent failures, will result in more desireable overall
behaviour of keeping metrics available for further investigation if
things look unusual.
2019-09-19 01:22:07 +02:00
..
allocdir connect: add unix socket to proxy grpc for envoy (#6232) 2019-09-03 08:43:38 -07:00
allochealth connect: add group.service stanza support 2019-07-31 01:04:05 -04:00
allocrunner client: Return empty values when host stats fail 2019-09-19 01:22:07 +02:00
allocwatcher goimports 2019-01-22 15:44:31 -08:00
config test: expand symlink for temp dir for macOS compatibility (#6303) 2019-09-10 12:20:09 -04:00
consul support script checks for task group services (#6197) 2019-09-03 15:09:04 -04:00
devicemanager initialize device manager stats interval 2019-08-23 14:58:34 -04:00
fingerprint Merge pull request #6260 from hashicorp/c-circleci-tweak-20190903 2019-09-11 11:17:10 -07:00
interfaces Populate alloc stats API with device stats 2018-11-16 10:26:32 -05:00
lib ar: plumb client config for networking into the network hook 2019-07-31 01:04:06 -04:00
logmon close file handle when FileRotator object will closed. Fixes https://github.com/hashicorp/nomad/issues/6309 (#6323) 2019-09-13 10:31:13 -04:00
pluginmanager implement client endpoint of nomad exec 2019-05-09 16:49:08 -04:00
servers client: drop unused DC field from servers list 2019-05-20 14:19:15 -07:00
state test: fix NewMemDB API change 2019-03-04 13:37:20 -08:00
stats client: Return empty values when host stats fail 2019-09-19 01:22:07 +02:00
structs remove generated code 2019-09-06 19:24:15 +00:00
taskenv Merge pull request #6080 from lchayoun/bug-6079 2019-09-11 11:17:24 -07:00
testutil connect: task hook for bootstrapping envoy sidecar 2019-08-22 08:15:32 -07:00
vaultclient vault: fix data races 2019-04-16 11:22:44 -07:00
acl.go aux: helper method that returns token as well as ACL policy 2019-04-30 10:23:56 -04:00
acl_test.go tests: explicitly cleanup after clients 2018-10-17 10:06:59 -07:00
alloc_endpoint.go client config flag to disable remote exec 2019-06-03 15:31:39 -04:00
alloc_endpoint_test.go client config flag to disable remote exec 2019-06-03 15:31:39 -04:00
alloc_watcher_e2e_test.go tests: enable and fix tests requiring mock driver 2019-01-10 10:10:11 -05:00
client.go client: Return empty values when host stats fail 2019-09-19 01:22:07 +02:00
client_stats_endpoint.go Server side impl + touch ups 2018-02-15 13:59:02 -08:00
client_stats_endpoint_test.go tests: explicitly cleanup after clients 2018-10-17 10:06:59 -07:00
client_test.go rename to hasLocalState, and ignore clientstate 2019-08-28 11:44:48 -04:00
driver_manager_test.go tests: fix data race in client TestDriverManager_Fingerprint_Periodic 2019-05-21 09:49:56 -04:00
fingerprint_manager.go goimports until make check is happy 2019-01-23 06:27:14 -08:00
fingerprint_manager_test.go client/drivermananger: add driver manager 2018-12-18 22:55:18 -05:00
fs_endpoint.go implement client endpoint of nomad exec 2019-05-09 16:49:08 -04:00
fs_endpoint_test.go Infer content type in alloc fs stat endpoint 2019-06-28 20:31:28 -05:00
gc.go Plugins use parent loggers 2019-01-11 11:36:37 -08:00
gc_test.go test: copy AR's Alloc before mutating 2018-12-19 15:48:02 -08:00
node_updater.go client: wait for batched driver updated 2019-04-19 09:00:24 -04:00
rpc.go implement client endpoint of nomad exec 2019-05-09 16:49:08 -04:00
rpc_test.go tests: explicitly cleanup after clients 2018-10-17 10:06:59 -07:00
testing.go goimports until make check is happy 2019-01-23 06:27:14 -08:00
util.go client: defensive against getting stale alloc updates 2019-06-29 04:17:35 -05:00
util_test.go Update state with server 2018-10-16 16:53:29 -07:00