Commit graph

296 commits

Author SHA1 Message Date
Seth Hoenig f71d1755a6 env/aws: update ec2 cpu data
using tools/ec2info

```
$ go run .
```
2021-07-22 09:32:46 -05:00
Seth Hoenig 209e2d6d81 consul: pr cleanup namespace probe function signatures 2021-06-07 15:41:01 -05:00
Seth Hoenig 519429a2de consul: probe consul namespace feature before using namespace api
This PR changes Nomad's wrapper around the Consul NamespaceAPI so that
it will detect if the Consul Namespaces feature is enabled before making
a request to the Namespaces API. Namespaces are not enabled in Consul OSS,
and require a suitable license to be used with Consul ENT.

Previously Nomad would check for a 404 status code when makeing a request
to the Namespaces API to "detect" if Consul OSS was being used. This does
not work for Consul ENT with Namespaces disabled, which returns a 500.

Now we avoid requesting the namespace API altogether if Consul is detected
to be the OSS sku, or if the Namespaces feature is not licensed. Since
Consul can be upgraded from OSS to ENT, or a new license applied, we cache
the value for 1 minute, refreshing on demand if expired.

Fixes https://github.com/hashicorp/nomad-enterprise/issues/575

Note that the ticket originally describes using attributes from https://github.com/hashicorp/nomad/issues/10688.
This turns out not to be possible due to a chicken-egg situation between
bootstrapping the agent and setting up the consul client. Also fun: the
Consul fingerprinter creates its own Consul client, because there is no
[currently] no way to pass the agent's client through the fingerprint factory.
2021-06-07 12:19:25 -05:00
Seth Hoenig 3346432d58 client/fingerprint/consul: add new attributes to consul fingerprinter
This PR adds new probes for detecting these new Consul related attributes:

Consul namespaces are a Consul enterprise feature that may be disabled depending
on the enterprise license associated with the Consul servers. Having this attribute
available will enable Nomad to properly decide whether to query the Consul Namespace
API.

Consul connect must be explicitly enabled before Connect APIs will work. Currently
Nomad only checks for a minimum Consul version. Having this attribute available will
enable Nomad to properly schedule Connect tasks only on nodes with a Consul agent that
has Connect enabled.

Consul connect requires the grpc port to be explicitly set before Connect APIs will work.
Currently Nomad only checks for a minimal Consul version. Having this attribute available
will enable Nomad to schedule Connect tasks only on nodes with a Consul agent that has
the grpc listener enabled.
2021-06-03 12:49:22 -05:00
Seth Hoenig b548cf6816 client/fingerprint/consul: refactor the consul fingerprinter to test individual attributes
This PR refactors the ConsulFingerprint implementation, breaking individual attributes
into individual functions to make testing them easier. This is in preparation for
additional extractors about to be added. Behavior should be otherwise unchanged.

It adds the attribute consul.sku, which can be used to differentiate between Consul
OSS vs Consul ENT.
2021-06-03 12:48:39 -05:00
Seth Hoenig f53c30c684 aws_env: update ec2 instances
Generate updated list using tools/ec2info
2021-04-22 11:33:51 -06:00
Nick Ethier 155a2ca5fb client/ar: thread through cpuset manager 2021-04-13 13:28:36 -04:00
Nick Ethier b6b74a98a9 client/fingerprint: move existing cgroup concerns to cgutil 2021-04-13 13:28:36 -04:00
Nick Ethier edc0da9040 client: only fingerprint reservable cores via cgroups, allowing manual override for other platforms 2021-04-13 13:28:15 -04:00
Nick Ethier bed4e92b61 fingerprint: implement client fingerprinting of reservable cores
on Linux systems this is derived from the configure cpuset cgroup parent (defaults to /nomad)
for non Linux systems and Linux systems where cgroups are not enabled, the client defaults to using all cores
2021-04-13 13:28:15 -04:00
Andrii Chubatiuk d8df568f10
support multiple host network aliases for the same interface 2021-04-13 09:33:33 -04:00
Yoan Blanc ac0d5d8bd3
chore: bump golangci-lint from v1.24 to v1.39
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2021-04-03 09:50:23 +02:00
Tim Gross f820021f9e deps: bump gopsutil to v3.21.2 2021-03-30 16:02:51 -04:00
Florian Apolloner df7e22362d Properly detect unloaded dynamic modules on RHEL derivates. Fixes #9776
The modules.dep file on RHEL includes .xz for compressed kernel modules.
2021-01-12 18:28:00 +01:00
Joel May 13faf0d79e Allow client.cpu_total_compute to override attr.cpu.totalcompute 2021-01-07 15:31:11 -05:00
Kris Hicks 0a3a748053
Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
Seth Hoenig eb7cdce52b client/fingerprint/cpu: use fallback total compute value if cpu not detected
Previously, Nomad would fail to startup if the CPU fingerprinter could
not detect the cpu total compute (i.e. cores * mhz). This is common on
some EC2 instance types (graviton class), where the env_aws fingerprinter
will override the detected CPU performance with a more accurate value
anyway.

Instead of crashing on startup, have Nomad use a low default for available
cpu performance of 1000 ticks (e.g. 1 core * 1 GHz). This enables Nomad
to get past the useless cpu fingerprinting on those EC2 instances. The
crashing error message is now a log statement suggesting the setting of
cpu_total_compute in client config.

Fixes #7989
2020-12-09 10:35:58 -06:00
Seth Hoenig 1ca5ea3240 env_aws: run ec2info to update ec2 info
Use `tools/ec2info` to update the generated table of instance types.
`$ go run .`
2020-12-02 09:35:03 -06:00
Roman Vynar b957f87cd7 Add compute/zone to Azure fingerprinting 2020-11-26 13:26:51 +02:00
Seth Hoenig 9960f96446 client/fingerprint: detect unloaded dynamic bridge kernel module
In Nomad v0.12.0, the client added additional fingerprinting around the
presense of the bridge kernel module. The fingerprinter only checked in
`/proc/modules` which is a list of loaded modules. In some cases, the
bridge kernel module is builtin rather than dynamically loaded. The fix
for that case is in #8721. However we were still missing the case where
the bridge module is dynamically loaded, but not yet loaded during the
startup of the Nomad agent. In this case the fingerprinter would believe
the bridge module was unavailable when really it gets loaded on demand.

This PR now has the fingerprinter scan the kernel module dependency file,
which will contain an entry for the bridge module even if it is not yet
loaded.

In summary, the client now looks for the bridge kernel module in
 - /proc/modules
 - /lib/modules/<kernel>/modules.builtin
 - /lib/modules/<kernel>/modules.dep

Closes #8423
2020-11-09 13:56:14 -06:00
Seth Hoenig 9b555fe6d5 env_aws: fixup test case node attr detection 2020-10-08 12:59:07 -05:00
Seth Hoenig e693d15a5b env_aws: get ec2 cpu perf data from AWS API
Previously, Nomad was using a hand-made lookup table for looking
up EC2 CPU performance characteristics (core count + speed = ticks).

This data was incomplete and incorrect depending on region. The AWS
API has the correct data but requires API keys to use (i.e. should not
be queried directly from Nomad).

This change introduces a lookup table generated by a small command line
tool in Nomad's tools module which uses the Amazon AWS API.

Running the tool requires AWS_* environment variables set.
  $ # in nomad/tools/cpuinfo
  $ go run .

Going forward, Nomad can incorporate regeneration of the lookup table
somewhere in the CI pipeline so that we remain up-to-date on the latest
offerings from EC2.

Fixes #7830
2020-10-08 12:01:09 -05:00
Landan Cheruka 023a2d36b7
fingerprint: changed unique.platform.azure.hostname to unique.platform.azure.name (#9016) 2020-10-02 16:50:12 -04:00
Javier Heredia 103ac0a37f
Add consul segment fingerprint (#7214) 2020-10-02 15:15:59 -04:00
Landan Cheruka 3df1802119
client: added azure fingerprinting support (#8979) 2020-10-01 09:10:27 -04:00
Joel May 2adc5bdec7
fingerprinting: add AWS MAC and public-ipv6 (#8887) 2020-09-17 09:03:01 -04:00
Mark Lee cd23fd7ca2 refactor lookup code 2020-08-24 12:24:16 +09:00
Mark Lee cd7aabca72 lookup kernel builtin modules too 2020-08-24 11:09:13 +09:00
Shengjing Zhu 7a4f48795d Adjust cgroup change in libcontainer 2020-08-20 00:31:07 +08:00
Mahmood Ali 91ba9ccefe honor config.NetworkInterface in NodeNetworks 2020-07-21 15:43:45 -04:00
Mahmood Ali 026d8c6eed tests: avoid os.Exit in a test 2020-07-01 15:25:13 -04:00
Nick Ethier f0559a8162
multi-interface network support 2020-06-19 09:42:10 -04:00
Nick Ethier 4a44deaa5c CNI Implementation (#7518) 2020-06-18 11:05:29 -07:00
Seth Hoenig 880c4e23d3 env_aws: combine 3 log lines into 1 2020-04-29 10:47:36 -06:00
Seth Hoenig 67303b666c
env_aws: downgrade log line
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-04-29 10:34:26 -06:00
Seth Hoenig 5ddc607701
env_aws: fixup log line
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-04-29 10:33:53 -06:00
Seth Hoenig f8596a3602 env_aws: use best-effort lookup table for CPU performance in EC2
Fixes #7681

The current behavior of the CPU fingerprinter in AWS is that it
reads the **current** speed from `/proc/cpuinfo` (`CPU MHz` field).

This is because the max CPU frequency is not available by reading
anything on the EC2 instance itself. Normally on Linux one would
look at e.g. `sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq`
or perhaps parse the values from the `CPU max MHz` field in
`/proc/cpuinfo`, but those values are not available.

Furthermore, no metadata about the CPU is made available in the
EC2 metadata service.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html

Since `go-psutil` cannot determine the max CPU speed it defaults to
the current CPU speed, which could be basically any number between
0 and the true max. This is particularly bad on large, powerful
reserved instances which often idle at ~800 MHz while Nomad does
its fingerprinting (typically IO bound), which Nomad then uses as
the max, which results in severe loss of available resources.

Since the CPU specification is unavailable programmatically (at least
not without sudo) use a best-effort lookup table. This table was
generated by going through every instance type in AWS documentation
and copy-pasting the numbers.
https://aws.amazon.com/ec2/instance-types/

This approach obviously is not ideal as future instance types will
need to be added as they are introduced to AWS. However, using the
table should only be an improvement over the status quo since right
now Nomad miscalculates available CPU resources on all instance types.
2020-04-28 19:01:33 -06:00
Mahmood Ali 7985b1893f fixup! tests: Add tests for EC2 Metadata immitation cases 2020-03-26 11:37:54 -04:00
Mahmood Ali a1e7378c7b fixup! tests: Add tests for EC2 Metadata immitation cases 2020-03-26 11:33:44 -04:00
Mahmood Ali 1d50379bc6 fingerprint: handle incomplete AWS immitation APIs
Fix a regression where we accidentally started treating non-AWS
environments as AWS environments, resulting in bad networking settings.

Two factors some at play:

First, in [1], we accidentally switched the ultimate AWS test from
checking `ami-id` to `instance-id`.  This means that nomad started
treating more environments as AWS; e.g. Hetzner implements `instance-id`
but not `ami-id`.

Second, some of these environments return empty values instead of
errors!  Hetzner returns empty 200 response for `local-ipv4`, resulting
into bad networking configuration.

This change fix the situation by restoring the check to `ami-id` and
ensuring that we only set network configuration when the ip address is
not-empty.  Also, be more defensive around response whitespace input.

[1] https://github.com/hashicorp/nomad/pull/6779
2020-03-26 11:23:15 -04:00
Mahmood Ali b3de5d5721 tests: Add tests for EC2 Metadata immitation cases
Test that nomad doesn't set empty/bad network configuration when in an
environment that does incomplete immitation of EC2 Metadata API.
2020-03-26 11:13:21 -04:00
Danielle b006be623d
Update client/fingerprint/env_aws.go
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2019-12-16 14:48:52 +01:00
Danielle Lancashire 5a87b3ab4b
env_aws: Disable Retries and set Session cfg
Previously, Nomad used hand rolled HTTP requests to interact with the
EC2 metadata API. Recently however, we switched to using the AWS SDK for
this fingerprinting.

The default behaviour of the AWS SDK is to perform retries with
exponential backoff when a request fails. This is problematic for Nomad,
because interacting with the EC2 API is in our client start path.

Here we revert to our pre-existing behaviour of not performing retries
in the fast path, as if the metadata service is unavailable, it's likely
that nomad is not running in AWS.
2019-12-16 10:56:32 +01:00
Mahmood Ali 293276a457 fingerprint code refactor
Some code cleanup:

* Use a field for setting EC2 metadata instead of env-vars in testing;
but keep environment variables for backward compatibility reasons

* Update tests to use testify
2019-11-26 10:51:28 -05:00
Mahmood Ali 1e48f8e20d fingerprint: avoid api query if config overrides it 2019-11-26 10:51:28 -05:00
Mahmood Ali 5bb9089431 fingerprint: use ec2metadata package 2019-11-26 10:51:27 -05:00
Mahmood Ali 262dcb0842 Revert "lint: ignore generated windows syscall wrappers"
This reverts commit 482862e6ab0f8db748367bb1eefc2efd11fbe11a.
2019-10-22 08:23:44 -04:00
Michael Schurter 2992cb80b0 Remove 0.10.0-rc1 generated files 2019-10-10 13:31:42 -07:00
Mahmood Ali b6bf7f9a6c
Merge pull request #6260 from hashicorp/c-circleci-tweak-20190903
ci: ignore nested pkgs in GOTEST_PKGS_EXCLUDE
2019-09-11 11:17:10 -07:00
Danielle aa5605fce1
fingerprint: Add backwards compatibility comment
Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>
2019-09-04 17:38:35 +02:00