Related to #4280
This PR adds
`client.allocs.<job>.<group>.<alloc>.<task>.memory.allocated` as a gauge
in bytes to metrics to ease calculating how close a task is to OOMing.
```
'nomad.client.allocs.memory.allocated.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 268435456.000
'nomad.client.allocs.memory.cache.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 5677056.000
'nomad.client.allocs.memory.kernel_max_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000
'nomad.client.allocs.memory.kernel_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000
'nomad.client.allocs.memory.max_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 8908800.000
'nomad.client.allocs.memory.rss.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 876544.000
'nomad.client.allocs.memory.swap.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000
'nomad.client.allocs.memory.usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 8208384.000
```
Adds nomad exec support in our API, by hitting the websocket endpoint.
We introduce API structs that correspond to the drivers streaming exec structs.
For creating the websocket connection, we reuse the transport setting from api
http client.
This adds a websocket endpoint for handling `nomad exec`.
The endpoint is a websocket interface, as we require a bi-directional
streaming (to handle both input and output), which is not very appropriate for
plain HTTP 1.0. Using websocket makes implementing the web ui a bit simpler. I
considered using golang http hijack capability to treat http request as a plain
connection, but the web interface would be too complicated potentially.
Furthermore, the API endpoint operates against the raw core nomad exec streaming
datastructures, defined in protobuf, with json serializer. Our APIs use json
interfaces in general, and protobuf generates json friendly golang structs.
Reusing the structs here simplify interface and reduce conversion overhead.
Since one allocation is preempted, the alloc factory creates a new alloc
that wasn't guaranteed to be running. When it is the first alloc row in
the table, then the alloc row detail test fails because non-running
allocs don't have metrics. The fix was to manually update all the alloc
clientStatuses.
- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section