CVE-2019-5736: Update libcontainer depedencies (#5334)

* CVE-2019-5736: Update libcontainer depedencies

Libcontainer is vulnerable to a runc container breakout, that was
reported as CVE-2019-5736[1].  Upgrading vendored libcontainer with the fix.

The runc changes are captured in 369b920277 .

[1] https://seclists.org/oss-sec/2019/q1/119
This commit is contained in:
Mahmood Ali 2019-02-19 20:21:18 -05:00 committed by GitHub
parent 2bd79d757a
commit a394cd63f4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
39 changed files with 1580 additions and 794 deletions

View File

@ -649,7 +649,7 @@ func configureBasicCgroups(cfg *lconfigs.Config) error {
freezer := cgroupFs.FreezerGroup{}
subsystem := freezer.Name()
path, err := cgroups.FindCgroupMountpoint(subsystem)
path, err := cgroups.FindCgroupMountpoint("", subsystem)
if err != nil {
return fmt.Errorf("failed to find %s cgroup mountpoint: %v", subsystem, err)
}
@ -693,7 +693,7 @@ func JoinRootCgroup(subsystems []string) error {
mErrs := new(multierror.Error)
paths := map[string]string{}
for _, s := range subsystems {
mnt, _, err := cgroups.FindCgroupMountpointAndRoot(s)
mnt, _, err := cgroups.FindCgroupMountpointAndRoot("", s)
if err != nil {
multierror.Append(mErrs, fmt.Errorf("error getting cgroup path for subsystem: %s", s))
continue

View File

@ -148,6 +148,7 @@ config := &configs.Config{
{Type: configs.NEWPID},
{Type: configs.NEWUSER},
{Type: configs.NEWNET},
{Type: configs.NEWCGROUP},
}),
Cgroups: &configs.Cgroup{
Name: "test-container",
@ -323,5 +324,7 @@ generated when building libcontainer with docker.
## Copyright and license
Code and documentation copyright 2014 Docker, inc. Code released under the Apache 2.0 license.
Docs released under Creative commons.
Code and documentation copyright 2014 Docker, inc.
The code and documentation are released under the [Apache 2.0 license](../LICENSE).
The documentation is also released under Creative Commons Attribution 4.0 International License.
You may obtain a copy of the license, titled CC-BY-4.0, at http://creativecommons.org/licenses/by/4.0/.

View File

@ -2,7 +2,7 @@
This is the standard configuration for version 1 containers. It includes
namespaces, standard filesystem setup, a default Linux capability set, and
information about resource reservations. It also has information about any
information about resource reservations. It also has information about any
populated environment settings for the processes running inside a container.
Along with the configuration of how a container is created the standard also
@ -21,16 +21,17 @@ Minimum requirements:
### Namespaces
| Flag | Enabled |
| ------------ | ------- |
| CLONE_NEWPID | 1 |
| CLONE_NEWUTS | 1 |
| CLONE_NEWIPC | 1 |
| CLONE_NEWNET | 1 |
| CLONE_NEWNS | 1 |
| CLONE_NEWUSER | 1 |
| Flag | Enabled |
| --------------- | ------- |
| CLONE_NEWPID | 1 |
| CLONE_NEWUTS | 1 |
| CLONE_NEWIPC | 1 |
| CLONE_NEWNET | 1 |
| CLONE_NEWNS | 1 |
| CLONE_NEWUSER | 1 |
| CLONE_NEWCGROUP | 1 |
Namespaces are created for the container via the `clone` syscall.
Namespaces are created for the container via the `unshare` syscall.
### Filesystem
@ -41,10 +42,10 @@ the binaries and system libraries are local to that directory. Any binaries
to be executed must be contained within this rootfs.
Mounts that happen inside the container are automatically cleaned up when the
container exits as the mount namespace is destroyed and the kernel will
container exits as the mount namespace is destroyed and the kernel will
unmount all the mounts that were setup within that namespace.
For a container to execute properly there are certain filesystems that
For a container to execute properly there are certain filesystems that
are required to be mounted within the rootfs that the runtime will setup.
| Path | Type | Flags | Data |
@ -57,7 +58,7 @@ are required to be mounted within the rootfs that the runtime will setup.
| /sys | sysfs | MS_NOEXEC,MS_NOSUID,MS_NODEV,MS_RDONLY | |
After a container's filesystems are mounted within the newly created
After a container's filesystems are mounted within the newly created
mount namespace `/dev` will need to be populated with a set of device nodes.
It is expected that a rootfs does not need to have any device nodes specified
for `/dev` within the rootfs as the container will setup the correct devices
@ -75,25 +76,25 @@ that are required for executing a container's process.
**ptmx**
`/dev/ptmx` will need to be a symlink to the host's `/dev/ptmx` within
the container.
the container.
The use of a pseudo TTY is optional within a container and it should support both.
If a pseudo is provided to the container `/dev/console` will need to be
If a pseudo is provided to the container `/dev/console` will need to be
setup by binding the console in `/dev/` after it has been populated and mounted
in tmpfs.
| Source | Destination | UID GID | Mode | Type |
| --------------- | ------------ | ------- | ---- | ---- |
| *pty host path* | /dev/console | 0 0 | 0600 | bind |
| *pty host path* | /dev/console | 0 0 | 0600 | bind |
After `/dev/null` has been setup we check for any external links between
the container's io, STDIN, STDOUT, STDERR. If the container's io is pointing
to `/dev/null` outside the container we close and `dup2` the `/dev/null`
to `/dev/null` outside the container we close and `dup2` the `/dev/null`
that is local to the container's rootfs.
After the container has `/proc` mounted a few standard symlinks are setup
After the container has `/proc` mounted a few standard symlinks are setup
within `/dev/` for the io.
| Source | Destination |
@ -103,7 +104,7 @@ within `/dev/` for the io.
| /proc/self/fd/1 | /dev/stdout |
| /proc/self/fd/2 | /dev/stderr |
A `pivot_root` is used to change the root for the process, effectively
A `pivot_root` is used to change the root for the process, effectively
jailing the process inside the rootfs.
```c
@ -150,23 +151,28 @@ so that containers can be paused and resumed.
The parent process of the container's init must place the init pid inside
the correct cgroups before the initialization begins. This is done so
that no processes or threads escape the cgroups. This sync is
that no processes or threads escape the cgroups. This sync is
done via a pipe ( specified in the runtime section below ) that the container's
init process will block waiting for the parent to finish setup.
### IntelRdt
Intel platforms with new Xeon CPU support Intel Resource Director Technology
(RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which
currently supports L3 cache resource allocation.
Intel platforms with new Xeon CPU support Resource Director Technology (RDT).
Cache Allocation Technology (CAT) and Memory Bandwidth Allocation (MBA) are
two sub-features of RDT.
This feature provides a way for the software to restrict cache allocation to a
defined 'subset' of L3 cache which may be overlapping with other 'subsets'.
The different subsets are identified by class of service (CLOS) and each CLOS
has a capacity bitmask (CBM).
Cache Allocation Technology (CAT) provides a way for the software to restrict
cache allocation to a defined 'subset' of L3 cache which may be overlapping
with other 'subsets'. The different subsets are identified by class of
service (CLOS) and each CLOS has a capacity bitmask (CBM).
It can be used to handle L3 cache resource allocation for containers if
hardware and kernel support Intel RDT/CAT.
Memory Bandwidth Allocation (MBA) provides indirect and approximate throttle
over memory bandwidth for the software. A user controls the resource by
indicating the percentage of maximum memory bandwidth or memory bandwidth limit
in MBps unit if MBA Software Controller is enabled.
It can be used to handle L3 cache and memory bandwidth resources allocation
for containers if hardware and kernel support Intel RDT CAT and MBA features.
In Linux 4.10 kernel or newer, the interface is defined and exposed via
"resource control" filesystem, which is a "cgroup-like" interface.
@ -175,6 +181,9 @@ Comparing with cgroups, it has similar process management lifecycle and
interfaces in a container. But unlike cgroups' hierarchy, it has single level
filesystem layout.
CAT and MBA features are introduced in Linux 4.10 and 4.12 kernel via
"resource control" filesystem.
Intel RDT "resource control" filesystem hierarchy:
```
mount -t resctrl resctrl /sys/fs/resctrl
@ -182,63 +191,101 @@ tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
| |-- L3
| |-- cbm_mask
| |-- min_cbm_bits
| | |-- cbm_mask
| | |-- min_cbm_bits
| | |-- num_closids
| |-- MB
| |-- bandwidth_gran
| |-- delay_linear
| |-- min_bandwidth
| |-- num_closids
|-- cpus
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
|-- cpus
|-- ...
|-- schemata
|-- tasks
```
For runc, we can make use of `tasks` and `schemata` configuration for L3 cache
resource constraints.
For runc, we can make use of `tasks` and `schemata` configuration for L3
cache and memory bandwidth resources constraints.
The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the task ID
to the "tasks" file (which will automatically remove them from the previous
to the "tasks" file (which will automatically remove them from the previous
group to which they belonged). New tasks created by fork(2) and clone(2) are
added to the same group as their parent. If a pid is not in any sub group, it
is in root group.
added to the same group as their parent.
The file `schemata` has allocation masks/values for L3 cache on each socket,
which contains L3 cache id and capacity bitmask (CBM).
The file `schemata` has a list of all the resources available to this group.
Each resource (L3 cache, memory bandwidth) has its own line and format.
L3 cache schema:
It has allocation bitmasks/values for L3 cache on each socket, which
contains L3 cache id and capacity bitmask (CBM).
```
Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
```
For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0`
Which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.
For example, on a two-socket machine, the schema line could be "L3:0=ff;1=c0"
which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.
The valid L3 cache CBM is a *contiguous bits set* and number of bits that can
be set is less than the max bit. The max bits in the CBM is varied among
supported Intel Xeon platforms. In Intel RDT "resource control" filesystem
layout, the CBM in a group should be a subset of the CBM in root. Kernel will
check if it is valid when writing. e.g., 0xfffff in root indicates the max bits
of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM
values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.
supported Intel CPU models. Kernel will check if it is valid when writing.
e.g., default value 0xfffff in root indicates the max bits of CBM is 20
bits, which mapping to entire L3 cache capacity. Some valid CBM values to
set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.
For more information about Intel RDT/CAT kernel interface:
Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which contains
L3 cache id and memory bandwidth.
```
Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
```
For example, on a two-socket machine, the schema line could be "MB:0=20;1=70"
The minimum bandwidth percentage value for each CPU model is predefined and
can be looked up through "info/MB/min_bandwidth". The bandwidth granularity
that is allocated is also dependent on the CPU model and can be looked up at
"info/MB/bandwidth_gran". The available bandwidth control steps are:
min_bw + N * bw_gran. Intermediate values are rounded to the next control
step available on the hardware.
If MBA Software Controller is enabled through mount option "-o mba_MBps"
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
We could specify memory bandwidth in "MBps" (Mega Bytes per second) unit
instead of "percentages". The kernel underneath would use a software feedback
mechanism or a "Software Controller" which reads the actual bandwidth using
MBM counters and adjust the memory bandwidth percentages to ensure:
"actual memory bandwidth < user specified memory bandwidth".
For example, on a two-socket machine, the schema line could be
"MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on socket 0
and 7000 MBps memory bandwidth limit on socket 1.
For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
An example for runc:
```
An example for runc:
Consider a two-socket machine with two L3 caches where the default CBM is
0xfffff and the max CBM length is 20 bits. With this configuration, tasks
inside the container only have access to the "upper" 80% of L3 cache id 0 and
the "lower" 50% L3 cache id 1:
0x7ff and the max CBM length is 11 bits, and minimum memory bandwidth of 10%
with a memory bandwidth granularity of 10%.
Tasks inside the container only have access to the "upper" 7/11 of L3 cache
on socket 0 and the "lower" 5/11 L3 cache on socket 1, and may use a
maximum memory bandwidth of 20% on socket 0 and 70% on socket 1.
"linux": {
"intelRdt": {
"l3CacheSchema": "L3:0=ffff0;1=3ff"
}
"intelRdt": {
"closID": "guaranteed_group",
"l3CacheSchema": "L3:0=7f0;1=1f",
"memBwSchema": "MB:0=20;1=70"
}
}
```
### Security
### Security
The standard set of Linux capabilities that are set in a container
provide a good default for security and flexibility for the applications.
@ -288,8 +335,8 @@ provide a good default for security and flexibility for the applications.
Additional security layers like [apparmor](https://wiki.ubuntu.com/AppArmor)
and [selinux](http://selinuxproject.org/page/Main_Page) can be used with
the containers. A container should support setting an apparmor profile or
selinux process and mount labels if provided in the configuration.
the containers. A container should support setting an apparmor profile or
selinux process and mount labels if provided in the configuration.
Standard apparmor profile:
```c
@ -324,17 +371,17 @@ profile <profile_name> flags=(attach_disconnected,mediate_deleted) {
### Runtime and Init Process
During container creation the parent process needs to talk to the container's init
During container creation the parent process needs to talk to the container's init
process and have a form of synchronization. This is accomplished by creating
a pipe that is passed to the container's init. When the init process first spawns
a pipe that is passed to the container's init. When the init process first spawns
it will block on its side of the pipe until the parent closes its side. This
allows the parent to have time to set the new process inside a cgroup hierarchy
and/or write any uid/gid mappings required for user namespaces.
allows the parent to have time to set the new process inside a cgroup hierarchy
and/or write any uid/gid mappings required for user namespaces.
The pipe is passed to the init process via FD 3.
The application consuming libcontainer should be compiled statically. libcontainer
does not define any init process and the arguments provided are used to `exec` the
process inside the application. There should be no long running init within the
process inside the application. There should be no long running init within the
container spec.
If a pseudo tty is provided to a container it will open and `dup2` the console
@ -344,10 +391,10 @@ as `/dev/console`.
An extra set of mounts are provided to a container and setup for use. A container's
rootfs can contain some non portable files inside that can cause side effects during
execution of a process. These files are usually created and populated with the container
specific information via the runtime.
specific information via the runtime.
**Extra runtime files:**
* /etc/hosts
* /etc/hosts
* /etc/resolv.conf
* /etc/hostname
* /etc/localtime
@ -360,7 +407,7 @@ these apply to processes within a container.
| Type | Value |
| ------------------- | ------------------------------ |
| Parent Death Signal | SIGKILL |
| Parent Death Signal | SIGKILL |
| UID | 0 |
| GID | 0 |
| GROUPS | 0, NULL |
@ -373,15 +420,15 @@ these apply to processes within a container.
## Actions
After a container is created there is a standard set of actions that can
be done to the container. These actions are part of the public API for
be done to the container. These actions are part of the public API for
a container.
| Action | Description |
| -------------- | ------------------------------------------------------------------ |
| Get processes | Return all the pids for processes running inside a container |
| Get processes | Return all the pids for processes running inside a container |
| Get Stats | Return resource statistics for the container as a whole |
| Wait | Waits on the container's init process ( pid 1 ) |
| Wait Process | Wait on any of the container's processes returning the exit status |
| Wait Process | Wait on any of the container's processes returning the exit status |
| Destroy | Kill the container's init process and remove any filesystem state |
| Signal | Send a signal to the container's init process |
| Signal Process | Send a signal to any of the container's processes |

View File

@ -65,7 +65,7 @@ type subsystem interface {
type Manager struct {
mu sync.Mutex
Cgroups *configs.Cgroup
Rootless bool
Rootless bool // ignore permission-related errors
Paths map[string]string
}
@ -174,7 +174,7 @@ func (m *Manager) Apply(pid int) (err error) {
m.Paths[sys.Name()] = p
if err := sys.Apply(d); err != nil {
// In the case of rootless, where an explicit cgroup path hasn't
// In the case of rootless (including euid=0 in userns), where an explicit cgroup path hasn't
// been set, we don't bail on error in case of permission problems.
// Cases where limits have been set (and we couldn't create our own
// cgroup) are handled by Set.
@ -236,6 +236,12 @@ func (m *Manager) Set(container *configs.Config) error {
for _, sys := range subsystems {
path := paths[sys.Name()]
if err := sys.Set(path, container.Cgroups); err != nil {
if m.Rootless && sys.Name() == "devices" {
continue
}
// When m.Rootless is true, errors from the device subsystem are ignored because it is really not expected to work.
// However, errors from other subsystems are not ignored.
// see @test "runc create (rootless + limits + no cgrouppath + no permission) fails with informative error"
if path == "" {
// We never created a path for this cgroup, so we cannot set
// limits for it (though we have already tried at this point).
@ -311,7 +317,7 @@ func getCgroupData(c *configs.Cgroup, pid int) (*cgroupData, error) {
}
func (raw *cgroupData) path(subsystem string) (string, error) {
mnt, err := cgroups.FindCgroupMountpoint(subsystem)
mnt, err := cgroups.FindCgroupMountpoint(raw.root, subsystem)
// If we didn't mount the subsystem, there is no point we make the path.
if err != nil {
return "", err

View File

@ -46,11 +46,7 @@ func (s *CpuGroup) ApplyDir(path string, cgroup *configs.Cgroup, pid int) error
}
// because we are not using d.join we need to place the pid into the procs file
// unlike the other subsystems
if err := cgroups.WriteCgroupProc(path, pid); err != nil {
return err
}
return nil
return cgroups.WriteCgroupProc(path, pid)
}
func (s *CpuGroup) SetRtSched(path string, cgroup *configs.Cgroup) error {
@ -83,11 +79,7 @@ func (s *CpuGroup) Set(path string, cgroup *configs.Cgroup) error {
return err
}
}
if err := s.SetRtSched(path, cgroup); err != nil {
return err
}
return nil
return s.SetRtSched(path, cgroup)
}
func (s *CpuGroup) Remove(d *cgroupData) error {

View File

@ -77,18 +77,14 @@ func (s *CpusetGroup) ApplyDir(dir string, cgroup *configs.Cgroup, pid int) erro
// The logic is, if user specified cpuset configs, use these
// specified configs, otherwise, inherit from parent. This makes
// cpuset configs work correctly with 'cpuset.cpu_exclusive', and
// keep backward compatbility.
// keep backward compatibility.
if err := s.ensureCpusAndMems(dir, cgroup); err != nil {
return err
}
// because we are not using d.join we need to place the pid into the procs file
// unlike the other subsystems
if err := cgroups.WriteCgroupProc(dir, pid); err != nil {
return err
}
return nil
return cgroups.WriteCgroupProc(dir, pid)
}
func (s *CpusetGroup) getSubsystemSettings(parent string) (cpus []byte, mems []byte, err error) {

View File

@ -0,0 +1,62 @@
// +build linux,!nokmem
package fs
import (
"errors"
"fmt"
"io/ioutil"
"os"
"path/filepath"
"strconv"
"syscall" // for Errno type only
"github.com/opencontainers/runc/libcontainer/cgroups"
"golang.org/x/sys/unix"
)
const cgroupKernelMemoryLimit = "memory.kmem.limit_in_bytes"
func EnableKernelMemoryAccounting(path string) error {
// Ensure that kernel memory is available in this kernel build. If it
// isn't, we just ignore it because EnableKernelMemoryAccounting is
// automatically called for all memory limits.
if !cgroups.PathExists(filepath.Join(path, cgroupKernelMemoryLimit)) {
return nil
}
// We have to limit the kernel memory here as it won't be accounted at all
// until a limit is set on the cgroup and limit cannot be set once the
// cgroup has children, or if there are already tasks in the cgroup.
for _, i := range []int64{1, -1} {
if err := setKernelMemory(path, i); err != nil {
return err
}
}
return nil
}
func setKernelMemory(path string, kernelMemoryLimit int64) error {
if path == "" {
return fmt.Errorf("no such directory for %s", cgroupKernelMemoryLimit)
}
if !cgroups.PathExists(filepath.Join(path, cgroupKernelMemoryLimit)) {
// We have specifically been asked to set a kmem limit. If the kernel
// doesn't support it we *must* error out.
return errors.New("kernel memory accounting not supported by this kernel")
}
if err := ioutil.WriteFile(filepath.Join(path, cgroupKernelMemoryLimit), []byte(strconv.FormatInt(kernelMemoryLimit, 10)), 0700); err != nil {
// Check if the error number returned by the syscall is "EBUSY"
// The EBUSY signal is returned on attempts to write to the
// memory.kmem.limit_in_bytes file if the cgroup has children or
// once tasks have been attached to the cgroup
if pathErr, ok := err.(*os.PathError); ok {
if errNo, ok := pathErr.Err.(syscall.Errno); ok {
if errNo == unix.EBUSY {
return fmt.Errorf("failed to set %s, because either tasks have already joined this cgroup or it has children", cgroupKernelMemoryLimit)
}
}
}
return fmt.Errorf("failed to write %v to %v: %v", kernelMemoryLimit, cgroupKernelMemoryLimit, err)
}
return nil
}

View File

@ -0,0 +1,15 @@
// +build linux,nokmem
package fs
import (
"errors"
)
func EnableKernelMemoryAccounting(path string) error {
return nil
}
func setKernelMemory(path string, kernelMemoryLimit int64) error {
return errors.New("kernel memory accounting disabled in this runc build")
}

View File

@ -5,23 +5,18 @@ package fs
import (
"bufio"
"fmt"
"io/ioutil"
"os"
"path/filepath"
"strconv"
"strings"
"syscall" // only for Errno
"github.com/opencontainers/runc/libcontainer/cgroups"
"github.com/opencontainers/runc/libcontainer/configs"
"golang.org/x/sys/unix"
)
const (
cgroupKernelMemoryLimit = "memory.kmem.limit_in_bytes"
cgroupMemorySwapLimit = "memory.memsw.limit_in_bytes"
cgroupMemoryLimit = "memory.limit_in_bytes"
cgroupMemorySwapLimit = "memory.memsw.limit_in_bytes"
cgroupMemoryLimit = "memory.limit_in_bytes"
)
type MemoryGroup struct {
@ -67,44 +62,6 @@ func (s *MemoryGroup) Apply(d *cgroupData) (err error) {
return nil
}
func EnableKernelMemoryAccounting(path string) error {
// Check if kernel memory is enabled
// We have to limit the kernel memory here as it won't be accounted at all
// until a limit is set on the cgroup and limit cannot be set once the
// cgroup has children, or if there are already tasks in the cgroup.
for _, i := range []int64{1, -1} {
if err := setKernelMemory(path, i); err != nil {
return err
}
}
return nil
}
func setKernelMemory(path string, kernelMemoryLimit int64) error {
if path == "" {
return fmt.Errorf("no such directory for %s", cgroupKernelMemoryLimit)
}
if !cgroups.PathExists(filepath.Join(path, cgroupKernelMemoryLimit)) {
// kernel memory is not enabled on the system so we should do nothing
return nil
}
if err := ioutil.WriteFile(filepath.Join(path, cgroupKernelMemoryLimit), []byte(strconv.FormatInt(kernelMemoryLimit, 10)), 0700); err != nil {
// Check if the error number returned by the syscall is "EBUSY"
// The EBUSY signal is returned on attempts to write to the
// memory.kmem.limit_in_bytes file if the cgroup has children or
// once tasks have been attached to the cgroup
if pathErr, ok := err.(*os.PathError); ok {
if errNo, ok := pathErr.Err.(syscall.Errno); ok {
if errNo == unix.EBUSY {
return fmt.Errorf("failed to set %s, because either tasks have already joined this cgroup or it has children", cgroupKernelMemoryLimit)
}
}
}
return fmt.Errorf("failed to write %v to %v: %v", kernelMemoryLimit, cgroupKernelMemoryLimit, err)
}
return nil
}
func setMemoryAndSwap(path string, cgroup *configs.Cgroup) error {
// If the memory update is set to -1 we should also
// set swap to -1, it means unlimited memory.

View File

@ -5,6 +5,7 @@ package systemd
import (
"errors"
"fmt"
"io/ioutil"
"math"
"os"
"path/filepath"
@ -71,13 +72,11 @@ const (
)
var (
connLock sync.Mutex
theConn *systemdDbus.Conn
hasStartTransientUnit bool
hasStartTransientSliceUnit bool
hasTransientDefaultDependencies bool
hasDelegateScope bool
hasDelegateSlice bool
connLock sync.Mutex
theConn *systemdDbus.Conn
hasStartTransientUnit bool
hasStartTransientSliceUnit bool
hasDelegateSlice bool
)
func newProp(name string, units interface{}) systemdDbus.Property {
@ -115,53 +114,6 @@ func UseSystemd() bool {
}
}
// Ensure the scope name we use doesn't exist. Use the Pid to
// avoid collisions between multiple libcontainer users on a
// single host.
scope := fmt.Sprintf("libcontainer-%d-systemd-test-default-dependencies.scope", os.Getpid())
testScopeExists := true
for i := 0; i <= testScopeWait; i++ {
if _, err := theConn.StopUnit(scope, "replace", nil); err != nil {
if dbusError, ok := err.(dbus.Error); ok {
if strings.Contains(dbusError.Name, "org.freedesktop.systemd1.NoSuchUnit") {
testScopeExists = false
break
}
}
}
time.Sleep(time.Millisecond)
}
// Bail out if we can't kill this scope without testing for DefaultDependencies
if testScopeExists {
return hasStartTransientUnit
}
// Assume StartTransientUnit on a scope allows DefaultDependencies
hasTransientDefaultDependencies = true
ddf := newProp("DefaultDependencies", false)
if _, err := theConn.StartTransientUnit(scope, "replace", []systemdDbus.Property{ddf}, nil); err != nil {
if dbusError, ok := err.(dbus.Error); ok {
if strings.Contains(dbusError.Name, "org.freedesktop.DBus.Error.PropertyReadOnly") {
hasTransientDefaultDependencies = false
}
}
}
// Not critical because of the stop unit logic above.
theConn.StopUnit(scope, "replace", nil)
// Assume StartTransientUnit on a scope allows Delegate
hasDelegateScope = true
dlScope := newProp("Delegate", true)
if _, err := theConn.StartTransientUnit(scope, "replace", []systemdDbus.Property{dlScope}, nil); err != nil {
if dbusError, ok := err.(dbus.Error); ok {
if strings.Contains(dbusError.Name, "org.freedesktop.DBus.Error.PropertyReadOnly") {
hasDelegateScope = false
}
}
}
// Assume we have the ability to start a transient unit as a slice
// This was broken until systemd v229, but has been back-ported on RHEL environments >= 219
// For details, see: https://bugzilla.redhat.com/show_bug.cgi?id=1370299
@ -206,7 +158,6 @@ func UseSystemd() bool {
}
// Not critical because of the stop unit logic above.
theConn.StopUnit(scope, "replace", nil)
theConn.StopUnit(slice, "replace", nil)
}
return hasStartTransientUnit
@ -267,9 +218,8 @@ func (m *Manager) Apply(pid int) error {
properties = append(properties, newProp("Delegate", true))
}
} else {
if hasDelegateScope {
properties = append(properties, newProp("Delegate", true))
}
// Assume scopes always support delegation.
properties = append(properties, newProp("Delegate", true))
}
// Always enable accounting, this gets us the same behaviour as the fs implementation,
@ -279,10 +229,9 @@ func (m *Manager) Apply(pid int) error {
newProp("CPUAccounting", true),
newProp("BlockIOAccounting", true))
if hasTransientDefaultDependencies {
properties = append(properties,
newProp("DefaultDependencies", false))
}
// Assume DefaultDependencies= will always work (the check for it was previously broken.)
properties = append(properties,
newProp("DefaultDependencies", false))
if c.Resources.Memory != 0 {
properties = append(properties,
@ -319,6 +268,12 @@ func (m *Manager) Apply(pid int) error {
newProp("BlockIOWeight", uint64(c.Resources.BlkioWeight)))
}
if c.Resources.PidsLimit > 0 {
properties = append(properties,
newProp("TasksAccounting", true),
newProp("TasksMax", uint64(c.Resources.PidsLimit)))
}
// We have to set kernel memory here, as we can't change it once
// processes have been attached to the cgroup.
if c.Resources.KernelMemory != 0 {
@ -464,7 +419,7 @@ func ExpandSlice(slice string) (string, error) {
}
func getSubsystemPath(c *configs.Cgroup, subsystem string) (string, error) {
mountpoint, err := cgroups.FindCgroupMountpoint(subsystem)
mountpoint, err := cgroups.FindCgroupMountpoint(c.Path, subsystem)
if err != nil {
return "", err
}
@ -584,6 +539,15 @@ func setKernelMemory(c *configs.Cgroup) error {
if err := os.MkdirAll(path, 0755); err != nil {
return err
}
// do not try to enable the kernel memory if we already have
// tasks in the cgroup.
content, err := ioutil.ReadFile(filepath.Join(path, "tasks"))
if err != nil {
return err
}
if len(content) > 0 {
return nil
}
return fs.EnableKernelMemoryAccounting(path)
}

View File

@ -13,40 +13,51 @@ import (
"strings"
"time"
"github.com/docker/go-units"
units "github.com/docker/go-units"
"golang.org/x/sys/unix"
)
const (
cgroupNamePrefix = "name="
CgroupNamePrefix = "name="
CgroupProcesses = "cgroup.procs"
)
// https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
func FindCgroupMountpoint(subsystem string) (string, error) {
mnt, _, err := FindCgroupMountpointAndRoot(subsystem)
func FindCgroupMountpoint(cgroupPath, subsystem string) (string, error) {
mnt, _, err := FindCgroupMountpointAndRoot(cgroupPath, subsystem)
return mnt, err
}
func FindCgroupMountpointAndRoot(subsystem string) (string, string, error) {
func FindCgroupMountpointAndRoot(cgroupPath, subsystem string) (string, string, error) {
// We are not using mount.GetMounts() because it's super-inefficient,
// parsing it directly sped up x10 times because of not using Sscanf.
// It was one of two major performance drawbacks in container start.
if !isSubsystemAvailable(subsystem) {
return "", "", NewNotFoundError(subsystem)
}
f, err := os.Open("/proc/self/mountinfo")
if err != nil {
return "", "", err
}
defer f.Close()
scanner := bufio.NewScanner(f)
return findCgroupMountpointAndRootFromReader(f, cgroupPath, subsystem)
}
func findCgroupMountpointAndRootFromReader(reader io.Reader, cgroupPath, subsystem string) (string, string, error) {
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
txt := scanner.Text()
fields := strings.Split(txt, " ")
for _, opt := range strings.Split(fields[len(fields)-1], ",") {
if opt == subsystem {
return fields[4], fields[3], nil
fields := strings.Fields(txt)
if len(fields) < 5 {
continue
}
if strings.HasPrefix(fields[4], cgroupPath) {
for _, opt := range strings.Split(fields[len(fields)-1], ",") {
if opt == subsystem {
return fields[4], fields[3], nil
}
}
}
}
@ -103,7 +114,7 @@ func FindCgroupMountpointDir() (string, error) {
}
if postSeparatorFields[0] == "cgroup" {
// Check that the mount is properly formated.
// Check that the mount is properly formatted.
if numPostFields < 3 {
return "", fmt.Errorf("Error found less than 3 fields post '-' in %q", text)
}
@ -151,19 +162,20 @@ func getCgroupMountsHelper(ss map[string]bool, mi io.Reader, all bool) ([]Mount,
Root: fields[3],
}
for _, opt := range strings.Split(fields[len(fields)-1], ",") {
if !ss[opt] {
seen, known := ss[opt]
if !known || (!all && seen) {
continue
}
if strings.HasPrefix(opt, cgroupNamePrefix) {
m.Subsystems = append(m.Subsystems, opt[len(cgroupNamePrefix):])
} else {
m.Subsystems = append(m.Subsystems, opt)
}
if !all {
numFound++
ss[opt] = true
if strings.HasPrefix(opt, CgroupNamePrefix) {
opt = opt[len(CgroupNamePrefix):]
}
m.Subsystems = append(m.Subsystems, opt)
numFound++
}
if len(m.Subsystems) > 0 || all {
res = append(res, m)
}
res = append(res, m)
}
if err := scanner.Err(); err != nil {
return nil, err
@ -187,7 +199,7 @@ func GetCgroupMounts(all bool) ([]Mount, error) {
allMap := make(map[string]bool)
for s := range allSubsystems {
allMap[s] = true
allMap[s] = false
}
return getCgroupMountsHelper(allMap, f, all)
}
@ -256,13 +268,13 @@ func GetInitCgroupPath(subsystem string) (string, error) {
}
func getCgroupPathHelper(subsystem, cgroup string) (string, error) {
mnt, root, err := FindCgroupMountpointAndRoot(subsystem)
mnt, root, err := FindCgroupMountpointAndRoot("", subsystem)
if err != nil {
return "", err
}
// This is needed for nested containers, because in /proc/self/cgroup we
// see pathes from host, which don't exist in container.
// see paths from host, which don't exist in container.
relCgroup, err := filepath.Rel(root, cgroup)
if err != nil {
return "", err
@ -342,7 +354,7 @@ func getControllerPath(subsystem string, cgroups map[string]string) (string, err
return p, nil
}
if p, ok := cgroups[cgroupNamePrefix+subsystem]; ok {
if p, ok := cgroups[CgroupNamePrefix+subsystem]; ok {
return p, nil
}
@ -453,10 +465,39 @@ func WriteCgroupProc(dir string, pid int) error {
}
// Dont attach any pid to the cgroup if -1 is specified as a pid
if pid != -1 {
if err := ioutil.WriteFile(filepath.Join(dir, CgroupProcesses), []byte(strconv.Itoa(pid)), 0700); err != nil {
return fmt.Errorf("failed to write %v to %v: %v", pid, CgroupProcesses, err)
}
if pid == -1 {
return nil
}
cgroupProcessesFile, err := os.OpenFile(filepath.Join(dir, CgroupProcesses), os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0700)
if err != nil {
return fmt.Errorf("failed to write %v to %v: %v", pid, CgroupProcesses, err)
}
defer cgroupProcessesFile.Close()
for i := 0; i < 5; i++ {
_, err = cgroupProcessesFile.WriteString(strconv.Itoa(pid))
if err == nil {
return nil
}
// EINVAL might mean that the task being added to cgroup.procs is in state
// TASK_NEW. We should attempt to do so again.
if isEINVAL(err) {
time.Sleep(30 * time.Millisecond)
continue
}
return fmt.Errorf("failed to write %v to %v: %v", pid, CgroupProcesses, err)
}
return err
}
func isEINVAL(err error) bool {
switch err := err.(type) {
case *os.PathError:
return err.Err == unix.EINVAL
default:
return false
}
return nil
}

View File

@ -186,12 +186,19 @@ type Config struct {
// callers keyring in this case.
NoNewKeyring bool `json:"no_new_keyring"`
// Rootless specifies whether the container is a rootless container.
Rootless bool `json:"rootless"`
// IntelRdt specifies settings for Intel RDT/CAT group that the container is placed into
// to limit the resources (e.g., L3 cache) the container has available
// IntelRdt specifies settings for Intel RDT group that the container is placed into
// to limit the resources (e.g., L3 cache, memory bandwidth) the container has available
IntelRdt *IntelRdt `json:"intel_rdt,omitempty"`
// RootlessEUID is set when the runc was launched with non-zero EUID.
// Note that RootlessEUID is set to false when launched with EUID=0 in userns.
// When RootlessEUID is set, runc creates a new userns for the container.
// (config.json needs to contain userns settings)
RootlessEUID bool `json:"rootless_euid,omitempty"`
// RootlessCgroups is set when unlikely to have the full access to cgroups.
// When RootlessCgroups is set, cgroups errors are ignored.
RootlessCgroups bool `json:"rootless_cgroups,omitempty"`
}
type Hooks struct {
@ -265,26 +272,23 @@ func (hooks Hooks) MarshalJSON() ([]byte, error) {
})
}
// HookState is the payload provided to a hook on execution.
type HookState specs.State
type Hook interface {
// Run executes the hook with the provided state.
Run(HookState) error
Run(*specs.State) error
}
// NewFunctionHook will call the provided function when the hook is run.
func NewFunctionHook(f func(HookState) error) FuncHook {
func NewFunctionHook(f func(*specs.State) error) FuncHook {
return FuncHook{
run: f,
}
}
type FuncHook struct {
run func(HookState) error
run func(*specs.State) error
}
func (f FuncHook) Run(s HookState) error {
func (f FuncHook) Run(s *specs.State) error {
return f.run(s)
}
@ -307,7 +311,7 @@ type CommandHook struct {
Command
}
func (c Command) Run(s HookState) error {
func (c Command) Run(s *specs.State) error {
b, err := json.Marshal(s)
if err != nil {
return err

View File

@ -4,4 +4,10 @@ type IntelRdt struct {
// The schema for L3 cache id and capacity bitmask (CBM)
// Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
L3CacheSchema string `json:"l3_cache_schema,omitempty"`
// The schema of memory bandwidth per L3 cache id
// Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
// The unit of memory bandwidth is specified in "percentages" by
// default, and in "MBps" if MBA Software Controller is enabled.
MemBwSchema string `json:"memBwSchema,omitempty"`
}

View File

@ -7,12 +7,13 @@ import (
)
const (
NEWNET NamespaceType = "NEWNET"
NEWPID NamespaceType = "NEWPID"
NEWNS NamespaceType = "NEWNS"
NEWUTS NamespaceType = "NEWUTS"
NEWIPC NamespaceType = "NEWIPC"
NEWUSER NamespaceType = "NEWUSER"
NEWNET NamespaceType = "NEWNET"
NEWPID NamespaceType = "NEWPID"
NEWNS NamespaceType = "NEWNS"
NEWUTS NamespaceType = "NEWUTS"
NEWIPC NamespaceType = "NEWIPC"
NEWUSER NamespaceType = "NEWUSER"
NEWCGROUP NamespaceType = "NEWCGROUP"
)
var (
@ -35,6 +36,8 @@ func NsName(ns NamespaceType) string {
return "user"
case NEWUTS:
return "uts"
case NEWCGROUP:
return "cgroup"
}
return ""
}
@ -68,6 +71,7 @@ func NamespaceTypes() []NamespaceType {
NEWNET,
NEWPID,
NEWNS,
NEWCGROUP,
}
}

View File

@ -9,12 +9,13 @@ func (n *Namespace) Syscall() int {
}
var namespaceInfo = map[NamespaceType]int{
NEWNET: unix.CLONE_NEWNET,
NEWNS: unix.CLONE_NEWNS,
NEWUSER: unix.CLONE_NEWUSER,
NEWIPC: unix.CLONE_NEWIPC,
NEWUTS: unix.CLONE_NEWUTS,
NEWPID: unix.CLONE_NEWPID,
NEWNET: unix.CLONE_NEWNET,
NEWNS: unix.CLONE_NEWNS,
NEWUSER: unix.CLONE_NEWUSER,
NEWIPC: unix.CLONE_NEWIPC,
NEWUTS: unix.CLONE_NEWUTS,
NEWPID: unix.CLONE_NEWPID,
NEWCGROUP: unix.CLONE_NEWCGROUP,
}
// CloneFlags parses the container's Namespaces options to set the correct

View File

@ -2,23 +2,18 @@ package validate
import (
"fmt"
"os"
"reflect"
"strings"
"github.com/opencontainers/runc/libcontainer/configs"
)
var (
geteuid = os.Geteuid
getegid = os.Getegid
)
func (v *ConfigValidator) rootless(config *configs.Config) error {
if err := rootlessMappings(config); err != nil {
// rootlessEUID makes sure that the config can be applied when runc
// is being executed as a non-root user (euid != 0) in the current user namespace.
func (v *ConfigValidator) rootlessEUID(config *configs.Config) error {
if err := rootlessEUIDMappings(config); err != nil {
return err
}
if err := rootlessMount(config); err != nil {
if err := rootlessEUIDMount(config); err != nil {
return err
}
@ -38,46 +33,24 @@ func hasIDMapping(id int, mappings []configs.IDMap) bool {
return false
}
func rootlessMappings(config *configs.Config) error {
if euid := geteuid(); euid != 0 {
if !config.Namespaces.Contains(configs.NEWUSER) {
return fmt.Errorf("rootless containers require user namespaces")
}
if len(config.UidMappings) == 0 {
return fmt.Errorf("rootless containers requires at least one UID mapping")
}
if len(config.GidMappings) == 0 {
return fmt.Errorf("rootless containers requires at least one GID mapping")
}
func rootlessEUIDMappings(config *configs.Config) error {
if !config.Namespaces.Contains(configs.NEWUSER) {
return fmt.Errorf("rootless container requires user namespaces")
}
return nil
}
// cgroup verifies that the user isn't trying to set any cgroup limits or paths.
func rootlessCgroup(config *configs.Config) error {
// Nothing set at all.
if config.Cgroups == nil || config.Cgroups.Resources == nil {
return nil
if len(config.UidMappings) == 0 {
return fmt.Errorf("rootless containers requires at least one UID mapping")
}
// Used for comparing to the zero value.
left := reflect.ValueOf(*config.Cgroups.Resources)
right := reflect.Zero(left.Type())
// This is all we need to do, since specconv won't add cgroup options in
// rootless mode.
if !reflect.DeepEqual(left.Interface(), right.Interface()) {
return fmt.Errorf("cannot specify resource limits in rootless container")
if len(config.GidMappings) == 0 {
return fmt.Errorf("rootless containers requires at least one GID mapping")
}
return nil
}
// mount verifies that the user isn't trying to set up any mounts they don't have
// the rights to do. In addition, it makes sure that no mount has a `uid=` or
// `gid=` option that doesn't resolve to root.
func rootlessMount(config *configs.Config) error {
func rootlessEUIDMount(config *configs.Config) error {
// XXX: We could whitelist allowed devices at this point, but I'm not
// convinced that's a good idea. The kernel is the best arbiter of
// access control.

View File

@ -38,14 +38,17 @@ func (v *ConfigValidator) Validate(config *configs.Config) error {
if err := v.usernamespace(config); err != nil {
return err
}
if err := v.cgroupnamespace(config); err != nil {
return err
}
if err := v.sysctl(config); err != nil {
return err
}
if err := v.intelrdt(config); err != nil {
return err
}
if config.Rootless {
if err := v.rootless(config); err != nil {
if config.RootlessEUID {
if err := v.rootlessEUID(config); err != nil {
return err
}
}
@ -116,6 +119,15 @@ func (v *ConfigValidator) usernamespace(config *configs.Config) error {
return nil
}
func (v *ConfigValidator) cgroupnamespace(config *configs.Config) error {
if config.Namespaces.Contains(configs.NEWCGROUP) {
if _, err := os.Stat("/proc/self/ns/cgroup"); os.IsNotExist(err) {
return fmt.Errorf("cgroup namespaces aren't enabled in the kernel")
}
}
return nil
}
// sysctl validates that the specified sysctl keys are valid or not.
// /proc/sys isn't completely namespaced and depending on which namespaces
// are specified, a subset of sysctls are permitted.
@ -169,11 +181,22 @@ func (v *ConfigValidator) sysctl(config *configs.Config) error {
func (v *ConfigValidator) intelrdt(config *configs.Config) error {
if config.IntelRdt != nil {
if !intelrdt.IsEnabled() {
return fmt.Errorf("intelRdt is specified in config, but Intel RDT feature is not supported or enabled")
if !intelrdt.IsCatEnabled() && !intelrdt.IsMbaEnabled() {
return fmt.Errorf("intelRdt is specified in config, but Intel RDT is not supported or enabled")
}
if config.IntelRdt.L3CacheSchema == "" {
return fmt.Errorf("intelRdt is specified in config, but intelRdt.l3CacheSchema is empty")
if !intelrdt.IsCatEnabled() && config.IntelRdt.L3CacheSchema != "" {
return fmt.Errorf("intelRdt.l3CacheSchema is specified in config, but Intel RDT/CAT is not enabled")
}
if !intelrdt.IsMbaEnabled() && config.IntelRdt.MemBwSchema != "" {
return fmt.Errorf("intelRdt.memBwSchema is specified in config, but Intel RDT/MBA is not enabled")
}
if intelrdt.IsCatEnabled() && config.IntelRdt.L3CacheSchema == "" {
return fmt.Errorf("Intel RDT/CAT is enabled and intelRdt is specified in config, but intelRdt.l3CacheSchema is empty")
}
if intelrdt.IsMbaEnabled() && config.IntelRdt.MemBwSchema == "" {
return fmt.Errorf("Intel RDT/MBA is enabled and intelRdt is specified in config, but intelRdt.memBwSchema is empty")
}
}

View File

@ -9,6 +9,7 @@ import (
"time"
"github.com/opencontainers/runc/libcontainer/configs"
"github.com/opencontainers/runtime-spec/specs-go"
)
// Status is the status of a container.
@ -85,6 +86,12 @@ type BaseContainer interface {
// SystemError - System error.
State() (*State, error)
// OCIState returns the current container's state information.
//
// errors:
// SystemError - System error.
OCIState() (*specs.State, error)
// Returns the current config of the container.
Config() configs.Config

View File

@ -25,6 +25,7 @@ import (
"github.com/opencontainers/runc/libcontainer/intelrdt"
"github.com/opencontainers/runc/libcontainer/system"
"github.com/opencontainers/runc/libcontainer/utils"
"github.com/opencontainers/runtime-spec/specs-go"
"github.com/golang/protobuf/proto"
"github.com/sirupsen/logrus"
@ -59,7 +60,8 @@ type State struct {
// Platform specific fields below here
// Specifies if the container was started under the rootless mode.
// Specified if the container was started under the rootless mode.
// Set to true if BaseState.Config.RootlessEUID && BaseState.Config.RootlessCgroups
Rootless bool `json:"rootless"`
// Path to all the cgroups setup for a container. Key is cgroup subsystem name
@ -155,6 +157,12 @@ func (c *linuxContainer) State() (*State, error) {
return c.currentState()
}
func (c *linuxContainer) OCIState() (*specs.State, error) {
c.m.Lock()
defer c.m.Unlock()
return c.currentOCIState()
}
func (c *linuxContainer) Processes() ([]int, error) {
pids, err := c.cgroupManager.GetAllPids()
if err != nil {
@ -348,13 +356,9 @@ func (c *linuxContainer) start(process *Process) error {
c.initProcessStartTime = state.InitProcessStartTime
if c.config.Hooks != nil {
bundle, annotations := utils.Annotations(c.config.Labels)
s := configs.HookState{
Version: c.config.Version,
ID: c.id,
Pid: parent.pid(),
Bundle: bundle,
Annotations: annotations,
s, err := c.currentOCIState()
if err != nil {
return err
}
for i, hook := range c.config.Hooks.Poststart {
if err := hook.Run(s); err != nil {
@ -373,10 +377,18 @@ func (c *linuxContainer) Signal(s os.Signal, all bool) error {
if all {
return signalAllProcesses(c.cgroupManager, s)
}
if err := c.initProcess.signal(s); err != nil {
return newSystemErrorWithCause(err, "signaling init process")
status, err := c.currentStatus()
if err != nil {
return err
}
return nil
// to avoid a PID reuse attack
if status == Running || status == Created || status == Paused {
if err := c.initProcess.signal(s); err != nil {
return newSystemErrorWithCause(err, "signaling init process")
}
return nil
}
return newGenericError(fmt.Errorf("container not running"), ContainerNotRunning)
}
func (c *linuxContainer) createExecFifo() error {
@ -399,10 +411,7 @@ func (c *linuxContainer) createExecFifo() error {
return err
}
unix.Umask(oldMask)
if err := os.Chown(fifoName, rootuid, rootgid); err != nil {
return err
}
return nil
return os.Chown(fifoName, rootuid, rootgid)
}
func (c *linuxContainer) deleteExecFifo() {
@ -494,7 +503,7 @@ func (c *linuxContainer) newInitProcess(p *Process, cmd *exec.Cmd, parentPipe, c
if err != nil {
return nil, err
}
return &initProcess{
init := &initProcess{
cmd: cmd,
childPipe: childPipe,
parentPipe: parentPipe,
@ -505,7 +514,9 @@ func (c *linuxContainer) newInitProcess(p *Process, cmd *exec.Cmd, parentPipe, c
process: p,
bootstrapData: data,
sharePidns: sharePidns,
}, nil
}
c.initProcess = init
return init, nil
}
func (c *linuxContainer) newSetnsProcess(p *Process, cmd *exec.Cmd, parentPipe, childPipe *os.File) (*setnsProcess, error) {
@ -521,14 +532,15 @@ func (c *linuxContainer) newSetnsProcess(p *Process, cmd *exec.Cmd, parentPipe,
return nil, err
}
return &setnsProcess{
cmd: cmd,
cgroupPaths: c.cgroupManager.GetPaths(),
intelRdtPath: state.IntelRdtPath,
childPipe: childPipe,
parentPipe: parentPipe,
config: c.newInitConfig(p),
process: p,
bootstrapData: data,
cmd: cmd,
cgroupPaths: c.cgroupManager.GetPaths(),
rootlessCgroups: c.config.RootlessCgroups,
intelRdtPath: state.IntelRdtPath,
childPipe: childPipe,
parentPipe: parentPipe,
config: c.newInitConfig(p),
process: p,
bootstrapData: data,
}, nil
}
@ -544,7 +556,8 @@ func (c *linuxContainer) newInitConfig(process *Process) *initConfig {
PassedFilesCount: len(process.ExtraFiles),
ContainerId: c.ID(),
NoNewPrivileges: c.config.NoNewPrivileges,
Rootless: c.config.Rootless,
RootlessEUID: c.config.RootlessEUID,
RootlessCgroups: c.config.RootlessCgroups,
AppArmorProfile: c.config.AppArmorProfile,
ProcessLabel: c.config.ProcessLabel,
Rlimits: c.config.Rlimits,
@ -612,16 +625,16 @@ func (c *linuxContainer) Resume() error {
func (c *linuxContainer) NotifyOOM() (<-chan struct{}, error) {
// XXX(cyphar): This requires cgroups.
if c.config.Rootless {
return nil, fmt.Errorf("cannot get OOM notifications from rootless container")
if c.config.RootlessCgroups {
logrus.Warn("getting OOM notifications may fail if you don't have the full access to cgroups")
}
return notifyOnOOM(c.cgroupManager.GetPaths())
}
func (c *linuxContainer) NotifyMemoryPressure(level PressureLevel) (<-chan struct{}, error) {
// XXX(cyphar): This requires cgroups.
if c.config.Rootless {
return nil, fmt.Errorf("cannot get memory pressure notifications from rootless container")
if c.config.RootlessCgroups {
logrus.Warn("getting memory pressure notifications may fail if you don't have the full access to cgroups")
}
return notifyMemoryPressure(c.cgroupManager.GetPaths(), level)
}
@ -861,16 +874,41 @@ func waitForCriuLazyServer(r *os.File, status string) error {
return nil
}
func (c *linuxContainer) handleCriuConfigurationFile(rpcOpts *criurpc.CriuOpts) {
// CRIU will evaluate a configuration starting with release 3.11.
// Settings in the configuration file will overwrite RPC settings.
// Look for annotations. The annotation 'org.criu.config'
// specifies if CRIU should use a different, container specific
// configuration file.
_, annotations := utils.Annotations(c.config.Labels)
configFile, exists := annotations["org.criu.config"]
if exists {
// If the annotation 'org.criu.config' exists and is set
// to a non-empty string, tell CRIU to use that as a
// configuration file. If the file does not exist, CRIU
// will just ignore it.
if configFile != "" {
rpcOpts.ConfigFile = proto.String(configFile)
}
// If 'org.criu.config' exists and is set to an empty
// string, a runc specific CRIU configuration file will
// be not set at all.
} else {
// If the mentioned annotation has not been found, specify
// a default CRIU configuration file.
rpcOpts.ConfigFile = proto.String("/etc/criu/runc.conf")
}
}
func (c *linuxContainer) Checkpoint(criuOpts *CriuOpts) error {
c.m.Lock()
defer c.m.Unlock()
// Checkpoint is unlikely to work if os.Geteuid() != 0 || system.RunningInUserNS().
// (CLI prints a warning)
// TODO(avagin): Figure out how to make this work nicely. CRIU 2.0 has
// support for doing unprivileged dumps, but the setup of
// rootless containers might make this complicated.
if c.config.Rootless {
return fmt.Errorf("cannot checkpoint a rootless container")
}
// criu 1.5.2 => 10502
if err := c.checkCriuVersion(10502); err != nil {
@ -927,6 +965,8 @@ func (c *linuxContainer) Checkpoint(criuOpts *CriuOpts) error {
LazyPages: proto.Bool(criuOpts.LazyPages),
}
c.handleCriuConfigurationFile(&rpcOpts)
// If the container is running in a network namespace and has
// a path to the network namespace configured, we will dump
// that network namespace as an external namespace and we
@ -1104,11 +1144,10 @@ func (c *linuxContainer) Restore(process *Process, criuOpts *CriuOpts) error {
var extraFiles []*os.File
// Restore is unlikely to work if os.Geteuid() != 0 || system.RunningInUserNS().
// (CLI prints a warning)
// TODO(avagin): Figure out how to make this work nicely. CRIU doesn't have
// support for unprivileged restore at the moment.
if c.config.Rootless {
return fmt.Errorf("cannot restore a rootless container")
}
// criu 1.5.2 => 10502
if err := c.checkCriuVersion(10502); err != nil {
@ -1178,6 +1217,8 @@ func (c *linuxContainer) Restore(process *Process, criuOpts *CriuOpts) error {
},
}
c.handleCriuConfigurationFile(req.Opts)
// Same as during checkpointing. If the container has a specific network namespace
// assigned to it, this now expects that the checkpoint will be restored in a
// already created network namespace.
@ -1196,7 +1237,7 @@ func (c *linuxContainer) Restore(process *Process, criuOpts *CriuOpts) error {
netns, err := os.Open(nsPath)
defer netns.Close()
if err != nil {
logrus.Error("If a specific network namespace is defined it must exist: %s", err)
logrus.Errorf("If a specific network namespace is defined it must exist: %s", err)
return fmt.Errorf("Requested network namespace %v does not exist", nsPath)
}
inheritFd := new(criurpc.InheritFd)
@ -1538,14 +1579,11 @@ func (c *linuxContainer) criuNotifications(resp *criurpc.CriuResp, process *Proc
}
case notify.GetScript() == "setup-namespaces":
if c.config.Hooks != nil {
bundle, annotations := utils.Annotations(c.config.Labels)
s := configs.HookState{
Version: c.config.Version,
ID: c.id,
Pid: int(notify.GetPid()),
Bundle: bundle,
Annotations: annotations,
s, err := c.currentOCIState()
if err != nil {
return nil
}
s.Pid = int(notify.GetPid())
for i, hook := range c.config.Hooks.Prestart {
if err := hook.Run(s); err != nil {
return newSystemErrorWithCausef(err, "running prestart hook %d", i)
@ -1716,7 +1754,7 @@ func (c *linuxContainer) currentState() (*State, error) {
InitProcessStartTime: startTime,
Created: c.created,
},
Rootless: c.config.Rootless,
Rootless: c.config.RootlessEUID && c.config.RootlessCgroups,
CgroupPaths: c.cgroupManager.GetPaths(),
IntelRdtPath: intelRdtPath,
NamespacePaths: make(map[configs.NamespaceType]string),
@ -1739,11 +1777,31 @@ func (c *linuxContainer) currentState() (*State, error) {
return state, nil
}
func (c *linuxContainer) currentOCIState() (*specs.State, error) {
bundle, annotations := utils.Annotations(c.config.Labels)
state := &specs.State{
Version: specs.Version,
ID: c.ID(),
Bundle: bundle,
Annotations: annotations,
}
status, err := c.currentStatus()
if err != nil {
return nil, err
}
state.Status = status.String()
if status != Stopped {
if c.initProcess != nil {
state.Pid = c.initProcess.pid()
}
}
return state, nil
}
// orderNamespacePaths sorts namespace paths into a list of paths that we
// can setns in order.
func (c *linuxContainer) orderNamespacePaths(namespaces map[configs.NamespaceType]string) ([]string, error) {
paths := []string{}
for _, ns := range configs.NamespaceTypes() {
// Remove namespaces that we don't need to join.
@ -1817,7 +1875,7 @@ func (c *linuxContainer) bootstrapData(cloneFlags uintptr, nsMaps map[configs.Na
if !joinExistingUser {
// write uid mappings
if len(c.config.UidMappings) > 0 {
if c.config.Rootless && c.newuidmapPath != "" {
if c.config.RootlessEUID && c.newuidmapPath != "" {
r.AddData(&Bytemsg{
Type: UidmapPathAttr,
Value: []byte(c.newuidmapPath),
@ -1843,7 +1901,7 @@ func (c *linuxContainer) bootstrapData(cloneFlags uintptr, nsMaps map[configs.Na
Type: GidmapAttr,
Value: b,
})
if c.config.Rootless && c.newgidmapPath != "" {
if c.config.RootlessEUID && c.newgidmapPath != "" {
r.AddData(&Bytemsg{
Type: GidmapPathAttr,
Value: []byte(c.newgidmapPath),
@ -1868,8 +1926,8 @@ func (c *linuxContainer) bootstrapData(cloneFlags uintptr, nsMaps map[configs.Na
// write rootless
r.AddData(&Boolmsg{
Type: RootlessAttr,
Value: c.config.Rootless,
Type: RootlessEUIDAttr,
Value: c.config.RootlessEUID,
})
return bytes.NewReader(r.Serialize()), nil

View File

@ -1,6 +1,5 @@
// Code generated by protoc-gen-go.
// Code generated by protoc-gen-go. DO NOT EDIT.
// source: criurpc.proto
// DO NOT EDIT!
/*
Package criurpc is a generated protocol buffer package.
@ -393,6 +392,7 @@ type CriuOpts struct {
LazyPages *bool `protobuf:"varint,48,opt,name=lazy_pages,json=lazyPages" json:"lazy_pages,omitempty"`
StatusFd *int32 `protobuf:"varint,49,opt,name=status_fd,json=statusFd" json:"status_fd,omitempty"`
OrphanPtsMaster *bool `protobuf:"varint,50,opt,name=orphan_pts_master,json=orphanPtsMaster" json:"orphan_pts_master,omitempty"`
ConfigFile *string `protobuf:"bytes,51,opt,name=config_file,json=configFile" json:"config_file,omitempty"`
XXX_unrecognized []byte `json:"-"`
}
@ -748,6 +748,13 @@ func (m *CriuOpts) GetOrphanPtsMaster() bool {
return false
}
func (m *CriuOpts) GetConfigFile() string {
if m != nil && m.ConfigFile != nil {
return *m.ConfigFile
}
return ""
}
type CriuDumpResp struct {
Restored *bool `protobuf:"varint,1,opt,name=restored" json:"restored,omitempty"`
XXX_unrecognized []byte `json:"-"`
@ -1062,117 +1069,118 @@ func init() {
func init() { proto.RegisterFile("criurpc.proto", fileDescriptor0) }
var fileDescriptor0 = []byte{
// 1781 bytes of a gzipped FileDescriptorProto
// 1795 bytes of a gzipped FileDescriptorProto
0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0xff, 0x8c, 0x56, 0xdd, 0x72, 0x5b, 0xb7,
0x11, 0x0e, 0x29, 0xfe, 0x1c, 0x82, 0x3f, 0xa6, 0x10, 0xdb, 0x81, 0x93, 0xda, 0x62, 0xe8, 0x28,
0x51, 0x15, 0x97, 0x4d, 0x58, 0x3b, 0xae, 0x33, 0xed, 0x85, 0x47, 0x22, 0x5d, 0x36, 0x92, 0xc8,
0x01, 0x25, 0xcf, 0xe4, 0x0a, 0x73, 0x74, 0x0e, 0x48, 0xc1, 0x3c, 0x7f, 0x05, 0x40, 0x45, 0xf2,
0x83, 0xf4, 0x29, 0xfa, 0x0c, 0x7d, 0x84, 0xbe, 0x4e, 0x6f, 0x3b, 0xbb, 0x00, 0x65, 0x29, 0xc9,
0xb4, 0xbd, 0xc3, 0x7e, 0x58, 0x00, 0xbb, 0xfb, 0xed, 0x0f, 0x48, 0x3b, 0xd2, 0x6a, 0xad, 0x8b,
0x68, 0x50, 0xe8, 0xdc, 0xe6, 0xfd, 0x25, 0x79, 0x00, 0x80, 0x28, 0xc2, 0xa5, 0x14, 0x46, 0xea,
0x4b, 0xa9, 0x85, 0xca, 0x16, 0x39, 0x65, 0xa4, 0x1e, 0xc6, 0xb1, 0x96, 0xc6, 0xb0, 0x52, 0xaf,
0xb4, 0xd7, 0xe0, 0x1b, 0x91, 0x52, 0x52, 0x29, 0x72, 0x6d, 0x59, 0xb9, 0x57, 0xda, 0xab, 0x72,
0x5c, 0xd3, 0x2e, 0xd9, 0x2a, 0x54, 0xcc, 0xb6, 0x10, 0x82, 0x25, 0xed, 0x90, 0xf2, 0x22, 0x66,
0x15, 0x04, 0xca, 0x8b, 0xb8, 0xff, 0x27, 0xd2, 0xc1, 0x87, 0x2e, 0xa5, 0xbd, 0x10, 0x45, 0xa8,
0x34, 0xfd, 0x98, 0x54, 0xd5, 0x42, 0xa8, 0x8c, 0x95, 0x7a, 0xe5, 0xbd, 0x06, 0xaf, 0xa8, 0xc5,
0x24, 0xa3, 0x0f, 0x48, 0x4d, 0x2d, 0x44, 0xbe, 0x86, 0xeb, 0x01, 0xad, 0xaa, 0xc5, 0x74, 0x6d,
0xfb, 0x7f, 0x20, 0x6d, 0x79, 0x65, 0x45, 0x9a, 0xaf, 0x33, 0x2b, 0xd2, 0xb0, 0x80, 0x07, 0x57,
0xf2, 0xda, 0x1f, 0x85, 0x25, 0x20, 0x97, 0x61, 0xe2, 0x8f, 0xc1, 0xb2, 0xff, 0x96, 0x74, 0xde,
0xe5, 0x2a, 0x13, 0x59, 0x98, 0x4a, 0x53, 0x84, 0x91, 0x04, 0xa3, 0x32, 0xe3, 0x0f, 0x95, 0x33,
0x43, 0x3f, 0x21, 0xf5, 0xcc, 0x88, 0x85, 0x4a, 0xa4, 0x3f, 0x57, 0xcb, 0xcc, 0x58, 0x25, 0x92,
0x7e, 0x46, 0x1a, 0xf2, 0xca, 0xea, 0x50, 0xe4, 0x85, 0x45, 0xaf, 0x1a, 0x3c, 0x40, 0x60, 0x5a,
0xd8, 0xfe, 0x80, 0x10, 0x95, 0x5d, 0x48, 0xad, 0xac, 0x58, 0xc4, 0xbf, 0x62, 0x89, 0x73, 0x1d,
0x2e, 0x74, 0xae, 0xbf, 0x20, 0xcd, 0x68, 0xa9, 0xf3, 0x75, 0x21, 0x74, 0x9e, 0x5b, 0x88, 0x5f,
0x64, 0x75, 0xe2, 0xc3, 0x8a, 0x6b, 0x8c, 0x69, 0x68, 0x2f, 0xbc, 0x15, 0xb8, 0xee, 0xef, 0x90,
0xfa, 0x3a, 0x53, 0x57, 0xc2, 0xac, 0xe8, 0x7d, 0x52, 0x55, 0x59, 0x1e, 0x4b, 0x7c, 0xa5, 0xcd,
0x9d, 0xd0, 0xff, 0x57, 0x9b, 0x34, 0x30, 0xa6, 0x79, 0x61, 0x0d, 0xed, 0x93, 0xb6, 0x4a, 0xc3,
0xa5, 0x34, 0x22, 0x56, 0x5a, 0x2c, 0x62, 0xd4, 0xad, 0xf2, 0xa6, 0x03, 0x0f, 0x95, 0x1e, 0xc7,
0x1b, 0x9a, 0xca, 0x1f, 0x68, 0x7a, 0x4a, 0xda, 0x89, 0x0c, 0x2f, 0xa5, 0xd0, 0xeb, 0x2c, 0x53,
0xd9, 0x12, 0x9d, 0x0d, 0x78, 0x0b, 0x41, 0xee, 0x30, 0xfa, 0x84, 0x34, 0x21, 0xfa, 0xde, 0x1a,
0x24, 0x35, 0xe0, 0x10, 0xa0, 0xb3, 0x4c, 0x5d, 0xcd, 0x57, 0xf4, 0x2b, 0x72, 0xcf, 0x46, 0x85,
0x90, 0xc6, 0x86, 0xe7, 0x89, 0x32, 0x17, 0x32, 0x66, 0x55, 0xd4, 0xe9, 0xd8, 0xa8, 0x18, 0x7d,
0x40, 0x41, 0x51, 0x5e, 0x86, 0x46, 0x5d, 0x4a, 0x11, 0xcb, 0x4b, 0x15, 0x49, 0xc3, 0x6a, 0x4e,
0xd1, 0xc3, 0x87, 0x0e, 0x85, 0xf8, 0x9b, 0x0b, 0x99, 0x24, 0xe2, 0x5d, 0x7e, 0xce, 0xea, 0xa8,
0x12, 0x20, 0xf0, 0xd7, 0xfc, 0x9c, 0x3e, 0x26, 0x04, 0x28, 0x13, 0x49, 0x1e, 0xad, 0x0c, 0x0b,
0x9c, 0x35, 0x80, 0x1c, 0x01, 0x40, 0x9f, 0x90, 0x46, 0x92, 0x2f, 0x45, 0x22, 0x2f, 0x65, 0xc2,
0x1a, 0xe0, 0xea, 0xf7, 0xa5, 0x21, 0x0f, 0x92, 0x7c, 0x79, 0x04, 0x10, 0x7d, 0x44, 0x60, 0xed,
0x58, 0x27, 0x2e, 0xb5, 0x93, 0x7c, 0x89, 0xb4, 0x7f, 0x49, 0xca, 0x85, 0x61, 0xcd, 0x5e, 0x69,
0xaf, 0x39, 0x7c, 0x38, 0xf8, 0xd5, 0xc2, 0xe0, 0xe5, 0xc2, 0xd0, 0x5d, 0xd2, 0xc9, 0x72, 0xab,
0x16, 0xd7, 0xc2, 0x44, 0x5a, 0x15, 0xd6, 0xb0, 0x16, 0x5a, 0xd1, 0x76, 0xe8, 0xdc, 0x81, 0xc0,
0x2a, 0x30, 0xce, 0xda, 0x8e, 0x69, 0x64, 0xff, 0x31, 0x21, 0x45, 0xa8, 0x65, 0x66, 0x85, 0x4a,
0x97, 0xac, 0x83, 0x3b, 0x0d, 0x87, 0x4c, 0xd2, 0x25, 0x38, 0x6e, 0x75, 0x18, 0xad, 0x44, 0x2a,
0x53, 0x76, 0xcf, 0x39, 0x8e, 0xc0, 0xb1, 0x4c, 0xe1, 0x6c, 0xb8, 0xb6, 0xb9, 0x88, 0x65, 0xbc,
0x2e, 0x58, 0xd7, 0x39, 0x0e, 0xc8, 0x21, 0x00, 0x40, 0xd3, 0x4f, 0xb9, 0x5e, 0x6d, 0xf8, 0xdf,
0x46, 0x96, 0x1b, 0x00, 0x39, 0xf6, 0x1f, 0x13, 0x92, 0xa8, 0x6c, 0x25, 0xb4, 0x4c, 0xc3, 0x82,
0x51, 0x77, 0x1c, 0x10, 0x0e, 0x00, 0xdd, 0x25, 0x55, 0x28, 0x4e, 0xc3, 0x3e, 0xee, 0x6d, 0xed,
0x35, 0x87, 0xf7, 0x06, 0x77, 0xeb, 0x95, 0xbb, 0x5d, 0xfa, 0x94, 0xd4, 0xa3, 0x62, 0x2d, 0xa2,
0xb0, 0x60, 0xf7, 0x7b, 0xa5, 0xbd, 0xf6, 0xf7, 0xe4, 0xf9, 0xf0, 0xd5, 0xf3, 0x57, 0xdf, 0xbd,
0x1c, 0xbe, 0x7a, 0xc1, 0x6b, 0x51, 0xb1, 0x3e, 0x08, 0x0b, 0xba, 0x43, 0x9a, 0x8b, 0x5c, 0x47,
0x52, 0x28, 0x0d, 0x6f, 0x3d, 0xc0, 0xb7, 0x08, 0x42, 0x13, 0x40, 0x80, 0x04, 0x79, 0x25, 0x23,
0x11, 0xa5, 0x31, 0x7b, 0xd8, 0xdb, 0x02, 0x12, 0x40, 0x3e, 0x48, 0x21, 0x49, 0xea, 0x58, 0xeb,
0x99, 0x65, 0x9f, 0xa0, 0x25, 0x9d, 0xc1, 0x9d, 0xda, 0xe7, 0x35, 0x79, 0x65, 0x8f, 0x33, 0x0b,
0x2c, 0xa4, 0x61, 0x06, 0xfc, 0xb8, 0xf2, 0x32, 0x8c, 0x39, 0x16, 0x1c, 0x7a, 0xe0, 0x40, 0xba,
0x4b, 0xea, 0xd1, 0x12, 0x4b, 0x8f, 0x3d, 0xc2, 0xfb, 0x5a, 0x83, 0x5b, 0xe5, 0xc8, 0x6b, 0xd1,
0x92, 0x03, 0x31, 0x3b, 0xa4, 0xa9, 0x8d, 0x15, 0x46, 0x9d, 0x27, 0x50, 0x07, 0x9f, 0x3a, 0x93,
0xb5, 0xb1, 0x73, 0x87, 0xd0, 0xfd, 0xdb, 0x65, 0xcf, 0x3e, 0xc3, 0xab, 0x9a, 0x83, 0x0f, 0x10,
0x6f, 0xf8, 0xf5, 0x38, 0xa6, 0x3d, 0xd2, 0x42, 0xa6, 0x36, 0x8e, 0xfc, 0xc6, 0xdd, 0x06, 0xd8,
0xc8, 0x19, 0xbf, 0xe3, 0x6a, 0xca, 0x5c, 0x84, 0x1a, 0x9e, 0x7b, 0xec, 0x14, 0xe4, 0x95, 0x9d,
0x3b, 0x64, 0xa3, 0x90, 0x86, 0xc6, 0x4a, 0x6d, 0xd8, 0x93, 0x1b, 0x85, 0x63, 0x87, 0x40, 0x08,
0xcd, 0x4a, 0x15, 0x78, 0xff, 0x8e, 0x0b, 0x21, 0xc8, 0x70, 0x39, 0xb4, 0xaf, 0x2c, 0x3c, 0x4f,
0xa4, 0x58, 0x18, 0xd6, 0xc3, 0xbd, 0xc0, 0x01, 0x63, 0x43, 0xf7, 0x48, 0xd3, 0x57, 0xb2, 0x50,
0x59, 0xce, 0x3e, 0x47, 0x47, 0x82, 0x81, 0xc7, 0x78, 0x63, 0x8d, 0x45, 0x3d, 0xc9, 0x72, 0xfa,
0x67, 0xf2, 0xf1, 0xdd, 0x00, 0x8b, 0x14, 0x9a, 0x50, 0xbf, 0x57, 0xda, 0xeb, 0x0c, 0xdb, 0x2e,
0x3f, 0xa2, 0x25, 0x82, 0x7c, 0xfb, 0x4e, 0xd0, 0x8f, 0xf3, 0x58, 0xc2, 0x43, 0xcb, 0x8b, 0xdc,
0x58, 0x91, 0xa8, 0x54, 0x59, 0xf6, 0x14, 0xb3, 0xa5, 0xfe, 0xed, 0x37, 0xcf, 0xff, 0xf8, 0xe2,
0xe5, 0x77, 0x9c, 0xe0, 0xde, 0x11, 0x6c, 0xd1, 0x3d, 0xd2, 0xc5, 0x44, 0x11, 0x26, 0x0a, 0x33,
0x01, 0xdd, 0xcf, 0xb0, 0x2f, 0xd0, 0xec, 0x0e, 0xe2, 0xf3, 0x28, 0xcc, 0x66, 0x80, 0xd2, 0x4f,
0x21, 0x6f, 0xac, 0xd4, 0x59, 0x98, 0xb0, 0x5d, 0xef, 0x98, 0x97, 0x31, 0xa7, 0xd2, 0xc2, 0x5e,
0x8b, 0xcc, 0xb0, 0x2f, 0xe1, 0x31, 0x5e, 0x47, 0xf9, 0x04, 0x7c, 0xae, 0xbb, 0x51, 0x60, 0xd8,
0x57, 0x3e, 0xbb, 0xef, 0x8e, 0x06, 0x5e, 0x03, 0xf9, 0xc4, 0xd0, 0xcf, 0x49, 0xcb, 0x67, 0x47,
0xa1, 0xf3, 0xc2, 0xb0, 0xdf, 0x62, 0x85, 0xfa, 0x06, 0x3e, 0x03, 0x88, 0xee, 0x93, 0xed, 0xdb,
0x2a, 0xae, 0x93, 0xec, 0xa3, 0xde, 0xbd, 0x5b, 0x7a, 0xd8, 0x51, 0x9e, 0x93, 0x87, 0x5e, 0x37,
0x5e, 0xa7, 0x85, 0x88, 0xf2, 0xcc, 0xea, 0x3c, 0x49, 0xa4, 0x66, 0x5f, 0xa3, 0xf5, 0xf7, 0xdd,
0xee, 0xe1, 0x3a, 0x2d, 0x0e, 0x6e, 0xf6, 0xa0, 0x2b, 0x2f, 0xb4, 0x94, 0xef, 0x37, 0x81, 0x67,
0xcf, 0xf0, 0xf6, 0x96, 0x03, 0x5d, 0x8c, 0x61, 0x42, 0x5b, 0x95, 0x4a, 0x98, 0x95, 0xbf, 0x73,
0xde, 0x7a, 0x91, 0x7e, 0x4d, 0x28, 0xf4, 0x63, 0xcc, 0x0e, 0x95, 0x89, 0x45, 0xa2, 0x96, 0x17,
0x96, 0x0d, 0x30, 0x83, 0xa0, 0x53, 0xcf, 0x57, 0xaa, 0x98, 0x64, 0x63, 0x84, 0xc1, 0xe1, 0x9f,
0x64, 0xb8, 0x12, 0xe6, 0xda, 0x44, 0x36, 0x31, 0xec, 0xf7, 0xa8, 0xd6, 0x04, 0x6c, 0xee, 0x20,
0x6c, 0x1c, 0xe1, 0xfb, 0x6b, 0xec, 0x85, 0x86, 0x7d, 0xe3, 0x1b, 0x47, 0xf8, 0xfe, 0x7a, 0x06,
0x00, 0x36, 0x6b, 0x1b, 0xda, 0xb5, 0x81, 0xba, 0xf8, 0x16, 0xbb, 0x4e, 0xe0, 0x80, 0x71, 0x0c,
0xc1, 0xca, 0x75, 0x71, 0x01, 0xb4, 0x5a, 0xe3, 0xb3, 0x99, 0x0d, 0x9d, 0x29, 0x6e, 0x63, 0x66,
0x8d, 0x4b, 0xe9, 0xfe, 0x33, 0xff, 0x47, 0xc0, 0x50, 0x69, 0x69, 0x0a, 0xa0, 0x5b, 0x4b, 0x63,
0x73, 0x2d, 0x63, 0x9c, 0x97, 0x01, 0xbf, 0x91, 0xfb, 0xbb, 0x64, 0x1b, 0xb5, 0x3d, 0xe0, 0x0e,
0xf8, 0x09, 0xe7, 0x66, 0x1f, 0x2c, 0xfb, 0x2f, 0x49, 0x13, 0xd5, 0x5c, 0x6b, 0xa6, 0x0f, 0x49,
0xcd, 0xf5, 0x6c, 0x3f, 0x7f, 0xbd, 0xf4, 0xcb, 0xd1, 0xd8, 0xff, 0xc1, 0xfd, 0x95, 0xc4, 0x42,
0x86, 0x76, 0xad, 0x9d, 0x9f, 0xa9, 0x4c, 0x05, 0xb6, 0xe3, 0x8d, 0x35, 0xa9, 0x4c, 0x4f, 0x41,
0xfe, 0x59, 0x8c, 0xca, 0x3f, 0x8b, 0x51, 0xff, 0x9f, 0x25, 0x12, 0x78, 0x6b, 0xff, 0x46, 0xfb,
0xa4, 0x62, 0xaf, 0x0b, 0x37, 0xcd, 0x3b, 0xc3, 0xce, 0x60, 0xb3, 0x21, 0x00, 0xe5, 0xb8, 0x47,
0x9f, 0x90, 0x0a, 0x8c, 0x75, 0xbc, 0xa9, 0x39, 0x24, 0x83, 0x9b, 0x41, 0xcf, 0x11, 0xbf, 0x3d,
0x82, 0xd6, 0x51, 0x04, 0xdf, 0xb4, 0xad, 0x3b, 0x23, 0xc8, 0x81, 0x60, 0xf3, 0x4a, 0xca, 0x42,
0xe4, 0x85, 0xcc, 0xfc, 0xe0, 0x0e, 0x00, 0x98, 0x16, 0x32, 0xa3, 0xfb, 0x24, 0xd8, 0x38, 0x87,
0x03, 0xbb, 0xb9, 0xb1, 0x65, 0x83, 0xf2, 0x9b, 0xfd, 0xfe, 0xbf, 0xcb, 0xfe, 0xb3, 0x81, 0x61,
0xfe, 0x7f, 0x3c, 0x60, 0xa4, 0xbe, 0x31, 0x0d, 0xbe, 0x35, 0x01, 0xdf, 0x88, 0xf4, 0x29, 0xa9,
0x00, 0xc5, 0x68, 0xf1, 0xcd, 0xa0, 0xb9, 0x21, 0x9d, 0xe3, 0x26, 0x7d, 0x46, 0xea, 0x9e, 0x59,
0xb4, 0xbb, 0x39, 0xa4, 0x83, 0x5f, 0xd0, 0xcd, 0x37, 0x2a, 0xf4, 0x0b, 0x52, 0x73, 0x8e, 0x7b,
0x47, 0x5a, 0x83, 0x5b, 0xa4, 0x73, 0xbf, 0xe7, 0xe7, 0x7b, 0xed, 0x7f, 0xce, 0xf7, 0x47, 0x40,
0x96, 0x90, 0x5a, 0x67, 0x39, 0xfe, 0x3e, 0xaa, 0xbc, 0x1e, 0xe9, 0x11, 0x88, 0x77, 0x62, 0x16,
0xfc, 0xf7, 0x98, 0x41, 0xf0, 0xdd, 0x35, 0xa9, 0x59, 0xe2, 0x4f, 0xa4, 0xc1, 0x03, 0xbc, 0x27,
0x35, 0x4b, 0x18, 0x73, 0x97, 0x52, 0x1b, 0x95, 0x67, 0xf8, 0x0b, 0x69, 0x6e, 0x1a, 0xaa, 0x07,
0xf9, 0x66, 0xb7, 0xff, 0xf7, 0x12, 0x69, 0xdd, 0xde, 0x81, 0xdf, 0x60, 0x1a, 0xbe, 0xcb, 0xb5,
0xcf, 0x72, 0x27, 0x20, 0xaa, 0xb2, 0x5c, 0xfb, 0x8f, 0xa7, 0x13, 0x00, 0x5d, 0x2a, 0xeb, 0xbf,
0xe6, 0x0d, 0xee, 0x04, 0x28, 0x2b, 0xb3, 0x3e, 0x77, 0x3f, 0xa4, 0x8a, 0x2f, 0x58, 0x2f, 0xc3,
0x09, 0xfc, 0xe9, 0x62, 0x20, 0xab, 0xdc, 0x09, 0xf0, 0x95, 0x81, 0x5e, 0x89, 0xb1, 0x6b, 0x70,
0x5c, 0xef, 0x0b, 0x6f, 0x97, 0x1f, 0x01, 0x94, 0x90, 0xda, 0xe4, 0xcd, 0xc9, 0x94, 0x8f, 0xba,
0x1f, 0xd1, 0x26, 0xa9, 0x1f, 0xbc, 0x11, 0x27, 0xd3, 0x93, 0x51, 0xb7, 0x44, 0x1b, 0xa4, 0x3a,
0xe3, 0xd3, 0xd9, 0xbc, 0x5b, 0xa6, 0x01, 0xa9, 0xcc, 0xa7, 0xe3, 0xd3, 0xee, 0x16, 0xac, 0xc6,
0x67, 0x47, 0x47, 0xdd, 0x0a, 0x9c, 0x9b, 0x9f, 0xf2, 0xc9, 0xc1, 0x69, 0xb7, 0x0a, 0xe7, 0x0e,
0x47, 0xe3, 0xd7, 0x67, 0x47, 0xa7, 0xdd, 0xda, 0xfe, 0x3f, 0x4a, 0xbe, 0x04, 0x37, 0x99, 0x05,
0x37, 0x8d, 0x8e, 0x67, 0xa7, 0x3f, 0x76, 0x3f, 0x82, 0xf3, 0x87, 0x67, 0xc7, 0xb3, 0x6e, 0x09,
0xce, 0xf0, 0xd1, 0xfc, 0x14, 0x1e, 0x2e, 0x83, 0xc6, 0xc1, 0x5f, 0x46, 0x07, 0x3f, 0x74, 0xb7,
0x68, 0x8b, 0x04, 0x33, 0x3e, 0x12, 0xa8, 0x55, 0xa1, 0xf7, 0x48, 0x73, 0xf6, 0xfa, 0xcd, 0x48,
0xcc, 0x47, 0xfc, 0xed, 0x88, 0x77, 0xab, 0xf0, 0xec, 0xc9, 0xf4, 0x74, 0x32, 0xfe, 0xb1, 0x5b,
0xa3, 0x5d, 0xd2, 0x3a, 0x98, 0x9d, 0x4d, 0x4e, 0xc6, 0x53, 0xa7, 0x5e, 0xa7, 0xdb, 0xa4, 0xbd,
0x41, 0xdc, 0x7d, 0x01, 0x40, 0xe3, 0xd1, 0xeb, 0xd3, 0x33, 0x3e, 0xf2, 0x50, 0x03, 0x9e, 0x7e,
0x3b, 0xe2, 0xf3, 0xc9, 0xf4, 0xa4, 0x4b, 0xfe, 0x13, 0x00, 0x00, 0xff, 0xff, 0x5f, 0x2a, 0xaf,
0x49, 0x5b, 0x0d, 0x00, 0x00,
0x11, 0x0e, 0x29, 0xf1, 0x0f, 0xfc, 0x31, 0x0d, 0xff, 0x04, 0x4e, 0x6a, 0x9b, 0xa1, 0xe3, 0x44,
0x55, 0x5c, 0x36, 0x61, 0xec, 0xb8, 0xce, 0xb4, 0x17, 0x1e, 0x89, 0x74, 0xd9, 0x48, 0x22, 0x07,
0x94, 0x3c, 0x93, 0x2b, 0xcc, 0xd1, 0x39, 0x20, 0x05, 0xf3, 0x1c, 0x9c, 0x53, 0x00, 0x54, 0x24,
0x3f, 0x48, 0x9f, 0xa2, 0xcf, 0xd0, 0x57, 0xea, 0x65, 0x6f, 0x3b, 0xbb, 0x00, 0x65, 0x29, 0xc9,
0xb4, 0xb9, 0xc3, 0x7e, 0xbb, 0x00, 0xf6, 0x7f, 0x97, 0xb4, 0x63, 0xa3, 0xd6, 0xa6, 0x88, 0x07,
0x85, 0xc9, 0x5d, 0xde, 0x5f, 0x92, 0x7b, 0x00, 0x88, 0x22, 0x5a, 0x4a, 0x61, 0xa5, 0x39, 0x97,
0x46, 0x28, 0xbd, 0xc8, 0x29, 0x23, 0xb5, 0x28, 0x49, 0x8c, 0xb4, 0x96, 0x95, 0x7a, 0xa5, 0x9d,
0x06, 0xdf, 0x90, 0x94, 0x92, 0xed, 0x22, 0x37, 0x8e, 0x95, 0x7b, 0xa5, 0x9d, 0x0a, 0xc7, 0x33,
0xed, 0x92, 0xad, 0x42, 0x25, 0x6c, 0x0b, 0x21, 0x38, 0xd2, 0x0e, 0x29, 0x2f, 0x12, 0xb6, 0x8d,
0x40, 0x79, 0x91, 0xf4, 0xff, 0x4c, 0x3a, 0xf8, 0xd1, 0xb9, 0x74, 0x67, 0xa2, 0x88, 0x94, 0xa1,
0x77, 0x48, 0x45, 0x2d, 0x84, 0xd2, 0xac, 0xd4, 0x2b, 0xef, 0x34, 0xf8, 0xb6, 0x5a, 0x4c, 0x34,
0xbd, 0x47, 0xaa, 0x6a, 0x21, 0xf2, 0x35, 0x3c, 0x0f, 0x68, 0x45, 0x2d, 0xa6, 0x6b, 0xd7, 0xff,
0x96, 0xb4, 0xe5, 0x85, 0x13, 0x59, 0xbe, 0xd6, 0x4e, 0x64, 0x51, 0x01, 0x1f, 0xae, 0xe4, 0x65,
0xb8, 0x0a, 0x47, 0x40, 0xce, 0xa3, 0x34, 0x5c, 0x83, 0x63, 0xff, 0x2d, 0xe9, 0xbc, 0xcb, 0x95,
0x16, 0x3a, 0xca, 0xa4, 0x2d, 0xa2, 0x58, 0x82, 0x52, 0xda, 0x86, 0x4b, 0x65, 0x6d, 0xe9, 0xc7,
0xa4, 0xa6, 0xad, 0x58, 0xa8, 0x54, 0x86, 0x7b, 0x55, 0x6d, 0xc7, 0x2a, 0x95, 0xf4, 0x53, 0xd2,
0x90, 0x17, 0xce, 0x44, 0x22, 0x2f, 0x1c, 0x5a, 0xd5, 0xe0, 0x75, 0x04, 0xa6, 0x85, 0xeb, 0x0f,
0x08, 0x51, 0xfa, 0x4c, 0x1a, 0xe5, 0xc4, 0x22, 0xf9, 0x15, 0x4d, 0xbc, 0xe9, 0xf0, 0xa0, 0x37,
0xfd, 0x05, 0x69, 0xc6, 0x4b, 0x93, 0xaf, 0x0b, 0x61, 0xf2, 0xdc, 0x81, 0xff, 0x62, 0x67, 0xd2,
0xe0, 0x56, 0x3c, 0xa3, 0x4f, 0x23, 0x77, 0x16, 0xb4, 0xc0, 0x73, 0xff, 0x31, 0xa9, 0xad, 0xb5,
0xba, 0x10, 0x76, 0x45, 0xef, 0x92, 0x8a, 0xd2, 0x79, 0x22, 0xf1, 0x97, 0x36, 0xf7, 0x44, 0xff,
0xdf, 0x6d, 0xd2, 0x40, 0x9f, 0xe6, 0x85, 0xb3, 0xb4, 0x4f, 0xda, 0x2a, 0x8b, 0x96, 0xd2, 0x8a,
0x44, 0x19, 0xb1, 0x48, 0x50, 0xb6, 0xc2, 0x9b, 0x1e, 0xdc, 0x57, 0x66, 0x9c, 0x6c, 0xc2, 0x54,
0xfe, 0x10, 0xa6, 0x27, 0xa4, 0x9d, 0xca, 0xe8, 0x5c, 0x0a, 0xb3, 0xd6, 0x5a, 0xe9, 0x25, 0x1a,
0x5b, 0xe7, 0x2d, 0x04, 0xb9, 0xc7, 0xe8, 0x23, 0xd2, 0x04, 0xef, 0x07, 0x6d, 0x30, 0xa8, 0x75,
0x0e, 0x0e, 0x3a, 0xd1, 0xea, 0x62, 0xbe, 0xa2, 0x5f, 0x92, 0x5b, 0x2e, 0x2e, 0x84, 0xb4, 0x2e,
0x3a, 0x4d, 0x95, 0x3d, 0x93, 0x09, 0xab, 0xa0, 0x4c, 0xc7, 0xc5, 0xc5, 0xe8, 0x03, 0x0a, 0x82,
0xf2, 0x3c, 0xb2, 0xea, 0x5c, 0x8a, 0x44, 0x9e, 0xab, 0x58, 0x5a, 0x56, 0xf5, 0x82, 0x01, 0xde,
0xf7, 0x28, 0xf8, 0xdf, 0x9e, 0xc9, 0x34, 0x15, 0xef, 0xf2, 0x53, 0x56, 0x43, 0x91, 0x3a, 0x02,
0x7f, 0xcb, 0x4f, 0xe9, 0x43, 0x42, 0x20, 0x64, 0x22, 0xcd, 0xe3, 0x95, 0x65, 0x75, 0xaf, 0x0d,
0x20, 0x07, 0x00, 0xd0, 0x47, 0xa4, 0x91, 0xe6, 0x4b, 0x91, 0xca, 0x73, 0x99, 0xb2, 0x06, 0x98,
0xfa, 0x7d, 0x69, 0xc8, 0xeb, 0x69, 0xbe, 0x3c, 0x00, 0x88, 0x3e, 0x20, 0x70, 0xf6, 0x51, 0x27,
0x3e, 0xb5, 0xd3, 0x7c, 0x89, 0x61, 0xff, 0x82, 0x94, 0x0b, 0xcb, 0x9a, 0xbd, 0xd2, 0x4e, 0x73,
0x78, 0x7f, 0xf0, 0xab, 0x85, 0xc1, 0xcb, 0x85, 0xa5, 0x4f, 0x49, 0x47, 0xe7, 0x4e, 0x2d, 0x2e,
0x85, 0x8d, 0x8d, 0x2a, 0x9c, 0x65, 0x2d, 0xd4, 0xa2, 0xed, 0xd1, 0xb9, 0x07, 0x21, 0xaa, 0x10,
0x71, 0xd6, 0xf6, 0x91, 0xc6, 0xe8, 0x3f, 0x24, 0xa4, 0x88, 0x8c, 0xd4, 0x4e, 0xa8, 0x6c, 0xc9,
0x3a, 0xc8, 0x69, 0x78, 0x64, 0x92, 0x2d, 0xc1, 0x70, 0x67, 0xa2, 0x78, 0x25, 0x32, 0x99, 0xb1,
0x5b, 0xde, 0x70, 0x04, 0x0e, 0x65, 0x06, 0x77, 0xa3, 0xb5, 0xcb, 0x45, 0x22, 0x93, 0x75, 0xc1,
0xba, 0xde, 0x70, 0x40, 0xf6, 0x01, 0x80, 0x30, 0xfd, 0x94, 0x9b, 0xd5, 0x26, 0xfe, 0xb7, 0x31,
0xca, 0x0d, 0x80, 0x7c, 0xf4, 0x1f, 0x12, 0x92, 0x2a, 0xbd, 0x12, 0x46, 0x66, 0x51, 0xc1, 0xa8,
0xbf, 0x0e, 0x08, 0x07, 0x80, 0x3e, 0x25, 0x15, 0x28, 0x4e, 0xcb, 0xee, 0xf4, 0xb6, 0x76, 0x9a,
0xc3, 0x5b, 0x83, 0x9b, 0xf5, 0xca, 0x3d, 0x97, 0x3e, 0x21, 0xb5, 0xb8, 0x58, 0x8b, 0x38, 0x2a,
0xd8, 0xdd, 0x5e, 0x69, 0xa7, 0xfd, 0x3d, 0x79, 0x3e, 0x7c, 0xf5, 0xfc, 0xd5, 0x77, 0x2f, 0x87,
0xaf, 0x5e, 0xf0, 0x6a, 0x5c, 0xac, 0xf7, 0xa2, 0x82, 0x3e, 0x26, 0xcd, 0x45, 0x6e, 0x62, 0x29,
0x94, 0x81, 0xbf, 0xee, 0xe1, 0x5f, 0x04, 0xa1, 0x09, 0x20, 0x10, 0x04, 0x79, 0x21, 0x63, 0x11,
0x67, 0x09, 0xbb, 0xdf, 0xdb, 0x82, 0x20, 0x00, 0xbd, 0x97, 0x41, 0x92, 0xd4, 0xb0, 0xd6, 0xb5,
0x63, 0x1f, 0xa3, 0x26, 0x9d, 0xc1, 0x8d, 0xda, 0xe7, 0x55, 0x79, 0xe1, 0x0e, 0xb5, 0x83, 0x28,
0x64, 0x91, 0x86, 0xf8, 0xf8, 0xf2, 0xb2, 0x8c, 0xf9, 0x28, 0x78, 0x74, 0xcf, 0x83, 0xf4, 0x29,
0xa9, 0xc5, 0x4b, 0x2c, 0x3d, 0xf6, 0x00, 0xdf, 0x6b, 0x0d, 0xae, 0x95, 0x23, 0xaf, 0xc6, 0x4b,
0x0e, 0x81, 0x79, 0x4c, 0x9a, 0xc6, 0x3a, 0x61, 0xd5, 0x69, 0x0a, 0x75, 0xf0, 0x89, 0x57, 0xd9,
0x58, 0x37, 0xf7, 0x08, 0xdd, 0xbd, 0x5e, 0xf6, 0xec, 0x53, 0x7c, 0xaa, 0x39, 0xf8, 0x00, 0xf1,
0x46, 0x38, 0x8f, 0x13, 0xda, 0x23, 0x2d, 0x8c, 0xd4, 0xc6, 0x90, 0xdf, 0xf9, 0xd7, 0x00, 0x1b,
0x79, 0xe5, 0x1f, 0xfb, 0x9a, 0xb2, 0x67, 0x91, 0x81, 0xef, 0x1e, 0x7a, 0x01, 0x79, 0xe1, 0xe6,
0x1e, 0xd9, 0x08, 0x64, 0x91, 0x75, 0xd2, 0x58, 0xf6, 0xe8, 0x4a, 0xe0, 0xd0, 0x23, 0xe0, 0x42,
0xbb, 0x52, 0x05, 0xbe, 0xff, 0xd8, 0xbb, 0x10, 0x68, 0x78, 0x1c, 0xda, 0x97, 0x8e, 0x4e, 0x53,
0x29, 0x16, 0x96, 0xf5, 0x90, 0x57, 0xf7, 0xc0, 0xd8, 0xd2, 0x1d, 0xd2, 0x0c, 0x95, 0x2c, 0x94,
0xce, 0xd9, 0x67, 0x68, 0x48, 0x7d, 0x10, 0x30, 0xde, 0x58, 0x63, 0x51, 0x4f, 0x74, 0x4e, 0xff,
0x42, 0xee, 0xdc, 0x74, 0xb0, 0xc8, 0xa0, 0x09, 0xf5, 0x7b, 0xa5, 0x9d, 0xce, 0xb0, 0xed, 0xf3,
0x23, 0x5e, 0x22, 0xc8, 0x6f, 0xdf, 0x70, 0xfa, 0x61, 0x9e, 0x48, 0xf8, 0x68, 0x79, 0x96, 0x5b,
0x27, 0x52, 0x95, 0x29, 0xc7, 0x9e, 0x60, 0xb6, 0xd4, 0xbe, 0xf9, 0xfa, 0xf9, 0x9f, 0x5e, 0xbc,
0xfc, 0x8e, 0x13, 0xe4, 0x1d, 0x00, 0x8b, 0xee, 0x90, 0x2e, 0x26, 0x8a, 0xb0, 0x71, 0xa4, 0x05,
0x74, 0x3f, 0xcb, 0x3e, 0x47, 0xb5, 0x3b, 0x88, 0xcf, 0xe3, 0x48, 0xcf, 0x00, 0xa5, 0x9f, 0x40,
0xde, 0x38, 0x69, 0x74, 0x94, 0xb2, 0xa7, 0xc1, 0xb0, 0x40, 0x63, 0x4e, 0x65, 0x85, 0xbb, 0x14,
0xda, 0xb2, 0x2f, 0xe0, 0x33, 0x5e, 0x43, 0xfa, 0x08, 0x6c, 0xae, 0xf9, 0x51, 0x60, 0xd9, 0x97,
0x21, 0xbb, 0x6f, 0x8e, 0x06, 0x5e, 0x05, 0xfa, 0xc8, 0xd2, 0xcf, 0x48, 0x2b, 0x64, 0x47, 0x61,
0xf2, 0xc2, 0xb2, 0xdf, 0x63, 0x85, 0x86, 0x06, 0x3e, 0x03, 0x88, 0xee, 0x92, 0xdb, 0xd7, 0x45,
0x7c, 0x27, 0xd9, 0x45, 0xb9, 0x5b, 0xd7, 0xe4, 0xb0, 0xa3, 0x3c, 0x27, 0xf7, 0x83, 0x6c, 0xb2,
0xce, 0x0a, 0x11, 0xe7, 0xda, 0x99, 0x3c, 0x4d, 0xa5, 0x61, 0x5f, 0xa1, 0xf6, 0x77, 0x3d, 0x77,
0x7f, 0x9d, 0x15, 0x7b, 0x57, 0x3c, 0xe8, 0xca, 0x0b, 0x23, 0xe5, 0xfb, 0x8d, 0xe3, 0xd9, 0x33,
0x7c, 0xbd, 0xe5, 0x41, 0xef, 0x63, 0x98, 0xd0, 0x4e, 0x65, 0x12, 0x66, 0xe5, 0x1f, 0xbc, 0xb5,
0x81, 0xa4, 0x5f, 0x11, 0x0a, 0xfd, 0x18, 0xb3, 0x43, 0x69, 0xb1, 0x48, 0xd5, 0xf2, 0xcc, 0xb1,
0x01, 0x66, 0x10, 0x74, 0xea, 0xf9, 0x4a, 0x15, 0x13, 0x3d, 0x46, 0x18, 0x0c, 0xfe, 0x49, 0x46,
0x2b, 0x61, 0x2f, 0x6d, 0xec, 0x52, 0xcb, 0xfe, 0x88, 0x62, 0x4d, 0xc0, 0xe6, 0x1e, 0xc2, 0xc6,
0x11, 0xbd, 0xbf, 0xc4, 0x5e, 0x68, 0xd9, 0xd7, 0xa1, 0x71, 0x44, 0xef, 0x2f, 0x67, 0x00, 0x60,
0xb3, 0x76, 0x91, 0x5b, 0x5b, 0xa8, 0x8b, 0x6f, 0xb0, 0xeb, 0xd4, 0x3d, 0x30, 0x4e, 0xc0, 0x59,
0xb9, 0x29, 0xce, 0x20, 0xac, 0xce, 0x86, 0x6c, 0x66, 0x43, 0xaf, 0x8a, 0x67, 0xcc, 0x9c, 0xf5,
0x29, 0x0d, 0x29, 0x1f, 0xe7, 0x7a, 0xa1, 0x42, 0x73, 0xfe, 0x16, 0x8d, 0x26, 0x1e, 0x02, 0x6f,
0xf6, 0x9f, 0x85, 0x25, 0x02, 0x7d, 0x69, 0xa4, 0x2d, 0x20, 0x1f, 0x8c, 0xb4, 0x2e, 0x37, 0x32,
0xc1, 0x81, 0x5a, 0xe7, 0x57, 0x74, 0xff, 0x29, 0xb9, 0x8d, 0xd2, 0x01, 0xf0, 0x17, 0xc2, 0x08,
0xf4, 0xc3, 0x11, 0x8e, 0xfd, 0x97, 0xa4, 0x89, 0x62, 0xbe, 0x77, 0xd3, 0xfb, 0xa4, 0xea, 0x9b,
0x7a, 0x18, 0xd0, 0x81, 0xfa, 0xe5, 0xec, 0xec, 0xff, 0xe0, 0x97, 0x29, 0xb1, 0x90, 0x91, 0x5b,
0x1b, 0xef, 0x88, 0x4c, 0x66, 0x02, 0xfb, 0xf5, 0x46, 0x9b, 0x4c, 0x66, 0xc7, 0x40, 0xff, 0xcc,
0x89, 0xe5, 0x9f, 0x39, 0xb1, 0xff, 0xaf, 0x12, 0xa9, 0x07, 0x6d, 0xff, 0x4e, 0xfb, 0x64, 0xdb,
0x5d, 0x16, 0x7e, 0xdc, 0x77, 0x86, 0x9d, 0xc1, 0x86, 0x21, 0x00, 0xe5, 0xc8, 0xa3, 0x8f, 0xc8,
0x36, 0xcc, 0x7d, 0x7c, 0xa9, 0x39, 0x24, 0x83, 0xab, 0x4d, 0x80, 0x23, 0x7e, 0x7d, 0x46, 0xad,
0xe3, 0x18, 0xf6, 0xb8, 0xad, 0x1b, 0x33, 0xca, 0x83, 0xa0, 0xf3, 0x4a, 0xca, 0x42, 0xe4, 0x85,
0xd4, 0x61, 0xb2, 0xd7, 0x01, 0x98, 0x16, 0x52, 0xd3, 0x5d, 0x52, 0xdf, 0x18, 0x87, 0x13, 0xbd,
0xb9, 0xd1, 0x65, 0x83, 0xf2, 0x2b, 0x7e, 0xff, 0x3f, 0xe5, 0xb0, 0x8d, 0xa0, 0x9b, 0x7f, 0x8b,
0x05, 0x8c, 0xd4, 0x36, 0xaa, 0xc1, 0xde, 0x53, 0xe7, 0x1b, 0x92, 0x3e, 0x21, 0xdb, 0x10, 0x62,
0xd4, 0xf8, 0x6a, 0x12, 0x5d, 0x05, 0x9d, 0x23, 0x93, 0x3e, 0x23, 0xb5, 0x10, 0x59, 0xd4, 0xbb,
0x39, 0xa4, 0x83, 0x5f, 0x84, 0x9b, 0x6f, 0x44, 0xe8, 0xe7, 0xa4, 0xea, 0x0d, 0x0f, 0x86, 0xb4,
0x06, 0xd7, 0x82, 0xce, 0x03, 0x2f, 0x2c, 0x00, 0xd5, 0xff, 0xbb, 0x00, 0x3c, 0x80, 0x60, 0x09,
0x69, 0x8c, 0xce, 0x71, 0x3d, 0xa9, 0xf0, 0x5a, 0x6c, 0x46, 0x40, 0xde, 0xf0, 0x59, 0xfd, 0x7f,
0xfb, 0x0c, 0x9c, 0xef, 0x9f, 0xc9, 0xec, 0x12, 0x57, 0x95, 0x06, 0xaf, 0xe3, 0x3b, 0x99, 0x5d,
0xc2, 0x1c, 0x3c, 0x97, 0xc6, 0xaa, 0x5c, 0xe3, 0x9a, 0xd2, 0xdc, 0x74, 0xdc, 0x00, 0xf2, 0x0d,
0xb7, 0xff, 0x8f, 0x12, 0x69, 0x5d, 0xe7, 0xc0, 0xba, 0x98, 0x45, 0xef, 0x72, 0x13, 0xb2, 0xdc,
0x13, 0x88, 0x2a, 0x9d, 0x9b, 0xb0, 0x99, 0x7a, 0x02, 0xd0, 0xa5, 0x72, 0x61, 0x77, 0x6f, 0x70,
0x4f, 0x40, 0x59, 0xd9, 0xf5, 0xa9, 0x5f, 0xa1, 0xb6, 0x43, 0x45, 0x07, 0x1a, 0x6e, 0xe0, 0x2a,
0x8c, 0x8e, 0xac, 0x70, 0x4f, 0xc0, 0xae, 0x03, 0xcd, 0x14, 0x7d, 0xd7, 0xe0, 0x78, 0xde, 0x15,
0x41, 0xaf, 0x30, 0x23, 0x28, 0x21, 0xd5, 0xc9, 0x9b, 0xa3, 0x29, 0x1f, 0x75, 0x3f, 0xa2, 0x4d,
0x52, 0xdb, 0x7b, 0x23, 0x8e, 0xa6, 0x47, 0xa3, 0x6e, 0x89, 0x36, 0x48, 0x65, 0xc6, 0xa7, 0xb3,
0x79, 0xb7, 0x4c, 0xeb, 0x64, 0x7b, 0x3e, 0x1d, 0x1f, 0x77, 0xb7, 0xe0, 0x34, 0x3e, 0x39, 0x38,
0xe8, 0x6e, 0xc3, 0xbd, 0xf9, 0x31, 0x9f, 0xec, 0x1d, 0x77, 0x2b, 0x70, 0x6f, 0x7f, 0x34, 0x7e,
0x7d, 0x72, 0x70, 0xdc, 0xad, 0xee, 0xfe, 0xb3, 0x14, 0x4a, 0x70, 0x93, 0x59, 0xf0, 0xd2, 0xe8,
0x70, 0x76, 0xfc, 0x63, 0xf7, 0x23, 0xb8, 0xbf, 0x7f, 0x72, 0x38, 0xeb, 0x96, 0xe0, 0x0e, 0x1f,
0xcd, 0x8f, 0xe1, 0xe3, 0x32, 0x48, 0xec, 0xfd, 0x75, 0xb4, 0xf7, 0x43, 0x77, 0x8b, 0xb6, 0x48,
0x7d, 0xc6, 0x47, 0x02, 0xa5, 0xb6, 0xe9, 0x2d, 0xd2, 0x9c, 0xbd, 0x7e, 0x33, 0x12, 0xf3, 0x11,
0x7f, 0x3b, 0xe2, 0xdd, 0x0a, 0x7c, 0x7b, 0x34, 0x3d, 0x9e, 0x8c, 0x7f, 0xec, 0x56, 0x69, 0x97,
0xb4, 0xf6, 0x66, 0x27, 0x93, 0xa3, 0xf1, 0xd4, 0x8b, 0xd7, 0xe8, 0x6d, 0xd2, 0xde, 0x20, 0xfe,
0xbd, 0x3a, 0x40, 0xe3, 0xd1, 0xeb, 0xe3, 0x13, 0x3e, 0x0a, 0x50, 0x03, 0xbe, 0x7e, 0x3b, 0xe2,
0xf3, 0xc9, 0xf4, 0xa8, 0x4b, 0xfe, 0x1b, 0x00, 0x00, 0xff, 0xff, 0xc2, 0x38, 0x55, 0x41, 0x7c,
0x0d, 0x00, 0x00,
}

View File

@ -111,6 +111,7 @@ message criu_opts {
optional bool lazy_pages = 48;
optional int32 status_fd = 49;
optional bool orphan_pts_master = 50;
optional string config_file = 51;
}
message criu_dump_resp {

View File

@ -80,7 +80,7 @@ func Cgroupfs(l *LinuxFactory) error {
// containers that use the native cgroups filesystem implementation to create
// and manage cgroups. The difference between RootlessCgroupfs and Cgroupfs is
// that RootlessCgroupfs can transparently handle permission errors that occur
// during rootless container setup (while still allowing cgroup usage if
// during rootless container (including euid=0 in userns) setup (while still allowing cgroup usage if
// they've been set up properly).
func RootlessCgroupfs(l *LinuxFactory) error {
l.NewCgroupsManager = func(config *configs.Cgroup, paths map[string]string) cgroups.Manager {
@ -95,7 +95,7 @@ func RootlessCgroupfs(l *LinuxFactory) error {
// IntelRdtfs is an options func to configure a LinuxFactory to return
// containers that use the Intel RDT "resource control" filesystem to
// create and manage Intel Xeon platform shared resources (e.g., L3 cache).
// create and manage Intel RDT resources (e.g., L3 cache, memory bandwidth).
func IntelRdtFs(l *LinuxFactory) error {
l.NewIntelRdtManager = func(config *configs.Config, id string, path string) intelrdt.Manager {
return &intelrdt.IntelRdtManager{
@ -141,7 +141,7 @@ func New(root string, options ...func(*LinuxFactory) error) (Factory, error) {
l := &LinuxFactory{
Root: root,
InitPath: "/proc/self/exe",
InitArgs: []string{os.Args[0], "init"},
InitArgs: []string{"init"},
Validator: validate.New(),
CriuPath: "criu",
}
@ -225,7 +225,7 @@ func (l *LinuxFactory) Create(id string, config *configs.Config) (Container, err
newgidmapPath: l.NewgidmapPath,
cgroupManager: l.NewCgroupsManager(config.Cgroups, nil),
}
if intelrdt.IsEnabled() {
if intelrdt.IsCatEnabled() || intelrdt.IsMbaEnabled() {
c.intelRdtManager = l.NewIntelRdtManager(config, id, "")
}
c.state = &stoppedState{c: c}
@ -271,7 +271,7 @@ func (l *LinuxFactory) Load(id string) (Container, error) {
if err := c.refreshState(); err != nil {
return nil, err
}
if intelrdt.IsEnabled() {
if intelrdt.IsCatEnabled() || intelrdt.IsMbaEnabled() {
c.intelRdtManager = l.NewIntelRdtManager(&state.Config, id, state.IntelRdtPath)
}
return c, nil

View File

@ -6,6 +6,7 @@ import (
"encoding/json"
"fmt"
"io"
"io/ioutil"
"net"
"os"
"strings"
@ -65,7 +66,8 @@ type initConfig struct {
CreateConsole bool `json:"create_console"`
ConsoleWidth uint16 `json:"console_width"`
ConsoleHeight uint16 `json:"console_height"`
Rootless bool `json:"rootless"`
RootlessEUID bool `json:"rootless_euid,omitempty"`
RootlessCgroups bool `json:"rootless_cgroups,omitempty"`
}
type initer interface {
@ -218,11 +220,7 @@ func syncParentReady(pipe io.ReadWriter) error {
}
// Wait for parent to give the all-clear.
if err := readSync(pipe, procRun); err != nil {
return err
}
return nil
return readSync(pipe, procRun)
}
// syncParentHooks sends to the given pipe a JSON payload which indicates that
@ -235,11 +233,7 @@ func syncParentHooks(pipe io.ReadWriter) error {
}
// Wait for parent to give the all-clear.
if err := readSync(pipe, procResume); err != nil {
return err
}
return nil
return readSync(pipe, procResume)
}
// setupUser changes the groups, gid, and uid for the user inside the container
@ -283,7 +277,7 @@ func setupUser(config *initConfig) error {
return fmt.Errorf("cannot set gid to unmapped user in user namespace")
}
if config.Rootless {
if config.RootlessEUID {
// We cannot set any additional groups in a rootless container and thus
// we bail if the user asked us to do so. TODO: We currently can't do
// this check earlier, but if libcontainer.Process.User was typesafe
@ -299,11 +293,18 @@ func setupUser(config *initConfig) error {
return err
}
setgroups, err := ioutil.ReadFile("/proc/self/setgroups")
if err != nil && !os.IsNotExist(err) {
return err
}
// This isn't allowed in an unprivileged user namespace since Linux 3.19.
// There's nothing we can do about /etc/group entries, so we silently
// ignore setting groups here (since the user didn't explicitly ask us to
// set the group).
if !config.Rootless {
allowSupGroups := !config.RootlessEUID && strings.TrimSpace(string(setgroups)) != "deny"
if allowSupGroups {
suppGroups := append(execUser.Sgids, addGroups...)
if err := unix.Setgroups(suppGroups); err != nil {
return err

View File

@ -16,20 +16,26 @@ import (
)
/*
* About Intel RDT/CAT feature:
* About Intel RDT features:
* Intel platforms with new Xeon CPU support Resource Director Technology (RDT).
* Intel Cache Allocation Technology (CAT) is a sub-feature of RDT. Currently L3
* Cache is the only resource that is supported in RDT.
* Cache Allocation Technology (CAT) and Memory Bandwidth Allocation (MBA) are
* two sub-features of RDT.
*
* This feature provides a way for the software to restrict cache allocation to a
* defined 'subset' of L3 cache which may be overlapping with other 'subsets'.
* The different subsets are identified by class of service (CLOS) and each CLOS
* has a capacity bitmask (CBM).
* Cache Allocation Technology (CAT) provides a way for the software to restrict
* cache allocation to a defined 'subset' of L3 cache which may be overlapping
* with other 'subsets'. The different subsets are identified by class of
* service (CLOS) and each CLOS has a capacity bitmask (CBM).
*
* For more information about Intel RDT/CAT can be found in the section 17.17
* of Intel Software Developer Manual.
* Memory Bandwidth Allocation (MBA) provides indirect and approximate throttle
* over memory bandwidth for the software. A user controls the resource by
* indicating the percentage of maximum memory bandwidth or memory bandwidth
* limit in MBps unit if MBA Software Controller is enabled.
*
* About Intel RDT/CAT kernel interface:
* More details about Intel RDT CAT and MBA can be found in the section 17.18
* of Intel Software Developer Manual:
* https://software.intel.com/en-us/articles/intel-sdm
*
* About Intel RDT kernel interface:
* In Linux 4.10 kernel or newer, the interface is defined and exposed via
* "resource control" filesystem, which is a "cgroup-like" interface.
*
@ -37,59 +43,98 @@ import (
* interfaces in a container. But unlike cgroups' hierarchy, it has single level
* filesystem layout.
*
* CAT and MBA features are introduced in Linux 4.10 and 4.12 kernel via
* "resource control" filesystem.
*
* Intel RDT "resource control" filesystem hierarchy:
* mount -t resctrl resctrl /sys/fs/resctrl
* tree /sys/fs/resctrl
* /sys/fs/resctrl/
* |-- info
* | |-- L3
* | |-- cbm_mask
* | |-- min_cbm_bits
* | | |-- cbm_mask
* | | |-- min_cbm_bits
* | | |-- num_closids
* | |-- MB
* | |-- bandwidth_gran
* | |-- delay_linear
* | |-- min_bandwidth
* | |-- num_closids
* |-- cpus
* |-- ...
* |-- schemata
* |-- tasks
* |-- <container_id>
* |-- cpus
* |-- ...
* |-- schemata
* |-- tasks
*
* For runc, we can make use of `tasks` and `schemata` configuration for L3 cache
* resource constraints.
* For runc, we can make use of `tasks` and `schemata` configuration for L3
* cache and memory bandwidth resources constraints.
*
* The file `tasks` has a list of tasks that belongs to this group (e.g.,
* The file `tasks` has a list of tasks that belongs to this group (e.g.,
* <container_id>" group). Tasks can be added to a group by writing the task ID
* to the "tasks" file (which will automatically remove them from the previous
* to the "tasks" file (which will automatically remove them from the previous
* group to which they belonged). New tasks created by fork(2) and clone(2) are
* added to the same group as their parent. If a pid is not in any sub group, it is
* in root group.
* added to the same group as their parent.
*
* The file `schemata` has allocation bitmasks/values for L3 cache on each socket,
* which contains L3 cache id and capacity bitmask (CBM).
* The file `schemata` has a list of all the resources available to this group.
* Each resource (L3 cache, memory bandwidth) has its own line and format.
*
* L3 cache schema:
* It has allocation bitmasks/values for L3 cache on each socket, which
* contains L3 cache id and capacity bitmask (CBM).
* Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
* For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0`
* For example, on a two-socket machine, the schema line could be "L3:0=ff;1=c0"
* which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.
*
* The valid L3 cache CBM is a *contiguous bits set* and number of bits that can
* be set is less than the max bit. The max bits in the CBM is varied among
* supported Intel Xeon platforms. In Intel RDT "resource control" filesystem
* layout, the CBM in a group should be a subset of the CBM in root. Kernel will
* check if it is valid when writing. e.g., 0xfffff in root indicates the max bits
* of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM
* values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.
* supported Intel CPU models. Kernel will check if it is valid when writing.
* e.g., default value 0xfffff in root indicates the max bits of CBM is 20
* bits, which mapping to entire L3 cache capacity. Some valid CBM values to
* set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.
*
* For more information about Intel RDT/CAT kernel interface:
* Memory bandwidth schema:
* It has allocation values for memory bandwidth on each socket, which contains
* L3 cache id and memory bandwidth.
* Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
* For example, on a two-socket machine, the schema line could be "MB:0=20;1=70"
*
* The minimum bandwidth percentage value for each CPU model is predefined and
* can be looked up through "info/MB/min_bandwidth". The bandwidth granularity
* that is allocated is also dependent on the CPU model and can be looked up at
* "info/MB/bandwidth_gran". The available bandwidth control steps are:
* min_bw + N * bw_gran. Intermediate values are rounded to the next control
* step available on the hardware.
*
* If MBA Software Controller is enabled through mount option "-o mba_MBps":
* mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
* We could specify memory bandwidth in "MBps" (Mega Bytes per second) unit
* instead of "percentages". The kernel underneath would use a software feedback
* mechanism or a "Software Controller" which reads the actual bandwidth using
* MBM counters and adjust the memory bandwidth percentages to ensure:
* "actual memory bandwidth < user specified memory bandwidth".
*
* For example, on a two-socket machine, the schema line could be
* "MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on socket 0
* and 7000 MBps memory bandwidth limit on socket 1.
*
* For more information about Intel RDT kernel interface:
* https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
*
* An example for runc:
* Consider a two-socket machine with two L3 caches where the default CBM is
* 0xfffff and the max CBM length is 20 bits. With this configuration, tasks
* inside the container only have access to the "upper" 80% of L3 cache id 0 and
* the "lower" 50% L3 cache id 1:
* 0x7ff and the max CBM length is 11 bits, and minimum memory bandwidth of 10%
* with a memory bandwidth granularity of 10%.
*
* Tasks inside the container only have access to the "upper" 7/11 of L3 cache
* on socket 0 and the "lower" 5/11 L3 cache on socket 1, and may use a
* maximum memory bandwidth of 20% on socket 0 and 70% on socket 1.
*
* "linux": {
* "intelRdt": {
* "l3CacheSchema": "L3:0=ffff0;1=3ff"
* "intelRdt": {
* "l3CacheSchema": "L3:0=7f0;1=1f",
* "memBwSchema": "MB:0=20;1=70"
* }
* }
*/
@ -129,8 +174,12 @@ var (
intelRdtRoot string
intelRdtRootLock sync.Mutex
// The flag to indicate if Intel RDT is supported
isEnabled bool
// The flag to indicate if Intel RDT/CAT is enabled
isCatEnabled bool
// The flag to indicate if Intel RDT/MBA is enabled
isMbaEnabled bool
// The flag to indicate if Intel RDT/MBA Software Controller is enabled
isMbaScEnabled bool
)
type intelRdtData struct {
@ -139,19 +188,40 @@ type intelRdtData struct {
pid int
}
// Check if Intel RDT is enabled in init()
// Check if Intel RDT sub-features are enabled in init()
func init() {
// 1. Check if hardware and kernel support Intel RDT/CAT feature
// "cat_l3" flag is set if supported
isFlagSet, err := parseCpuInfoFile("/proc/cpuinfo")
if !isFlagSet || err != nil {
isEnabled = false
// 1. Check if hardware and kernel support Intel RDT sub-features
// "cat_l3" flag for CAT and "mba" flag for MBA
isCatFlagSet, isMbaFlagSet, err := parseCpuInfoFile("/proc/cpuinfo")
if err != nil {
return
}
// 2. Check if Intel RDT "resource control" filesystem is mounted
// The user guarantees to mount the filesystem
isEnabled = isIntelRdtMounted()
if !isIntelRdtMounted() {
return
}
// 3. Double check if Intel RDT sub-features are available in
// "resource control" filesystem. Intel RDT sub-features can be
// selectively disabled or enabled by kernel command line
// (e.g., rdt=!l3cat,mba) in 4.14 and newer kernel
if isCatFlagSet {
if _, err := os.Stat(filepath.Join(intelRdtRoot, "info", "L3")); err == nil {
isCatEnabled = true
}
}
if isMbaScEnabled {
// We confirm MBA Software Controller is enabled in step 2,
// MBA should be enabled because MBA Software Controller
// depends on MBA
isMbaEnabled = true
} else if isMbaFlagSet {
if _, err := os.Stat(filepath.Join(intelRdtRoot, "info", "MB")); err == nil {
isMbaEnabled = true
}
}
}
// Return the mount point path of Intel RDT "resource control" filesysem
@ -177,11 +247,16 @@ func findIntelRdtMountpointDir() (string, error) {
}
if postSeparatorFields[0] == "resctrl" {
// Check that the mount is properly formated.
// Check that the mount is properly formatted.
if numPostFields < 3 {
return "", fmt.Errorf("Error found less than 3 fields post '-' in %q", text)
}
// Check if MBA Software Controller is enabled through mount option "-o mba_MBps"
if strings.Contains(postSeparatorFields[2], "mba_MBps") {
isMbaScEnabled = true
}
return fields[4], nil
}
}
@ -223,30 +298,40 @@ func isIntelRdtMounted() bool {
return true
}
func parseCpuInfoFile(path string) (bool, error) {
func parseCpuInfoFile(path string) (bool, bool, error) {
isCatFlagSet := false
isMbaFlagSet := false
f, err := os.Open(path)
if err != nil {
return false, err
return false, false, err
}
defer f.Close()
s := bufio.NewScanner(f)
for s.Scan() {
if err := s.Err(); err != nil {
return false, err
return false, false, err
}
text := s.Text()
flags := strings.Split(text, " ")
line := s.Text()
// "cat_l3" flag is set if Intel RDT/CAT is supported
for _, flag := range flags {
if flag == "cat_l3" {
return true, nil
// Search "cat_l3" and "mba" flags in first "flags" line
if strings.Contains(line, "flags") {
flags := strings.Split(line, " ")
// "cat_l3" flag for CAT and "mba" flag for MBA
for _, flag := range flags {
switch flag {
case "cat_l3":
isCatFlagSet = true
case "mba":
isMbaFlagSet = true
}
}
return isCatFlagSet, isMbaFlagSet, nil
}
}
return false, nil
return isCatFlagSet, isMbaFlagSet, nil
}
func parseUint(s string, base, bitSize int) (uint64, error) {
@ -292,30 +377,6 @@ func getIntelRdtParamString(path, file string) (string, error) {
return strings.TrimSpace(string(contents)), nil
}
func readTasksFile(dir string) ([]int, error) {
f, err := os.Open(filepath.Join(dir, IntelRdtTasks))
if err != nil {
return nil, err
}
defer f.Close()
var (
s = bufio.NewScanner(f)
out = []int{}
)
for s.Scan() {
if t := s.Text(); t != "" {
pid, err := strconv.Atoi(t)
if err != nil {
return nil, err
}
out = append(out, pid)
}
}
return out, nil
}
func writeFile(dir, file, data string) error {
if dir == "" {
return fmt.Errorf("no such directory for %s", file)
@ -368,13 +429,64 @@ func getL3CacheInfo() (*L3CacheInfo, error) {
return l3CacheInfo, nil
}
// Get the read-only memory bandwidth information
func getMemBwInfo() (*MemBwInfo, error) {
memBwInfo := &MemBwInfo{}
rootPath, err := getIntelRdtRoot()
if err != nil {
return memBwInfo, err
}
path := filepath.Join(rootPath, "info", "MB")
bandwidthGran, err := getIntelRdtParamUint(path, "bandwidth_gran")
if err != nil {
return memBwInfo, err
}
delayLinear, err := getIntelRdtParamUint(path, "delay_linear")
if err != nil {
return memBwInfo, err
}
minBandwidth, err := getIntelRdtParamUint(path, "min_bandwidth")
if err != nil {
return memBwInfo, err
}
numClosids, err := getIntelRdtParamUint(path, "num_closids")
if err != nil {
return memBwInfo, err
}
memBwInfo.BandwidthGran = bandwidthGran
memBwInfo.DelayLinear = delayLinear
memBwInfo.MinBandwidth = minBandwidth
memBwInfo.NumClosids = numClosids
return memBwInfo, nil
}
// Get diagnostics for last filesystem operation error from file info/last_cmd_status
func getLastCmdStatus() (string, error) {
rootPath, err := getIntelRdtRoot()
if err != nil {
return "", err
}
path := filepath.Join(rootPath, "info")
lastCmdStatus, err := getIntelRdtParamString(path, "last_cmd_status")
if err != nil {
return "", err
}
return lastCmdStatus, nil
}
// WriteIntelRdtTasks writes the specified pid into the "tasks" file
func WriteIntelRdtTasks(dir string, pid int) error {
if dir == "" {
return fmt.Errorf("no such directory for %s", IntelRdtTasks)
}
// Dont attach any pid if -1 is specified as a pid
// Don't attach any pid if -1 is specified as a pid
if pid != -1 {
if err := ioutil.WriteFile(filepath.Join(dir, IntelRdtTasks), []byte(strconv.Itoa(pid)), 0700); err != nil {
return fmt.Errorf("failed to write %v to %v: %v", pid, IntelRdtTasks, err)
@ -383,9 +495,19 @@ func WriteIntelRdtTasks(dir string, pid int) error {
return nil
}
// Check if Intel RDT is enabled
func IsEnabled() bool {
return isEnabled
// Check if Intel RDT/CAT is enabled
func IsCatEnabled() bool {
return isCatEnabled
}
// Check if Intel RDT/MBA is enabled
func IsMbaEnabled() bool {
return isMbaEnabled
}
// Check if Intel RDT/MBA Software Controller is enabled
func IsMbaScEnabled() bool {
return isMbaScEnabled
}
// Get the 'container_id' path in Intel RDT "resource control" filesystem
@ -425,7 +547,7 @@ func (m *IntelRdtManager) Apply(pid int) (err error) {
func (m *IntelRdtManager) Destroy() error {
m.mu.Lock()
defer m.mu.Unlock()
if err := os.RemoveAll(m.Path); err != nil {
if err := os.RemoveAll(m.GetPath()); err != nil {
return err
}
m.Path = ""
@ -452,65 +574,143 @@ func (m *IntelRdtManager) GetStats() (*Stats, error) {
defer m.mu.Unlock()
stats := NewStats()
// The read-only L3 cache information
l3CacheInfo, err := getL3CacheInfo()
if err != nil {
return nil, err
}
stats.L3CacheInfo = l3CacheInfo
// The read-only L3 cache schema in root
rootPath, err := getIntelRdtRoot()
if err != nil {
return nil, err
}
// The read-only L3 cache and memory bandwidth schemata in root
tmpRootStrings, err := getIntelRdtParamString(rootPath, "schemata")
if err != nil {
return nil, err
}
// L3 cache schema is in the first line
schemaRootStrings := strings.Split(tmpRootStrings, "\n")
stats.L3CacheSchemaRoot = schemaRootStrings[0]
// The L3 cache schema in 'container_id' group
// The L3 cache and memory bandwidth schemata in 'container_id' group
tmpStrings, err := getIntelRdtParamString(m.GetPath(), "schemata")
if err != nil {
return nil, err
}
// L3 cache schema is in the first line
schemaStrings := strings.Split(tmpStrings, "\n")
stats.L3CacheSchema = schemaStrings[0]
if IsCatEnabled() {
// The read-only L3 cache information
l3CacheInfo, err := getL3CacheInfo()
if err != nil {
return nil, err
}
stats.L3CacheInfo = l3CacheInfo
// The read-only L3 cache schema in root
for _, schemaRoot := range schemaRootStrings {
if strings.Contains(schemaRoot, "L3") {
stats.L3CacheSchemaRoot = strings.TrimSpace(schemaRoot)
}
}
// The L3 cache schema in 'container_id' group
for _, schema := range schemaStrings {
if strings.Contains(schema, "L3") {
stats.L3CacheSchema = strings.TrimSpace(schema)
}
}
}
if IsMbaEnabled() {
// The read-only memory bandwidth information
memBwInfo, err := getMemBwInfo()
if err != nil {
return nil, err
}
stats.MemBwInfo = memBwInfo
// The read-only memory bandwidth information
for _, schemaRoot := range schemaRootStrings {
if strings.Contains(schemaRoot, "MB") {
stats.MemBwSchemaRoot = strings.TrimSpace(schemaRoot)
}
}
// The memory bandwidth schema in 'container_id' group
for _, schema := range schemaStrings {
if strings.Contains(schema, "MB") {
stats.MemBwSchema = strings.TrimSpace(schema)
}
}
}
return stats, nil
}
// Set Intel RDT "resource control" filesystem as configured.
func (m *IntelRdtManager) Set(container *configs.Config) error {
path := m.GetPath()
// About L3 cache schema file:
// The schema has allocation masks/values for L3 cache on each socket,
// About L3 cache schema:
// It has allocation bitmasks/values for L3 cache on each socket,
// which contains L3 cache id and capacity bitmask (CBM).
// Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
// For example, on a two-socket machine, L3's schema line could be:
// L3:0=ff;1=c0
// Which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.
// Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
// For example, on a two-socket machine, the schema line could be:
// L3:0=ff;1=c0
// which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM
// is 0xc0.
//
// About L3 cache CBM validity:
// The valid L3 cache CBM is a *contiguous bits set* and number of
// bits that can be set is less than the max bit. The max bits in the
// CBM is varied among supported Intel Xeon platforms. In Intel RDT
// "resource control" filesystem layout, the CBM in a group should
// be a subset of the CBM in root. Kernel will check if it is valid
// when writing.
// e.g., 0xfffff in root indicates the max bits of CBM is 20 bits,
// which mapping to entire L3 cache capacity. Some valid CBM values
// to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.
// CBM is varied among supported Intel CPU models. Kernel will check
// if it is valid when writing. e.g., default value 0xfffff in root
// indicates the max bits of CBM is 20 bits, which mapping to entire
// L3 cache capacity. Some valid CBM values to set in a group:
// 0xf, 0xf0, 0x3ff, 0x1f00 and etc.
//
//
// About memory bandwidth schema:
// It has allocation values for memory bandwidth on each socket, which
// contains L3 cache id and memory bandwidth.
// Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
// For example, on a two-socket machine, the schema line could be:
// "MB:0=20;1=70"
//
// The minimum bandwidth percentage value for each CPU model is
// predefined and can be looked up through "info/MB/min_bandwidth".
// The bandwidth granularity that is allocated is also dependent on
// the CPU model and can be looked up at "info/MB/bandwidth_gran".
// The available bandwidth control steps are: min_bw + N * bw_gran.
// Intermediate values are rounded to the next control step available
// on the hardware.
//
// If MBA Software Controller is enabled through mount option
// "-o mba_MBps": mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
// We could specify memory bandwidth in "MBps" (Mega Bytes per second)
// unit instead of "percentages". The kernel underneath would use a
// software feedback mechanism or a "Software Controller" which reads
// the actual bandwidth using MBM counters and adjust the memory
// bandwidth percentages to ensure:
// "actual memory bandwidth < user specified memory bandwidth".
//
// For example, on a two-socket machine, the schema line could be
// "MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on
// socket 0 and 7000 MBps memory bandwidth limit on socket 1.
if container.IntelRdt != nil {
path := m.GetPath()
l3CacheSchema := container.IntelRdt.L3CacheSchema
if l3CacheSchema != "" {
memBwSchema := container.IntelRdt.MemBwSchema
// Write a single joint schema string to schemata file
if l3CacheSchema != "" && memBwSchema != "" {
if err := writeFile(path, "schemata", l3CacheSchema+"\n"+memBwSchema); err != nil {
return NewLastCmdError(err)
}
}
// Write only L3 cache schema string to schemata file
if l3CacheSchema != "" && memBwSchema == "" {
if err := writeFile(path, "schemata", l3CacheSchema); err != nil {
return err
return NewLastCmdError(err)
}
}
// Write only memory bandwidth schema string to schemata file
if l3CacheSchema == "" && memBwSchema != "" {
if err := writeFile(path, "schemata", memBwSchema); err != nil {
return NewLastCmdError(err)
}
}
}
@ -521,11 +721,11 @@ func (m *IntelRdtManager) Set(container *configs.Config) error {
func (raw *intelRdtData) join(id string) (string, error) {
path := filepath.Join(raw.root, id)
if err := os.MkdirAll(path, 0755); err != nil {
return "", err
return "", NewLastCmdError(err)
}
if err := WriteIntelRdtTasks(path, raw.pid); err != nil {
return "", err
return "", NewLastCmdError(err)
}
return path, nil
}
@ -551,3 +751,23 @@ func IsNotFound(err error) bool {
_, ok := err.(*NotFoundError)
return ok
}
type LastCmdError struct {
LastCmdStatus string
Err error
}
func (e *LastCmdError) Error() string {
return fmt.Sprintf(e.Err.Error() + ", last_cmd_status: " + e.LastCmdStatus)
}
func NewLastCmdError(err error) error {
lastCmdStatus, err1 := getLastCmdStatus()
if err1 == nil {
return &LastCmdError{
LastCmdStatus: lastCmdStatus,
Err: err,
}
}
return err
}

View File

@ -8,6 +8,13 @@ type L3CacheInfo struct {
NumClosids uint64 `json:"num_closids,omitempty"`
}
type MemBwInfo struct {
BandwidthGran uint64 `json:"bandwidth_gran,omitempty"`
DelayLinear uint64 `json:"delay_linear,omitempty"`
MinBandwidth uint64 `json:"min_bandwidth,omitempty"`
NumClosids uint64 `json:"num_closids,omitempty"`
}
type Stats struct {
// The read-only L3 cache information
L3CacheInfo *L3CacheInfo `json:"l3_cache_info,omitempty"`
@ -17,6 +24,15 @@ type Stats struct {
// The L3 cache schema in 'container_id' group
L3CacheSchema string `json:"l3_cache_schema,omitempty"`
// The read-only memory bandwidth information
MemBwInfo *MemBwInfo `json:"mem_bw_info,omitempty"`
// The read-only memory bandwidth schema in root
MemBwSchemaRoot string `json:"mem_bw_schema_root,omitempty"`
// The memory bandwidth schema in 'container_id' group
MemBwSchema string `json:"mem_bw_schema,omitempty"`
}
func NewStats() *Stats {

View File

@ -7,6 +7,8 @@ import (
"strconv"
"strings"
"github.com/pkg/errors"
"golang.org/x/sys/unix"
)
@ -15,7 +17,7 @@ type KeySerial uint32
func JoinSessionKeyring(name string) (KeySerial, error) {
sessKeyId, err := unix.KeyctlJoinSessionKeyring(name)
if err != nil {
return 0, fmt.Errorf("could not create session key: %v", err)
return 0, errors.Wrap(err, "create session key")
}
return KeySerial(sessKeyId), nil
}
@ -42,9 +44,5 @@ func ModKeyringPerm(ringId KeySerial, mask, setbits uint32) error {
perm := (uint32(perm64) & mask) | setbits
if err := unix.KeyctlSetperm(int(ringId), perm); err != nil {
return err
}
return nil
return unix.KeyctlSetperm(int(ringId), perm)
}

View File

@ -10,16 +10,16 @@ import (
// list of known message types we want to send to bootstrap program
// The number is randomly chosen to not conflict with known netlink types
const (
InitMsg uint16 = 62000
CloneFlagsAttr uint16 = 27281
NsPathsAttr uint16 = 27282
UidmapAttr uint16 = 27283
GidmapAttr uint16 = 27284
SetgroupAttr uint16 = 27285
OomScoreAdjAttr uint16 = 27286
RootlessAttr uint16 = 27287
UidmapPathAttr uint16 = 27288
GidmapPathAttr uint16 = 27289
InitMsg uint16 = 62000
CloneFlagsAttr uint16 = 27281
NsPathsAttr uint16 = 27282
UidmapAttr uint16 = 27283
GidmapAttr uint16 = 27284
SetgroupAttr uint16 = 27285
OomScoreAdjAttr uint16 = 27286
RootlessEUIDAttr uint16 = 27287
UidmapPathAttr uint16 = 27288
GidmapPathAttr uint16 = 27289
)
type Int32msg struct {

View File

@ -1,17 +1,17 @@
## nsenter
The `nsenter` package registers a special init constructor that is called before
the Go runtime has a chance to boot. This provides us the ability to `setns` on
existing namespaces and avoid the issues that the Go runtime has with multiple
threads. This constructor will be called if this package is registered,
The `nsenter` package registers a special init constructor that is called before
the Go runtime has a chance to boot. This provides us the ability to `setns` on
existing namespaces and avoid the issues that the Go runtime has with multiple
threads. This constructor will be called if this package is registered,
imported, in your go application.
The `nsenter` package will `import "C"` and it uses [cgo](https://golang.org/cmd/cgo/)
package. In cgo, if the import of "C" is immediately preceded by a comment, that comment,
package. In cgo, if the import of "C" is immediately preceded by a comment, that comment,
called the preamble, is used as a header when compiling the C parts of the package.
So every time we import package `nsenter`, the C code function `nsexec()` would be
called. And package `nsenter` is now only imported in `main_unix.go`, so every time
before we call `cmd.Start` on linux, that C code would run.
So every time we import package `nsenter`, the C code function `nsexec()` would be
called. And package `nsenter` is only imported in `init.go`, so every time the runc
`init` command is invoked, that C code is run.
Because `nsexec()` must be run before the Go runtime in order to use the
Linux kernel namespace, you must `import` this library into a package if
@ -37,5 +37,8 @@ the parent `nsexec()` will exit and the child `nsexec()` process will
return to allow the Go runtime take over.
NOTE: We do both `setns(2)` and `clone(2)` even if we don't have any
CLONE_NEW* clone flags because we must fork a new process in order to
`CLONE_NEW*` clone flags because we must fork a new process in order to
enter the PID namespace.

View File

@ -0,0 +1,265 @@
/*
* Copyright (C) 2019 Aleksa Sarai <cyphar@cyphar.com>
* Copyright (C) 2019 SUSE LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <limits.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/vfs.h>
#include <sys/mman.h>
#include <sys/sendfile.h>
#include <sys/syscall.h>
/* Use our own wrapper for memfd_create. */
#if !defined(SYS_memfd_create) && defined(__NR_memfd_create)
# define SYS_memfd_create __NR_memfd_create
#endif
#ifdef SYS_memfd_create
# define HAVE_MEMFD_CREATE
/* memfd_create(2) flags -- copied from <linux/memfd.h>. */
# ifndef MFD_CLOEXEC
# define MFD_CLOEXEC 0x0001U
# define MFD_ALLOW_SEALING 0x0002U
# endif
int memfd_create(const char *name, unsigned int flags)
{
return syscall(SYS_memfd_create, name, flags);
}
#endif
/* This comes directly from <linux/fcntl.h>. */
#ifndef F_LINUX_SPECIFIC_BASE
# define F_LINUX_SPECIFIC_BASE 1024
#endif
#ifndef F_ADD_SEALS
# define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
# define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
#endif
#ifndef F_SEAL_SEAL
# define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
# define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
# define F_SEAL_GROW 0x0004 /* prevent file from growing */
# define F_SEAL_WRITE 0x0008 /* prevent writes */
#endif
#define RUNC_SENDFILE_MAX 0x7FFFF000 /* sendfile(2) is limited to 2GB. */
#ifdef HAVE_MEMFD_CREATE
# define RUNC_MEMFD_COMMENT "runc_cloned:/proc/self/exe"
# define RUNC_MEMFD_SEALS \
(F_SEAL_SEAL | F_SEAL_SHRINK | F_SEAL_GROW | F_SEAL_WRITE)
#endif
static void *must_realloc(void *ptr, size_t size)
{
void *old = ptr;
do {
ptr = realloc(old, size);
} while(!ptr);
return ptr;
}
/*
* Verify whether we are currently in a self-cloned program (namely, is
* /proc/self/exe a memfd). F_GET_SEALS will only succeed for memfds (or rather
* for shmem files), and we want to be sure it's actually sealed.
*/
static int is_self_cloned(void)
{
int fd, ret, is_cloned = 0;
fd = open("/proc/self/exe", O_RDONLY|O_CLOEXEC);
if (fd < 0)
return -ENOTRECOVERABLE;
#ifdef HAVE_MEMFD_CREATE
ret = fcntl(fd, F_GET_SEALS);
is_cloned = (ret == RUNC_MEMFD_SEALS);
#else
struct stat statbuf = {0};
ret = fstat(fd, &statbuf);
if (ret >= 0)
is_cloned = (statbuf.st_nlink == 0);
#endif
close(fd);
return is_cloned;
}
/*
* Basic wrapper around mmap(2) that gives you the file length so you can
* safely treat it as an ordinary buffer. Only gives you read access.
*/
static char *read_file(char *path, size_t *length)
{
int fd;
char buf[4096], *copy = NULL;
if (!length)
return NULL;
fd = open(path, O_RDONLY | O_CLOEXEC);
if (fd < 0)
return NULL;
*length = 0;
for (;;) {
int n;
n = read(fd, buf, sizeof(buf));
if (n < 0)
goto error;
if (!n)
break;
copy = must_realloc(copy, (*length + n) * sizeof(*copy));
memcpy(copy + *length, buf, n);
*length += n;
}
close(fd);
return copy;
error:
close(fd);
free(copy);
return NULL;
}
/*
* A poor-man's version of "xargs -0". Basically parses a given block of
* NUL-delimited data, within the given length and adds a pointer to each entry
* to the array of pointers.
*/
static int parse_xargs(char *data, int data_length, char ***output)
{
int num = 0;
char *cur = data;
if (!data || *output != NULL)
return -1;
while (cur < data + data_length) {
num++;
*output = must_realloc(*output, (num + 1) * sizeof(**output));
(*output)[num - 1] = cur;
cur += strlen(cur) + 1;
}
(*output)[num] = NULL;
return num;
}
/*
* "Parse" out argv from /proc/self/cmdline.
* This is necessary because we are running in a context where we don't have a
* main() that we can just get the arguments from.
*/
static int fetchve(char ***argv)
{
char *cmdline = NULL;
size_t cmdline_size;
cmdline = read_file("/proc/self/cmdline", &cmdline_size);
if (!cmdline)
goto error;
if (parse_xargs(cmdline, cmdline_size, argv) <= 0)
goto error;
return 0;
error:
free(cmdline);
return -EINVAL;
}
static int clone_binary(void)
{
int binfd, memfd;
ssize_t sent = 0;
#ifdef HAVE_MEMFD_CREATE
memfd = memfd_create(RUNC_MEMFD_COMMENT, MFD_CLOEXEC | MFD_ALLOW_SEALING);
#else
memfd = open("/tmp", O_TMPFILE | O_EXCL | O_RDWR | O_CLOEXEC, 0711);
#endif
if (memfd < 0)
return -ENOTRECOVERABLE;
binfd = open("/proc/self/exe", O_RDONLY | O_CLOEXEC);
if (binfd < 0)
goto error;
sent = sendfile(memfd, binfd, NULL, RUNC_SENDFILE_MAX);
close(binfd);
if (sent < 0)
goto error;
#ifdef HAVE_MEMFD_CREATE
int err = fcntl(memfd, F_ADD_SEALS, RUNC_MEMFD_SEALS);
if (err < 0)
goto error;
#else
/* Need to re-open "memfd" as read-only to avoid execve(2) giving -EXTBUSY. */
int newfd;
char *fdpath = NULL;
if (asprintf(&fdpath, "/proc/self/fd/%d", memfd) < 0)
goto error;
newfd = open(fdpath, O_RDONLY | O_CLOEXEC);
free(fdpath);
if (newfd < 0)
goto error;
close(memfd);
memfd = newfd;
#endif
return memfd;
error:
close(memfd);
return -EIO;
}
/* Get cheap access to the environment. */
extern char **environ;
int ensure_cloned_binary(void)
{
int execfd;
char **argv = NULL;
/* Check that we're not self-cloned, and if we are then bail. */
int cloned = is_self_cloned();
if (cloned > 0 || cloned == -ENOTRECOVERABLE)
return cloned;
if (fetchve(&argv) < 0)
return -EINVAL;
execfd = clone_binary();
if (execfd < 0)
return -EIO;
fexecve(execfd, argv, environ);
return -ENOEXEC;
}

View File

@ -42,6 +42,12 @@ enum sync_t {
SYNC_ERR = 0xFF, /* Fatal error, no turning back. The error code follows. */
};
/*
* Synchronisation value for cgroup namespace setup.
* The same constant is defined in process_linux.go as "createCgroupns".
*/
#define CREATECGROUPNS 0x80
/* longjmp() arguments. */
#define JUMP_PARENT 0x00
#define JUMP_CHILD 0xA0
@ -82,7 +88,7 @@ struct nlconfig_t {
uint8_t is_setgroup;
/* Rootless container settings. */
uint8_t is_rootless;
uint8_t is_rootless_euid; /* boolean */
char *uidmappath;
size_t uidmappath_len;
char *gidmappath;
@ -100,7 +106,7 @@ struct nlconfig_t {
#define GIDMAP_ATTR 27284
#define SETGROUP_ATTR 27285
#define OOM_SCORE_ADJ_ATTR 27286
#define ROOTLESS_ATTR 27287
#define ROOTLESS_EUID_ATTR 27287
#define UIDMAPPATH_ATTR 27288
#define GIDMAPPATH_ATTR 27289
@ -211,7 +217,7 @@ static int try_mapping_tool(const char *app, int pid, char *map, size_t map_len)
/*
* If @app is NULL, execve will segfault. Just check it here and bail (if
* we're in this path, the caller is already getting desparate and there
* we're in this path, the caller is already getting desperate and there
* isn't a backup to this failing). This usually would be a configuration
* or programming issue.
*/
@ -419,8 +425,8 @@ static void nl_parse(int fd, struct nlconfig_t *config)
case CLONE_FLAGS_ATTR:
config->cloneflags = readint32(current);
break;
case ROOTLESS_ATTR:
config->is_rootless = readint8(current);
case ROOTLESS_EUID_ATTR:
config->is_rootless_euid = readint8(current); /* boolean */
break;
case OOM_SCORE_ADJ_ATTR:
config->oom_score_adj = current;
@ -528,6 +534,9 @@ void join_namespaces(char *nslist)
free(namespaces);
}
/* Defined in cloned_binary.c. */
extern int ensure_cloned_binary(void);
void nsexec(void)
{
int pipenum;
@ -543,6 +552,14 @@ void nsexec(void)
if (pipenum == -1)
return;
/*
* We need to re-exec if we are not in a cloned binary. This is necessary
* to ensure that containers won't be able to access the host binary
* through /proc/self/exe. See CVE-2019-5736.
*/
if (ensure_cloned_binary() < 0)
bail("could not ensure we are a cloned binary");
/* Parse all of the netlink configuration. */
nl_parse(pipenum, &config);
@ -640,7 +657,6 @@ void nsexec(void)
case JUMP_PARENT:{
int len;
pid_t child, first_child = -1;
char buf[JSON_MAX];
bool ready = false;
/* For debugging. */
@ -687,7 +703,7 @@ void nsexec(void)
* newuidmap/newgidmap shall be used.
*/
if (config.is_rootless && !config.is_setgroup)
if (config.is_rootless_euid && !config.is_setgroup)
update_setgroups(child, SETGROUPS_DENY);
/* Set up mappings. */
@ -716,6 +732,18 @@ void nsexec(void)
kill(child, SIGKILL);
bail("failed to sync with child: write(SYNC_RECVPID_ACK)");
}
/* Send the init_func pid back to our parent.
*
* Send the init_func pid and the pid of the first child back to our parent.
* We need to send both back because we can't reap the first child we created (CLONE_PARENT).
* It becomes the responsibility of our parent to reap the first child.
*/
len = dprintf(pipenum, "{\"pid\": %d, \"pid_first\": %d}\n", child, first_child);
if (len < 0) {
kill(child, SIGKILL);
bail("unable to generate JSON for child pid");
}
}
break;
case SYNC_CHILD_READY:
@ -759,23 +787,6 @@ void nsexec(void)
bail("unexpected sync value: %u", s);
}
}
/*
* Send the init_func pid and the pid of the first child back to our parent.
*
* We need to send both back because we can't reap the first child we created (CLONE_PARENT).
* It becomes the responsibility of our parent to reap the first child.
*/
len = snprintf(buf, JSON_MAX, "{\"pid\": %d, \"pid_first\": %d}\n", child, first_child);
if (len < 0) {
kill(child, SIGKILL);
bail("unable to generate JSON for child pid");
}
if (write(pipenum, buf, len) != len) {
kill(child, SIGKILL);
bail("unable to send child pid to bootstrapper");
}
exit(0);
}
@ -862,14 +873,17 @@ void nsexec(void)
if (setresuid(0, 0, 0) < 0)
bail("failed to become root in user namespace");
}
/*
* Unshare all of the namespaces. Note that we don't merge this
* with clone() because there were some old kernel versions where
* clone(CLONE_PARENT | CLONE_NEWPID) was broken, so we'll just do
* it the long way.
* Unshare all of the namespaces. Now, it should be noted that this
* ordering might break in the future (especially with rootless
* containers). But for now, it's not possible to split this into
* CLONE_NEWUSER + [the rest] because of some RHEL SELinux issues.
*
* Note that we don't merge this with clone() because there were
* some old kernel versions where clone(CLONE_PARENT | CLONE_NEWPID)
* was broken, so we'll just do it the long way anyway.
*/
if (unshare(config.cloneflags) < 0)
if (unshare(config.cloneflags & ~CLONE_NEWCGROUP) < 0)
bail("failed to unshare namespaces");
/*
@ -953,11 +967,23 @@ void nsexec(void)
if (setgid(0) < 0)
bail("setgid failed");
if (!config.is_rootless && config.is_setgroup) {
if (!config.is_rootless_euid && config.is_setgroup) {
if (setgroups(0, NULL) < 0)
bail("setgroups failed");
}
/* ... wait until our topmost parent has finished cgroup setup in p.manager.Apply() ... */
if (config.cloneflags & CLONE_NEWCGROUP) {
uint8_t value;
if (read(pipenum, &value, sizeof(value)) != sizeof(value))
bail("read synchronisation value failed");
if (value == CREATECGROUPNS) {
if (unshare(CLONE_NEWCGROUP) < 0)
bail("failed to unshare cgroup namespace");
} else
bail("received unknown synchronisation value");
}
s = SYNC_CHILD_READY;
if (write(syncfd, &s, sizeof(s)) != sizeof(s))
bail("failed to sync with patent: write(SYNC_CHILD_READY)");

View File

@ -22,6 +22,10 @@ import (
"golang.org/x/sys/unix"
)
// Synchronisation value for cgroup namespace setup.
// The same constant is defined in nsexec.c as "CREATECGROUPNS".
const createCgroupns = 0x80
type parentProcess interface {
// pid returns the pid for the running process.
pid() int
@ -46,15 +50,16 @@ type parentProcess interface {
}
type setnsProcess struct {
cmd *exec.Cmd
parentPipe *os.File
childPipe *os.File
cgroupPaths map[string]string
intelRdtPath string
config *initConfig
fds []string
process *Process
bootstrapData io.Reader
cmd *exec.Cmd
parentPipe *os.File
childPipe *os.File
cgroupPaths map[string]string
rootlessCgroups bool
intelRdtPath string
config *initConfig
fds []string
process *Process
bootstrapData io.Reader
}
func (p *setnsProcess) startTime() (uint64, error) {
@ -86,7 +91,7 @@ func (p *setnsProcess) start() (err error) {
return newSystemErrorWithCause(err, "executing setns process")
}
if len(p.cgroupPaths) > 0 {
if err := cgroups.EnterPid(p.cgroupPaths, p.pid()); err != nil {
if err := cgroups.EnterPid(p.cgroupPaths, p.pid()); err != nil && !p.rootlessCgroups {
return newSystemErrorWithCausef(err, "adding pid %d to cgroups", p.pid())
}
}
@ -224,12 +229,17 @@ func (p *initProcess) externalDescriptors() []string {
return p.fds
}
// execSetns runs the process that executes C code to perform the setns calls
// because setns support requires the C process to fork off a child and perform the setns
// before the go runtime boots, we wait on the process to die and receive the child's pid
// over the provided pipe.
// This is called by initProcess.start function
func (p *initProcess) execSetns() error {
// getChildPid receives the final child's pid over the provided pipe.
func (p *initProcess) getChildPid() (int, error) {
var pid pid
if err := json.NewDecoder(p.parentPipe).Decode(&pid); err != nil {
p.cmd.Wait()
return -1, err
}
return pid.Pid, nil
}
func (p *initProcess) waitForChildExit(childPid int) error {
status, err := p.cmd.Process.Wait()
if err != nil {
p.cmd.Wait()
@ -239,22 +249,8 @@ func (p *initProcess) execSetns() error {
p.cmd.Wait()
return &exec.ExitError{ProcessState: status}
}
var pid *pid
if err := json.NewDecoder(p.parentPipe).Decode(&pid); err != nil {
p.cmd.Wait()
return err
}
// Clean up the zombie parent process
firstChildProcess, err := os.FindProcess(pid.PidFirstChild)
if err != nil {
return err
}
// Ignore the error in case the child has already been reaped for any reason
_, _ = firstChildProcess.Wait()
process, err := os.FindProcess(pid.Pid)
process, err := os.FindProcess(childPid)
if err != nil {
return err
}
@ -296,19 +292,47 @@ func (p *initProcess) start() error {
if _, err := io.Copy(p.parentPipe, p.bootstrapData); err != nil {
return newSystemErrorWithCause(err, "copying bootstrap data to pipe")
}
if err := p.execSetns(); err != nil {
return newSystemErrorWithCause(err, "running exec setns process for init")
childPid, err := p.getChildPid()
if err != nil {
return newSystemErrorWithCause(err, "getting the final child's pid from pipe")
}
// Save the standard descriptor names before the container process
// can potentially move them (e.g., via dup2()). If we don't do this now,
// we won't know at checkpoint time which file descriptor to look up.
fds, err := getPipeFds(p.pid())
fds, err := getPipeFds(childPid)
if err != nil {
return newSystemErrorWithCausef(err, "getting pipe fds for pid %d", p.pid())
return newSystemErrorWithCausef(err, "getting pipe fds for pid %d", childPid)
}
p.setExternalDescriptors(fds)
// Do this before syncing with child so that no children
// can escape the cgroup
if err := p.manager.Apply(childPid); err != nil {
return newSystemErrorWithCause(err, "applying cgroup configuration for process")
}
if p.intelRdtManager != nil {
if err := p.intelRdtManager.Apply(childPid); err != nil {
return newSystemErrorWithCause(err, "applying Intel RDT configuration for process")
}
}
// Now it's time to setup cgroup namesapce
if p.config.Config.Namespaces.Contains(configs.NEWCGROUP) && p.config.Config.Namespaces.PathOf(configs.NEWCGROUP) == "" {
if _, err := p.parentPipe.Write([]byte{createCgroupns}); err != nil {
return newSystemErrorWithCause(err, "sending synchronization value to init process")
}
}
// Wait for our first child to exit
if err := p.waitForChildExit(childPid); err != nil {
return newSystemErrorWithCause(err, "waiting for our first child to exit")
}
defer func() {
if err != nil {
// TODO: should not be the responsibility to call here
p.manager.Destroy()
}
}()
if err := p.createNetworkInterfaces(); err != nil {
return newSystemErrorWithCause(err, "creating network interfaces")
}
@ -341,14 +365,13 @@ func (p *initProcess) start() error {
}
if p.config.Config.Hooks != nil {
bundle, annotations := utils.Annotations(p.container.config.Labels)
s := configs.HookState{
Version: p.container.config.Version,
ID: p.container.id,
Pid: p.pid(),
Bundle: bundle,
Annotations: annotations,
s, err := p.container.currentOCIState()
if err != nil {
return err
}
// initProcessStartTime hasn't been set yet.
s.Pid = p.cmd.Process.Pid
s.Status = "creating"
for i, hook := range p.config.Config.Hooks.Prestart {
if err := hook.Run(s); err != nil {
return newSystemErrorWithCausef(err, "running prestart hook %d", i)
@ -372,14 +395,13 @@ func (p *initProcess) start() error {
}
}
if p.config.Config.Hooks != nil {
bundle, annotations := utils.Annotations(p.container.config.Labels)
s := configs.HookState{
Version: p.container.config.Version,
ID: p.container.id,
Pid: p.pid(),
Bundle: bundle,
Annotations: annotations,
s, err := p.container.currentOCIState()
if err != nil {
return err
}
// initProcessStartTime hasn't been set yet.
s.Pid = p.cmd.Process.Pid
s.Status = "creating"
for i, hook := range p.config.Config.Hooks.Prestart {
if err := hook.Run(s); err != nil {
return newSystemErrorWithCausef(err, "running prestart hook %d", i)
@ -537,7 +559,7 @@ func (p *Process) InitializeIO(rootuid, rootgid int) (i *IO, err error) {
}
fds = append(fds, r.Fd(), w.Fd())
p.Stderr, i.Stderr = w, r
// change ownership of the pipes incase we are in a user namespace
// change ownership of the pipes in case we are in a user namespace
for _, fd := range fds {
if err := unix.Fchown(int(fd), rootuid, rootgid); err != nil {
return nil, err

View File

@ -42,16 +42,11 @@ func needsSetupDev(config *configs.Config) bool {
// finalizeRootfs after this function to finish setting up the rootfs.
func prepareRootfs(pipe io.ReadWriter, iConfig *initConfig) (err error) {
config := iConfig.Config
if config.Rootfs == "/" {
if err := unix.Chdir(config.Rootfs); err != nil {
return newSystemErrorWithCausef(err, "changing dir to %q", config.Rootfs)
}
return nil
}
if err := prepareRoot(config); err != nil {
return newSystemErrorWithCause(err, "preparing rootfs")
}
hasCgroupns := config.Namespaces.Contains(configs.NEWCGROUP)
setupDev := needsSetupDev(config)
for _, m := range config.Mounts {
for _, precmd := range m.PremountCmds {
@ -59,8 +54,7 @@ func prepareRootfs(pipe io.ReadWriter, iConfig *initConfig) (err error) {
return newSystemErrorWithCause(err, "running premount command")
}
}
if err := mountToRootfs(m, config.Rootfs, config.MountLabel); err != nil {
if err := mountToRootfs(m, config.Rootfs, config.MountLabel, hasCgroupns); err != nil {
return newSystemErrorWithCausef(err, "mounting %q to rootfs %q at %q", m.Source, config.Rootfs, m.Destination)
}
@ -135,9 +129,6 @@ func prepareRootfs(pipe io.ReadWriter, iConfig *initConfig) (err error) {
// finalizeRootfs sets anything to ro if necessary. You must call
// prepareRootfs first.
func finalizeRootfs(config *configs.Config) (err error) {
if config.Rootfs == "/" {
return nil
}
// remount dev as ro if specified
for _, m := range config.Mounts {
if libcontainerUtils.CleanPath(m.Destination) == "/dev" {
@ -191,7 +182,7 @@ func mountCmd(cmd configs.Command) error {
return nil
}
func mountToRootfs(m *configs.Mount, rootfs, mountLabel string) error {
func mountToRootfs(m *configs.Mount, rootfs, mountLabel string, enableCgroupns bool) error {
var (
dest = m.Destination
)
@ -328,12 +319,33 @@ func mountToRootfs(m *configs.Mount, rootfs, mountLabel string) error {
Data: "mode=755",
PropagationFlags: m.PropagationFlags,
}
if err := mountToRootfs(tmpfs, rootfs, mountLabel); err != nil {
if err := mountToRootfs(tmpfs, rootfs, mountLabel, enableCgroupns); err != nil {
return err
}
for _, b := range binds {
if err := mountToRootfs(b, rootfs, mountLabel); err != nil {
return err
if enableCgroupns {
subsystemPath := filepath.Join(rootfs, b.Destination)
if err := os.MkdirAll(subsystemPath, 0755); err != nil {
return err
}
flags := defaultMountFlags
if m.Flags&unix.MS_RDONLY != 0 {
flags = flags | unix.MS_RDONLY
}
cgroupmount := &configs.Mount{
Source: "cgroup",
Device: "cgroup",
Destination: subsystemPath,
Flags: flags,
Data: filepath.Base(subsystemPath),
}
if err := mountNewCgroup(cgroupmount); err != nil {
return err
}
} else {
if err := mountToRootfs(b, rootfs, mountLabel, enableCgroupns); err != nil {
return err
}
}
}
for _, mc := range merged {
@ -736,6 +748,41 @@ func pivotRoot(rootfs string) error {
}
func msMoveRoot(rootfs string) error {
mountinfos, err := mount.GetMounts()
if err != nil {
return err
}
absRootfs, err := filepath.Abs(rootfs)
if err != nil {
return err
}
for _, info := range mountinfos {
p, err := filepath.Abs(info.Mountpoint)
if err != nil {
return err
}
// Umount every syfs and proc file systems, except those under the container rootfs
if (info.Fstype != "proc" && info.Fstype != "sysfs") || filepath.HasPrefix(p, absRootfs) {
continue
}
// Be sure umount events are not propagated to the host.
if err := unix.Mount("", p, "", unix.MS_SLAVE|unix.MS_REC, ""); err != nil {
return err
}
if err := unix.Unmount(p, unix.MNT_DETACH); err != nil {
if err != unix.EINVAL && err != unix.EPERM {
return err
} else {
// If we have not privileges for umounting (e.g. rootless), then
// cover the path.
if err := unix.Mount("tmpfs", p, "tmpfs", 0, ""); err != nil {
return err
}
}
}
}
if err := unix.Mount(rootfs, "/", "", unix.MS_MOVE, ""); err != nil {
return err
}
@ -837,10 +884,7 @@ func remount(m *configs.Mount, rootfs string) error {
if !strings.HasPrefix(dest, rootfs) {
dest = filepath.Join(rootfs, dest)
}
if err := unix.Mount(m.Source, dest, m.Device, uintptr(m.Flags|unix.MS_REMOUNT), ""); err != nil {
return err
}
return nil
return unix.Mount(m.Source, dest, m.Device, uintptr(m.Flags|unix.MS_REMOUNT), "")
}
// Do the mount operation followed by additional mounts required to take care
@ -871,3 +915,18 @@ func mountPropagate(m *configs.Mount, rootfs string, mountLabel string) error {
}
return nil
}
func mountNewCgroup(m *configs.Mount) error {
var (
data = m.Data
source = m.Source
)
if data == "systemd" {
data = cgroups.CgroupNamePrefix + data
source = "systemd"
}
if err := unix.Mount(source, m.Destination, m.Device, uintptr(m.Flags), data); err != nil {
return err
}
return nil
}

View File

@ -5,12 +5,14 @@ package libcontainer
import (
"fmt"
"os"
"runtime"
"github.com/opencontainers/runc/libcontainer/apparmor"
"github.com/opencontainers/runc/libcontainer/keys"
"github.com/opencontainers/runc/libcontainer/seccomp"
"github.com/opencontainers/runc/libcontainer/system"
"github.com/opencontainers/selinux/go-selinux/label"
"github.com/pkg/errors"
"golang.org/x/sys/unix"
)
@ -28,10 +30,19 @@ func (l *linuxSetnsInit) getSessionRingName() string {
}
func (l *linuxSetnsInit) Init() error {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
if !l.config.Config.NoNewKeyring {
// do not inherit the parent's session keyring
// Do not inherit the parent's session keyring.
if _, err := keys.JoinSessionKeyring(l.getSessionRingName()); err != nil {
return err
// Same justification as in standart_init_linux.go as to why we
// don't bail on ENOSYS.
//
// TODO(cyphar): And we should have logging here too.
if errors.Cause(err) != unix.ENOSYS {
return errors.Wrap(err, "join session keyring")
}
}
}
if l.config.CreateConsole {
@ -47,6 +58,10 @@ func (l *linuxSetnsInit) Init() error {
return err
}
}
if err := label.SetProcessLabel(l.config.ProcessLabel); err != nil {
return err
}
defer label.SetProcessLabel("")
// Without NoNewPrivileges seccomp is a privileged operation, so we need to
// do this before dropping capabilities; otherwise do it as late as possible
// just before execve so as few syscalls take place after it as possible.
@ -61,9 +76,6 @@ func (l *linuxSetnsInit) Init() error {
if err := apparmor.ApplyProfile(l.config.AppArmorProfile); err != nil {
return err
}
if err := label.SetProcessLabel(l.config.ProcessLabel); err != nil {
return err
}
// Set seccomp as close to execve as possible, so as few syscalls take
// place afterward (reducing the amount of syscalls that users need to
// enable in their seccomp profiles).

View File

@ -6,6 +6,7 @@ import (
"fmt"
"os"
"os/exec"
"runtime"
"syscall" //only for Exec
"github.com/opencontainers/runc/libcontainer/apparmor"
@ -44,17 +45,31 @@ func (l *linuxStandardInit) getSessionRingParams() (string, uint32, uint32) {
}
func (l *linuxStandardInit) Init() error {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
if !l.config.Config.NoNewKeyring {
ringname, keepperms, newperms := l.getSessionRingParams()
// Do not inherit the parent's session keyring.
sessKeyId, err := keys.JoinSessionKeyring(ringname)
if err != nil {
return errors.Wrap(err, "join session keyring")
}
// Make session keyring searcheable.
if err := keys.ModKeyringPerm(sessKeyId, keepperms, newperms); err != nil {
return errors.Wrap(err, "mod keyring permissions")
if sessKeyId, err := keys.JoinSessionKeyring(ringname); err != nil {
// If keyrings aren't supported then it is likely we are on an
// older kernel (or inside an LXC container). While we could bail,
// the security feature we are using here is best-effort (it only
// really provides marginal protection since VFS credentials are
// the only significant protection of keyrings).
//
// TODO(cyphar): Log this so people know what's going on, once we
// have proper logging in 'runc init'.
if errors.Cause(err) != unix.ENOSYS {
return errors.Wrap(err, "join session keyring")
}
} else {
// Make session keyring searcheable. If we've gotten this far we
// bail on any error -- we don't want to have a keyring with bad
// permissions.
if err := keys.ModKeyringPerm(sessKeyId, keepperms, newperms); err != nil {
return errors.Wrap(err, "mod keyring permissions")
}
}
}
@ -96,9 +111,6 @@ func (l *linuxStandardInit) Init() error {
if err := apparmor.ApplyProfile(l.config.AppArmorProfile); err != nil {
return errors.Wrap(err, "apply apparmor profile")
}
if err := label.SetProcessLabel(l.config.ProcessLabel); err != nil {
return errors.Wrap(err, "set process label")
}
for key, value := range l.config.Config.Sysctl {
if err := writeSystemProperty(key, value); err != nil {
@ -130,6 +142,10 @@ func (l *linuxStandardInit) Init() error {
if err := syncParentReady(l.pipe); err != nil {
return errors.Wrap(err, "sync ready")
}
if err := label.SetProcessLabel(l.config.ProcessLabel); err != nil {
return errors.Wrap(err, "set process label")
}
defer label.SetProcessLabel("")
// Without NoNewPrivileges seccomp is a privileged operation, so we need to
// do this before dropping capabilities; otherwise do it as late as possible
// just before execve so as few syscalls take place after it as possible.

View File

@ -8,7 +8,6 @@ import (
"path/filepath"
"github.com/opencontainers/runc/libcontainer/configs"
"github.com/opencontainers/runc/libcontainer/utils"
"github.com/sirupsen/logrus"
"golang.org/x/sys/unix"
@ -63,12 +62,9 @@ func destroy(c *linuxContainer) error {
func runPoststopHooks(c *linuxContainer) error {
if c.config.Hooks != nil {
bundle, annotations := utils.Annotations(c.config.Labels)
s := configs.HookState{
Version: c.config.Version,
ID: c.id,
Bundle: bundle,
Annotations: annotations,
s, err := c.currentOCIState()
if err != nil {
return err
}
for _, hook := range c.config.Hooks.Poststop {
if err := hook.Run(s); err != nil {

View File

@ -41,10 +41,7 @@ type syncT struct {
// writeSync is used to write to a synchronisation pipe. An error is returned
// if there was a problem writing the payload.
func writeSync(pipe io.Writer, sync syncType) error {
if err := utils.WriteJSON(pipe, syncT{sync}); err != nil {
return err
}
return nil
return utils.WriteJSON(pipe, syncT{sync})
}
// readSync is used to read from a synchronisation pipe. An error is returned

View File

@ -5,6 +5,7 @@ package user
import (
"io"
"os"
"strconv"
"golang.org/x/sys/unix"
)
@ -115,22 +116,23 @@ func CurrentGroup() (Group, error) {
return LookupGid(unix.Getgid())
}
func CurrentUserSubUIDs() ([]SubID, error) {
func currentUserSubIDs(fileName string) ([]SubID, error) {
u, err := CurrentUser()
if err != nil {
return nil, err
}
return ParseSubIDFileFilter("/etc/subuid",
func(entry SubID) bool { return entry.Name == u.Name })
filter := func(entry SubID) bool {
return entry.Name == u.Name || entry.Name == strconv.Itoa(u.Uid)
}
return ParseSubIDFileFilter(fileName, filter)
}
func CurrentGroupSubGIDs() ([]SubID, error) {
g, err := CurrentGroup()
if err != nil {
return nil, err
}
return ParseSubIDFileFilter("/etc/subgid",
func(entry SubID) bool { return entry.Name == g.Name })
func CurrentUserSubUIDs() ([]SubID, error) {
return currentUserSubIDs("/etc/subuid")
}
func CurrentUserSubGIDs() ([]SubID, error) {
return currentUserSubIDs("/etc/subgid")
}
func CurrentProcessUIDMap() ([]IDMap, error) {

View File

@ -1,8 +1,6 @@
package utils
import (
"crypto/rand"
"encoding/hex"
"encoding/json"
"io"
"os"
@ -17,19 +15,6 @@ const (
exitSignalOffset = 128
)
// GenerateRandomName returns a new name joined with a prefix. This size
// specified is used to truncate the randomly generated value
func GenerateRandomName(prefix string, size int) (string, error) {
id := make([]byte, 32)
if _, err := io.ReadFull(rand.Reader, id); err != nil {
return "", err
}
if size > 64 {
size = 64
}
return prefix + hex.EncodeToString(id)[:size], nil
}
// ResolveRootfs ensures that the current working directory is
// not a symlink and returns the absolute path to the rootfs
func ResolveRootfs(uncleanRootfs string) (string, error) {

36
vendor/vendor.json vendored
View File

@ -298,24 +298,24 @@
{"path":"github.com/opencontainers/go-digest","checksumSHA1":"NTperEHVh1uBqfTy9+oKceN4tKI=","revision":"21dfd564fd89c944783d00d069f33e3e7123c448","revisionTime":"2017-01-11T18:16:59Z"},
{"path":"github.com/opencontainers/image-spec/specs-go","checksumSHA1":"ZGlIwSRjdLYCUII7JLE++N4w7Xc=","revision":"89b51c794e9113108a2914e38e66c826a649f2b5","revisionTime":"2017-11-03T11:36:04Z"},
{"path":"github.com/opencontainers/image-spec/specs-go/v1","checksumSHA1":"jdbXRRzeu0njLE9/nCEZG+Yg/Jk=","revision":"89b51c794e9113108a2914e38e66c826a649f2b5","revisionTime":"2017-11-03T11:36:04Z"},
{"path":"github.com/opencontainers/runc/libcontainer","checksumSHA1":"6qP/ejjcc/+HelbVHmtl6cyUZmc=","origin":"github.com/hashicorp/runc/libcontainer","revision":"8df81be073884e4e4c613d893f5877310136820f","revisionTime":"2018-09-10T14:23:11Z","version":"nomad","versionExact":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/apparmor","checksumSHA1":"gVVY8k2G3ws+V1czsfxfuRs8log=","origin":"github.com/hashicorp/runc/libcontainer/apparmor","revision":"8df81be073884e4e4c613d893f5877310136820f","revisionTime":"2018-09-10T14:23:11Z","version":"nomad","versionExact":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/cgroups","checksumSHA1":"eA9qNw7Tkpi1GLojT/Vxa99aL00=","origin":"github.com/hashicorp/runc/libcontainer/cgroups","revision":"8df81be073884e4e4c613d893f5877310136820f","revisionTime":"2018-09-10T14:23:11Z","version":"nomad","versionExact":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/cgroups/fs","checksumSHA1":"YgPihDRi3OxI0qv7CxTxrqZuvfU=","origin":"github.com/hashicorp/runc/libcontainer/cgroups/fs","revision":"8df81be073884e4e4c613d893f5877310136820f","revisionTime":"2018-09-10T14:23:11Z","version":"nomad","versionExact":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/cgroups/systemd","checksumSHA1":"RVQixM4pF56oCNymhNY67I5eS0Y=","origin":"github.com/hashicorp/runc/libcontainer/cgroups/systemd","revision":"8df81be073884e4e4c613d893f5877310136820f","revisionTime":"2018-09-10T14:23:11Z","version":"nomad","versionExact":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/configs","checksumSHA1":"PUv5rdj6oEGJoBij/9Elx8VO6bs=","origin":"github.com/hashicorp/runc/libcontainer/configs","revision":"8df81be073884e4e4c613d893f5877310136820f","revisionTime":"2018-09-10T14:23:11Z","version":"nomad","versionExact":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/configs/validate","checksumSHA1":"YJq+f9izqizSzG/OuVFUOfloNEk=","origin":"github.com/hashicorp/runc/libcontainer/configs/validate","revision":"8df81be073884e4e4c613d893f5877310136820f","revisionTime":"2018-09-10T14:23:11Z","version":"nomad","versionExact":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/criurpc","checksumSHA1":"I1iwXoDUJeDXviilCtkvDpEF/So=","origin":"github.com/hashicorp/runc/libcontainer/criurpc","revision":"8df81be073884e4e4c613d893f5877310136820f","revisionTime":"2018-09-10T14:23:11Z","version":"nomad","versionExact":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/devices","checksumSHA1":"I1iwXoDUJeDXviilCtkvDpEF/So=","origin":"github.com/hashicorp/runc/libcontainer/devices","revision":"8df81be073884e4e4c613d893f5877310136820f","revisionTime":"2018-09-10T14:23:11Z","version":"nomad","versionExact":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/intelrdt","checksumSHA1":"bAWJX1XUDMd4GqPLSrCkUcdiTbg=","origin":"github.com/hashicorp/runc/libcontainer/intelrdt","revision":"459bfaec1fc6c17d8bfb12d0a0f69e7e7271ed2a","revisionTime":"2018-08-23T14:46:37Z","version":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/keys","checksumSHA1":"QXuHZwxlqhoq/oHRJFbTi6+AWLY=","origin":"github.com/hashicorp/runc/libcontainer/keys","revision":"459bfaec1fc6c17d8bfb12d0a0f69e7e7271ed2a","revisionTime":"2018-08-23T14:46:37Z","version":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/mount","checksumSHA1":"MJiogPDUU2nFr1fzQU6T+Ry1W8o=","origin":"github.com/hashicorp/runc/libcontainer/mount","revision":"459bfaec1fc6c17d8bfb12d0a0f69e7e7271ed2a","revisionTime":"2018-08-23T14:46:37Z","version":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/nsenter","checksumSHA1":"5p0YhO25gaLG+GUdTzqgvcRHjkE=","origin":"github.com/hashicorp/runc/libcontainer/nsenter","revision":"8df81be073884e4e4c613d893f5877310136820f","revisionTime":"2018-09-10T14:23:11Z","version":"nomad","versionExact":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/seccomp","checksumSHA1":"I1Qw/btE1twMqKHpYNsC98cteak=","origin":"github.com/hashicorp/runc/libcontainer/seccomp","revision":"8df81be073884e4e4c613d893f5877310136820f","revisionTime":"2018-09-10T14:23:11Z","version":"nomad","versionExact":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/stacktrace","checksumSHA1":"yp/kYBgVqKtxlnpq4CmyxLFMAE4=","origin":"github.com/hashicorp/runc/libcontainer/stacktrace","revision":"459bfaec1fc6c17d8bfb12d0a0f69e7e7271ed2a","revisionTime":"2018-08-23T14:46:37Z","version":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/system","checksumSHA1":"cjg/UcueM1/2/ExZ3N7010sa+hI=","origin":"github.com/hashicorp/runc/libcontainer/system","revision":"459bfaec1fc6c17d8bfb12d0a0f69e7e7271ed2a","revisionTime":"2018-08-23T14:46:37Z","version":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/user","checksumSHA1":"XtLpcP6ca9SQG218re7E7UcOj3Y=","origin":"github.com/hashicorp/runc/libcontainer/user","revision":"459bfaec1fc6c17d8bfb12d0a0f69e7e7271ed2a","revisionTime":"2018-08-23T14:46:37Z","version":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer/utils","checksumSHA1":"Sb296YW4V+esieanrx4Nzt2L5lU=","origin":"github.com/hashicorp/runc/libcontainer/utils","revision":"459bfaec1fc6c17d8bfb12d0a0f69e7e7271ed2a","revisionTime":"2018-08-23T14:46:37Z","version":"nomad"},
{"path":"github.com/opencontainers/runc/libcontainer","checksumSHA1":"qv6jtvdzSa/0N58fGCNwlYWu7z8=","origin":"github.com/hashicorp/runc/libcontainer","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/apparmor","checksumSHA1":"gVVY8k2G3ws+V1czsfxfuRs8log=","origin":"github.com/hashicorp/runc/libcontainer/apparmor","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/cgroups","checksumSHA1":"Ku9h5AOZZyF7LIoruJ26Ut+1WRI=","origin":"github.com/hashicorp/runc/libcontainer/cgroups","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/cgroups/fs","checksumSHA1":"OnnBJ2WfB/Y9EQpABKetBedf6ts=","origin":"github.com/hashicorp/runc/libcontainer/cgroups/fs","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/cgroups/systemd","checksumSHA1":"941jSDfCIl+b1pIQwZ9r+wj8wvM=","origin":"github.com/hashicorp/runc/libcontainer/cgroups/systemd","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/configs","checksumSHA1":"v9sgw4eYRNSsJUSG33OoFIwLqRI=","origin":"github.com/hashicorp/runc/libcontainer/configs","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/configs/validate","checksumSHA1":"hUveFGK1HhGenf0OVoYZWccoW9I=","origin":"github.com/hashicorp/runc/libcontainer/configs/validate","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/criurpc","checksumSHA1":"n7G7Egz/tOPacXuq+nkvnFai3eU=","origin":"github.com/hashicorp/runc/libcontainer/criurpc","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/devices","checksumSHA1":"2CwtFvz9kB0RSjFlcCkmq4taJ9U=","origin":"github.com/hashicorp/runc/libcontainer/devices","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/intelrdt","checksumSHA1":"sAbowQ7hjveSH5ADUD9IYXnEAJM=","origin":"github.com/hashicorp/runc/libcontainer/intelrdt","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/keys","checksumSHA1":"mKxBw0il2IWjWYgksX+17ufDw34=","origin":"github.com/hashicorp/runc/libcontainer/keys","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/mount","checksumSHA1":"MJiogPDUU2nFr1fzQU6T+Ry1W8o=","origin":"github.com/hashicorp/runc/libcontainer/mount","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/nsenter","checksumSHA1":"hRRDwZprEmMomgf9L/ymYTJmA/U=","origin":"github.com/hashicorp/runc/libcontainer/nsenter","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/seccomp","checksumSHA1":"I1Qw/btE1twMqKHpYNsC98cteak=","origin":"github.com/hashicorp/runc/libcontainer/seccomp","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/stacktrace","checksumSHA1":"yp/kYBgVqKtxlnpq4CmyxLFMAE4=","origin":"github.com/hashicorp/runc/libcontainer/stacktrace","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/system","checksumSHA1":"cjg/UcueM1/2/ExZ3N7010sa+hI=","origin":"github.com/hashicorp/runc/libcontainer/system","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/user","checksumSHA1":"mdUukOXCVJxmT0CufSKDeMg5JFM=","origin":"github.com/hashicorp/runc/libcontainer/user","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runc/libcontainer/utils","checksumSHA1":"PqGgeBjTHnyGrTr5ekLFEXpC3iQ=","origin":"github.com/hashicorp/runc/libcontainer/utils","revision":"369b920277d27630441336775cd728bc0f19e496","revisionTime":"2018-09-07T18:53:11Z","version":"nomad-20190219","versionExact":"nomad-20190219"},
{"path":"github.com/opencontainers/runtime-spec/specs-go","checksumSHA1":"AMYc2X2O/IL6EGrq6lTl5vEhLiY=","origin":"github.com/opencontainers/runc/vendor/github.com/opencontainers/runtime-spec/specs-go","revision":"459bfaec1fc6c17d8bfb12d0a0f69e7e7271ed2a","revisionTime":"2018-08-23T14:46:37Z"},
{"path":"github.com/opencontainers/selinux/go-selinux","checksumSHA1":"j9efF9bPmCCag+LzqwjyB8a44B8=","origin":"github.com/opencontainers/runc/vendor/github.com/opencontainers/selinux/go-selinux","revision":"459bfaec1fc6c17d8bfb12d0a0f69e7e7271ed2a","revisionTime":"2018-08-23T14:46:37Z"},
{"path":"github.com/opencontainers/selinux/go-selinux/label","checksumSHA1":"QbeVoKIoaJWZDH8V/588i8/Pjjs=","origin":"github.com/opencontainers/runc/vendor/github.com/opencontainers/selinux/go-selinux/label","revision":"459bfaec1fc6c17d8bfb12d0a0f69e7e7271ed2a","revisionTime":"2018-08-23T14:46:37Z"},