Docker Metrics with InstrumentalD

Docker logo

The best way to get Docker metrics into Instrumental is with InstrumentalD, the fast and reliable server agent created by the Instrumental team. By using InstrumentalD to collect Docker metrics, you'll get premade Docker graphs and unlock the full power of our Query Language.

Quick Start

Check out our Installation Instructions for more details. Otherwise, here's the bare minimum to get up and running.

brew install instrumental/instrumentald/instrumentald
echo 'docker = ["unix:///var/run/docker.sock"]' >> instrumentald.toml
instrumentald -c instrumentald.toml -k PROJECT_TOKEN
curl https://packagecloud.io/install/repositories/expectedbehavior/instrumental/script.deb.sh | sudo bash
sudo apt-get install instrumentald
sudo echo 'project_token = "PROJECT_TOKEN"' > /etc/instrumentald.toml
sudo echo 'docker = ["unix:///var/run/docker.sock"]' >> /opt/instrumentald.toml
sudo systemctl start instrumentald
curl https://packagecloud.io/install/repositories/expectedbehavior/instrumental/script.rpm.sh | sudo bash
sudo yum install instrumentald
sudo echo 'project_token = "PROJECT_TOKEN"' > /etc/instrumentald.toml
sudo echo 'docker = ["unix:///var/run/docker.sock"]' >> /opt/instrumentald.toml
sudo service instrumentald restart

Configuring InstrumentalD

InstrumentalD will collect the metrics below from as many Docker endpoints as configured. Here's a basic example of the Docker config:

docker = ["unix:///var/run/docker.sock"]

Metrics Collected

InstrumentalD collects both container-related metrics and host-specific metrics.

Container-related metrics collected by InstrumentalD follow this pattern:

docker.container.<image name>.<container name>.<metric>

Container Memory Metrics

The following memory metrics are collected:
fail_count Number of times memory usage has hit limits
limit Maximum memory allowed for the container in bytes
total_cache Size of the page cache in bytes
total_pgfault Indicate the number of times that a process of the cgroup triggered a "page fault". A page fault happens when a process accesses a part of its virtual memory space which is nonexistent or protected
total_pgmafault Indicate the number of times that a process of the cgroup triggered a "major fault". "Major" faults happen when the kernel actually has to read the data from disk. When it just has to duplicate an existing page, or allocate an empty page, it's a regular (or "minor") fault
total_rss The amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory maps
total_unevictable The amount of memory that cannot be reclaimed; generally, it will account for memory that has been "locked" with mlock. It is often used by crypto frameworks to make sure that secret keys and other sensitive material never gets swapped out to disk
usage Memory usage in bytes
usage_percent Memory usage as a percent of total available memory

Container CPU Metrics

Container CPU metrics follow a similar pattern to the general pattern described above, except that they include an additional metric part of `cpu-total`.

docker.container.<image name>.<container name>.<cpu_total>.<metric>
The following CPU-related metrics are collected:
cpu-total.throttling_periods The total number of times the container could have been throttled
cpu-total.throttling_throttled_periods The total number of times the container was throttled
cpu-total.throttling_throttled_time The amount of time the container was throttled, in microseconds
cpu-total.usage_percent CPU usage as a percent of total available

Container Network Metrics

Container network metrics follow a similar pattern to the general pattern described above, except that they include an additional metric part that represents the interface name (e.g. `eth0`)

docker.container.<image name>.<container name>.<interface name>.<metric>
The following network-related metrics are collected:
rx_bytes Bytes received
rx_dropped Inbound packets dropped
rx_errors Inbound packet errors
rx_packets Packets received
tx_bytes Bytes sent
tx_dropped Outbound packets dropped
tx_errors Outbound packet errors
tx_packets Packets sent

Container Block I/O Metrics

Container block I/O metrics follow a similar pattern to the general pattern described above, except that they include an additional metric part that represents the device major/minor numbers (e.g. `254_0`)

docker.container.<image name>.<container name>.<major_minor>.<metric>
The following metrics are collected for each block I/O device. If a container has no block I/O devices, these metrics will not be collected.
io_service_bytes_recursive_async Volume of serviced asynchronous block I/O requests, in bytes
io_service_bytes_recursive_read Volume read from block devices, in bytes
io_service_bytes_recursive_sync Volume of serviced synchronous block I/O requests, in bytes
io_service_bytes_recursive_write Volume written to block devices, in bytes
io_serviced_recursive_async Count of serviced asynchronous block I/O requests
io_serviced_recursive_read Count of read requests from block devices serviced
io_serviced_recursive_sync Count of serviced synchronous block I/O requests
io_serviced_recursive_write Count of write requests to block devices serviced

Host-Specific Metrics

Host-specific metrics collected by InstrumentalD follow this pattern:

docker.host.<hostname>.<metric>
The following memory-related metrics are collected:
bytes.memory_total Total memory allocated for all containers
n_containers Total number of running containers
n_cpus Total number of CPUs available to Docker
n_images Total number of images
n_listener_events Current number of listeners connected to Docker
n_used_file_descriptors Total number of file descriptors in use by Docker
Questions? We can help!