Docker容器与主机监控 | 8lovelife's life
0%

Docker容器与主机监控

监控 Docker

容器的优势让越来越多的应用开始容器化,如何监控容器的健康越来越重要。文章记录如何围绕主机与容器的CPU、内存、磁盘、网络等指标建立Docker应用监控系统

组件准备

文章目的是建立Docker的监控系统,所以系统中除了如下组件要求,还需要监控的系统中安装并使用了Docker

  1. node-exporter
  2. cAdvisor
  3. prometheus
  4. grafana

node-exporter

node-exporter 用于获取主机HOST的系统指标

1
2
3
4
5
6
7
8
9
启动 node-exporter

docker run -d --name=node-exporter -p 9100:9100 \
-v "/proc:/host/proc" \
-v "/sys:/host/sys" \
-v "/:/rootfs" \
prom/node-exporter \
--path.procfs /host/proc \
--path.sysfs /host/sys
1
2
3
4
5
6
7
验证指标数据,浏览器打开 http://localhost:9100/metrics

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 8.54e-05
...

cAdvisor

cAdvisor 是 Container Advisor 的缩写,用于获取运行中容器的系统指标,指标默认在本地保存2分钟

1
2
3
4
5
6
7
8
9
10
11
启动 cAdvisor

docker run -d --name=cadvisor -p 8080:8080 \
-v /:/rootfs:ro \
-v /var/run/docker.sock:/var/run/docker.sock:rw \
-v /sys:/sys:ro \
-v /var/lib/docker/:/var/lib/docker:ro \
-v /dev/disk/:/dev/disk:ro \
--privileged \
--device=/dev/kmsg \
google/cadvisor
1
2
3
4
5
6
7
8
9
10
验证指标数据,浏览器打开 http://localhost:8080/

/
root
Docker Containers
Subcontainers
/000-dhcpcd
/001-sysfs
/002-sysctl
...

Prometheus

prometheus 是一个可以实时将搜集到的指标数据存储到时间序列数据库(TSDB)中的监控应用,指标数据默认存储在本地磁盘

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
prometheus.yml内容如下

global:
scrape_interval: 120s # By default, scrape targets every 15 seconds.
evaluation_interval: 120s # By default, scrape targets every 15 seconds.
# scrape_timeout is set to the global default (10s).

# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'prometheus-test'

# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
# - "alert.rules"
# - "first.rules"
# - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
scrape_interval: 30s
static_configs:
- targets: ['host_ip:9090','host_ip:8080','host_ip:9100']

1
2
3
4
5
启动 prometheus

docker run -d --name prometheus -p 9090:9090 \
-v ~/prometheus/config/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus
1
2
3
4
验证指标数据,浏览器打开 http://localhost:9090/  搜索 machine_cpu_cores

machine_cpu_cores{instance="172.17.0.1:8080", job="prometheus"} 2

Grafana

使用Grafana将搜集的指标数据图表化,数据源使用Prometheus。Grafana的使用可以参考 Docker搭建监控系统

这里写图片描述# 附录

docker-compose.yml

使用 Docker Compose

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
version: '3'

volumes:
prometheus_data:
networks:
monitor-net:
driver: bridge

services:
prometheus:
image: prom/prometheus
container_name: prometheus_service
volumes:
- ~/Projects/monitor/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
ports:
- 9090:9090
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"

cAdvisor:
image: google/cadvisor
container_name: cAdvisor_service
# Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: operation not permitted
privileged: true
volumes:
- /:/rootfs:ro
# - /var/run:/var/run:ro # failed to get docker info: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
- /var/run/docker.sock:/var/run/docker.sock:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
devices:
- /dev/kmsg:/dev/kmsg
ports:
- 8080:8080
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"

node-exporter:
image: prom/node-exporter
container_name: node_exporter_service
volumes:
- /proc:/host/proc
- /sys:/host/sys
- /:/rootfs
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/rootfs'
ports:
- 9100:9100
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"

prometheus.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
global:
scrape_interval: 120s # By default, scrape targets every 15 seconds.
evaluation_interval: 120s # By default, scrape targets every 15 seconds.
# scrape_timeout is set to the global default (10s).

# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'monitoring'

# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
# - "alert.rules"
# - "first.rules"
# - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
scrape_interval: 30s
static_configs:
- targets: ['localhost:9090','cAdvisor_service:8080','node_exporter_service:9100']

Life is like a roll of toilet paper. The closer it gets to the end, the faster it goes. - Forrest

Forrest Gump