blackbox的安装配置


安装:到官网下载二进制解压即可。

[Unit]
Description=blackbox_exporter
After=network.target

[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/blackbox_exporter/blackbox_exporter \
    --config.file=/usr/local/blackbox_exporter/blackbox.yml \
    --web.listen-address ":9115"

Restart=on-failure

[Install]
WantedBy=multi-user.target

blackbox的配置:

modules:
  http_2xx:
    prober: http
    timeout: 8s
    http:
      method: GET
      preferred_ip_protocol: "ip4" # 改成ipv4的默认不写是ipv6
      ip_protocol_fallback: false
  http_post_2xx:
    prober: http
    http:
      method: POST
      preferred_ip_protocol: "ip4"
      ip_protocol_fallback: false
  tcp_connect:
    prober: tcp
  pop3s_banner:
    prober: tcp
    tcp:
      query_response:
      - expect: "^+OK"
      tls: true
      tls_config:
        insecure_skip_verify: false
  ssh_banner:
    prober: tcp
    tcp:
      query_response:
      - expect: "^SSH-2.0-"
  irc_banner:
    prober: tcp
    tcp:
      query_response:
      - send: "NICK prober"
      - send: "USER prober prober prober :prober"
      - expect: "PING :([^ ]+)"
        send: "PONG ${1}"
      - expect: "^:[^ ]+ 001"
  icmp:
    prober: icmp

blackbox配置示例:

https://github.com/prometheus/blackbox_exporter/blob/master/example.yml


Prometheus中的配置


HTTP检测:配置后到prometheus的target页面就可以看到监控项。

  - job_name: 'http_status'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
    - targets:
      - 10.47.12.223
      - 10.47.12.224
      - www.baidu.com
    # 下面的这些操作最终组合出这样一条请求语句:curl 'localhost:9115/probe?target=10.47.12.224&module=http_2xx'
    relabel_configs: # 对标签进行重写替换
      - source_labels: [__address__] # 源标签
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115 # 替换后的值,如果用正则可以用$1替换

标签重写讲解:

重写之前的语句:

http://10.47.12.223/probe?module=http_2xx

先来了解一下URL分别用来表示的变量:

__scheme__    协议

__address__    IP地址

__metrics_path__      请求路径

__param_<name>    参数,其中的<name>就是请求参数的名称,如: ?<name>=123  。

下面为标签重写的过程:

第一步:

  relabel_configs:
    - source_labels: [__address__] # 获取标签中的值
      target_label: __param_target # 来创建新的标签,__param_target 中的target就是url中的参数

重写之后的语句:

http://10.47.12.223/probe?target=http://10.47.12.223&module=http_2xx

第二步:target_label 用于创建新标签

  relabel_configs:
    - source_labels: [__param_target] # 用请求参数中的target里面的值
      target_label: instance # 来创建一个新的标签,这样请求返回值将带有 instance="10.47.12.223"的标签

第三步:

  relabel_configs:
    - target_label: __address__
      replacement: localhost:9115 # 替换__address__标签中的值为localhost:9115

重写之后的语句:下面的这个语句可以直接使用curl请求。

"http://localhost:9115/probe?target=http://10.47.12.223&module=http_2xx"

请求示例:

$curl 'localhost:9115/probe?target=10.47.12.223&module=http_2xx'

# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 2.2923e-05
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.116946177
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_http_content_length Length of http content response
# TYPE probe_http_content_length gauge
probe_http_content_length 56788
# HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects
# TYPE probe_http_duration_seconds gauge
probe_http_duration_seconds{phase="connect"} 0.011409116
probe_http_duration_seconds{phase="processing"} 0.018658983
probe_http_duration_seconds{phase="resolve"} 0.005563115
probe_http_duration_seconds{phase="tls"} 0.081830561
probe_http_duration_seconds{phase="transfer"} 0.005295138
# HELP probe_http_redirects The number of redirects
# TYPE probe_http_redirects gauge
probe_http_redirects 2
# HELP probe_http_ssl Indicates if SSL was used for the final redirect
# TYPE probe_http_ssl gauge
probe_http_ssl 1
# HELP probe_http_status_code Response HTTP status code
# TYPE probe_http_status_code gauge
probe_http_status_code 200
# HELP probe_http_uncompressed_body_length Length of uncompressed response body
# TYPE probe_http_uncompressed_body_length gauge
probe_http_uncompressed_body_length 56788
# HELP probe_http_version Returns the version of HTTP of the probe response
# TYPE probe_http_version gauge
probe_http_version 1.1
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 3.42328796e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_ssl_earliest_cert_expiry Returns earliest SSL cert expiry in unixtime
# TYPE probe_ssl_earliest_cert_expiry gauge
probe_ssl_earliest_cert_expiry 1.647056768e+09
# HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp seconds
# TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge
probe_ssl_last_chain_expiry_timestamp_seconds 1.647056768e+09
# HELP probe_ssl_last_chain_info Contains SSL leaf certificate information
# TYPE probe_ssl_last_chain_info gauge
probe_ssl_last_chain_info{fingerprint_sha256="20a457047e25007253f82d526bfd6ff48ecdef5156f1f59f71db9132cac7d7d3"} 1
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
# HELP probe_tls_version_info Contains the TLS version used
# TYPE probe_tls_version_info gauge
probe_tls_version_info{version="TLS 1.2"} 1


TCP的端口检测:

  - job_name: 'port_status'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
    - targets:
      - 10.47.12.223:25
      - 10.47.12.224:25
      labels:
        group: 'tcp'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

组合出这样一条语句:

http://10.32.215.16:9115/probe?target=10.47.12.224:25&module=tcp_connect


PING检测:

  - job_name: 'ping_status'
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets:
        - 10.47.12.223
        - 10.47.12.224
        labels:
          group: 'icmp'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115



prometheus中的查询语句示例:

probe_success

grafana面板:

https://grafana.com/grafana/dashboards/9965


参考文章:

infoq.cn/article/sxextntuttxduedeagiq

黑盒导出器的工作原理:最下面一段有标签重写的实例讲解。

https://prometheus.io/docs/guides/multi-target-exporter/

配置示例:

# my global config
global:
  scrape_interval:     30s 
  evaluation_interval: 10s 

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - localhost:9093

rule_files:
  - "rules/*.yml"
  # - "second_rules.yml"

scrape_configs:
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'nodes'
    file_sd_configs:
    - files:
      - target/nodes.yml
      refresh_interval: 60m

  - job_name: 'demo-metrics'
    scrape_interval: 15s
    static_configs:
    - targets: ['192.168.1.9:9176']

  - job_name: 'kubernetes-pods-eureka'
    metrics_path: /actuator/prometheus
    basic_auth:
      username: 'user'
      password: '4yDGdTO8'
    eureka_sd_configs:
    - server: 'http://autotest-eureka.demo.com/eureka'
    relabel_configs:
    - source_labels: 
      - __meta_eureka_app_name
      separator: ;
      regex: (.*)
      target_label: appname
      replacement: $1
      action: replace
    - action: labelmap
      regex: __meta_eureka_app_instance_(.+)
    - regex: metadata_user_(.+)
      action: labeldrop

  - job_name: 'ping_status'
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets:
        - 192.168.1.136
        labels:
          group: 'icmp'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

  - job_name: 'port_status'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
    - targets:
      - api.demo.com:80
      labels:
        group: 'tcp'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

  - job_name: "http_status"
    metrics_path: /probe
    params:
      module: [http_2xx]
    file_sd_configs: 
    - refresh_interval: 1m
      files: 
      - target/http.yml
    relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: localhost:9115


  - job_name: 'vmware_vcenter'
    metrics_path: '/metrics'
    static_configs:
      - targets:
        - 'vcenter.demo.com'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9272

  - job_name: 'mysqld'     
    file_sd_configs:
      - files: ['/usr/local/prometheus/target/mysqld.yml']
        refresh_interval: 1m

  - job_name: 'redis'
    file_sd_configs:
      - files: ['/usr/local/prometheus/target/redis.yml']
        refresh_interval: 1m

  - job_name: 'windows'
    file_sd_configs:
      - files: ['/usr/local/prometheus/target/windows.yml']
        refresh_interval: 1m