clickhouse 业务日志告警(已过时)

发布时间:2026/5/20 5:31:48

clickhouse 业务日志告警(已过时) 一、需求对入库到clickhouse的业务日志进行告警达阀值后发送企业微信告警。方法一、fluent-bit–clickhouse(http)–shell脚本,每隔一分钟获取分析结果 -- 把结果保存到/dev/shm/目录下 -- node_exporter读取指标入库到prometheus-- rules根据告警规则生产告警–alertmanager–webhook -- 企业微信。方法二、fluent-bit–clickhouse(http)–python,每隔一分钟获取分析结果 -- pushgateway–指标入库到prometheus-- rules根据告警规则生产告警–alertmanager–webhook -- 企业微信。二、告警组件clickhouseprometheusalertmanagernode_exporter查询脚本或者python脚本pushgateway)webhook三、clickhouse搭建和建表业务日志库四、node_exporter启动参数添加 --collector.textfile.directory/dev/shm/[Unit]Descriptionnode_exporter ServiceAfternetwork.targetAfternetwork-online.targetWantsnetwork-online.target[Service]TypesimpleWorkingDirectory/data/node_exporterExecStart/data/node_exporter/node_exporter\--web.config.file/data/node_exporter/etc/config.yml\--collector.filesystem.mount-points-exclude^/(sys|proc|dev|host|etc|var/lib/docker/.|var/lib/kubelet/.)($|/)\--collector.systemd\--collector.systemd.unit-include(docker|sshd|isg|sgadmin).service\--web.listen-address:19100\--collector.textfile.directory/dev/shm/\--web.telemetry-path/metricsRestartalwaysRestartSec5[Install]WantedBymulti-user.target五、shell脚本使用crontab定时一分钟执行一次#!/usr/bin/env bash## Generate node_resolv_info# which are not handled by node_exporters own collectorset-e#ch的IPch_hostxx.xx.xx.xx#ch的端口ch_port9000#ch的用户ch_userxxxx#ch的密码ch_passwordxxxxxxxxxxxxxxxxxxxx#ch的数据库ch_databasexxxxxxxxxxxxxx#ch的表名ch_tablexxxxxxxxxxxxx#查询推后query_delay60#因入库时间较慢查询前一分钟所#站点聚合site_sqlSELECT splitByChar(/,req_path)[2] as paasid , round(sum(if((toInt64(res_statuscode)200)AND(toInt64(res_statuscode)400),1,0))) as suc, count(1) as total , round(sum(if((toInt64(res_statuscode)200)AND(toInt64(res_statuscode)400),1,0))/ count(1)*100, 5) AS val FROM${ch_database}.${ch_table}PREWHERE (create_time toDateTime(now() - 60 -${query_delay})) AND (create_time toDateTime(now() -${query_delay})) GROUP BY paasid HAVING total 5 ORDER BY val DESCSITE_ARRAY(dockerexec-ich clickhouse-client--user${ch_user}--password${ch_password}--host${ch_host}--port${ch_port}-n-m-q${site_sql}|tr-d\r)site_num${#SITE_ARRAY[]}catEOS/dev/shm/site_rate.prom.tmp# HELP site_rate # TYPE site_rate gauge EOSfor((i0;isite_num;ii4));doREQ_PATH${SITE_ARRAY[i]}SUC${SITE_ARRAY[i1]}TOL${SITE_ARRAY[i2]}VAL${SITE_ARRAY[i3]}catEOS/dev/shm/site_rate.prom.tmpsite_rate{site_path${REQ_PATH},suc${SUC},total${TOL}}${VAL}EOSdone\mv /dev/shm/site_rate.prom.tmp /dev/shm/site_rate.prom#------------------------------------#API接口api_sqlSELECT req_path , round(sum(if((toInt64(res_statuscode)200)AND(toInt64(res_statuscode)400),1,0))) as suc, count(1) as total , round(sum(if((toInt64(res_statuscode)200)AND(toInt64(res_statuscode)400),1,0))/ count(1)*100, 5) AS val FROM${ch_database}.${ch_table}PREWHERE req_path like /ebus/% and (create_time toDateTime(now() - 60 -${query_delay})) AND (create_time toDateTime(now() -${query_delay})) GROUP BY req_path HAVING total 3 ORDER BY val DESCAPI_ARRAY(dockerexec-ich clickhouse-client--user${ch_user}--password${ch_password}--host${ch_host}--port${ch_port}-n-m-q${api_sql}|tr-d\r)api_num${#API_ARRAY[]}catEOS/dev/shm/api_rate.prom.tmp# HELP api_rate # TYPE api_rate gauge EOSfor((i0;iapi_num;ii4));doREQ_PATH${API_ARRAY[i]}SUC${API_ARRAY[i1]}TOL${API_ARRAY[i2]}VAL${API_ARRAY[i3]}catEOS/dev/shm/interface_rate.prom.tmpapi_rate{api_path${REQ_PATH},suc${SUC},total${TOL}}${VAL}EOSdone\mv /dev/shm/api_rate.prom.tmp /dev/shm/api_rate.prom#脚本生成结果1cat/dev/shm/site_rate.prom# HELP site_rate# TYPE site_rate gaugesite_rate{site_path/metrics/,suc49,total49}100site_rate{site_path/grafana/,suc9,total9}100site_rate{site_path/dail_healthcheck/,suc16,total16}100site_rate{site_path/abcyhzx5/,suc64,total64}100site_rate{site_path/abcapm/,suc30,total32}93.75site_rate{site_path/abc/,suc333,total370}90site_rate{site_path/ebus/,suc2,total14}14.28571六、prometheus告警规则groups: - name: 接口成功率-监控告警 rules: - alert: 接口成功率低于85% expr: avg by(api_path,suc,total)(api_rate)85for: 0m labels: severity: 一般 alert: api annotations: description:接口成功率低于85%\n(suc:{{$labels.suc}} total:{{$labels.total}})\n成功率:{{printf\%.0f\$value}}%- alert: 站点成功率低于85% expr: avg by(site_path,suc,total)(site_rate)85for: 0m labels: severity: 一般 alert: api annotations: description:站点成功率低于85%\n(suc:{{$labels.suc}} total:{{$labels.total}})\n成功率:{{printf\%.0f\$value}}%七、alertmanagerglobal: resolve_timeout: 1m smtp_from:xxxxxxxxqq.comsmtp_smarthost:smtp.qq.com:465smtp_auth_username:xxxxxxqqq.comsmtp_auth_password:XXXXXXsmtp_require_tls:falsesmtp_hello:qq.comtemplates: -/etc/alertmanager/email.tmpl#邮件模板文件容器内的路径route: receiver:ding2wechat#按alertname等进行分组group_by:[alertname]#周期内有同一组的报警到来则一起发送group_wait: 1m#报警发送周期group_interval: 10m#与上次相同的报警延迟30m才发送这里应该是(1030)m左右repeat_interval: 30m routes:#可以使用match_re正则匹配- match: severity: 严重#匹配上则发给下面的nameding2wechatreceiver: ding2wechat - match: alert: api#匹配上则发给下面的nameapi_ding2wechatreceiver: api_ding2wechat repeat_interval: 24h group_interval: 1m receivers:##企微机器人2,通过prometheus-webhook-dingtalk后再通过ding2wechat- name:ding2wechatwebhook_configs: - url:http://172.xxx.xxx.xxx:8060/dingtalk/ding2wechat/sendsend_resolved:true- name:api_ding2wechatwebhook_configs:#不需要发送恢复告警- url:http://172.xxx.xxx.xxx:8060/dingtalk/ding2wechat/sendsend_resolved:false- name:emailemail_configs: - to:xxxxxxxxqq.comhtml:{{ template email.jwolf.html . }}send_resolved:true#抑制规则如果是critical时抑制warning警报inhibit_rules: - source_match: severity:criticaltarget_match: severity:warningequal:[alertname,instance]

相关新闻