nagios监控heartbeat配置教程
时间:2015-02-08 23:16 来源:linux.it.net.cn 作者:IT
先来看几个命令,在heartbeat安装后会自动加上,在监控脚本中会用到以下命令:
复制代码代码示例:
[root@usvr-210 libexec]# which cl_status
/usr/bin/cl_status
[root@usvr-210 libexec]# cl_status listnodes #列出当前heartbeat集群中的节点
192.168.3.1
usvr-211
usvr-210
[root@usvr-210 libexec]# cl_status nodestatus usvr-211 #列出节点的状态
active
[root@usvr-210 libexec]# cl_status nodestatus 192.168.3.1 #列出节点的状态
ping
check_heartbeat.sh原理就是列出集群中所有节点,并监测所有节点的状态是否正常,实验的节点状态为ping和active。
当active+ping的个数为0时critical
当active+ping的个数小于节点总个数时为warn
当active+ping的个数等于节点总个数时为ok
操作实例:
复制代码代码示例:
[root@usvr-210 libexec]# cat check_heartbeat.sh
#!/bin/bash
# Author: Emmanuel Bretelle
# Date: 12/03/2010
# Description: Retrieve Linux HA cluster status using cl_status
# Based on http://www.randombugs.com/linux/howto-monitor-linux-heartbeat-snmp.html
#
# Autor: Stanila Constantin Adrian
# Date: 20/03/2009
# Description: Check the number of active heartbeats
# http://www.randombugs.com
# Get program path
REVISION=1.3
PROGNAME=`/bin/basename $0`
PROGPATH=`echo $0 | /bin/sed -e 's,[\\/][^\\/][^\\/]*$,,'`
NODE_NAME=`uname -n`
CL_ST='/usr/bin/cl_status'
#nagios error codes
#. $PROGPATH/utils.sh
OK=0
WARNING=1
CRITICAL=2
UNKNOWN=3
usage () {
echo "\
Nagios plugin to heartbeat.
Usage:
$PROGNAME
$PROGNAME [--help | -h]
$PROGNAME [--version | -v]
Options:
--help -l Print this help information
--version -v Print version of plugin
"
}
help () {
print_revision $PROGNAME $REVISION
echo; usage; echo; support
}
while test -n "$1"
do
case "$1" in
--help | -h)
help
exit $STATE_OK;;
--version | -v)
print_revision $PROGNAME $REVISION
exit $STATE_OK;;
# -H)
# shift
# HOST=$1;;
# -C)
# shift
# COMMUNITY=$1;;
*)
echo "Heartbeat UNKNOWN: Wrong command usage"; exit $UNKNOWN;;
esac
shift
done
$CL_ST hbstatus > /dev/null
res=$?
if [ $res -ne 0 ]
then
echo "Heartbeat CRITICAL: Heartbeat is not running on this node"
exit $CRITICAL
fi
declare -i I=0
declare -i A=0
NODES=`$CL_ST listnodes`
for node in $NODES
do
status=`$CL_ST nodestatus $node`
let I=$I+1
# if [ $status == "active" ] 默认情况下检测active状态的个数,但是ping状态也为正常状态,因此改成如下条件。
if [ $status == "active" -o $status == "ping" ]
then
let A=$A+1
fi
done
if [ $A -eq 0 ]
then
echo "Heartbeat CRITICAL: $A/$I"
exit $CRITICAL
elif [ $A -ne $I ]
then
echo "Heartbeat WARNING: $A/$I"
exit $WARNING
else
echo "Heartbeat OK: $A/$I"
exit $OK
fi
在nagios客户端,lvs集群usvr-210,usvr-211,通过nagios服务器端的check_nrpe来获取监控信息。
一、naigos客户端
1,先将脚本复制到nagios命令目录下并修改相应权限
cp check_heartbeat.sh /usr/local/nagios/libexec/
chmod a+x check_heartbeat.sh
chown nagios.nagios check_heartbeat.sh
2,在naigos客户端的配置文件中加入监控命令。
vim /usr/local/nagios/etc/nrpe.cfg
command[check_heartbeat]=/usr/local/nagios/libexec/check_heartbeat.sh
3,重新载入配置文件。
service xinetd reload
二、nagios服务端
1,加入相关监控服务
define service {
use local-service
service_description heartbeat-lvs-master
check_command check_nrpe!check_heartbeat
service_groups heartbeat_services
host_name usvr-210
check_interval 5
notifications_enabled 1
notification_interval 30
contact_groups admins
}
define service {
use local-service
service_description heartbeat-lvs-slave
check_command check_nrpe!check_heartbeat
service_groups heartbeat_services
host_name usvr-211
check_interval 5
notifications_enabled 1
notification_interval 30
contact_groups admins
}
2,检查并载入配置文件
nagioscheck
service nagios reload
监控如下:
heartbeat监控完成。
参考链接:
http://wiki.debuntu.org/wiki/Linux_HA_Heartbeat/Monitoring_with_Nagios
(责任编辑:IT)
先来看几个命令,在heartbeat安装后会自动加上,在监控脚本中会用到以下命令:
复制代码代码示例:
[root@usvr-210 libexec]# which cl_status
/usr/bin/cl_status [root@usvr-210 libexec]# cl_status listnodes #列出当前heartbeat集群中的节点 192.168.3.1 usvr-211 usvr-210 [root@usvr-210 libexec]# cl_status nodestatus usvr-211 #列出节点的状态 active [root@usvr-210 libexec]# cl_status nodestatus 192.168.3.1 #列出节点的状态 ping
check_heartbeat.sh原理就是列出集群中所有节点,并监测所有节点的状态是否正常,实验的节点状态为ping和active。
操作实例:
复制代码代码示例:
[root@usvr-210 libexec]# cat check_heartbeat.sh
#!/bin/bash # Author: Emmanuel Bretelle # Date: 12/03/2010 # Description: Retrieve Linux HA cluster status using cl_status # Based on http://www.randombugs.com/linux/howto-monitor-linux-heartbeat-snmp.html # # Autor: Stanila Constantin Adrian # Date: 20/03/2009 # Description: Check the number of active heartbeats # http://www.randombugs.com # Get program path REVISION=1.3 PROGNAME=`/bin/basename $0` PROGPATH=`echo $0 | /bin/sed -e 's,[\\/][^\\/][^\\/]*$,,'` NODE_NAME=`uname -n` CL_ST='/usr/bin/cl_status' #nagios error codes #. $PROGPATH/utils.sh OK=0 WARNING=1 CRITICAL=2 UNKNOWN=3 usage () { echo "\ Nagios plugin to heartbeat. Usage: $PROGNAME $PROGNAME [--help | -h] $PROGNAME [--version | -v] Options: --help -l Print this help information --version -v Print version of plugin " } help () { print_revision $PROGNAME $REVISION echo; usage; echo; support } while test -n "$1" do case "$1" in --help | -h) help exit $STATE_OK;; --version | -v) print_revision $PROGNAME $REVISION exit $STATE_OK;; # -H) # shift # HOST=$1;; # -C) # shift # COMMUNITY=$1;; *) echo "Heartbeat UNKNOWN: Wrong command usage"; exit $UNKNOWN;; esac shift done $CL_ST hbstatus > /dev/null res=$? if [ $res -ne 0 ] then echo "Heartbeat CRITICAL: Heartbeat is not running on this node" exit $CRITICAL fi declare -i I=0 declare -i A=0 NODES=`$CL_ST listnodes` for node in $NODES do status=`$CL_ST nodestatus $node` let I=$I+1 # if [ $status == "active" ] 默认情况下检测active状态的个数,但是ping状态也为正常状态,因此改成如下条件。 if [ $status == "active" -o $status == "ping" ] then let A=$A+1 fi done if [ $A -eq 0 ] then echo "Heartbeat CRITICAL: $A/$I" exit $CRITICAL elif [ $A -ne $I ] then echo "Heartbeat WARNING: $A/$I" exit $WARNING else echo "Heartbeat OK: $A/$I" exit $OK fi 在nagios客户端,lvs集群usvr-210,usvr-211,通过nagios服务器端的check_nrpe来获取监控信息。
一、naigos客户端
cp check_heartbeat.sh /usr/local/nagios/libexec/
chmod a+x check_heartbeat.sh chown nagios.nagios check_heartbeat.sh
2,在naigos客户端的配置文件中加入监控命令。
command[check_heartbeat]=/usr/local/nagios/libexec/check_heartbeat.sh
3,重新载入配置文件。
service xinetd reload
二、nagios服务端
define service {
use local-service service_description heartbeat-lvs-master check_command check_nrpe!check_heartbeat service_groups heartbeat_services host_name usvr-210 check_interval 5 notifications_enabled 1 notification_interval 30 contact_groups admins } define service { use local-service service_description heartbeat-lvs-slave check_command check_nrpe!check_heartbeat service_groups heartbeat_services host_name usvr-211 check_interval 5 notifications_enabled 1 notification_interval 30 contact_groups admins }
2,检查并载入配置文件
nagioscheck
service nagios reload
监控如下: heartbeat监控完成。
参考链接: |