nagios监控heartbeat配置教程

时间:2015-02-08 23:16 来源:linux.it.net.cn 作者:IT

先来看几个命令，在heartbeat安装后会自动加上，在监控脚本中会用到以下命令：

复制代码代码示例:

	[root@usvr-210 libexec]# which cl_status  

	/usr/bin/cl_status  

	[root@usvr-210 libexec]# cl_status listnodes   #列出当前heartbeat集群中的节点  

	192.168.3.1  

	usvr-211  

	usvr-210  

	[root@usvr-210 libexec]# cl_status nodestatus usvr-211  #列出节点的状态  

	active  

	[root@usvr-210 libexec]# cl_status nodestatus 192.168.3.1  #列出节点的状态  

	ping

check_heartbeat.sh原理就是列出集群中所有节点，并监测所有节点的状态是否正常，实验的节点状态为ping和active。
当active+ping的个数为0时critical
当active+ping的个数小于节点总个数时为warn
当active+ping的个数等于节点总个数时为ok

操作实例：

复制代码代码示例:

	[root@usvr-210 libexec]# cat check_heartbeat.sh   

	#!/bin/bash  

	# Author: Emmanuel Bretelle  

	# Date: 12/03/2010  

	# Description: Retrieve Linux HA cluster status using cl_status  

	# Based on http://www.randombugs.com/linux/howto-monitor-linux-heartbeat-snmp.html   

	#  

	# Autor: Stanila Constantin Adrian  

	# Date: 20/03/2009  

	# Description: Check the number of active heartbeats  

	# http://www.randombugs.com  

	# Get program path  

	REVISION=1.3  

	PROGNAME=`/bin/basename $0`  

	PROGPATH=`echo $0 | /bin/sed -e 's,[\\/][^\\/][^\\/]*$,,'`  

	NODE_NAME=`uname -n`  

	CL_ST='/usr/bin/cl_status'  

	#nagios error codes  

	#. $PROGPATH/utils.sh   

	OK=0  

	WARNING=1  

	CRITICAL=2  

	UNKNOWN=3  

	usage () {  

	    echo "\  

	Nagios plugin to heartbeat.  

	Usage:  

	  $PROGNAME   

	  $PROGNAME [--help | -h]  

	  $PROGNAME [--version | -v]  

	Options:  

	  --help -l Print this help information  

	  --version -v  Print version of plugin  

	"  

	}  

	help () {  

	    print_revision $PROGNAME $REVISION  

	    echo; usage; echo; support  

	}  

	while test -n "$1"  

	do  

	  case "$1" in  

	    --help | -h)  

	      help  

	      exit $STATE_OK;;  

	    --version | -v)  

	      print_revision $PROGNAME $REVISION  

	      exit $STATE_OK;;  

	#    -H)  

	#      shift  

	#      HOST=$1;;  

	#    -C)  

	#      shift  

	#      COMMUNITY=$1;;  

	    *)  

	      echo "Heartbeat UNKNOWN: Wrong command usage"; exit $UNKNOWN;;  

	  esac  

	  shift  

	done  

	$CL_ST hbstatus > /dev/null  

	res=$?  

	if [ $res -ne 0 ]  

	then  

	  echo "Heartbeat CRITICAL: Heartbeat is not running on this node"  

	  exit $CRITICAL  

	fi  

	declare -i I=0  

	declare -i A=0  

	NODES=`$CL_ST listnodes`  

	for node in $NODES  

	do  

	  status=`$CL_ST nodestatus $node`  

	  let I=$I+1  

	#  if [ $status == "active" ] 默认情况下检测active状态的个数，但是ping状态也为正常状态，因此改成如下条件。  

	  if [ $status == "active" -o $status == "ping" ]  

	  then  

	    let A=$A+1  

	  fi  

	done

	if [ $A -eq 0 ]  

	then  

	  echo "Heartbeat CRITICAL: $A/$I"  

	  exit $CRITICAL  

	elif [ $A -ne $I ]  

	then  

	  echo "Heartbeat WARNING: $A/$I"  

	  exit $WARNING  

	else  

	  echo "Heartbeat OK: $A/$I"  

	  exit $OK  

	fi

在nagios客户端，lvs集群usvr-210，usvr-211，通过nagios服务器端的check_nrpe来获取监控信息。

一、naigos客户端
1，先将脚本复制到nagios命令目录下并修改相应权限

cp check_heartbeat.sh /usr/local/nagios/libexec/
chmod a+x check_heartbeat.sh
chown nagios.nagios check_heartbeat.sh

2，在naigos客户端的配置文件中加入监控命令。
vim /usr/local/nagios/etc/nrpe.cfg

command[check_heartbeat]=/usr/local/nagios/libexec/check_heartbeat.sh

3，重新载入配置文件。

service xinetd reload

二、nagios服务端
1，加入相关监控服务

define service {
    use                     local-service
    service_description     heartbeat-lvs-master
    check_command           check_nrpe!check_heartbeat
    service_groups          heartbeat_services
    host_name               usvr-210
    check_interval          5
    notifications_enabled   1
    notification_interval   30
    contact_groups          admins
}
define service {
    use                     local-service
    service_description     heartbeat-lvs-slave
    check_command           check_nrpe!check_heartbeat
    service_groups          heartbeat_services
    host_name               usvr-211
    check_interval          5
    notifications_enabled   1
    notification_interval   30
    contact_groups          admins
}

2，检查并载入配置文件

nagioscheck
service nagios reload

监控如下：

heartbeat监控完成。

参考链接：
http://wiki.debuntu.org/wiki/Linux_HA_Heartbeat/Monitoring_with_Nagios

(责任编辑：IT)