一、简介
1、NRPE介绍
NRPE是Nagios的一个功能扩展,它可在远程Linux/Unix主机上执行插件程序。通过在远程服务器上安装NRPE插件及Nagios插件程序来向Nagios监控平台提供该服务器的本地情况,如CPU负载,内存使用,磁盘使用等。这里将Nagios监控端称为Nagios服务器端,而将远程被监控的主机称为Nagios客户端。
Nagios监控远程主机的方法有多种,其方式包括SNMP,NRPE,SSH,NCSA等。这里介绍其通过NRPE监控远程Linux主机的方式。
NRPE(Nagios Remote Plugin Executor)是用于在远端服务器上运行监测命令的守护进程,它用于让Nagios监控端基于安装的方式触发远端主机上的检测命令,并将检测结果返回给监控端。而其执行的开销远低于基于SSH的检测方式,而且检测过程不需要远程主机上的系统账号信息,其安全性也高于SSH的检测方式。
2、NRPE的工作原理
NRPE有两部分组成
check_nrpe插件:位于监控主机上
nrpe daemon:运行在远程主机上,通常是被监控端agent
注意:nrpe daemon需要Nagios-plugins插件的支持,否则daemon不能做任何监控
详细的介绍NRPE的工作原理
当Nagios需要监控某个远程Linux主机的服务或者资源情况时:
首先:Nagios会运行check_nrpe这个插件,告诉它要检查什么;
其次:check_nrpe插件会连接到远程的NRPE daemon,所用的方式是SSL;
然后:NRPE daemon 会运行相应的Nagios插件来执行检查;
最后:NRPE daemon 将检查的结果返回给check_nrpe 插件,插件将其递交给nagios做处理。
二、被监控端安装Nagios-plugins插件和NRPE
1、添加nagios用户
-
[root@ClientNrpe ~]# useradd -s /sbin/nologin nagios
2、安装nagios-plugins,因为NRPE依赖此插件
-
[root@ClientNrpe ~]
-
-
[root@ClientNrpe ~]
-
[root@ClientNrpe ~]
-
[root@ClientNrpe nagios-plugins-2.0.3]
-
[root@ClientNrpe nagios-plugins-2.0.3]
-
-
3、安装NRPE
-
[root@ClientNrpe ~]
-
[root@ClientNrpe ~]
-
[root@ClientNrpe nrpe-2.15]
-
> --with-nrpe-group=nagios \
-
> --with-nagios-user=nagios \
-
> --with-nagios-group=nagios \
-
> --enable-command-args \
-
> --enable-ssl
-
[root@ClientNrpe nrpe-2.15]
-
[root@ClientNrpe nrpe-2.15]
-
[root@ClientNrpe nrpe-2.15]
-
[root@ClientNrpe nrpe-2.15]
4、配置NRPE
-
[root@ClientNrpe ~]
-
log_facility=daemon
-
pid_file=/var/run/nrpe.pid
-
server_port=5666
-
nrpe_user=nagios
-
nrpe_group=nagios
-
allowed_hosts=192.168.0.105
-
-
dont_blame_nrpe=0
-
allow_bash_command_substitution=0
-
debug=0
-
command_timeout=60
-
connection_timeout=300
-
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
-
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
-
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
-
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
-
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
5、启动NRPE
-
-
[root@ClientNrpe ~]
-
[root@ClientNrpe ~]
-
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 22597/nrpe
-
tcp 0 0 :::5666 :::* LISTEN 22597/nrpe
有两种方式用于管理nrpe服务,nrpe有两种运行模式:
-
-i
-
-d
可以为nrpe编写启动脚本,使得nrpe以standard alone方式运行:
-
[root@ClientNrpe ~]
-
-
-
-
-
NRPE=/usr/local/nagios/bin/nrpe
-
NRPECONF=/usr/local/nagios/etc/nrpe.cfg
-
-
case "$1" in
-
start)
-
echo -n "Starting NRPE daemon..."
-
$NRPE -c $NRPECONF -d
-
echo " done."
-
;;
-
stop)
-
echo -n "Stopping NRPE daemon..."
-
pkill -u nagios nrpe
-
echo " done."
-
;;
-
restart)
-
$0 stop
-
sleep 2
-
$0 start
-
;;
-
*)
-
echo "Usage: $0 start|stop|restart"
-
;;
-
esac
-
exit 0
-
[root@ClientNrpe ~]
-
[root@ClientNrpe ~]
-
[root@ClientNrpe ~]
-
-
[root@ClientNrpe ~]
-
Starting NRPE daemon... done.
-
[root@ClientNrpe ~]
-
Active Internet connections (only servers)
-
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
-
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1031/sshd
-
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1108/master
-
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 22597/nrpe
-
tcp 0 0 :::22 :::* LISTEN 1031/sshd
-
tcp 0 0 ::1:25 :::* LISTEN 1108/master
-
tcp 0 0 :::5666 :::* LISTEN 22597/nrpe
三、监控端安装NRPE
1、安装NRPE
-
[root@Nagios ~]
-
[root@Nagios ~]
-
[root@Nagios nrpe-2.15]
-
> --with-nrpe-user=nagios \
-
> --with-nrpe-group=nagios \
-
> --with-nagios-user=nagios \
-
> --with-nagios-group=nagios \
-
> --enable-command-args \
-
> --enable-ssl
-
[root@Nagios nrpe-2.15]
-
[root@Nagios nrpe-2.15]
-
-
-
[root@Nagios ~]
-
[root@Nagios libexec]
-
-rwxrwxr-x. 1 nagios nagios 76769 9月 28 08:07 check_nrpe
2、check_nrpe的用法
-
[root@Nagios libexec]
-
-
NRPE Plugin for Nagios
-
Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org)
-
Version: 2.15
-
Last Modified: 09-06-2013
-
License: GPL v2 with exemptions (-l for more info)
-
SSL/TLS Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required
-
-
Usage: check_nrpe -H <host> [ -b <bindaddr> ] [-4] [-6] [-n] [-u] [-p <port>] [-t <timeout>] [-c <command>] [-a <arglist...>]
-
-
Options:
-
-n = Do no use SSL
-
-u = Make socket timeouts return an UNKNOWN state instead of CRITICAL
-
<host> = The address of the host running the NRPE daemon
-
<bindaddr> = bind to local address
-
-4 = user ipv4 only
-
-6 = user ipv6 only
-
[port] = The port on which the daemon is running (default=5666)
-
[timeout] = Number of seconds before connection times out (default=10)
-
[command] = The name of the command that the remote daemon should run
-
[arglist] = Optional arguments that should be passed to the command. Multiple
-
arguments should be separated by a space. If provided, this must be
-
the last option supplied on the command line.
-
-
Note:
-
This plugin requires that you have the NRPE daemon running on the remote host.
-
You must also have configured the daemon to associate a specific plugin command
-
with the [command] option you are specifying here. Upon receipt of the
-
[command] argument, the NRPE daemon will run the appropriate plugin command and
-
send the plugin output and return code back to *this* plugin. This allows you
-
to execute plugins on remote hosts and 'fake' the results to make Nagios think
-
the plugin is being run locally.
通过NRPE监控远程Linux主机要使用chech_nrpe插件进行,其语法格式如下:
-
check_nrpe -H <host> [-n] [-u] [-p <port>] [-t <timeout>] [-c <command>] [-a <arglist...>]
-
-
[root@Nagios libexec]
-
NRPE v2.15
3、定义命令
-
[root@Nagios ~]
-
[root@Nagios objects]
-
-
define command{
-
command_name check_nrpe
-
command_line $USER1$/check_nrpe -H "$HOSTADDRESS$" -c "$ARG1$"
-
}
4、定义服务
-
[root@Nagios objects]
-
[root@Nagios objects]
-
define host{
-
use linux-server
-
host_name linhost
-
alias My Linux Server
-
address 192.168.0.81
-
}
-
define service{
-
use generic-service
-
host_name linhost
-
service_description CHECK USER
-
check_command check_nrpe!check_users
-
}
-
define service{
-
use generic-service
-
host_name linhost
-
service_description Load
-
check_command check_nrpe!check_load
-
}
-
define service{
-
use generic-service
-
host_name linhost
-
service_description SDA1
-
check_command check_nrpe!check_hda1
-
}
-
define service{
-
use generic-service
-
host_name linhost
-
service_description Zombie
-
check_command check_nrpe!check_zombie_procs
-
}
-
define service{
-
use generic-service
-
host_name linhost
-
service_description Total procs
-
check_command check_nrpe!check_total_procs
-
}
这里重点说下,Nagios服务端定义服务的命令完全是根据被监控端NRPE中内置的监控命令,如下图所示
5、启动所定义的命令和服务
-
[root@Nagios ~]
-
-
cfg_file=/usr/local/nagios/etc/objects/linhost.cfg
6、配置文件语法检查
-
[root@Nagios ~]
-
-
Nagios Core 4.0.7
-
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
-
Copyright (c) 1999-2009 Ethan Galstad
-
Last Modified: 06-03-2014
-
License: GPL
-
-
Website: http://www.nagios.org
-
Reading configuration data...
-
Read main config file okay...
-
Read object config files okay...
-
-
Running pre-flight check on configuration data...
-
-
Checking objects...
-
Checked 20 services.
-
Checked 3 hosts.
-
Checked 2 host groups.
-
Checked 0 service groups.
-
Checked 1 contacts.
-
Checked 1 contact groups.
-
Checked 26 commands.
-
Checked 5 time periods.
-
Checked 0 host escalations.
-
Checked 0 service escalations.
-
Checking for circular paths...
-
Checked 3 hosts
-
Checked 0 service dependencies
-
Checked 0 host dependencies
-
Checked 5 timeperiods
-
Checking global event handlers...
-
Checking obsessive compulsive processor commands...
-
Checking misc settings...
-
-
Total Warnings: 0
-
Total Errors: 0
-
-
Things look okay - No serious problems were detected during the pre-flight check
-
Object precache file created:
-
/usr/local/nagios/var/objects.precache
7、重新启动nagios服务
-
[root@Nagios ~]
-
Running configuration check...
-
Stopping nagios: done.
-
Starting nagios: done.
8、打开Nagios web监控页面
1)首先点击【Hosts】查看监控主机状态是否为UP
2)其次点击【Services】查看各监控服务的状态是否为OK
注意:在监控新添加的主机linhost;出现状态为CRITICAL,提示没有那个文件或目录。下面是解决办法
在监控Linhost主机时出现一个CRITICAL的警告,查找解决办法
-
-
[root@ClientNrpe etc]
-
command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda1
-
[root@ClientNrpe etc]
-
-
-
[root@Nagios objects]
-
-
define service{
-
use generic-service
-
host_name linhost
-
service_description SDA1
-
check_command check_nrpe!check_sda1
-
}
-
[root@Nagios ~]
-
Running configuration check...
-
Stopping nagios: done.
-
Starting nagios: done.
-
[root@Nagios ~]
-
停止 httpd: [确定]
-
正在启动 httpd: [确定]
再次点击【services】即为刷新页面,查看如下图所示:
博文出自:http://467754239.blog.51cto.com/4878013/1558897
(责任编辑:IT) |