nagios配置文件详解

主配置文件 nagios.cfg 需要更改的地方：
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
interval_length=1 ; 间隔时间基准由 60s 改为 1s
command_check_interval=10s ; 命令检查时间间隔，-1 表示尽可能频繁的进行检查
date_format=iso8601 ; 日期格式
objects/contacts.cfg 用来定义联系人：

复制代码代码如下:

	define contact {

	 contact_name sa

	 alias System Administrator

	 service_notification_period 24×7

	 host_notification_period 24×7

	 service_notification_options w,u,c,r

	 host_notification_options d,u,r

	 service_notification_commands notify-service-by-email

	 host_notification_commands notify-host-by-email

	 email

	 }

定义联系人组

复制代码代码如下:

	define contactgroup {

	 contactgroup_name admins

	 alias Administrator Group

	 members sa    ; 添加其它联系人用 “,” 分隔

	 }

主机监控的配置

复制代码代码如下:

	define host {

	 host_name host_name    ; 简短的主机名称。

	 alias alias    ; 别名，可以更详细的说明主机。

	 address address    ; IP 地址，也可以写主机名。如果不定义这个值， nagio 将会用 host_name 去寻找主机。

	 parents host_names    ; 上一节点的名称，也就是指从 nagios 服务器到被监控主机之间经过的节点，可以是路由器、交换机、主机等等。

	 hostgroups hostgroup_names    ; 简短的主机组名称。

	 check_command command_name    ; 检查命令的简短名称，如果此项留空， nagios 将不会去判断主机是否 alive 。

	 max_check_attempts 整数    ; 当检查命令的返回值不是 “OK” 时，重试的次数。

	 check_interval 数字    ; 循环检查的间隔时间。

	 active_checks_enabled [0/1]    ; 是否启用 “active_checks”

	 passive_checks_enabled [0/1]    ; 是否启用 “passive_checks” ，及“被动检查”

	 check_period timeperiod_name    ; 检测时间段简短名称，这只是个名称，具体的时间段要写在其他的配置文件中。

	 obsess_over_host [0/1]    ; 是否启用主机操作系统探测。

	 check_freshness [0/1]    ; 是否启用 freshness 检查。freshness 检查是对于启用被动检查模式的主机而言的，其作用是定期检查主机报告的状态信息，如果该状态信息已经过期，freshness 将会强制做主机检查。

	 freshness_threshold 数字     ; fressness 的临界值，单位为秒。 如果定义为 “0″ ，则为自动定义。

	 event_handler command_name    ; 当主机发生状态改变时，采用的处理命令的简短的名字（可以在 commands.cfg 中对其定义）

	 event_handler_enabled [0/1]    ; 是否启用 event_handler

	 low_flap_threshold 数字    ; 抖动的下限值。抖动，即在一段时间内，主机（或服务）的状态值频繁的发生变化。

	 high_flap_threshold 数字   ; 抖动的上限值。

	 flap_detection_enabled [0/1]    ; 是否启用抖动检查。

	 process_perf_data [0/1]    ; 是否启用 processing of performance data

	 retain_status_information [0/1]    ; 程序重启时，是否保持主机状态相关的信息。

	 retain_nonstatus_information [0/1]    ; 程序重启时，是否保持主机状态无关的信息。

	 contact_groups contact_groups    ; 联系人组，在此组中的联系人都会收到主机的提醒信息。

	 notification_interval 整数    ; 重复发送提醒信息的最短间隔时间。默认间隔时间是 “60″ 分钟。如果这个值设置为 “0″ ，将不会发送重复提醒。

	 notification_period timeperiod_name   ; 发送提醒的时间段。非常重要的主机（服务）定义为 24×7 ，一般的主机（服务）就定义为上班时间。如果不在定义的时间段内，无论发生什么问题，都不会发送提醒。

	 notification_options [d,u,r,f]    ; 发送提醒包括的情况： d = 状态为 DOWN , u = 状态为 UNREACHABLE , r = 状态恢复为 OK , f = flapping

	 notifications_enabled [0/1]    ; 是否开启提醒功能。”1″ 为开启，”0″ 为禁用。一般，这个选项会在主配置文件 (nagios.cfg) 中定义，效果相同。

	 stalking_options [o,d,u]    ; 持续状态检测参数，o = 持续的 UP 状态 , d = 持续的 DOWN 状态 , u = 持续的 UNREACHABLE 状态

	 }

服务监控的配置

复制代码代码如下:

	define service {

	 host_name host_name

	 service_description service_description

	 servicegroups servicegroup_names

	 is_volatile [0/1]

	 check_command command_name

	 max_check_attempts

	 normal_check_interval

	 retry_check_interval

	 active_checks_enabled [0/1]

	 passive_checks_enabled [0/1]

	 check_period timeperiod_name

	 parallelize_check [0/1]

	 obsess_over_service [0/1]

	 check_freshness [0/1]

	 freshness_threshold

	 event_handler command_name

	 event_handler_enabled [0/1]

	 low_flap_threshold

	 high_flap_threshold

	 flap_detection_enabled [0/1]

	 process_perf_data [0/1]

	 retain_status_information [0/1]

	 retain_nonstatus_information [0/1]

	 notification_interval

	 notification_period timeperiod_name n

	 otification_options [w,u,c,r,f]

	 notifications_enabled [0/1]

	 contact_groups contact_groups

	 stalking_options [o,w,u,c]

	 }

服务监控的配置和主机监控的配置较为相似。
间隔时间的计算方法为：
normal_check_interval x interval_length 秒
retry_check_interval x interval_length 秒
notification_interval x interval_length 秒
主机监控配置

复制代码代码如下:

	define host {

	 host_name web1

	 alias web1

	 address 192.168.0.101

	 contact_groups admins

	 check_command check-host-alive

	 max_check_attempts 5

	 notification_interval 0

	 notification_period 24×7

	 notification_options d,u,r

	 }

对主机 web1 进行 24×7 的监控，默认会每 10 秒检查一次状态，累计五次失败就发送提醒，并且不再重复发送提醒。
服务监控配置

复制代码代码如下:

	define service {

	 host_name web1

	 service_description check_http

	 check_period 24×7

	 max_check_attempts 3

	 normal_check_interval 30

	 contact_groups admins

	 retry_check_interval 15

	 notification_interval 3600

	 notification_period 24×7

	 notification_options w,u,c,r

	 check_command check_http

	 }

配置解释： 24×7 监控 web1 主机上的 HTTP 服务，检查间隔为 30 秒，检查失败后每 15 秒再进行一次检查，累计三次失败就认定是故障并发送提醒。
联系人组是 admins 。提醒后恢复到 30 秒一次的 normal_check_interval 检查。如果服务仍然没有被恢复，每个小时发送一次提醒。
如果要检测其他服务，例如，要检查 ssh 服务是否开启，更改如下两行：
service_description check_ssh
check_command check_ssh
为方便管理，对配置文件的分布做了如下修改：

复制代码代码如下:

	 nagios.cfg 中增加：

	 cfg_dir=/usr/local/nagios/etc/hosts

	 cfg_dir=/usr/local/nagios/etc/services

在 hosts 目录中，为不同类型的主机创建了配置文件，如： app.cfg cache.cfg mysql.cfg web.cfg
并创建了 hostgroup.cfg 文件对主机进行分组，如：

复制代码代码如下:

	define hostgroup {

	 hostgroup_name app-hosts

	 alias APP Hosts

	 members app1,app2

	 }

在 services 目录中创建了各种服务的配置文件，如： disk.cfg http.cfg load.cfg mysql.cfg
并创建了 servicegroup.cfg 文件对服务进行分组，如：

复制代码代码如下:

	define servicegroup {

	 servicegroup_name disk

	 alias DISK

	 members cache1,check_disk,cache2,check_disk

	 }

(责任编辑：IT)

搜索

热门标签:

nagios配置文件详解