/[adm]/puppet/modules/xymon/templates/hobbit-clients.cfg
ViewVC logotype

Contents of /puppet/modules/xymon/templates/hobbit-clients.cfg

Parent Directory Parent Directory | Revision Log Revision Log


Revision 3297 - (show annotations) (download)
Sun Jul 21 22:38:33 2013 UTC (10 years, 8 months ago) by tmb
File size: 21672 byte(s)
add exception for sucuk autobuild cpu usage
1 # hobbit-clients.cfg - configuration file for clients reporting to Xymon
2 #
3 # This file is used by the hobbitd_client module, when it builds the
4 # cpu, disk, files, memory, msgs and procs status messages from the
5 # information reported by clients running on the monitored systems.
6 #
7 # This file must be installed on the Xymon server - client installations
8 # do not need this file.
9 #
10 # The file defines a series of rules:
11 # UP : Changes the "cpu" status when the system has rebooted recently,
12 # or when it has been running for too long.
13 # LOAD : Changes the "cpu" status according to the system load.
14 # CLOCK : Changes the "cpu" status if the client system clock is
15 # not synchronized with the clock of the Xymon server.
16 # DISK : Changes the "disk" status, depending on the amount of space
17 # used of filesystems.
18 # MEMPHYS: Changes the "memory" status, based on the percentage of real
19 # memory used.
20 # MEMACT : Changes the "memory" status, based on the percentage of "actual"
21 # memory used. Note: Not all systems report an "actual" value.
22 # MEMSWAP: Changes the "memory" status, based on the percentage of swap
23 # space used.
24 # PROC : Changes the "procs" status according to which processes were found
25 # in the "ps" listing from the client.
26 # LOG : Changes the "msgs" status according to entries in text-based logfiles.
27 # Note: The "client-local.cfg" file controls which logfiles the client will report.
28 # FILE : Changes the "files" status according to meta-data for files.
29 # Note: The "client-local.cfg" file controls which files the client will report.
30 # DIR : Changes the "files" status according to the size of a directory.
31 # Note: The "client-local.cfg" file controls which directories the client will report.
32 # PORT : Changes the "ports" status according to which tcp ports were found
33 # in the "netstat" listing from the client.
34 # DEFAULT: Set the default values that apply if no other rules match.
35 #
36 # All rules can be qualified so they apply only to certain hosts, or on certain
37 # times of the day (see below).
38 #
39 # Each type of rule takes a number of parameters:
40 # UP bootlimit toolonglimit
41 # The cpu status goes yellow if the system has been up for less than
42 # "bootlimit" time, or longer than "toolonglimit". The time is in
43 # minutes, or you can add h/d/w for hours/days/weeks - eg. "2h" for
44 # two hours, or "4w" for 4 weeks.
45 # Defaults: bootlimit=1h, toolonglimit=-1 (infinite).
46 #
47 # LOAD warnlevel paniclevel
48 # If the system load exceeds "warnlevel" or "paniclevel", the "cpu"
49 # status will go yellow or red, respectively. These are decimal
50 # numbers.
51 # Defaults: warnlevel=5.0, paniclevel=10.0
52 #
53 # CLOCK maximum-offset
54 # If the system clock of the client differs from that of the Xymon
55 # server by more than "maximum-offset" seconds, then the CPU status
56 # column will go yellow. Note that the accuracy of this test is limited,
57 # since it is affected by the time it takes a client status report to
58 # go from the client to the Xymon server and be processed. You should
59 # therefore allow for a few seconds (5-10) of slack when you define
60 # your max. offset.
61 # It is not wise to use this test, unless your servers are synchronized
62 # to a common clock, e.g. through NTP.
63 #
64 # DISK filesystem warnlevel paniclevel
65 # DISK filesystem IGNORE
66 # If the utilization of "filesystem" is reported to exceed "warnlevel"
67 # or "paniclevel", the "disk" status will go yellow or red, respectively.
68 # "warnlevel" and "paniclevel" are either the percentage used, or the
69 # space available as reported by the local "df" command on the host.
70 # For the latter type of check, the "warnlevel" must be followed by the
71 # letter "U", e.g. "1024U".
72 # The special keyword "IGNORE" causes this filesystem to be ignored
73 # completely, i.e. it will not appear in the "disk" status column and
74 # it will not be tracked in a graph. This is useful for e.g. removable
75 # devices, backup-disks and similar hardware.
76 # "filesystem" is the mount-point where the filesystem is mounted, e.g.
77 # "/usr" or "/home". A filesystem-name that begins with "%" is interpreted
78 # as a Perl-compatible regular expression; e.g. "%^/oracle.*/" will match
79 # any filesystem whose mountpoint begins with "/oracle".
80 # Defaults: warnlevel=90%, paniclevel=95%
81 #
82 # MEMPHYS warnlevel paniclevel
83 # MEMACT warnlevel paniclevel
84 # MEMSWAP warnlevel paniclevel
85 # If the memory utilization exceeds the "warnlevel" or "paniclevel", the
86 # "memory" status will change to yellow or red, respectively.
87 # Note: The words "PHYS", "ACT" and "SWAP" are also recognized.
88 # Defaults: MEMPHYS warnlevel=100 paniclevel=101 (i.e. it will never go red)
89 # MEMSWAP warnlevel=50 paniclevel=80
90 # MEMACT warnlevel=90 paniclevel=97
91 #
92 # PROC processname minimumcount maximumcount color [TRACK=id] [TEXT=displaytext]
93 # The "ps" listing sent by the client will be scanned for how many
94 # processes containing "processname" are running, and this is then
95 # matched against the min/max settings defined here. If the running
96 # count is outside the thresholds, the color of the "procs" status
97 # changes to "color".
98 # To check for a process that must NOT be running: Set minimum and
99 # maximum to 0.
100 #
101 # "processname" can be a simple string, in which case this string must
102 # show up in the "ps" listing as a command. The scanner will find
103 # a ps-listing of e.g. "/usr/sbin/cron" if you only specify "processname"
104 # as "cron".
105 # "processname" can also be a Perl-compatiable regular expression, e.g.
106 # "%java.*inst[0123]" can be used to find entries in the ps-listing for
107 # "java -Xmx512m inst2" and "java -Xmx256 inst3". In that case,
108 # "processname" must begin with "%" followed by the reg.expression.
109 # If "processname" contains whitespace (blanks or TAB), you must enclose
110 # the full string in double quotes - including the "%" if you use regular
111 # expression matching. E.g.
112 # PROC "%hobbitd_channel --channel=data.*hobbitd_rrd" 1 1 yellow
113 # or
114 # PROC "java -DCLASSPATH=/opt/java/lib" 2 5
115 #
116 # You can have multiple "PROC" entries for the same host, all of the
117 # checks are merged into the "procs" status and the most severe
118 # check defines the color of the status.
119 #
120 # The TRACK=id option causes the number of processes found to be recorded
121 # in an RRD file, with "id" as part of the filename. This graph will then
122 # appear on the "procs" page as well as on the "trends" page. Note that
123 # "id" must be unique among the processes tracked for each host.
124 #
125 # The TEXT=displaytext option affects how the process appears on the
126 # "procs" status page. By default, the process is listed with the
127 # "processname" as identification, but if this is a regular expression
128 # it may be a bit difficult to understand. You can then use e.g.
129 # "TEXT=Apache" to make these processes appear with the name "Apache"
130 # instead.
131 #
132 # Defaults: mincount=1, maxcount=-1 (unlimited), color="red".
133 # Note: No processes are checked by default.
134 #
135 # Example: Check that "cron" is running:
136 # PROC cron
137 # Example: Check that at least 5 "httpd" processes are running, but
138 # not more than 20:
139 # PROC httpd 5 20
140 #
141 # LOG filename match-pattern [COLOR=color] [IGNORE=ignore-pattern] [TEXT=displaytext]
142 # In the "client-local.cfg" file, you can list any number of files
143 # that the client will collect log data from. These are sent to the
144 # Xymon server together with the other client data, and you can then
145 # choose how to analyze the log data with LOG entries.
146 #
147 # ************ IMPORTANT ***************
148 # To monitor a logfile, you *MUST* configure both client-local.cfg
149 # and hobbit-clients.cfg. If you configure only the client-local.cfg
150 # file, the client will collect the log data and you can view it in
151 # the "client data" display, but it will not affect the color of the
152 # "msgs" status. On the other hand, if you configure only the
153 # hobbit-clients.cfg file, then there will be no log data to inspect,
154 # and you will not see any updates of the "msgs" status either.
155 #
156 # "filename" is a filename or pattern. The set of files reported by
157 # the client is matched against "filename", and if they match then
158 # this LOG entry is processed against the data from a file.
159 #
160 # "match-pattern": The log data is matched against this pattern. If
161 # there is a match, this log file causes a status change to "color".
162 #
163 # "ignore-pattern": The log data that matched "match-pattern" is also
164 # matched against "ignore-pattern". If the data matches the "ignore-pattern",
165 # this line of data does not affect the status color. In other words,
166 # the "ignore-pattern" can be used to refine the strings which cause
167 # a match.
168 # Note: The "ignore-pattern" is optional.
169 #
170 # "color": The color which this match will trigger.
171 # Note: "color" is optional, if omitted then "red" will be used.
172 #
173 # Example: Go yellow if the text "WARNING" shows up in any logfile.
174 # LOG %.* WARNING COLOR=yellow
175 #
176 # Example: Go red if the text "I/O error" or "read error" appears.
177 # LOG %/var/(adm|log)/messages %(I/O|read).error COLOR=red
178 #
179 # FILE filename [color] [things to check] [TRACK]
180 # NB: The files you wish to monitor must be listed in a "file:..."
181 # entry in the client-local.cfg file, in order for the client to
182 # report any data about them.
183 #
184 # "filename" is a filename or pattern. The set of files reported by
185 # the client is matched against "filename", and if they match then
186 # this FILE entry is processed against the data from that file.
187 #
188 # [things to check] can be one or more of the following:
189 # - "NOEXIST" triggers a warning if the file exists. By default,
190 # a warning is triggered for files that have a FILE entry, but
191 # which do not exist.
192 # - "TYPE=type" where "type" is one of "file", "dir", "char", "block",
193 # "fifo", or "socket". Triggers warning if the file is not of the
194 # specified type.
195 # - "OWNERID=owner" and "GROUPID=group" triggers a warning if the owner
196 # or group does not match what is listed here. "owner" and "group" is
197 # specified either with the numeric uid/gid, or the user/group name.
198 # - "MODE=mode" triggers a warning if the file permissions are not
199 # as listed. "mode" is written in the standard octal notation, e.g.
200 # "644" for the rw-r--r-- permissions.
201 # - "SIZE<max.size" and "SIZE>min.size" triggers a warning it the file
202 # size is greater than "max.size" or less than "min.size", respectively.
203 # You can append "K" (KB), "M" (MB), "G" (GB) or "T" (TB) to the size.
204 # If there is no such modifier, KB is assumed.
205 # E.g. to warn if a file grows larger than 1MB (1024 KB): "SIZE<1M".
206 # - "SIZE=size" triggers a warning it the file size is not what is listed.
207 # - "MTIME>min.mtime" and "MTIME<max.mtime" checks how long ago the file
208 # was last modified (in seconds). E.g. to check if a file was updated
209 # within the past 10 minutes (600 seconds): "MTIME<600". Or to check
210 # that a file has NOT been updated in the past 24 hours: "MTIME>86400".
211 # - "MTIME=timestamp" checks if a file was last modified at "timestamp".
212 # "timestamp" is a unix epoch time (seconds since midnight Jan 1 1970 UTC).
213 # - "CTIME>min.ctime", "CTIME<max.ctime", "CTIME=timestamp" acts as the
214 # mtime checks, but for the ctime timestamp (when the files' directory
215 # entry was last changed, eg. by chown, chgrp or chmod).
216 # - "MD5=md5sum", "SHA1=sha1sum", "RMD160=rmd160sum" trigger a warning
217 # if the file checksum using the MD5, SHA1 or RMD160 message digest
218 # algorithms do not match the one configured here. Note: The "file"
219 # entry in the client-local.cfg file must specify which algorithm to use.
220 #
221 # "TRACK" causes the size of this file to be tracked in an RRD file, and
222 # shown on the graph on the "files" display.
223 #
224 # Example: Check that the /var/log/messages file is not empty and was updated
225 # within the past 10 minutes, and go yellow if either fails:
226 # FILE /var/log/messages SIZE>0 MTIME<600 yellow
227 #
228 # Example: Check the timestamp, size and SHA-1 hash of the /bin/sh program:
229 # FILE /bin/sh MTIME=1128514608 SIZE=645140 SHA1=5bd81afecf0eb93849a2fd9df54e8bcbe3fefd72
230 #
231 # DIR directory [color] [SIZE<maxsize] [SIZE>minsize] [TRACK]
232 # NB: The directories you wish to monitor must be listed in a "dir:..."
233 # entry in the client-local.cfg file, in order for the client to
234 # report any data about them.
235 #
236 # "directory" is a filename or pattern. The set of directories reported by
237 # the client is matched against "directory", and if they match then
238 # this DIR entry is processed against the data for that directory.
239 #
240 # "SIZE<maxsize" and "SIZE>minsize" defines the size limits that the
241 # directory must stay within. If it goes outside these limits, a warning
242 # will trigger. Note the Xymon uses the raw number reported by the
243 # local "du" command on the client. This is commonly KB, but it may be
244 # disk blocks which are often 512 bytes.
245 #
246 # "TRACK" causes the size of this directory to be tracked in an RRD file,
247 # and shown on the graph on the "files" display.
248 #
249 # PORT [LOCAL=addr] [EXLOCAL=addr] [REMOTE=addr] [EXREMOTE=addr] [STATE=state] [EXSTATE=state] [MIN=mincount] [MAX=maxcount] [COLOR=color] [TRACK=id] [TEXT=displaytext]
250 # The "netstat" listing sent by the client will be scanned for how many
251 # sockets match the criteria listed.
252 # "addr" is a (partial) address specification in the format used on
253 # the output from netstat. This is typically "10.0.0.1:80" for the IP
254 # 10.0.0.1, port 80. Or "*:80" for any local address, port 80.
255 # NB: The Xymon clients normally report only the numeric data for
256 # IP-adresses and port-numbers, so you must specify the port
257 # number (e.g. "80") instead of the service name ("www").
258 # "state" causes only the sockets in the specified state to be included;
259 # it is usually LISTEN or ESTABLISHED.
260 # The socket count is then matched against the min/max settings defined
261 # here. If the count is outside the thresholds, the color of the "ports"
262 # status changes to "color".
263 # To check for a socket that must NOT exist: Set minimum and
264 # maximum to 0.
265 #
266 # "addr" and "state" can be a simple strings, in which case these string must
267 # show up in the "netstat" at the appropriate column.
268 # "addr" and "state" can also be a Perl-compatiable regular expression, e.g.
269 # "LOCAL=%(:80|:443)" can be used to find entries in the netstat local port for
270 # both http (port 80) and https (port 443). In that case, portname or state must
271 # begin with "%" followed by the reg.expression.
272 #
273 # The TRACK=id option causes the number of sockets found to be recorded
274 # in an RRD file, with "id" as part of the filename. This graph will then
275 # appear on the "ports" page as well as on the "trends" page. Note that
276 # "id" must be unique among the ports tracked for each host.
277 #
278 # The TEXT=displaytext option affects how the port appears on the
279 # "ports" status page. By default, the port is listed with the
280 # local/remote/state rules as identification, but this may be somewhat
281 # difficult to understand. You can then use e.g. "TEXT=Secure Shell" to make
282 # these ports appear with the name "Secure Shell" instead.
283 #
284 # Defaults: state="LISTEN", mincount=1, maxcount=-1 (unlimited), color="red".
285 # Note: No ports are checked by default.
286 #
287 # Example: Check that there is someone listening on the https port:
288 # PORT "LOCAL=%([.:]443)$" state=LISTEN TEXT=https
289 #
290 # Example: Check that at least 5 "ssh" connections are established, but
291 # not more than 10; warn but do not error; graph the connection count:
292 # PORT "LOCAL=%([.:]22)$" state=ESTABLISHED min=5 max=20 color=yellow TRACK=ssh "TEXT=SSH logins"
293 #
294 # Example: Check that ONLY ports 22, 80 and 443 are open for incoming connections:
295 # PORT STATE=LISTEN LOCAL=%0.0.0.0[.:].* EXLOCAL=%[.:](22|80|443)$ MAX=0 "TEXT=Bad listeners"
296 #
297 #
298 # To apply rules to specific hosts, you can use the "HOST=", "EXHOST=", "PAGE="
299 # "EXPAGE=", "CLASS=" or "EXCLASS=" qualifiers. (These act just as in the
300 # hobbit-alerts.cfg file).
301 #
302 # Hostnames are either a comma-separated list of hostnames (from the bb-hosts file),
303 # "*" to indicate "all hosts", or a Perl-compatible regular expression.
304 # E.g. "HOST=dns.foo.com,www.foo.com" identifies two specific hosts;
305 # "HOST=%www.*.foo.com EXHOST=www-test.foo.com" matches all hosts with a name
306 # beginning with "www", except the "www-test" host.
307 # "PAGE" and "EXPAGE" match the hostnames against the page on where they are
308 # located in the bb-hosts file, via the bb-hosts' page/subpage/subparent
309 # directives. This can be convenient to pick out all hosts on a specific page.
310 #
311 # Rules can be dependant on time-of-day, using the standard Xymon syntax
312 # (the bb-hosts(5) about the NKTIME parameter). E.g. "TIME=W:0800:2200"
313 # applied to a rule will make this rule active only on week-days between
314 # 8AM and 10PM.
315 #
316 # You can also associate a GROUP id with a rule. The group-id is passed to
317 # the alert module, which can then use it to control who gets an alert when
318 # a failure occurs. E.g. the following associates the "httpd" process check
319 # with the "web" group, and the "sshd" check with the "admins" group:
320 # PROC httpd 5 GROUP=web
321 # PROC sshd 1 GROUP=admins
322 # In the hobbit-alerts.cfg file, you could then have rules like
323 # GROUP=web
324 # MAIL webmaster@foo.com
325 # GROUP=admins
326 # MAIL root@foo.com
327 #
328 # Qualifiers must be placed after each rule, e.g.
329 # LOAD 8.0 12.0 HOST=db.foo.com TIME=*:0800:1600
330 #
331 # If you have multiple rules that you want to apply the same qualifiers to,
332 # you can write the qualifiers *only* on one line, followed by the rules. E.g.
333 # HOST=%db.*.foo.com TIME=W:0800:1600
334 # LOAD 8.0 12.0
335 # DISK /db 98 100
336 # PROC mysqld 1
337 # will apply the three rules to all of the "db" hosts on week-days between 8AM
338 # and 4PM. This can be combined with per-rule qualifiers, in which case the
339 # per-rule qualifier overrides the general qualifier; e.g.
340 # HOST=%.*.foo.com
341 # LOAD 7.0 12.0 HOST=bax.foo.com
342 # LOAD 3.0 8.0
343 # will result in the load-limits being 7.0/12.0 for the "bax.foo.com" host,
344 # and 3.0/8.0 for all other foo.com hosts.
345 #
346 # The special DEFAULT section can modify the built-in defaults - this must
347 # be placed at the end of the file.
348
349 HOST=rabbit.<%= domain %>
350 DISK %.*stage2$ IGNORE
351
352 # jonund has 24 cores and we try and utilise it as much as possible
353 # la of up to 1.5*cores is probably not problematic
354 HOST=jonund.<%= domain %>
355 LOAD 36.0 48.0
356
357 # ecosse has 24 cores, is a builder, and we try to use them all
358 HOST=ecosse.<%= domain %>
359 LOAD 36.0 48.0
360
361 # rabbit has 12 cores and mksquashfs uses all of them
362 HOST=rabbit.<%= domain %>
363 LOAD 18.0 24.0
364
365 # sucuk has 12 cores and autobuilder uses all of them
366 HOST=sucuk.<%= domain %>
367 LOAD 18.0 24.0
368
369 DEFAULT
370 # These are the built-in defaults.
371 UP 1h
372 LOAD 5.0 10.0
373 DISK %^/mnt/cdrom 101 101
374 DISK * 90 95
375 MEMPHYS 100 101
376 MEMSWAP 50 80
377 MEMACT 90 97
378 CLOCK 60
379 FILE /var/lib/puppet/state/state.yaml yellow mtime<5400
380 PORT state=LISTEN "LOCAL=%([.:]22)$" MIN=1 TEXT=ssh
381 PROC puppetd 0 3 red
382 # 10 , just in case something goes wrong
383 PROC crond 1 10 red

  ViewVC Help
Powered by ViewVC 1.1.30