Argus

Argus

A computer network monitor.

Table of Contents

  1. Overview
  2. Download
  3. Architecture
    1. The Head
    2. The Eyes
  4. The Head
    1. Installation
    2. Building from Source
    3. Configuration
    4. Using the Web Application
    5. Triggers
  5. The Eyes
    1. Installation
    2. Configuration
    3. Probes
    4. Alternative Eyes
  6. Future Work
  7. Resources
  8. Acknowledgements
  9. Appendix: RRD4J Graph Template XML Format


Copyright (C) 2007 XLab d.o.o.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Overview

Argus is a monitor for computer networks. Well, it is primarily used to monitor a set of servers and their various attributes, such as disk space, available memory, CPU and network usage, etc.

Argus was incepted as an in-house project, used for monitoring the servers in the ISL Light Grid. However, due to extensible architecture, it seems useful for monitoring just about any set of machines: the author uses Argus to monitor two mid-size clusters used for testing distributed applications, as well as a bunch of machines he maintains at home, work and several other places all over the world.

You might be inclined to ask: whence the name? Well, in order to answer that, I suggest you take a peek at Greek mythology. This might be a nice place to start. Briefly: Argus Panoptes (that is: Argus the All-seeing) was a hundred-eyed giant in Greek mythology, a servant of Hera, who set him to watch over a certain cow she suspected was more than just a cow. Actually, the cow was really Io, another of Zeus' many girls, and Zeus, horny as he always was, sent Hermes, the messenger to bore Argus with his stories until all his eyes closed and he fell asleep. Hermes finally slew the sleeping giant, freeing Io for his master. Of course, in contrast to the mythological Argus, our Argus never falls asleep, no matter how boring the machines he watches over are!

Actually, there are many Argi in the world of computer software: the hundred eyes parabole fits so nicely to monitoring software. So, there is also this Argus and this one as well. I know: why a third one in spite of all the confusion? Well, as stated above, the parabole fit so nicely... Besides, the name was all over the code and documentation before I found out about those. So, yes, we have a third Argus. Perhaps you should also check the other two if they fit your needs better: they both have to do with monitoring hosts and networks.

More questions? Is it the logo that has you wondering what butterflies have to do with Greek myths? Well, the winged little bugger in the top left corner is actually a Meadow Argus butterfly, Junonia villida, named for the resemblance of patterns on its wings to eyes. I hope this satisfies your inquisitive mind, otherwise you should remember what curiosity did to the cat...

Download

Argus Head

Binaries

Built web application: argus-j-0.9.0.war. Please note that it was built with Java 1.6 compiler.

Sources

Source distribution: argus-j-0.9.0.tar.gz.

Argus Eye

Installable version

Installable Perl scripts: argus-0.9.0.tar.gz.

Sources

Source distribution: argus-0.9.0-src.tar.gz.

Architecture

Argus' architecture was designed with one single goal in mind: to let the sysadmin add new machines to be monitored, and even new probes providing new information with zero configuration on the server side.

Thus, Argus consists of its Head, a central web application that collects reports from monitored machines and presents them to user through a set of web pages, and an arbitrary number of its Eyes, one on each monitored machine.

The Head

The Head stores the last report from each machine in memory and is able to present it to the user either as raw XML or as a nicely formatted web page by using an XSLT transformation of the raw XML. Parts of each report that are marked as data source are also stored in a time series database, so that the sysadmin can view history for last day, month and year.

The Eyes

Each Eye runs on a monitored machine and periodically pushes information on that machine to the Head. This information should conform to the Argus report schema, which is extensible beyond your wildest dreams. The schema specifies only the container elements on top two levels of the pushed document: report element represents a report from a single host and groups a number of probe elements. Contents of each probe are completely left to the user, as long as they constitute a well-formed XML document fragment. The schema also prescribes attributes that let the probes themselves specify which elements' contents should be used as data sources that will be stored into a time series database.

In this manner, when a new machine is to be monitored, the sysadmin simply installs and configures the Eye on that machine. The Head will store its report as an XML document, regardless of whether it is aware of the actual semantics of the individual probe contents, and the sysadmin will be able to inspect the report at least as a raw XML document. Also, the Head will automatically create time series databases for new datasources from such a host and create relevant graphs for the sysadmin to view.

Not only new machines, but also new probes can be added in such a manner, with no configuration on the side of the Head. Of course, with a bit of Head configuration improvements, this new information can be presented better. When adding new probes to (new or existing) monitored machines, the sysadmin should (but is not required to) tweak the configuration of the Head in the following ways:

  1. Improve the XSLT transformation used to transform report XML documents to HTML so that new, yet unknown probe contents will be transformed properly. Thus, the most recent information will be presented to the sysadmin as a nicely formatted, shiny and coloured HTML instead of raw XML.
  2. Add data source descriptions and graph options to the configuration files, so that new probes will have nicer graphs.
More on this in the Head Configuration section.

The Head

Installation

Installation should be really simple: simply deploy the WAR archive in your favourite Java Servlet container.

Prior to posting the first report, you should check the deployed configuration files, and edit them to suit your system as described here. Then, reload the application or even the whole container (since configuration is only read when the application loads). You can inspect the configuration used by Argus in the web application itself to make sure it has picked up your changes: read about using the Argus web application.

NOTE: the WAR archive that is available for download is built for Java 6 virtual machines. Deploying them in a container that runs in an older virtual machine will result in general mayhem. In that case, you will have to build your own.

Building from source

Make sure you have all the required libraries: RRD4J, Apache Commons Logging, Apache Tomcat, Java Mail and Log4J. There are links to all of them in the acknowledgements section. Versions used in development are noted there as well. Later versions will probably do as well, but you might have to fix the Ant build script (at least this is a known problem when using Commons Logging v1.1 that is divided into more jar archives).

You will also need Java (at least Java 5, as generics are used) and a recent Ant (v1.6.x was used in development), with Tomcat tasks enabled. For the latter, you should copy catalina-ant.jar from your Tomcat installation into the lib/ directory of your Ant installation.

Unpack the source archive.

Edit the build.properties file so that paths to required libraries point to the proper directories on your local system. If you use newer versions of the required libraries, you might also want to tweak build.xml file in order to fix the location of the corresponding jar archives: you should look at compile.classpath path element and the prepare target.

At this point you might also want to edit log4j.properties and example/argus-config.xml files to contain the proper configuration values for your deployment. This will save you later configuration and reloading of the application after deployment.

Simply running ant dist should leave you with a newly compiled WAR archive in the dist/ directory.

Configuration

Basic configuration of Argus is specified in the main Argus configuration file. You will also want to configure logging facilities.

Advanced users, especially if introducing new probes, will also want to tweak the report presentation by editing the report XSL transformation and the looks of the Web pages by adjusting the CSS styles.

Basic Configuration: The Argus Configuration File

The example configuration file includes extensive comments on individual configuration sections and parameters: editing and adapting it to your needs should therefore be self-explanatory. However, for those of you who tend to RTFM (although I've probably never encountered such a person) before trying your luck with editing configuration, here is a brief description of the configuration parameters.

The configuration file of a deployed Argus web application is found at WEB-INF/classes/argus-config.xml in the web application directory tree.

The RRD section

The RRD section contains configuration options related to storage of reported values into round-robin databases.

The path element specifies the full path to the directory where RRD (time series round-robin database) related files will be stored. This directory should be writable for the user under whose privileges the Servlet Container runs.

The step element specifies the RRD time step in seconds. This parameter should be generally be lower than the heartbeat of the probes on any of the monitored machines.

The graph-width and graph-height elements specify the width and height of the graph in the data source view.

The overview-graph-width and overview-graph-height elements specify the width and height of the data source graphs in the host view.

The graphs-per-row element specifies the number of data source graphs in a single row of the host view.

The datasources element groups options for various data sources. Options are given in datasource elements, each of which contains the following information:

  1. the rx element contains a Java regular expression. The options are applied to a data source if and only if the regular expression matches the data source name. If multiple regular expressions (from multiple datasource elements) match, only the first match is used. Actually, although in the example regular expressions that match a single datasource element to multiple reported data sources, I advocate using the rx element in such a way that it matches one and only one data source from all those reported in order to avoid confusion.
  2. the description element contains a brief description of the data source. It will be show in the data source view.
  3. the graph element contains optional options for the data source graph: the only allowed content is the options element as defined by the RRD4J XML template API. For completeness' sake, you can find the excerpt from RRD4J documentation in the appendix.

The Triggers section

The triggers section contains configuration of triggers. A trigger consists of an arbitrary number of conditions and actions. If any of the conditions if fulfilled, the trigger fires and all its actions are executed.

The condition element contains definition of a condition. It contains the following:

  1. the name element contains a unique condition name. The name will be used in trigger definitions to reference the condition.
  2. the class element contains a fully qualified Java class name of a class that implements the si.xlab.research.argus.trigger.ITriggerCondition interface.
  3. the parameters element contains constructor parameters for the condition instance. Contents of the parameters can be just about anything. The condition class must implement a public constructor that takes a single parameter of type org.w3c.dom.Element. This parameter represents the parameters element from the condition configuration.

The action element contains definition of an action. It contains the following:

  1. the name element contains a unique action name. The name will be used in trigger definitions to reference the action.
  2. the class element contains a fully qualified Java class name of a class that implements the si.xlab.research.argus.trigger.ITriggerAction interface.
  3. the parameters element contains constructor parameters for the action instance. Contents of the parameters can be just about anything. The action class must implement a public constructor that takes a single parameter of type org.w3c.dom.Element. This parameter represents the parameters element from the action configuration.

The trigger element finally defines a trigger. It contains the following:

  1. the name element contains a unique trigger name.
  2. the description element contains a description of the trigger.
  3. the condition elements that follow reference all conditions for the trigger by their names. If any of the conditions is fulfilled the trigger will fire.
  4. the action elements that come last reference all actions of the trigger by their names. If the trigger fires, all actions will be executed.

The Namespaces section
The namespaces section lists all namespaces used by different probes in the report XML documents and their prefixes. This is especially important for using the XPathCondition trigger conditions, because proper namespaces need to be referenced via prefixes in the XPath expressions used.

The namespace element defines one namespace; it contains:

  1. the prefix element contains the prefix for the namespace, used to reference it in XPath queries.
  2. the uri element contains the namespace URI.

Logging: The log4j Configuration File

This file contains log4j configuration and in turn defines where Argus will log events. We suggest defining two separate logs: one for Argus events and one for logging trigger actions (see LogAction). The latter should have its log level set to INFO. The former should have the log level set to INFO as well during normal operation. However, when reproducing a bug, it should be set to DEBUG, so that full logs may be sent to us.

The log4j configuration file of a deployed Argus web application is found at WEB-INF/classes/log4j.properties in the web application directory tree.

For more info on log4j configuration look here.

Report Presentation: The Report XSL Transformation

This file contains the XSL transformation used to transform a single report document into HTML that is shown at the top of the host view. If you introduce new probes that contain new elements (from new namespaces), you will want to add new XSL templates for these elements and add an apply-templates directive in the template for the probe element near the top of the existing transformation.

The XSLT file of a deployed Argus web application is found at WEB-INF/classes/report2html.xsl in the web application directory tree.

The Looks: The Argus CSS File

This file contains the style information used by Argus web pages. You may change it as you please. Also, if you change the XSL transformation and introduce new CSS classes, you will want to add definitions of those classes here.

The CSS file of a deployed Argus web application is found at argus.css in the web application directory tree.

Using the Web Application

For the sake of this example, we will assume that the Argus Head web application is accessible at http://localhost/argus-j/.

Hosts view

The front page, available at http://localhost/argus-j/report lists all monitored hosts, reporting their name and status. Status can be one of:

Clicking on the host name takes you to the host view.

At the bottom of the page, the following options are available:

  1. Argus report test page takes you to a page used for testing reports from hosts. It shows the last report that has been posted to that page. It is to be used to test and debug output from the Eyes on the monitored machines and is especially useful when you are attempting to introduce new probes, etc. You can post a report to the test page by instructing the Eye to send the report to http://localhost/argus-j/test.
  2. Argus report collection RSS 2.0 feed fetches an RSS feed of last reports from all hosts.
  3. Argus Head configuration overview shows the current configuration.
  4. Argus documentation takes you to this very page.

Host view

In this view, first the last report is presented. The report presentation is prepared by transforming the last report with the Argus report to HTML XSL transformation.

The last report is followed by graphs of all data sources, which present the data source values for the last day with a resolution of host-specified heartbeat. Clicking on any graph brings you to data source view.

Finally, action follow:

Data Source View

The data source view presents graphs of the value of the selected data source for the last day with a resolution of the host-specified heartbeat period, hourly average values for the last month, and daily average values for the last year.

Triggers

This section describes the trigger condition and action classes that are available in the Argus distribution, and explains how to implement a new action or condition class.

Conditions

The following condition classes are available in the Argus distribution:

  1. XPathCondition

XPathCondition

The XPathCondition allows the user to specify an XPath query: if the query returns any XML node (that is: the returned NODESET is non-empty), the condition is fulfilled. It requires two parameters in the configuration file:

  1. expression contains the XPath query,
  2. description contains a human readable description that will be passed to the actions: logging action will log this description along with other information, alarm action will set the host info to this description, etc.
Examples of XPath queries are in the example Argus configuration file.

NOTE: it is of utter importance that you prefix the element names in the XPath expression with proper namespace prefixes. See the section on namespaces configuration.

Implementing a new condition class.

Implementing a new condition class is simple. A class that can be used as a condition should adhere to the following two:

  1. it should implement the si.xlab.research.argus.trigger.ITriggerCondition interface; the check() method is passed the report XML document as a parameter and should return true, if the condition was fulfilled, and should set appropriate information about why the condition was fulfilled in the TriggerConditionResult parameter using its setInfo() method.
  2. it should provide a public constructor that takes a org.w3c.dom.Element parameter. This parameter represents the parameters element from the condition configuration.

Actions

The following action classes are available in the Argus distribution:

  1. LogAction
  2. MailAction
  3. AlarmAction

LogAction

The LogAction will log the report and the information from the fulfilled condition. It requires exactly one parameter in the configuration file:

  1. log-id contains the name of the log4j logging facility that should be used for logging.

MailAction

The MailAction will send a mail when a trigger fires. It requires at least three parameters in the configuration file:

  1. smtp element contains the host name or IP address of the SMTP server to use.
  2. sender element contains the e-mail address that will be used in the From field of the e-mail.
  3. recipient element contains the e-mail address of the recipient. This element may appear any number of times, but at least once.

AlarmAction

The AlarmAction will set the host status to ALARM when a trigger fires. It supports no parameters.

Implementing a new action class.

Implementing a new action class is simple. A class that can be used as an action should adhere to the following two:

  1. it should implement the si.xlab.research.argus.trigger.ITriggerAction interface; the run() method can do just about anything, based on the report XML document that caused the trigger to fire and an TriggerConditionResult parameter that was filled in by the condition that was fulfilled.
  2. it should provide a public constructor that takes a org.w3c.dom.Element parameter. This parameter represents the parameters element from the action configuration.

The Eye

Installation

Simply untar the archive to a directory of your choice. Configure the Eye by editing the argus-probes.txt file and post a report to the test page:

./argus-eye.pl -u http://your.head.host/argus-j-0.1.0/test -c argus-probes.txt -h 30
You can inspect the posted report by directing your browser at the test page URI. Repeat tweaking the configuration until the posted report suits your needs. Finally, add a cron job that will periodically run the eye; the following line in your crontab should run the eye every minute:
* * * * * /path/to/your/argus/installation/argus-eye.pl -u http://your.head.host/argus-j-0.1.0/store -c argus-probes.txt -h 120

Argus Eye requires the following parameters:

  1. -u <head-web-application-uri> gives the URI where the report should be posted. This is the base URI of your Head deployment, ending in /test if you want to test the report, or /store for the real Head storage. HTTP and HTTPS URIs are allowed.
  2. -c <path-to-configuration> gives the path to the configuration file to use.
  3. -h <heartbeat-in-seconds> states the heartbeat: if an Eye does not report every so much seconds, Head considers the data source values for that host in the last period unknown. In general, the heartbeat should be somewhat higher than the real period of running the Eye. Twice as much seems like an appropriate choice.

Probes

Probes are simple executables that print an XML document fragment representing data from a single probe to stdout. The Eye runs all configured probes and combines all their output into a single report. You can safely run the probes directly and see their output on the command line; this is especially useful if you want to check probe output if you have changed the probe or implemented a new one.

Marking the data sources

The default set of probes marks data sources in their output. However, you might change which values are reported as data sources. Data source is marked in the following manner:

  1. only elements whose sole content is a single real number may be marked as data source.
  2. such an element must be given two attributes in order to mark it as a data source to store in the RRD databases and have its history available in as graphs (arg prefixs corresponds to the Argus schema namespace):
    1. arg:datasource with its value set to the name of the data source, and
    2. arg:datasourcedef with its value set to GAUGE or COUNTER in order to determine data source type. GAUGE corresponds to data sources which report the desired value for the last interval by themselves (load average or percentage of used space). COUNTER corresponds to data sources which constantly increment (total number of bytes received on a network interface).
NOTE: the total length of probe name (reported in the name attribute of the probe element) and the data source name must not exceed 19 character. This is due to limitations of RRD4J.

Disk probe

The disk probe reports status of mounted partitions: total space, free space and the percentage of used space. It accepts an arbitrary number of command line parameters, all of which are regular expression. If no command line parameters are given, all partitions are reported. Otherwise, only those that match one of the regular expressions are reported.

Memory probe

The memory probe reports memory status: total amount, free amount and the percentage of used memory. It does so separately for physical memory and swap. It accepts no command line parameters.

CPU probe

CPU probe reports 1, 5 and 15 minute load average values, and the number of ticks since boot spent CPU spent on behalf of processes in user- and kernel-space, niced processes, and idly doing nothing. No command line parameters are accepted.

Network probe

The network probe reports total number of bytes received and transferred for all network interfaces. It accepts an arbitrary number of command line parameters, all of which are regular expression. If no command line parameters are given, all interfaces are reported. Otherwise, only those that match one of the regular expressions are reported.

Configuration

The configuration file for the Eye lists command lines for running probes, one on each line. That is: path to the probe executable, followed by (optional) command line parameters. As simple as that.

NOTE: it is advisable that you list absolute paths to probe executables in the configuration file, especially when running the Eye from cron. Relative paths might prove a problem in this case.

Alternative Eyes

Well, to be honest, the current Eye and the probes are fairly simple hacks. You might want to write an alternative Eye and probes (especially for other operating systems, and especially for those where presence of a Perl interpreter is not common). It should be a simple task, just make sure that the output posted to the Head is a well-formed XML document that conforms with the report schema.

Future Work

There is a lot to do in order to improve Argus. Well, since this is a strictly focused project, the author only intends to add functionality (a) if it is needed here at XLab, or (b) if someone asks nicely. Option (b) might require some monetary compensation to acompany the plea as well, if it proves a lot of work. ;)

Then again, as Argus is free software, both as in free speech as well as in free beer, and comes with full source code, you are welcome to improve and enhance it yourself. If you do, do share your ehancements with us and the rest of the world, even if they are for your personal use and pleasure only.

Anyway, among the improvements that have been envisioned are the following:

  1. Allow for server-side (Head) probes: these would be used not so much to monitor machine attributes (such as CPU usage and disk usage) but services on a machine and their availability from a remote machine's point of view: web, ssh, ftp servers and such. Perhaps even the correct functioning of web applications (via checking of web page contents).
  2. Improve graph control through configuration: make it possible to group multiple data sources on a single graph, hide graphs, etc.
  3. Add a basic web-based administrative interface: configuration file editing, reconfiguration and removal of hosts and their databases.
  4. Triggers on the RRD databases (including data source history), not just on the last reported values.

Bug reports, feature requests, much appreciated patches or just simple thank-yous accompanied by a brief blurb about what you use Argus for should be directed to e-mail address argus at lists dot xlab dot si.

Resources

  1. Report schema: documentation and schema itself
  2. Head configuration schema: documentation and schema itself

Acknowledgements

Argus was developed and runs with a little help, not from my friends, but from the following pieces of software (versions used for development are indicated in parentheses):

  1. time series round-robin databases are maintained by the wonderful Java port of RRDTool, RRD4J, which also takes care of drawing the graphs (RRD4J v2.0.5),
  2. Apache Commons logging library and log4j take care of all the logging requirements of Argus (commons logging v1.0.4, log4j v1.2.13),
  3. Sun Java mail library takes care of e-mail notifications (javamail v1.4),
  4. Argus was developed with Eclipse and Ant, and was tested on the Apache Tomcat Java Servlet container.
Thank you.

Appendix: RRD4J Graph Template Options XML Format

Here is an example of a properly formatted XML template with all options that may be used with Argus (note that RRD4J allows other options as well and that Argus won't check if those are present - when using them, the behaviour of Argus is undefined):

     <options>
         <anti_aliasing>true</anti_aliasing>

         <time_grid>
             <show_grid>true</show_grid>
             <!-- allowed units: second, minute, hour, day, week, month, year -->
             <minor_grid_unit>minute</minor_grid_unit>
             <minor_grid_unit_count>60</minor_grid_unit_count>

             <major_grid_unit>hour</major_grid_unit>
             <major_grid_unit_count>2</major_grid_unit_count>
             <label_unit>hour</label_unit>
             <label_unit_count>2</label_unit_count>

             <label_span>1200</label_span>
             <!-- use SimpleDateFormat or strftime-like format to format labels -->
             <label_format>dd-MMM-yy</label_format>
         </time_grid>
         <value_grid>

             <show_grid>true</show_grid>
             <grid_step>100.0</grid_step>
             <label_factor>5</label_factor>
         </value_grid>

         <no_minor_grid>true</no_minor_grid>
         <alt_y_grid>true</alt_y_grid>
         <alt_y_mrtg>true</alt_y_mrtg>
         <alt_autoscale>true</alt_autoscale>

         <alt_autoscale_max>true</alt_autoscale_max>
         <units_exponent>3</units_exponent>
         <units_length>13</units_length>
         <vertical_label>Speed (kbits/sec)</vertical_label>

         <background_image>luka.png</background_image>
         <overlay_image>luka.png</overlay_image>

         <unit>kilos</unit>
         <lazy>false</lazy>
         <min_value>0</min_value>
         <max_value>5000</max_value>

         <rigid>true</rigid>
         <base>1000</base>
         <logarithmic>false</logarithmic>
         <colors>

             <canvas>#FFFFFF</canvas>
             <back>#FFFFFF</back>
             <shadea>#AABBCC</shadea>
             <shadeb>#DDDDDD</shadeb>

             <grid>#FF0000</grid>
             <mgrid>#00FF00</mgrid>
             <font>#FFFFFF</font>
             <frame>#EE00FF</frame>

             <arrow>#FF0000</arrow>
         </colors>
         <no_legend>false</no_legend>
         <only_graph>false</only_graph>

         <force_rules_legend>false</force_rules_legend>
         <title>This is a title</title>
         <fonts>

             <small_font>
                 <name>Courier</name>
                 <style>bold italic</style>
                 <size>12</size>

             </small_font>
             <large_font>
                 <name>Courier</name>
                 <style>plain</style>
                 <size>11</size>

             </large_font>
         </fonts>
         <first_day_of_week>SUNDAY</first_day_of_week>
     </options>
 
Notes on the template syntax: