Argus is a monitor for computer networks. Well, it is primarily used to monitor a set of servers and their various attributes, such as disk space, available memory, CPU and network usage, etc.
Argus was incepted as an in-house project, used for monitoring the servers in the ISL Light Grid. However, due to extensible architecture, it seems useful for monitoring just about any set of machines: the author uses Argus to monitor two mid-size clusters used for testing distributed applications, as well as a bunch of machines he maintains at home, work and several other places all over the world.
You might be inclined to ask: whence the name? Well, in order to answer that, I suggest you take a peek at Greek mythology. This might be a nice place to start. Briefly: Argus Panoptes (that is: Argus the All-seeing) was a hundred-eyed giant in Greek mythology, a servant of Hera, who set him to watch over a certain cow she suspected was more than just a cow. Actually, the cow was really Io, another of Zeus' many girls, and Zeus, horny as he always was, sent Hermes, the messenger to bore Argus with his stories until all his eyes closed and he fell asleep. Hermes finally slew the sleeping giant, freeing Io for his master. Of course, in contrast to the mythological Argus, our Argus never falls asleep, no matter how boring the machines he watches over are!
Actually, there are many Argi in the world of computer software: the hundred eyes parabole fits so nicely to monitoring software. So, there is also this Argus and this one as well. I know: why a third one in spite of all the confusion? Well, as stated above, the parabole fit so nicely... Besides, the name was all over the code and documentation before I found out about those. So, yes, we have a third Argus. Perhaps you should also check the other two if they fit your needs better: they both have to do with monitoring hosts and networks.
More questions? Is it the logo that has you wondering what butterflies have to do with Greek myths? Well, the winged little bugger in the top left corner is actually a Meadow Argus butterfly, Junonia villida, named for the resemblance of patterns on its wings to eyes. I hope this satisfies your inquisitive mind, otherwise you should remember what curiosity did to the cat...Built web application: argus-j-0.9.0.war. Please note that it was built with Java 1.6 compiler.
Source distribution: argus-j-0.9.0.tar.gz.
Installable Perl scripts: argus-0.9.0.tar.gz.
Source distribution: argus-0.9.0-src.tar.gz.
Argus' architecture was designed with one single goal in mind: to let the sysadmin add new machines to be monitored, and even new probes providing new information with zero configuration on the server side.
Thus, Argus consists of its Head, a central web application that collects reports from monitored machines and presents them to user through a set of web pages, and an arbitrary number of its Eyes, one on each monitored machine.
The Head stores the last report from each machine in memory and is able to present it to the user either as raw XML or as a nicely formatted web page by using an XSLT transformation of the raw XML. Parts of each report that are marked as data source are also stored in a time series database, so that the sysadmin can view history for last day, month and year.
Each Eye runs on a monitored machine and periodically pushes information on that machine to the Head. This information should conform to the Argus report schema, which is extensible beyond your wildest dreams. The schema specifies only the container elements on top two levels of the pushed document: report element represents a report from a single host and groups a number of probe elements. Contents of each probe are completely left to the user, as long as they constitute a well-formed XML document fragment. The schema also prescribes attributes that let the probes themselves specify which elements' contents should be used as data sources that will be stored into a time series database.
In this manner, when a new machine is to be monitored, the sysadmin simply installs and configures the Eye on that machine. The Head will store its report as an XML document, regardless of whether it is aware of the actual semantics of the individual probe contents, and the sysadmin will be able to inspect the report at least as a raw XML document. Also, the Head will automatically create time series databases for new datasources from such a host and create relevant graphs for the sysadmin to view.
Not only new machines, but also new probes can be added in such a manner, with no configuration on the side of the Head. Of course, with a bit of Head configuration improvements, this new information can be presented better. When adding new probes to (new or existing) monitored machines, the sysadmin should (but is not required to) tweak the configuration of the Head in the following ways:
Installation should be really simple: simply deploy the WAR archive in your favourite Java Servlet container.
Prior to posting the first report, you should check the deployed configuration files, and edit them to suit your system as described here. Then, reload the application or even the whole container (since configuration is only read when the application loads). You can inspect the configuration used by Argus in the web application itself to make sure it has picked up your changes: read about using the Argus web application.
NOTE: the WAR archive that is available for download is built for Java 6 virtual machines. Deploying them in a container that runs in an older virtual machine will result in general mayhem. In that case, you will have to build your own.
Make sure you have all the required libraries: RRD4J, Apache Commons Logging, Apache Tomcat, Java Mail and Log4J. There are links to all of them in the acknowledgements section. Versions used in development are noted there as well. Later versions will probably do as well, but you might have to fix the Ant build script (at least this is a known problem when using Commons Logging v1.1 that is divided into more jar archives).
You will also need Java (at least Java 5, as generics are used) and a recent Ant (v1.6.x was used in development), with Tomcat tasks enabled. For the latter, you should copy catalina-ant.jar from your Tomcat installation into the lib/ directory of your Ant installation.
Unpack the source archive.
Edit the build.properties file so that paths to required libraries point to the proper directories on your local system. If you use newer versions of the required libraries, you might also want to tweak build.xml file in order to fix the location of the corresponding jar archives: you should look at compile.classpath path element and the prepare target.
At this point you might also want to edit log4j.properties and example/argus-config.xml files to contain the proper configuration values for your deployment. This will save you later configuration and reloading of the application after deployment.
Simply running ant dist should leave you with a newly compiled WAR archive in the dist/ directory.
Basic configuration of Argus is specified in the main Argus configuration file. You will also want to configure logging facilities.
Advanced users, especially if introducing new probes, will also want to tweak the report presentation by editing the report XSL transformation and the looks of the Web pages by adjusting the CSS styles.
The example configuration file includes extensive comments on individual configuration sections and parameters: editing and adapting it to your needs should therefore be self-explanatory. However, for those of you who tend to RTFM (although I've probably never encountered such a person) before trying your luck with editing configuration, here is a brief description of the configuration parameters.
The configuration file of a deployed Argus web application is found at WEB-INF/classes/argus-config.xml in the web application directory tree.
The RRD section contains configuration options related to storage of reported values into round-robin databases.
The path element specifies the full path to the directory where RRD (time series round-robin database) related files will be stored. This directory should be writable for the user under whose privileges the Servlet Container runs.
The step element specifies the RRD time step in seconds. This parameter should be generally be lower than the heartbeat of the probes on any of the monitored machines.
The graph-width and graph-height elements specify the width and height of the graph in the data source view.
The overview-graph-width and overview-graph-height elements specify the width and height of the data source graphs in the host view.
The graphs-per-row element specifies the number of data source graphs in a single row of the host view.
The datasources element groups options for various data sources. Options are given in datasource elements, each of which contains the following information:
The triggers section contains configuration of triggers. A trigger consists of an arbitrary number of conditions and actions. If any of the conditions if fulfilled, the trigger fires and all its actions are executed.
The condition element contains definition of a condition. It contains the following:
The action element contains definition of an action. It contains the following:
The trigger element finally defines a trigger. It contains the following:
The namespace element defines one namespace; it contains:
This file contains log4j configuration and in turn defines where Argus will log events. We suggest defining two separate logs: one for Argus events and one for logging trigger actions (see LogAction). The latter should have its log level set to INFO. The former should have the log level set to INFO as well during normal operation. However, when reproducing a bug, it should be set to DEBUG, so that full logs may be sent to us.
The log4j configuration file of a deployed Argus web application is found at WEB-INF/classes/log4j.properties in the web application directory tree.
For more info on log4j configuration look here.
This file contains the XSL transformation used to transform a single report document into HTML that is shown at the top of the host view. If you introduce new probes that contain new elements (from new namespaces), you will want to add new XSL templates for these elements and add an apply-templates directive in the template for the probe element near the top of the existing transformation.
The XSLT file of a deployed Argus web application is found at WEB-INF/classes/report2html.xsl in the web application directory tree.
This file contains the style information used by Argus web pages. You may change it as you please. Also, if you change the XSL transformation and introduce new CSS classes, you will want to add definitions of those classes here.
The CSS file of a deployed Argus web application is found at argus.css in the web application directory tree.
For the sake of this example, we will assume that the Argus Head web application is accessible at http://localhost/argus-j/.
The front page, available at http://localhost/argus-j/report lists all monitored hosts, reporting their name and status. Status can be one of:
At the bottom of the page, the following options are available:
In this view, first the last report is presented. The report presentation is prepared by transforming the last report with the Argus report to HTML XSL transformation.
The last report is followed by graphs of all data sources, which present the data source values for the last day with a resolution of host-specified heartbeat. Clicking on any graph brings you to data source view.
Finally, action follow:
The data source view presents graphs of the value of the selected data source for the last day with a resolution of the host-specified heartbeat period, hourly average values for the last month, and daily average values for the last year.
This section describes the trigger condition and action classes that are available in the Argus distribution, and explains how to implement a new action or condition class.
The following condition classes are available in the Argus distribution:
The XPathCondition allows the user to specify an XPath query: if the query returns any XML node (that is: the returned NODESET is non-empty), the condition is fulfilled. It requires two parameters in the configuration file:
NOTE: it is of utter importance that you prefix the element names in the XPath expression with proper namespace prefixes. See the section on namespaces configuration.
Implementing a new condition class is simple. A class that can be used as a condition should adhere to the following two:
The following action classes are available in the Argus distribution:
The LogAction will log the report and the information from the fulfilled condition. It requires exactly one parameter in the configuration file:
The MailAction will send a mail when a trigger fires. It requires at least three parameters in the configuration file:
The AlarmAction will set the host status to ALARM when a trigger fires. It supports no parameters.
Implementing a new action class is simple. A class that can be used as an action should adhere to the following two:
Simply untar the archive to a directory of your choice. Configure the Eye by editing the argus-probes.txt file and post a report to the test page:
./argus-eye.pl -u http://your.head.host/argus-j-0.1.0/test -c argus-probes.txt -h 30You can inspect the posted report by directing your browser at the test page URI. Repeat tweaking the configuration until the posted report suits your needs. Finally, add a cron job that will periodically run the eye; the following line in your crontab should run the eye every minute:
* * * * * /path/to/your/argus/installation/argus-eye.pl -u http://your.head.host/argus-j-0.1.0/store -c argus-probes.txt -h 120
Argus Eye requires the following parameters:
Probes are simple executables that print an XML document fragment representing data from a single probe to stdout. The Eye runs all configured probes and combines all their output into a single report. You can safely run the probes directly and see their output on the command line; this is especially useful if you want to check probe output if you have changed the probe or implemented a new one.
The default set of probes marks data sources in their output. However, you might change which values are reported as data sources. Data source is marked in the following manner:
The disk probe reports status of mounted partitions: total space, free space and the percentage of used space. It accepts an arbitrary number of command line parameters, all of which are regular expression. If no command line parameters are given, all partitions are reported. Otherwise, only those that match one of the regular expressions are reported.
The memory probe reports memory status: total amount, free amount and the percentage of used memory. It does so separately for physical memory and swap. It accepts no command line parameters.
CPU probe reports 1, 5 and 15 minute load average values, and the number of ticks since boot spent CPU spent on behalf of processes in user- and kernel-space, niced processes, and idly doing nothing. No command line parameters are accepted.
The network probe reports total number of bytes received and transferred for all network interfaces. It accepts an arbitrary number of command line parameters, all of which are regular expression. If no command line parameters are given, all interfaces are reported. Otherwise, only those that match one of the regular expressions are reported.
The configuration file for the Eye lists command lines for running probes, one on each line. That is: path to the probe executable, followed by (optional) command line parameters. As simple as that.
NOTE: it is advisable that you list absolute paths to probe executables in the configuration file, especially when running the Eye from cron. Relative paths might prove a problem in this case.
Well, to be honest, the current Eye and the probes are fairly simple hacks. You might want to write an alternative Eye and probes (especially for other operating systems, and especially for those where presence of a Perl interpreter is not common). It should be a simple task, just make sure that the output posted to the Head is a well-formed XML document that conforms with the report schema.
There is a lot to do in order to improve Argus. Well, since this is a strictly focused project, the author only intends to add functionality (a) if it is needed here at XLab, or (b) if someone asks nicely. Option (b) might require some monetary compensation to acompany the plea as well, if it proves a lot of work. ;)
Then again, as Argus is free software, both as in free speech as well as in free beer, and comes with full source code, you are welcome to improve and enhance it yourself. If you do, do share your ehancements with us and the rest of the world, even if they are for your personal use and pleasure only.
Anyway, among the improvements that have been envisioned are the following:
Bug reports, feature requests, much appreciated patches or just simple thank-yous accompanied by a brief blurb about what you use Argus for should be directed to e-mail address argus at lists dot xlab dot si.
Argus was developed and runs with a little help, not from my friends, but from the following pieces of software (versions used for development are indicated in parentheses):
Here is an example of a properly formatted XML template with all options that may be used with Argus (note that RRD4J allows other options as well and that Argus won't check if those are present - when using them, the behaviour of Argus is undefined):
<options>
<anti_aliasing>true</anti_aliasing>
<time_grid>
<show_grid>true</show_grid>
<!-- allowed units: second, minute, hour, day, week, month, year -->
<minor_grid_unit>minute</minor_grid_unit>
<minor_grid_unit_count>60</minor_grid_unit_count>
<major_grid_unit>hour</major_grid_unit>
<major_grid_unit_count>2</major_grid_unit_count>
<label_unit>hour</label_unit>
<label_unit_count>2</label_unit_count>
<label_span>1200</label_span>
<!-- use SimpleDateFormat or strftime-like format to format labels -->
<label_format>dd-MMM-yy</label_format>
</time_grid>
<value_grid>
<show_grid>true</show_grid>
<grid_step>100.0</grid_step>
<label_factor>5</label_factor>
</value_grid>
<no_minor_grid>true</no_minor_grid>
<alt_y_grid>true</alt_y_grid>
<alt_y_mrtg>true</alt_y_mrtg>
<alt_autoscale>true</alt_autoscale>
<alt_autoscale_max>true</alt_autoscale_max>
<units_exponent>3</units_exponent>
<units_length>13</units_length>
<vertical_label>Speed (kbits/sec)</vertical_label>
<background_image>luka.png</background_image>
<overlay_image>luka.png</overlay_image>
<unit>kilos</unit>
<lazy>false</lazy>
<min_value>0</min_value>
<max_value>5000</max_value>
<rigid>true</rigid>
<base>1000</base>
<logarithmic>false</logarithmic>
<colors>
<canvas>#FFFFFF</canvas>
<back>#FFFFFF</back>
<shadea>#AABBCC</shadea>
<shadeb>#DDDDDD</shadeb>
<grid>#FF0000</grid>
<mgrid>#00FF00</mgrid>
<font>#FFFFFF</font>
<frame>#EE00FF</frame>
<arrow>#FF0000</arrow>
</colors>
<no_legend>false</no_legend>
<only_graph>false</only_graph>
<force_rules_legend>false</force_rules_legend>
<title>This is a title</title>
<fonts>
<small_font>
<name>Courier</name>
<style>bold italic</style>
<size>12</size>
</small_font>
<large_font>
<name>Courier</name>
<style>plain</style>
<size>11</size>
</large_font>
</fonts>
<first_day_of_week>SUNDAY</first_day_of_week>
</options>
Notes on the template syntax:
true, on, yes, y,
or 1 to specify boolean true value (anything else will
be treated as false).