Defining log statistics
Anomalies are defined as deviations in terms of
the number of
occurrences with relation to previously computed averages.
Statistics for logs is computed for every hour of the week and
takes into account both the number of total events of the log
and the number of unique occurrences of values for selected log
columns ("Column statistics"). For instance, if a log consists
of a "priority" column, statistics may be computed for each
occurrence of a priority value, such as "warn", "error" or
"fatal". The statistics then holds information of the number of
each "warn", "error" and "fatal" occurrences for each hour of
the week. If for instance there are, in a certain hour, many
"error" events, this will be displayed in the dashboard as an
anomaly. Column statistics can be defined in two ways: the first
one consists of computing the statistics for each of the values
of that column. When computing that columns anomaly, the
occurrences of column values of the generated data are each
compared to their statistics; the second method for computing
statistics consists of
computing the average number of occurrences of these values. In
this case, when computing that column anomaly the number of
occurrences of each of the values is compared to that average.
The first method should be used when a limited and known number
of values is expected for a column, such as in the case of
priority, while the second method should be used when a large or
unknown number of values is expected, such as in a "URL" column
of a proxy log or in an "IP" column of an access log. If a
certain IP occurs a certain number of times in a certain hour within the time
frame used for the generation of the dashboard data, then this
number is compared to the average number of unique occurrences
of IP’s in that hour of the week and not to that specific
value’s statistics.
In order to set the statistics
parameters of a certain log, go to this log’s edit wizard. See 'Log
Configuration' for more details on how to define logs. Click
'next' in the Log’s general definition’s page to get to the 'Log
pattern administration' page and click 'next' again here to get
to the 'Log field admin' page. In the 'statistics settings'
section, select the 'Compute statistics' option to enable
statistical evaluation of the log. When this option is selected,
the 'Compute statistics' combo box is displayed next to each of
the log’s fields (columns). Select one of the following options:
-
'do
not compute' – no statistics (and no anomaly) will be
computed for this column.
- 'compute unique values' – statistics will be computed for
each of this column's values.
- 'compute average' – only the average of the occurrences of
unique values will be computed. Anomaly for each unique
value is then computed based on this average statistics.
Selecting 'compute unique values'
for a column where many unique values are expected will result in
large statistics data and less accurate anomalies, so always
consider in these cases using 'compute average'.