SIEM projects are well-known to be demanding and greedy when it comes to the resources and your CIOs/CISOs would like to hear about your direct (software licensing, server investment, etc.) and indirect ( archiving storage, support) costs for at least following 3 years other than the benefits the Project will provide.
In this
article, I will of course give the basic formulas about sizing. However the
most important information to be provided is the real life experience of a live
operation concerning the values and comparison to those values to the
benchmarks.
First unit,
which constitutes the base for all our calculations, is the Event per Second
(EPS) value that each source system generates. EPS value greatly depends on 2
factors, audit policy or rules applied on the source system and business of the
system. A server with “Object Access” audit rules enabled and Web Server
functionality configured of course would not generate the same number of logs
with a standard server. Windows family of servers also tends to generate much
greater number of logs than Linux and UNIX servers, all with standard default
configurations.
Having
calculated the number of EPS for each source asset group the next step to do is
calculating the Event per Day (EPD) value.
EPD = ∑ EPS X 86400
Once EPD
value is calculated, we have to decide an average log message size to know how
much storage we will need each day. Log sources generate logs starting from 200
bytes range on network and infrastructure devices to 10 kilobytes or more on
application and database side. Syslog
standard (RFC 5424) sets the maximum size of the content field of a log message
to 2 kilobytes. In light of this information, it is wise and advisable to assume
a raw log message size as 500 bytes.
Average raw
log message size being set to 500 bytes, the amount of Daily log messages in GB
is calculated as follows:
Daily Raw Log Size = EPD * 500 / (1024)3
Log
management appliances do some changes on the log messages to make them
understandable and meaningful. This operation is called “Normalization” and it
increases the log size depending on the solution you use. In my personal
experience with HP ArcSight, normalization increased the log size about 90% to
100%. Some other people have seen up to 200% of increase in their experiences. As a result we obtain the below given formula for
daily normalized log size:
Daily Normalized Log Size = Daily Raw Log Size
* 2
The
calculated value does not really represent the daily storage value for log
management systems. Many vendors came up with proprietary compression solutions
and claim they compress logs 10 times (10:1) which is quite idealistic. It is
however, safe to consider a ratio of 8:1 for calculations. So the formula
becomes:
Daily Storage Requirement = Daily Normalized
Log Size / 8
The annual
storage need would basically be 365 times the Daily Storage Requirement, if you
want your calculations to be on the safe side. Nevertheless, EPS numbers
seriously fall during weekends and vacations. Watch how much your average EPS
numbers decrease in such periods and do your own calculations for your annual
needs.
Annual Storage Requirement = Daily Storage
Requirement * 365
The last
important point is the retention period when you plan your storage investments
for future. 2 factors are decisive in the definition of retention period,
Compliance Requirements and Security Requirements.
Compliance
Requirements only concern Log Management systems, in HP’s case it is Logger and
there is not much to decide really, whatever the legislation obliges, you have
to configure.
For
security needs which are addressed by HP’s ESM system, the decision is yours. I
have seen many decision makers trying to keep themselves on the very safe side and
take retention periods unnecessarily long.
According
to Mandiant, the median number of days attackers were present on a victim
network before they were discovered was 205 days in 2014, down from 229 days in
2013 and 243 days in 2012. This brings me to the conclusion that retention
period for security alert creation, monitoring, trending and forensics should
be at least 1 year and not longer than 3 years. According to the same study of
Mandiant, “The longest time an attacker was present before being detected in
2013 was six years and three months.”. Last but not the least, the retention
period of course depends on the sector of activity, defense being the longest
and the strictest followed by financial institutions.
A rough
estimation about Storage IOPS values can be calculated with the following formulas:
Storage
IOPS Needed (Direct Attached Storage) = EPS * 1.2
Storage
IOPS Needed (SAN) = EPS * 2.5