Thoughts about Winning in Incident Detection

Illustration: Bigstock

You've been compromised.

Not recently. Actually, 134 days ago.

You'll hear about it in the news or when you receive a gracious offer to assist you in disinfecting your environment or decrypting your files.

Well, you were not compromised, it was one of your employs, computers, servers, routers, printers, phones or smart refrigerator you've recently acquired for the company.

They gave you 2 million dollars to invest in cyber last year, and you've still got owned. You thought that all those fancy super machine learning, AI layer 9 anti-everything prevention and detection solutions will save you the time and trouble to actually examine your environment, but they did not. Your team tried to examine, investigate and hunt, but it was just too much for them. Too many tasks, alerts, and logs. So many logs – 100 million per day. They told you it was not much for a company your size – 1,100 events per second – and their solution supports 5,000 events per second, you were supposed to be covered for years.

Later on, after investigating the breach, you'll find out the attack was logged in your devices, but it was lower priority, not as critical as your "3 consecutive failed logins" and it got missed by your team, they wanted to get to it, but there was such a backlog, so they didn't.

No real moral to the story except that it is common.

Building a Security Operation Center

The post-breach reaction usually involves a lot of cash and building or buying a security operation center (SOC).

There is some confusion around what a SOC does; most agree that it is supposed to detect incidents. This is usually where the agreement ends.

Some SOCs are allowed to reject financial transactions, and some do not even have a local admin on their computer. Some limit themselves to notifying and dispatching, and some actually repair, patch and reconfigure.

Much of the SOCs misgivings are related to the fact that SOCs are relatively new, compared with security operations.

While the 'O' in SOC stands for "Operation," in most cases we have seen, SOCs are not part of the "operational" team.

As we want to focus on winning detection, suffice to say that it is important to have dedicated people, processes and technology in order to detect – and respond – to incidents.

Events, Alerts and Incidents

When detecting, it is customary to have at least two statuses for the data sent to the SIEM / BIG DATA / NG AI magic device; one is "event" – this is the auditing (logs, db, flow) sent by the different devices. When we decide that one or more of these lines is of interest, it's considered an alert or an incident.

For example, a single event of a single IP accessing a specific port on a server is an event. Ten events of a single IP accessing ten different ports on a server can be considered a "host scan" alert. SOCs usually escalate alerts to "cases" or "incidents" as not every alert is investigated.

From Logs to Incidents

There are a few steps for successful detection: (1) People have to decide what they want to collect; (2) People have to configure the relevant auditing; (3) The log has to be recorded in the originating device; (4) Technology has to collect the log (database, flow); (5) People have to decide what logs are considered "an alert" (preferably it takes place in step 1); (6) Technology has to be configured to raise an alert; and (7) People (SOC) have to react using processes and technologies.

Note that "People" are very important for this process.

What turns an event into an alert? There are two major approaches to alert creation: one is from the bottom up, the second, naturally, is the other way around.

The Bottom Up approach is basically proclaiming: "let's use the wheel and not invent it." Smart people (vendors) have already thought about it a lot, and the tools will provide us with the alerts we need to handle.

Those who advocate this approach might also assert that "these projects have been going on for two decades; let's use the accumulated knowledge on what is relevant and effective."

The Top Down approach is harder, and basically says: "let's look on our environment, our risks, and build use cases and scenarios."

In many cases, a combination of the two approaches is inevitable as the outputs from the top down approach in company A are used as "bottom up" in company B since many of the risks, use cases and systems are identical.

How Do We Detect

Rules are "old school," right? Machine learning, algorithms can automagically detect and resolve incidents, right? Wrong.

What is the main difference between detecting using anomalies and detecting using rules? Our answer is Hypothesis.

When we create a rule saying: a specific machine had a virus that was deleted, and that same machine is scanning the network or accessing a low reputation website, we have a hypothesis according to which "a machine that had a virus was presumably deleted may still be infected."

When we are using algorithms – we may get things like "there are machines are accessing a popular website and also accessing port 8000 in the firewall." Now, we may need to investigate.

Correlation needs a hypothesis for causation. In other words, knowing that two things are related does not imply that they cause each other to manifest, especially when you have a partial picture of the environment (and it's always partial).

Seeing Does Not Equal Understanding

There is a famous picture of a vase portraying an optical illusion: while some might see an intimate coupling, others will see swimming dolphins.

After we get an understanding of the relation between events and give it meaning, we can write a rule that specifies what is the alert and maybe more importantly – what to do when it happens.

How to Win in Detection?

First, let's define winning. If you were diagnosed with Cancer at an early stage and got healthy, did you win? We think yes.

Verizon's "Data Breach Investigation Report" comes out every year exclaiming the gap between compromise and detection.

Compromise is quick (minutes), but the following steps take longer – internal recon, lateral movement and exfiltration of data might take days, weeks, months, or even years. So basically winning can be mitigating the threat as close as possible to the initial compromise and before damage was done.

Focusing on What and Where It Hurts

Have you ever heard your boss saying, "Everything is important?" Is it the same as "Nothing is important"?

The "gracious" king Joffrey Baratheon (from the TV series Game of Thrones) offered a singer who sang an anti-royal song a choice of losing either his fingers or his tongue. After the singer replied "every man needs two hands," he lost his tongue.

Many organizations start with trying to log and alert on a lot of things and soon find themselves saturated and understaffed.

Points Offered for Winning in Detection

There are several books and articles about detection, and we haven't even started to discuss the response. Here are a few points for winning in detection: (1) Not everything is important, don't collect everything; (2) Understand your specific risks and use relevant use cases; (3) Continuously check what you are doing. Detection is not a project; it’s a function; (4) Build mechanisms to validate your environment frequently. Detection breaks easily; (5) Products are great, but we still need people to help them succeed.

And finally, one more thing: consider sharing the load – managed detection and response services can get the job done, find a service that is transparent on what and how it detects; and continuously validate it.


The authors are with BDO Cyber Defense Center

You might be interested also