Risk Based Alerting (RBA) is Here to Stay

The security talks at Splunk®’s annual .conf this year talked a lot about Risk Based Alerting (RBA). My co-founder gave the very first talk on RBA at .conf 2018. Since then it has gained viral popularity and Splunk formally included some RBA features in Enterprise Security’s fall 2020 release.

The focus on shifting to RBA was apparent in the .conf talks this year, as well as in the product architecture itself. It’s clear Splunk is making a long investment in RBA for security alerting.

And we believe Splunk well should. As an RBA pioneer we have seen it make huge impacts on the security operations of some of the world’s largest companies. Consistently we see performance metrics like alert volume, true positive percentages, and mean time to resolution exponentially improve just weeks after going fully live with RBA in a SOC. Perhaps even more significant is the alignment we see across the security teams; from Splunk admins, to the threat intel teams, to the SOC analysts themselves.

The operational gains and the team alignment combine to change how security teams do their daily work. This adds up to months and quarters of strong metrics to present to the board, as well as increasing cyber resiliency across the entire company.

We’ve released a technical guide entitled “Getting Started with RBA in Splunk® Enterprise Security” https://outpost-security.com/rba-getting-started . Following the guide will deliver next generation alerting for most small to mid-sized companies. But for larger enterprises, the complexities and size of large IT environments, as well as the distribution and diversity of their security teams, will introduce some challenges.

Unfortunately, in some companies, we’ve seen these challenges significantly stall progress and eventually kill the hopes of making RBA successful.

The purpose of this article is to take a step back and outline the three main principles that we are leveraging that make RBA the future of security alerting. These are first principles that when executed consistently allow you to present the most relevant information to a security analyst, enabling them to make the best decisions in the least amount of time.

These principles are – Expand, Relate, Enrich. I’ll discuss each one in detail. To be honest, since we put these principles to paper, I’ve had a hard time remembering them. They can be stated another way – and we’ll use those to start:

Blow-it up – Stich-it up – Roll-it up

That’s right – the three steps in implementing the future of security alerting are - Blow-it up – Stich-it up – Roll-it up

Let’s get started…

Step 1 – Blow-it Up – See All the Events

One of the most interesting aspects of cybersecurity to me is that WE HAVE ALL THE DATA. We can see EVERYTHING. Everything is logged across the entirety of the IT stacks – and Splunk makes it possible to search and find things in these logs IN NEAR REAL TIME.

Seriously, that is amazing. Almost perfect visibility.

But the challenge of course is using all that information effectively. We’ll address that in the next steps, but for now I’ll ask you to wrap your head around this single paradigm shift.

Security engineers and our existing alerting toolsets are currently focused on looking at indicators of compromise (IOCs) that may indicate a potential threat. This is how detections are written and how alerts are triggered. The trade-off is we ignore a lot of data that is too noisy – or contains a lot of “business as usual” events that are hard to distinguish from actual bad actors.

The first step is to revisit this data in its entirety. Risk Based Alerting give us the capability to filter the noise automatically – so we can broaden are detections and look at any event of interest – even if it has a low probability of being malicious.

One good example is “First time logon”. Incredibly noisy, especially in large environments. We’d never look at this event if it triggered a single alert, but with Risk Based alerting we can record this, score it as a low-probability of risk, and use it to stitch together a pattern of behavior when correlated with other “risk” events.

We call this a shift from “one-to-one” detections, to “many-to-one” detections.

The takeaway from step 1 – expand your detections to look for as many clues as possible – automatically rank those clues using the RBA scoring methodology.

But remember this is just the collection part of the process, there is more to the novel approach of RBA.

Step 2 – Stich-it Up – Relate the “Who”

Sometimes I think Risk Based Alerting is a misnomer. I remember we had a discussion once around the name RBA. In some industries – like banks – the word “risk” is taboo – and for compliance reasons – not just cultural and business model reasons 

The truth is the “R” should really stand for “Relational” – because that is the key element of RBA – and where RBA get’s its true power.

In step 1 we expanded our collection of events. We’ll call that the “what”. Step 2 addresses the “who”.

Another challenge of log and security data is the variety of sources and technology stacks that it originates from. While a user accesses the IT environment from an endpoint, the information they generate and consume travels across these stacks; authentication systems, email, network, firewalls, the web itself.

The simple disconnect of knowing a “user” by a single name adds complexity right out of the gate. Jane Doe – is it janedoe2, jdoe2@outpost-security.com, Jane R Doe – CIO, or Employee ID 8675309? The problem also extends to systems – “what is the IP address of this machine name?”

Every event has at least one “who”, most events have more than one. A “user” and a “system” is the most elemental paring. (But you can make this instantly smarter by tagging the system as a source or a destination. Why not add vector physics to our alerting searches!).

If we have a source of truth that identifies our known users and systems across all technology stacks and log sources, we have achieved a massive simplification win.

As we find and record risk events, we can identify the “who” immediately and consistently, and record that along with the event or behavior itself. Don’t forget we have a risk score added to each event as well. Even if we see a user or a system that we don’t know, we can still remember their identity, and use it to find evidence of their actions in other events.

The takeaway from step 2 – relate everything together by the person or thing doing the actions (the “object” – e.g. user, endpoint, IP address, email sender)

Step 3 – Roll-it Up – See the “who”, all the “what”, and “who else” all at Once

We now have a gigantic collection of the “what” and the “who”. The final step is to bring everything together.

Traditionally this is called correlation. The difference that you see immediately in RBA though is that we’ve taken the concept of correlation and supercharged it.

First and foremost, we don’t trigger an alert for review until we see enough “risk” events accumulate on a single object. There are a couple of ways to calculate this, but for this article, we don’t need to go to that level of detail.

The important thing is what you (your security analyst) sees in this alert – a narrative of recent events of interest, generated by your broad detection sets. This reads like a literal script of what this user did, when they did it, and what they did next.

In this screenshot we see a series of risk events for a user and system that individually could be benign, but when we see them together, another story becomes clear. We also make it really easy by adding some handy risk messages that explain in plain English what each event means. Again, what would have been difficult to see via individual events – we see almost immediately that this is a ransomware attack firing.

(Thank you Splunk® for attack_data, https://github.com/splunk/attack_data )

In a practice, this has reduced MTTR’s in our customers from hours to less than 20 minutes.

The takeaway from step 3 – using the expanded detection data that is all tagged with at least one object, create a single pane of glass that shows all of the risk activity logged on an object over time. Also show any related objects to those risk activities.

The Future of Security Alerting at Scale

Hopefully you have a better understanding of how RBA works in principle. As I mentioned before, it is consistent execution of these core principles that allows RBA to be successful at scale.

Why does that matter? Execution at scale not only means security alerting metrics that will be the envy of every CISO, but it also means reduced overhead, increased bandwidth for your top talent, and the ability to build resiliency without adding staff or outsourcing to 3rd parties who may not be as effective.

Finally, Highland Defense was founded to help you achieve all of these things in your company. Reach out for a demo so we can show you how you can make the future of security alerting a reality in your organization in about 10 weeks.