RSA Conference 2013: Big Data is a Big Deal

March 13, 2013 - By Chris Larsen

In my Friday morning presentation at this year's RSA Conference, I started off by asking the audience, "So, how many of you are sick of hearing about 'Big Data'?" and got a nice laugh. "Big Data" was clearly the "Big Buzzword" this year.

And, to be fair, Big Data is cool stuff -- it's what the WebPulse research team plays in every day. (I hope that none of them consider it to be "work" -- I think it's the funnest job in the world.)

So if I love Big Data so much, why was I so snarky about it? Because I see it being used more as a marketing ploy, in an attempt to sell a bunch of new "stuff" to people, when the fact of the matter is that they already HAVE "Big Data" -- they've got logs coming out of their ears. What they need, that they're not getting enough of, is guidance from the security industry on how to use that log data more effectively, and thereby get more use out of the tools they've already got. (At least, that was the premise of the presentation.)

 

Haystacks, Needles, and Foolish Zebras

Indeed, it was thinking about these issues that led to the research I did. Since my work involves looking through a huge haystack of traffic logs for malicious behavior, it has a lot in common with our customers, who have their own haystacks to deal with. And the bigger the haystack, the bigger the problem... Sure, it's easy to wave a magnet over the haystack and get the steel needles. But what about the high-tech, polymer composite needles, that have been specifically engineered to look like pieces of hay? How do you find those?

Wouldn't it be cool if there were some general principles that could be used to help winnow a haystack down to a much smaller size? And do it while maintaining a high confidence that your new, smaller, haystack still had plenty of interesting needles to look for? (And when I say "general principles" I mean principles that would work for any type of user traffic. Regardless of vendor. Regardless of the system that generated the logs.)

 

This is where the "Foolish Zebras" come in...

I've been using the analogy of zebra herds and predators for years. Zebras like to live in big herds. In a big herd, the odds of survival go up for individual zebras -- the bigger the better. And, the odds get even better for the smart zebras if their herd contains some young and foolish zebras. Those are the zebras who play games like "Last one to the water hole in the morning is a rotten egg!" and "Hey, this is boring. Let's go play tag over in those bushes!"

 

Fortunately (for our purposes), the "Foolish Zebra" gene seems to be fairly common, and deeply embedded, in the human genome. IT folks (especially Tech Support and Help Desk workers) love telling Foolish Zebra stories. (Although they generally have less-polite terms for their "zebras"...) My personal favorite story is "The K9-using Zebra Who Blew Through Not One, But Two, Malware Warning Pages to Get His Download", but I have a lot more...

In fact, it was while pondering "Why do users do that?" that the zebra herd analogy was born. Since this sort of behavior doesn't seem to be something that we can easily change, why not take advantage of it, and thereby keep the rest of the herd safer? In the case of the aforementioned anonymous K9 user, who got his computer infected in spite of two warning screens, we realized that the now-infected computer still had K9 on it, and would be sending us traffic that none of our Large Organization customers could (since their end users don't have admin passwords to override Block Pages). We could look at the anonymous zebra's traffic and probably find some interesting sites. (And we did.)

 

Anyway, the heart of this research project was as follows:

  • Come up with ways to identify Foolish Zebras (not so hard)
  • Pull out a small group of FZ traffic (this is the much-smaller haystack)
  • Identify overall similarities in the FZ traffic (i.e., common threats)
  • Focus more heavily on "oddballs" within the FZ group (i.e., uncommon threats)

 

Since I have access to a ton of Web traffic logs, that's what I used. And that's what the follow-up posts in this series will look at, as I present what I learned from "radio collaring" some of the Foolish Zebras in my herd.

However, I have such faith in the Foolish Zebra principle (as I said, it seems to be deeply rooted in the human genome) that I'm very confident the same principles can be used with logs from firewalls, IDS/IPS, or any other security solution -- because we're dealing with human behavior, not technology per se....

 

--C.L.

@bc_malware_guy