Blue Coat Labs

Labs Blog

Search Engine Poisoning (SEP) Update: Dangerous Searches

Search Engine Poisoning (SEP) Update: Dangerous Searches

Chris Larsen

A couple of years ago, we published an in-depth series of blog posts looking at the world of Search Engine Poisoning attacks (SEP). For the record, here are the links: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7.

During the Christmas break, I spent a good bit of time in our SEP logs, looking at the current state of the research question from Part 4 -- "What types of searches are the most dangerous?"

This is a worthwhile topic, since one of our competitors persists in doing an annual "most dangerous celebrity" news release (without explaining the methodology behind it) -- even though we've debunked this notion many times. (See Part 5 for the original research, and this recent update.)



Our SEP research follows a standard methodology. Simply put, for an SEP attack to work on a particular topic, the following things all have to happen:

  • The Bad Guys have to create bogus content about the topic.
  • They need to get the search engines to index that content, which means the site hosting the bogus content needs to be sufficiently trusted by the search engines.
  • A victim needs to search for the target topic, using an engine that trusted the content and host site.
  • The search engine needs to think that the bogus content is important enough (i.e., relevant and popular) to put in the top results.
  • The victim needs to see the bogus content, meaning it has to be relatively high in the results, since most people don't scan very deeply most of the time. (This is where a lot of the failure happens, since for any Big Event or Big Name search, there will be a ton of legitimate sites with good content -- and the search engines know and trust those sites.)
  • The victim needs to be fooled by the domain name and the snippet of content that the search engine displays as context.
  • Finally, the victim needs to click the link.

Consequently, instead of simply searching for a name or a topic, and then going through page after page of results, from multiple search engines, looking for links that might be dangerous, we start at the other end: users who've clicked on a dangerous search result. We block the victim from going to the evil site, but log the search terms they had used to find it, along with metadata like "Which search engine were they using?" and "Was it an Image search or a Text search?"

The data for this study was drawn from the second week in November and the second week in December.


Dangerous Search Topics

The list of topics to tabulate was taken from the list used before, in the "Part 4" post linked above, with the addition of several new topics that we wanted to try. Here's how often they turned up, with the latest numbers in black, and the older numbers in (gray):

App/Software Celeb Gaming Health Holiday Letters Non-celeb
4.1% (5.8%) 1.9% (2.7%) 1.9% (n/a) 3.9% (n/a) 0.2% (5.3%) 0.4% (n/a) 2.0% (n/a)


Non-English Porn Proxy/Unblock Shopping Specific Site Video (etc.) Misc
27% (18.1%) 2.8% (11.2%) 0.3% (2.0%) 2.8% (n/a) 21.0% (9.5%) 3.4% (3.5%) *28.5% (42.0%)


Initial Observations:

The first thing that jumped out of the logs, even before doing the statistical count, was the increased amount of non-English search content in our SEP logs. Simply put, there is a lot of SEP activity that begins with search terms entered in other languages. And there are a number of SEP networks that appear to only be doing non-English content. This is an area that needs more research in 2015. (We pointed out, back in Part 3 of the original research, that the anti-junk filters in all of the major search engines seem to do better against English junk than non-English junk...)

The second thing that caught my eye involved searches simply for a domain name (instead of all that hard work of remembering the whole domain name, and actually typing it into your browser's address bar!), it was fascinating to see the patterns. The one I'd like to highlight is the first one that jumped out (and kept jumping out!) -- Instagram.

In the November logs, I found 32 searches for variants of "Instagram" , often misspelled in creative ways ("instgram", "instgramm", "instagrma", "instragam", "instergram", "intagram", "istagram", "insagram", "imstagram" and more) -- typo squatters of the world, take note!

(Actually, the typo squatters have already taken note -- all of those odd variants exist as domains, and are flagged as Suspicious in our database -- as well as several others I found in the logs...)


Definitions & Details:

- App/Software: Includes people searching for apps for their device, as well as for more traditional software. Similarly to Video/Audio/Stream, many of these searches were obviously from people looking for free versions. Not much change from before.

- Celebrity: If the search term set was something like "[celebrity name] [porn terms]" then it was counted as Porn, not Celebrity. This eliminated most of the celebrities you've actually heard of. The "celebs" that were left were mostly people I'd never heard of -- I had to google them to find out who they were. This wasn't a huge topic before, and this year it was even smaller. Compare to the new "non-celeb" topic.

- Gaming: A new topic (based on earlier research observations), broken out of Miscellaneous. This includes game-related searches apart from looking for specific game software, which were counted in the App/Software group.

- Health: A new topic (based on earlier research observations), broken out of Miscellaneous. This includes all sorts of health-related issues, although most searches were about specific diseases or conditions.

- Holiday: Since I was doing this research in November and December logs, this topic included Thanksgiving and Christmas (which each had a couple of related searches), and other wintertime holidays like Hannukah, Kwanzaa, and the New Year -- although no searches were observed for terms related to these holidays. This topic has shrunk to almost zero. If I had to guess at why, I think it's due to the popularity of sites like Instagram and Pinterest -- there's less need to venture out onto the World Wide Web for cute holiday decorations and yummy recipes.

- Letters: A new topic (based on earlier research observations), broken out of Miscellaneous. People are sometimes faced with a need to write a formal letter of some sort, but may not be sure how to proceed with something like that. So they search for something like "sample letter for friend going through divorce". And again, the Bad Guys are attuned to this, and have prepared SEP content targeting those types of searches. However, this sort of search was much less frequent this time around.

- Non-celeb: A new topic (based on earlier research observations), broken out of Miscellaneous. About half of the time, when I googled for a particular name to see who the person was, I found not a minor celebrity, but just some random person. Often, this was for a local crime report or obituary -- but that doesn't count for celebrity! It's interesting that this was basically tied with the Celeb category in frequency.

- Non-English: One small change this time around was to include non-English searches for pornographic content in this category, instead of counting them in the Porn bucket, but that can't account for all of the jump. (Although it was interesting to note that there were a couple of active SEP networks focused on porn-type content in this area, chiefly for Russian and Turkish content.)

- Porn: Includes everything from the soft-porn area of "adult" content on up, but only in English, as noted on the previous item. Had non-English porn searches been included, this number would have been higher.

- Proxy/Unblocker: Basically, these were people searching for ways around the office/school Web Filter, and the SEP gangs were waiting for them. Not as much of this, these days.

- Shopping: A new topic (based on earlier research observations), broken out of Miscellaneous. Generally clothing or fashion related. Some of this is probably holiday related (hello, Black Friday!), but didn't go into the Holiday bucket unless it was explicitly linked to something like Christmas or Black Friday in the search terms.

- Specific Site: There are apparently a lot of people who type domain names into Google or Bing, and then click on the top result, instead of simply typing the domain into their browser's address bar. (My kids do this. It used to drive me crazy.) The Bad Guys seem to know this, and have content designed to show up in these sorts of searches.

- Video/Audio/Stream: People looking to watch movies, TV shows, or anime on-line. (Many of these searches were clearly for copyrighted material, in case you're wondering.)

- *Misc: This category has dropped some, this time around, but most of that is due to splitting out five new topic areas. If all of the new topics had been left in Misc, it would have been close to 40% of the SEP activity -- essentially unchanged from before.


Safety Measures:

A lot of these attacks can be avoided with a few simple precautions:

- Take time to look at the TLD (top level domain) of the site in the link you're about to click on: is it a familiar, well-known one, like .COM or .NET? Or is it a lower-value TLD, like .INFO or .CLICK (yes, that's a new one, and yes, it has a bunch of shady domains on it already)...

- Or is the TLD a two-letter country code that you may not recognize, like .RU (Russia), .IN (India), or .PW (Palau)? If it is for a different country, ask yourself if the content you are searching for ought to be found there...

- How about the domain name itself? Does it look "normal"? Or "weird"?

- Take time to read the two lines of "context" that the search engines typically provide for each result: do any of the words appear random, or to have little or nothing to do with your search terms?

If anything looks shady, then don't click. There are usually plenty of other results to choose from.