Uncovering the Mysteries of the Deep Web: A Major New Trend Micro Study

by Ross Dyer

Most IT professionals worth their salt will have heard of the Deep Web. But beyond the salacious reports and hearsay, how many of us really understand what happens on this vast un-indexed area of the web? At Trend Micro we always try to stay one step ahead of the cyber criminals. This is easier said than done, of course, but one strategy we hit upon was to dedicate significant time and resource to uncovering the secrets of the Deep Web.

So that’s exactly what we’ve done. Hopefully the findings of this major new report will help us, and the industry as a whole, better understand the enemy we all face online.

What’s the Deep Web?
There’s so much mis-information about this subject that it makes sense to start with a few definitions. The Deep Web, as we define it, isn’t the same as the ‘darknet’ or the Dark Web. The latter two are comprised of those limited access networks which rely on connections made between trusted peers – such as I2P and Tor. Granted, these are particularly deep down in the ‘Deep Web’, but they are not the same.

The Deep Web more properly refers to any piece of content that can’t be indexed. This includes dynamic web pages, blocked sites (like those that ask you to answer a CAPTCHA to access), unlinked sites, private sites (like those that require login credentials), nonHTML/-contextual/-scripted content, and limited-access networks – including those aforementioned darknet sites. Limited access networks might not just include the likes of Tor and I2P, however. They may also feature sites with domain names registered on DNS roots not managed by ICANN. Or sites which have registered their domain name on a completely different system from DNS, like .BIT for bitcoins.

What lies beneath
To get a better picture for the true scale of the Deep Web and the content therein, our Forward Looking Threat Research Team (FTR) built a Deep Web Analyzer tool which collects URLs, non-standard domains etc and tries to extract content, HTTP headers, links and so on.

Over the past two years we’ve managed to collect more than 38 million events that account for 576,000 URLs – 244,000 of which bear actual HTML content. Out of this, we identified over 8,000 suspicious pages, including those pointing to child exploitation, phishing, password cracking and C&C servers.

Here’s a brief summary of some of the key findings:

  • It’s not all about bad stuff. Personal blogs, news sites, discussion forums, religious sites, and even radio stations can be found – all they’re after is anonymity, which doesn’t necessarily mean nefarious behaviour.
  • English is the main language of choice, accounting for 62% of domains. But Russian wins based on URLs – accounting for 41% versus 40% for English.
  • Cannabis is the most widely exchanged product based on analysis of 15 top vendor sites, followed by pharmaceuticals like Ritalin and Xanax, and then MDMA, LCD and Meth.
  • Most domains were associated with HTTP or HTTPS, but filter them out and more than 100 domains use IRC or IRCS – a common protocol used by malicious actors to meet.
  • Assassination services can also be found on the Deep Web, although we couldn’t vouch for the authenticity of these.
  • The Deep Web is used by cybercriminals to host malware infrastructure, run money laundering services, and trade in stolen credentials and identities – but we all knew that.

As malware authors increasingly hide their infrastructure, and scammers trade in stolen identities, on the Deep Web, so we must do more to shine a light on this murky world. It’s important to remember, though, that many innocent users seek parts of the Dark Web for legitimate uses, such as journalists protecting sources and activists hiding from government snoops.

With more research like this, we as ‘industry experts’ can hopefully do a better job in future of separating one from the other.

More info here!

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.