Wednesday, April 21, 2010

Scam Constellations and Spam Link Architectures

Scams come in all shapes and sizes, even on the Internet. One thing that I've noticed while doing research for RescueTheWeb is that the scammers are persistent in inventing new architectures that meet their needs. This article is a brief survey of the scamming link architectures I've found.

When I talk about scamming architecture I'm referring to the link structure that scammers use to 1.) raise their Google Page Rank and 2.) draw you to their scam websites.

1. The Point Source Scam Site:In this architecture there is a single scam site that doesn't necessarily use hacking of other websites to create links to their scam site. The site is all alone and probably has low Google Page Ranking and doesn't show up too high is search results. There are probably not many of these types of scam sites since they would be hard to find and would have a low number of visitors.

2. The 2-tier Scam Architecture:
In this architecture there is a single scam site that uses hacking to disperse links around the Internet to point to their single scam site. This technique raises their Google Page Rank. Many times the website breaches are specially crafted to only be visible to Google/Yahoo!/Bing so that their page rank is raised without raising suspicion from the rightful website owner.

3. The 3-tier Scam Architecture:
In this architecture there a single scam site that uses two layers of hacked websites to cleverly raise the page rank and ensure they throw a large net to catch possible victims. This architecture is unique in that it uses a combination of redirects and links to bring the user to the goal scam site.

4. Another 3-tier Scam Architecture:
The problem with the previous architecture, from the scammers perspective, is that it requires the user to click on that first link. Typically, the link-based infections (that actually show links to the visitors) are a little sloppy and probably don't have a high click-through rate. To increase their click-through rate (which they appear to be watching based on how their URL's contain tracking parameters), the scammers have come up with scam search engines too. Typically they create faux-Google search engines where the scammer owns all the search results. This is very convenient for the scammer since they can direct the visitor anywhere they want.

To trick the visitors they use convenient keywords (bait) on the breached sites to pull in a high Google Page Rank, then when the visitor clicks on the Google link they are taken to the faux-Google (switch) which contains only links to scam businesses.

The transition from the real Google to the fake Google is nearly instant and probably not obvious to most users who will simply think it's a glitch and continue browsing.

To hide these fake Google's from direct access by search engine crawlers, and curious people, you can only view the fake Google results if you come at them from a link embedded in an infected website.

5. Scam Constellations:
As if this prior architecture wasn't good enough, the scammers wanted to make it robust to detection and shutdowns. Now scammers are creating constellations of scam sites that work together to direct visitors to their scam businesses. In the examples I've seen the names of the constellation scam sites are nearly identical with only one character changing between domain names. For example:,,, etc... One constellation had 24 nearly identical domain names within it.

These are the scam link architectures that I've recently seen on the Internet. There are probably many more. Please send me your observations, so we can add it to the RescueTheWeb analysis engine.

Friday, April 16, 2010

When Google isn't helping, just make your own...

Google vs. Googpillc
During our research of infected websites we ran into this interesting find. We found a website that looks nearly identical to Google, but it isn't Google.

This is an interesting twist to the typical approach of simply redirecting the user directly to questionable pharmacies. With the faux-Google approach the attackers are trying to regain the victim's trust by presenting them with a Google-like appearance and even offering them a list of (fraudulent) pharmacies to visit.

The faux-Google is complete with advertisements on the right side of the screen. Just think, the miscreants might be selling advertising positions to their cohorts. Underground-AdWords.

Website Access
If you simply go to the website you will only see an 'under construction' warning and nothing will appear abnormal.

However, if you are brought to the website from another infected website then the faux-Google facade will appear.

Outbound Links
The number of websites that are presented to the visitor appear to be limited, they are reused in several times in the search results. The search results vary but the outbound links are limited.

Browsing Caution
As if there isn't enough reason for caution while browsing the Internet, this is just one more reason. Always be sure to keep an eye on the URL you are currently visiting, it might have jumped to another website without you knowing it.

Tuesday, March 2, 2010

Analysis of 58,000 PHP websites show 80% have CVSS vulnerability scores of 10.

In order to get a feel for how easy it is for hackers to obtain control of arbitrary webservers, I wanted to see what percent of real websites are highly vulnerable to attack. To answer the question I'm working from the typical attack pattern of a today's hacker. Today's hackers use custom 'hacking software' systems which scan websites for known vulnerabilities. Once a website is found, the vulnerability is used to inject 'remote control' software on the server. Once the website is infected, the hacker then has total control over the webserver and uses it to assist them in wherever their endeavors take them.

Of course there are lots of different kinds of vulnerabilities. There are those that the programmer or installer creates as they attempt to implement some sort of functionality. These are typically only found when the attacker specifically looks for some common error, such as those identified in the SANS Top-25 list of programming errors. There are also vulnerabilities that come with the software you use to build your website. Software such as IIS, Apache, ASP, PHP and Wordpress, all come with their own set of free vulnerabilities.

The vulnerabilities that come with off-the-self software (both Open-Source and Closed-Source) are nothing short of pervasive. The National Vulnerability Database is logging over 5,000 new vulnerabilities per year. These vulnerabilities are the ones the good-guys have found and reported to the NVD. The ones that have been found, but not reported, are the ones being sold on the black-market to hackers.

While looking at the National Vulnerability Database (NVD), you will see that most vulnerabilities are scored. The scoring is broken down various ways according to guidelines set forth in the Common Vulnerability Scoring System (CVSS.) The higher the score the worse the vulnerability. NVD considers any score over 7.5 as HIGH severity.

Each vulnerability is associated with specific versions of software that it exists within. For example, vulnerability "CVE-2009-4024" has a Base CVSS score of "10.0" and exists in ten released versions of "pear" (a PHP library).

If we look at all the NVD vulnerabilities that apply to each version of PHP (as an example) then we can see how pervasive vulnerabilities are. All version of PHP have multiple vulnerabilities documented, except versions 5.2.13 and 5.3.1 (as of today). If we add each vulnerabilities CVSS score together, for each version of PHP, we can easily see that vulnerabilities are pervasive throughout the history of the software. This issue is not specific to PHP. All software packages have similar issues. Some are worse.

To get a feel for how exposed a typical website is to attack, we can use the NVD database's CVSS scores and correlate them with web crawl data. I obtained web crawl data from a public source which contained copies of 127,000 webpages (one page per site). To keep the analysis within reason, I focused only on those sites that use PHP. I extracted only the webpages that where hosted from PHP servers along with the version of PHP being used to create the webpage. The PHP version is easily obtained by looking at the HTTP Response Header. The HTTP header typically contains something like this: X-Powered-By: PHP/5.1.2-gentoo. The PHP version can easily be seen (5.1.2 in this example).

Looking at my small amount of web crawl data I found that 46% of the webpages where hosted by PHP servers. Looking closer at those PHP servers, we can plot the number of webservers hosting each version of PHP.

The broad distribution of PHP versions and their usage levels imply that people do not upgrade their websites with software updates, once their website is setup. This is a huge problem and should be addressed by all software development groups.
If all software has vulnerabilities (and it appears that they do), and no one (website owners and maintainers) is updating their software once they install it (which this chart implies), then the result is that all websites that are more than one release cycle old are vulnerable.
Now that we have the CVSS ratings for each version of PHP and which version of PHP each website is using, we could find the overall risk of PHP websites as a whole. However, low scored CVSS values tend to not be as risky for the type of attacks I'm focusing on. So, for this analysis I focused on only the CVSS scores of "10.0", the worst of the worst. These are the ones that can be done remotely (ie: across the Internet), without authentication (login not required) and with low complexity (meaning most hackers could take advantage of the issue). For PHP (again, as simply an example) there where 361 10's between version 3.0 and 5.2.6.

If we look at only these 361 CVSS 10.0 vulnerabilities and correlate those with the data from the 58,000 PHP websites that I have copies of, it turns out that 80% of those are running PHP versions with at least one CVSS score of 10.0. Most are using versions of PHP that contain more than three CVSS 10's. The average number of CVSS 10's per PHP version was 3.6 (for those versions that have at least one CVSS of 10.)

So what does all this mean? On the one side, not all CVSS's of 10 are wide open doors. Some websites incorporated application firewalls (for example mod_security) and other tools to help reduce the threat of attacks. Some website owners might have even written custom solutions to resolve some of the known issues. Looking at only 58K websites from a cursory perspective is not representative of the whole web and doesn't provide a completely accurate picture.

However, on the other hand, not all vulnerabilities are in the NVD, what about those? Also, each PHP system runs within an O/S, which also has vulnerabilities. Plus, PHP is always used as part of some web application, such as WordPress or Joomla!, which again has their own vulnerabilities. The additional O/S and application layer vulnerabilities are not considered in this simple analysis. So, my fear is that the true number of vulnerable websites out there is fairly high.

As ResuceTheWeb obtains more complete web crawl data, we will continue to analyze the situation and provide more data.

Thursday, February 11, 2010

We clearly have a security problem when 64 million webpages have the exact same scam phrase within them.

While inspecting hacked website data I found a phrase that appears to be a signature of a specific type of website infection. The phrase was "buy-phentermine- 37.5mg-without-prescription". This is a lengthy phrase and not likely to be reproduced by a reputable pharmacy in this exact form.

When I used Google to finds out how many websites had this phrase within them, I was shocked. Google reported that 64 Million webpages contain the exact phrase "buy-phentermine- 37.5mg-without-prescription".

Maybe the reported Google number of 64 Million is high. Bing said 24 Million, Yahoo! said 22 Thousand. But either way, this means that there are probably millions of hacked websites out there, with just this one infection. Considering just this one hack has resulting in so many infections, we have to be concerned. Widespread hacking is a serious problem because hacked websites lower the quality, trustworthiness and safety of the whole Internet. Depending on the exact attack used, many of these websites could also be drones in botnets or leaking confidential business or personal information to third parties.

Looking at these exact infections I see they occur on a variety of platforms and are presented in different ways. Some use 'display:none' styling to hide the links, some use 'position:absolute;left:-2000px;' to hide the links, some don't even hide there links. Some infections are focused on taking you to a (fake) online pharmacy to buy the drugs while others seem to be more after search engine ranking inflation. Some don't seem to have any purpose since they simply link to another hacked website. This type might be some type of search rank inflation too, if these linked sites eventually link back to the (fake) online pharmacy.

This pervasiveness is the reason is out to find these hacked websites, inform their owners, and get them fixed. The widespread hacking that we see on the Internet is why the word 'Rescue' is in the name of RescueTheWeb. This is a rescue mission. Anytime there are 64 million of anything you need to pay attention to it.

Website Infections that only express themselves when the HTTP Referrer is Google.

While looking for infected websites, I found a moinmoin based website that was infected with either a .htaccess hack or a software injection hack. The interesting part about this hack was that it only manifested itself when the http referrer was set to

There have been recent articles about malware that only shows when the visitor goes to the infected website through a Google Images frame. However, this new twist applies more broadly to any web content that came up through a Google search.

We can assume the purpose of this new approach to infections is to make it harder for the website owner to find the infected webpage.

The infected content points the user to the website using a "http 302 Found" redirect.