Tuesday, March 2, 2010

Analysis of 58,000 PHP websites show 80% have CVSS vulnerability scores of 10.

In order to get a feel for how easy it is for hackers to obtain control of arbitrary webservers, I wanted to see what percent of real websites are highly vulnerable to attack. To answer the question I'm working from the typical attack pattern of a today's hacker. Today's hackers use custom 'hacking software' systems which scan websites for known vulnerabilities. Once a website is found, the vulnerability is used to inject 'remote control' software on the server. Once the website is infected, the hacker then has total control over the webserver and uses it to assist them in wherever their endeavors take them.

Of course there are lots of different kinds of vulnerabilities. There are those that the programmer or installer creates as they attempt to implement some sort of functionality. These are typically only found when the attacker specifically looks for some common error, such as those identified in the SANS Top-25 list of programming errors. There are also vulnerabilities that come with the software you use to build your website. Software such as IIS, Apache, ASP, PHP and Wordpress, all come with their own set of free vulnerabilities.

The vulnerabilities that come with off-the-self software (both Open-Source and Closed-Source) are nothing short of pervasive. The National Vulnerability Database is logging over 5,000 new vulnerabilities per year. These vulnerabilities are the ones the good-guys have found and reported to the NVD. The ones that have been found, but not reported, are the ones being sold on the black-market to hackers.

While looking at the National Vulnerability Database (NVD), you will see that most vulnerabilities are scored. The scoring is broken down various ways according to guidelines set forth in the Common Vulnerability Scoring System (CVSS.) The higher the score the worse the vulnerability. NVD considers any score over 7.5 as HIGH severity.

Each vulnerability is associated with specific versions of software that it exists within. For example, vulnerability "CVE-2009-4024" has a Base CVSS score of "10.0" and exists in ten released versions of "pear" (a PHP library).

If we look at all the NVD vulnerabilities that apply to each version of PHP (as an example) then we can see how pervasive vulnerabilities are. All version of PHP have multiple vulnerabilities documented, except versions 5.2.13 and 5.3.1 (as of today). If we add each vulnerabilities CVSS score together, for each version of PHP, we can easily see that vulnerabilities are pervasive throughout the history of the software. This issue is not specific to PHP. All software packages have similar issues. Some are worse.

To get a feel for how exposed a typical website is to attack, we can use the NVD database's CVSS scores and correlate them with web crawl data. I obtained web crawl data from a public source which contained copies of 127,000 webpages (one page per site). To keep the analysis within reason, I focused only on those sites that use PHP. I extracted only the webpages that where hosted from PHP servers along with the version of PHP being used to create the webpage. The PHP version is easily obtained by looking at the HTTP Response Header. The HTTP header typically contains something like this: X-Powered-By: PHP/5.1.2-gentoo. The PHP version can easily be seen (5.1.2 in this example).

Looking at my small amount of web crawl data I found that 46% of the webpages where hosted by PHP servers. Looking closer at those PHP servers, we can plot the number of webservers hosting each version of PHP.

The broad distribution of PHP versions and their usage levels imply that people do not upgrade their websites with software updates, once their website is setup. This is a huge problem and should be addressed by all software development groups.
If all software has vulnerabilities (and it appears that they do), and no one (website owners and maintainers) is updating their software once they install it (which this chart implies), then the result is that all websites that are more than one release cycle old are vulnerable.
Now that we have the CVSS ratings for each version of PHP and which version of PHP each website is using, we could find the overall risk of PHP websites as a whole. However, low scored CVSS values tend to not be as risky for the type of attacks I'm focusing on. So, for this analysis I focused on only the CVSS scores of "10.0", the worst of the worst. These are the ones that can be done remotely (ie: across the Internet), without authentication (login not required) and with low complexity (meaning most hackers could take advantage of the issue). For PHP (again, as simply an example) there where 361 10's between version 3.0 and 5.2.6.

If we look at only these 361 CVSS 10.0 vulnerabilities and correlate those with the data from the 58,000 PHP websites that I have copies of, it turns out that 80% of those are running PHP versions with at least one CVSS score of 10.0. Most are using versions of PHP that contain more than three CVSS 10's. The average number of CVSS 10's per PHP version was 3.6 (for those versions that have at least one CVSS of 10.)

So what does all this mean? On the one side, not all CVSS's of 10 are wide open doors. Some websites incorporated application firewalls (for example mod_security) and other tools to help reduce the threat of attacks. Some website owners might have even written custom solutions to resolve some of the known issues. Looking at only 58K websites from a cursory perspective is not representative of the whole web and doesn't provide a completely accurate picture.

However, on the other hand, not all vulnerabilities are in the NVD, what about those? Also, each PHP system runs within an O/S, which also has vulnerabilities. Plus, PHP is always used as part of some web application, such as WordPress or Joomla!, which again has their own vulnerabilities. The additional O/S and application layer vulnerabilities are not considered in this simple analysis. So, my fear is that the true number of vulnerable websites out there is fairly high.

As ResuceTheWeb obtains more complete web crawl data, we will continue to analyze the situation and provide more data.