Dot Net analyses Wikipedia in the cloud

Dot Net Solutions

If proof was needed that Microsoft's Azure cloud computing platform can handle gargantuan computing projects, it has been delivered, thanks to a British software developer.

Wikipedia Explorer is an application developed by Windsor-based Dot Net Solutions as a proof of concept for high performance computing in the cloud.

The application analyses the data within the Wikipedia user generated encyclopaedia, revealing the various data links that form between content.

"Wikipedia Explorer is all about visualising the various links and relationships between all the pages and data within the Wikipedia repository," Dan Scarfe, chief executive of Dot Net Solutions, told IT PRO at Microsoft's Professional Developer Conference in Los Angeles.

"To do any kind of analysis on Wikipedia now is impossible in a client/server environment, the amounts of data being worked on are too big for that setup. We had hit a brick wall in terms of processing capability."

In order to tackle the problem the company was facing, they redeveloped the application, which was first written 18 months ago, to run on Microsoft's Windows Azure cloud platform. The end result was a cloud service that had access to almost unlimited processing power and far more server resources than it takes to actually run Wikipedia itself.

"The latest version, which sits in the cloud on Azure, allows us to tap into a huge amount of storage a processing, enabling us to harness a hundred or a thousand or thousands of servers as needed. We can do things that were simply impossible before without millions o pounds to spend on your own data centre and hardware."

To show the scale of this cloud, Microsoft is purchasing around 10,000 new servers every month as it continues to fit out the Azure data centres around the globe. As a comparison the whole of Facebook runs on only 10,000 servers.

Dot Net is also working on another project involving facial recognition technology that it is also looking to put into the cloud in order to harness on-demand processing and storage resources.

Click here for more PDC news.