Cloudhelix shows how to scale - without losing performance

A finger icon tapping on one of many question mark-shaped clouds.

The problem

  • How to scale up to cope with additional demand without losing performance.
  • RAID is not an option as capacity is lost
  • Additional SANs wouldn't work as throwing money at the problem
  • Couldn't install new technology to start again

The solution

  • PernixData's software provided a way to fix the issue
  • No overhead
  • Better metrics
  • More automation
  • Future-proofed

If you run a cloud, be it public or private, your end users have certain expectations of service and performance. If you are a start-up in cloud provision, offering high performance can be a costly business, particularly when you start scaling up.

For Cloudhelix, the challenge was scale performance independent of costly capacity.

“Out of the box, most SANs and subsystems perform really, really well,” says Angus Malcolm, Technical Director at Cloudhelix. “When you start loading them up and get to 70 to 80 per cent of the footprint being consumed, the performance tends to scale off because you have a finite amount of computing inside the storage processors and as you load it up, the performance tails off. So that is really what we were trying to overcome. How do we scale out the size and footprint and also the performance?”

The firm’s vCloud Director platform spans two UK datacentre locations and is based on VMware ESXi v5.5.1, overlaid with orchestration and multi-tenancy features delivered through VMware vCloud Director v5.5. The storage subsystems its uses to support its cloud platform is built on Dell’s EqualLogic platform.

Jack Jones, lead technical analyst at Cloudhelix says that when you want high performance in your SAN, the traditional route is to use Raid 10 to get high performance, but comes at the cost of sacrificing a lot of capacity.

“You don't necessarily want to put in large amounts of extra storage units simply to give you the performance that you need,” says Jones. “Ideally, we want to be in a position where we could be adding storage units simply to provide capacity without having to add storage units to provide extra performance in terms of load balancing or better raid levels.”

Appraising the alternatives

Having quickly discounted the idea of putting in an additional traditional SAN. “I think it is just because why would you want to throw in another 50, 60 spindles just to increase a bit of performance,” Malcolm explains. “We wanted to abstract the performance from the storage, so that was not going to fly. It is contrary to what we wanted to do.”

The firm also looked at other hybrid arrays that have onboard caching mechanisms. But as the firm is still a relatively new start-up, he quickly discounted that. “We have only been going a year, there was no way we were going to rip out significant investment and do a U-turn technically just purely for that,” says Malcolm.

Malcolm and his team considered a hybrid and all-flash arrays like Tegile and Pure Storage, but for Cloudhelix these weren’t architecturally the way forward either. “For us, it was down to Cap-ex and also we don't know what storage we are going to absorb.”

He says that as the firm is handling various clients with varying demands, it is hard to judge what the data footprint will look like as each customer’s needs are different. “We will only know that once we actually bring them onto the platform. So it is quite hard to capacity plan, it is quite hard to quantify what that data is actually going to do.”

The firm decided to go with PernixData as this has allowed the firm to quantify and scale outperformance as and when it needs to. Malcolm says that he can now drill into the metrics and gauge an accurate idea of any issues with latency and I/O performance. “That gives us more of a handle on how we scale performance out and how our clients use it.”

Implementation

Jones says that to set up PernixData was not too difficult. It sits in front of existing storage, meaning that the company did not need to rip everything out.

“First of all, you need a flash device of some kind installed into your ESXi host. That might be an SSD or even be one of the newer PCIe flash devices like Fusion-io,” he says.

After that, the team installed PernixData software onto a Windows Server, which then becomes the management server. This VM with PernixData running on it can fail without impacting the flash cluster, according to Jones. “It is similar to vCenter, if your vCenter server blows up then your VMs and your hosts will continue running OK. It is used for management of the flash clusters. For instance, if you were changing the caching policy from read to write or you might want to look at some of the graphs. So that's what the Windows software is for.”

What follows is the installation of the actual package onto each ESXi host. “Even though it is a third-party software, you install it using a ESX CLI onto the host and it is then acts as a hypervisor module,” he says.

Jones explains this is why the software is high performing. “It [is] essentially intercepting your reads and writes at the hypervisor level.”

This performs intelligent management on these reads and writes as well as caching it and then de-staging from flash to San and optionally mirroring those pending writes to other hosts.

The software running on the Windows Server communicates with vCenter and provides a plug-in to that as well. “From the word go, it was quite impressive because it is all very integrated with your existing vCenter admin stuff,” says Jones.

He says that he can select the cluster and along with all the normal tabs, such as data, configuration, tasks, events and so on. “There is a nice new tab called PernixData,” says Jones. “When you select that you can tweak the flash faster by fusing the host and their flash devices that you have installed into them.”

He adds that policy can be set as either read caching or write caching as well as choosing which virtual machines to accelerate. “Or, which is a big plus for us, you can choose an entire datastore that you want to accelerate.”

“Customers can be going on and merrily building their own VMs, deleting them, building some more, it wouldn’t be practical for us to configure that on a per VM basis. But if we can configure it as a level of the datastore then it is taken care of for us.”

Benefits

Jones says that since implementing PernixData they can get back to “doing something more worthy of our time.”

“The idea is we are a small, young start-up, we don't have hundreds of staff. Where we can automate something, we need to automate it.”

But the main benefit for the firm is that they have gained very high performing storage.

“We have seen something like 15 to 20 thousand IOPS from a single VM when we have been manually running. Prior to the solution, some of our storage may have been getting say a maximum of one to two thousand IOPS at the most,” says Jones.

He says it brought about a massive increase in performance and most importantly that is abstracted from capacity.

“We can just look at adding a new storage unit purely when we need more capacity and we can run that in a slightly more efficient Raid level, such as Raid six as opposed to Raid ten,” says Jones.

He says that a lot more capacity has been got out of the same storage unit, “purely because we don't have to consider performance at the San level because performance is being taken care of at the closest possible place [to] the VM storage needs.”

“Now we don't have to worry about delivering a storage unit for performance, we can deliver it solely for capacity, which I think is quite important for cloud hosting because we have customers that are loading on terabytes of data and that's right to do so especially as they are paying for it. It means we don't have to worry too much about the performance,” says Jones.

Next steps With the system fully implemented, Malcolm says it has been such a “game changer” for the company that the firm will now promote PernixData to its “end clients and the wider community”.

“Some customers have onsite deployment of it as well as services on our own cloud platform,” says Jones. “They see the performance of our platform and compare it to their on-premise performance. Because of that, we probably are going to offering a bundle where we resell them the Pernix licences as well as the SSDs and some consultancy to set it all up.”

Company profile - Cloudhelix

Based in the Innovation Centre at the University of Sussex, Cloudhelix is a service integrator that specialises in hybrid cloud and virtualisation implementations. Founded in April 2013 as a virtual company, it moved to the Sussex campus at the end of the year. The company is a VMware partner but will work to develop a cloud infrastructure from a variety of providers including such as Amazon, VMware, Google and Microsoft. The company was founded by a group of industry veterans from the ISP and managed services

Rene Millman

Rene Millman is a freelance writer and broadcaster who covers cybersecurity, AI, IoT, and the cloud. He also works as a contributing analyst at GigaOm and has previously worked as an analyst for Gartner covering the infrastructure market. He has made numerous television appearances to give his views and expertise on technology trends and companies that affect and shape our lives. You can follow Rene Millman on Twitter.