Montreux Jazz Festival: Storage in a different light

What if a client was to completely change the way you do business? This was the concept that knocked me back on my heels during a conversation with a team of guys from storage startup Amplidata.

The underlying point of painful public data loss stories is that it really is possible for a global name to come out with a storage platform in the cloud that can unravel like an old sweater in the jaws of an excited Terrier.

Storage as a business has a very curious problem: Making it work takes some really hard thinking. Storage is inherently unsexy, but absolutely necessary. It can be a hard sell into businesses, with those involved in decision making thanks largely to having to deal with large, unexciting boxes - being a little dry or hard to excite.

Then there's the emerging world of cloud storage. Some presume initially that anything in "the cloud" is inherently safer, cheaper and just all around better than what they can do with their own hands plugged into their own power supply, cooled by their own machinery and upkept by their own guys. Cloud storage confidence could just be because they are not clear about the sheer scale of disaster that potentially awaits the unwary adopter of big data cloud platforms (permute any other buzzwords you fancy here).

The underlying point of painful public data loss stories is that, yes, it really is possible for a global name to come out with a storage platform in the cloud, which can unravel like an old sweater in the jaws of an excited Terrier, when presented with just the wrong patch, update, guest or user. Yet, storage people' have, faced with a massive rise in the quantity of private data, chosen to mostly promote public resources to fix the problem. Amplidata's revelatory confession to me was that it started out sure its massively distributed RAID/RAIN storage system was perfectly suited to the cloud storage marketplace, and that's where the future lay. Until, that is, it got inside a particular need from a very particular client: cole Polytechnique Fdrale de Lausanne (EPFL). It's a fascinating place, to be sure, which seems to have a relationship with Lausanne and the surrounding Swiss towns much like Cambridge University does with its host city.

But this was not a dry academic test-bed job. There's no drum-roll big enough for the actual nature of the underlying project, so I'll just lay it out bare so we can get to the nitty gritty. The work EPFL was looking to do on a petabyte-plus array with effectively faultless, eternal reliability is part of its responsibility to preserve the archive of performances from the Montreux Jazz Festival. I visited EPFL to talk to Alex Delidais, the man in charge of the project, to find out more about what they are doing and why he chose Amplidata. After all, with this kind of backing and this kind of profile, it's not as if he couldn't have taken his pick of the world's storage suppliers.

I don't think it's an exaggeration to say that the current age-group of the whole world's enterprise and technology company CEOs are very likely to know and like the kind of music that Montreux has been presenting for the last 40 years. Actually, scrub that "like" is too mild a word. I think it's safe to say that the demographic for the subject matter of this project attracts an almost unprecedented degree of obsessive interest from influential people that ought to guarantee an open-door welcome to the project from every high-tech business on the planet. But and here's the point that bumped Amplidata's understanding of its own products off to one side of the cloud hype Alex and the EPFL team really understood how much data they were faced with, and how important it all was. They understood that the 40-year old analogue tape stock was tobogganing down a slope of self-initiated unstoppable destruction even as they had started work digitising the very worst cases. What's more, they recognised that quite apart from establishing a more copyable, more error-tolerant tape format based on LTO to fix the analogue decay crisis, they were also charged with producing a high-speed, cleanly accessible version of the whole archive. This was imperative so that Montreux, which had retained the rights to performances of many artists appearing on the stage of the Festival, could show the archive on-demand, within its burgeoning network of Montreux Jazz Cafes all across the world.

This was not a case of saying "take a copy of those" and pointing to a stack of LTOs. Alex was showing me truckloads of media, in EPFL's loading bay, fresh from Montreux, just across the lake. It's no good having several thousand LTOs in even the world's fastest autoloader, when your average time-to-mount a randomly chosen chunk of performance ready for streaming was in the order of a week.

Montreux needed something that responded faster, but at the same time didn't involve the long load-up times, high power consumption and, frankly, unexplored failure modes that it felt characterised the rest of the storage business. It's generally viewed the older the information is the less it is accessed, lots of it is repeats at a block level, much of it is eminently compressible, and so on. Not so for Montreux. Everyone wants to test the archive with a random selection. With, for example, Bob Dylan's earliest and most recent performances, or everything featuring Memphis Slim this is not your traditional database query requirement.

Compression of music is anathema to Jazz & Rock types who straddle the vinyl/CD conversion era and have entrenched views the algorithms that give a business storage array are not sound-waveform aware, in any case. Compressing company logos in Word has no algorithm crossover to compressing guitar solos in Final Cut.

Amplidata does its RAID a little differently. If you dip into the RAID business then it is very easy to get lost in all the RAID numbers, and the various painful tribally-defined allegiances to ultimately unhelpful concepts, with banners that read "NTFS doesn't require defragmentation" and "LINUX software RAID is superior" and "file based storage is inferior." All of these assume that an almost Talmudic obsession with the incredibly fine distinctions found in megabytes of research papers. One might reasonably expect a technical university to both grasp, and be able to make a selection from what is assumed to be a pretty mature and well-embedded industry.

But it didn't. It decided that the hype was indeed hype, and the mismatches it suspected could be problematical with thin provisioning and de-duplication for its data, were real. It went with the startup product instead.

It seems there's a clear advantage to the massively distributed, bit-level, multiple Intel Atom-powered "erasure coding" RAIN that Amplidata invented. Therefore, if you get the chance to look at anything related to the Montreux Jazz Festival Archive, I can give you the wherewithal to airily say "Oh yes, the 10:1 compressed working copy there is a Petabyte, you know". This is only a fraction of the story whether you look at it from the Montreux/EPFL perspective, or the Amplidata one. Montreux's commitment to preserving what it's got is thankfully backed up by impressive cash resources, because the fan base are certainly not taking any prisoners over the long term significance of those Jazz, avant-garde Rock and Performance moments.

Amplidata was helped in understanding this part of the project by a simple hint in a forum that Montreux might be working on the archive and this kind of breadth of motivations in the audience was instrumental (pardon the pun) in letting it get free of the cloud hype in a storage market. It realised that its strength was much more about securing vital resources of data for the longer term, than it was about providing abstracted, net-ready scaled-out (and therefore, me-too) cloud object storage. Montreux was a confidence builder for Amplidata. For EPFL, this data set is a massive and tempting playpen. The project extends in many different areas which are about audio and video data, and we could just think about all that for quite long enough to fill this and10-plus more articles besides, but that's not the end of the story.

Securing a stream of clearly piracy-worthy information (in this case, the recordings) as it is supplied, on demand, to a globally distributed set of showcase cafes is not a trivial problem. Nor is putting together the sheer diversity of sources of data, nor is deciding what's important about this or that apparently trivial scrap of trash. And I do mean, trash: Scuffed, scribbled stage plans, handwritten running orders, invoices for trivial items, ticket stubs: sounds, video, "document" scans, it's ALL part of the archive. While you and I might not be too bothered by knowing exactly where Brian May stood in 1979, or by clumping together all the archive entries by the name of the sound engineer, these are all legitimate queries for a general-purpose search engine operating on the body of data preserved by Montreux.

With so many different ways of looking at this data, I asked Alex how many people are working on the project and he looked a little apologetic, because it varies so much. Sometimes it's a core team of five or so and at other times it can be 30.

The playpen label is a bit of a puzzle for a while, until you consider the difficult situation of the reality of music-as-data in the 21st century. Universities everywhere are understandably worried by the dubious legal status of download data travelling over student networks, with no intention to get dragged into any of the Digital Rights arguments that waste so much time and effort.

A petabyte of music and video data whose rights are clearly defined, assigned and understood really opens up the opportunities for original research and development work. Amplidata was looking a bit more forward in my conversations on this topic than EPFL was. It pointed out that the deep pockets at Montreux aren't only open to the archive project. Indeed, the performances are continuing, and so are the recordings. While the archive is understandably focusing on deteriorating analogue formats, the Festival is still eagerly buying in to leading edge cameras, systems, and audio mixing gadgets which look set to bump the size of the newer archive entries up by five orders of magnitude. That's going to be some playpen.

The biggest problem in looking at this story which started from a one-sentence aside during an interview at last year's SNW Europe show is keeping a lid on the excitement that the topic generates. It's surprising to reflect on how little advancement there has been in the fields EPFL has started to look at with this work, given how enthusiastic everyone is about the subject matter. It's like we're all in the thrall of grumpy lawyers intent on draconian pursuit of petty criminals every time the music topic comes up.

In storage, too, there's a certain malaise about the success of de-duplicators, compressors and other gadgets that depend on the dull repeatedness of the data they squish down.

Amplidata's key realisation which it claims has helped it get inside even bigger fish than Montreux with its range of storage devices is that not everybody is keeping Office data. This is a breath of fresh air for me and, I suspect, for a lot of people trying to make decisions about what type of storage they buy, and from whom.