Information archaeology

And then there's in-house applications. You might be able to import the data from an app written in a DOS version of FoxPro, but could you get the custom code that generates reports working?

Finding solutions

The best answer so far is to virtualise the operating system and applications needed to open the files. Natalie Ceeney says the vast majority' of government information is in Microsoft formats, so the National Archives worked with Microsoft to develop a system to make documents in older versions easily accessible in their original format.

It uses Virtual PC 2007 to run all previous versions of Microsoft Office on Windows 3.1, Windows 95, Windows 98 and Vista as required, on a single PC.

"Today it's reasonably simple to convert a document to the latest versions of the Office file format," she points out. "This is also about protecting the digital integrity of the document and making sure it can still be viewed and seen the way it was intended."

Email archiving is a related problem. For one thing, PST archives on individual hard drives aren't easy to search or even to find. As James Blake, product manager for email archiving service Mimecast, puts it: "That's all the intelligence of your business spread across the world and scattered among road warriors who may or may not lose laptops or have them stolen."

There's the same file format issue with attachments, and you can also run into problems with archiving messages from Exchange, says Blake. "In the last ten years we've gone through Exchange 5, Exchange 5.5, Exchange 2000, 2003 and now 2007. Over a ten year retention period, you have to manage the migration of all these different email platforms and the underlying stored data in your archive. If I archived my data on a previous version and I'm now on Exchange 2003, I have no way to import that data to examine it. You have to install Exchange 5.5 or upgrade the data to an intermediate format."

With that in mind, Mimecast's reads in email that arrives as SMTP as well as the native Exchange format and stores it in a custom XML format that splits the message into component parts. "The whole message is cryptographically hashed so we can prove when we rebuild the message that it hasn't been tampered with. If it's been forwarded to one person without the original attachment we can single instance store that and note how it was forwarded. We also store notes on how business processes like approvals and scanning have changed a message."

The XML format is more efficient for searching. Blake claims you can search ten years of email using the Mimecast Outlook plug-in faster than you can search a local PST. And speed matters to your users more than the cost of tape versus hard drives: "People are emailing themselves documents so they know in five to 10 years they can get them back in seconds, as opposed to internal backups on tape where they have to wait three to four hours for the tape to come back on a truck from Iron Mountain and then wait again for you to load the tape."

Mary Branscombe

Mary is a freelance business technology journalist who has written for the likes of ITPro, CIO, ZDNet, TechRepublic, The New Stack, The Register, and many other online titles, as well as national publications like the Guardian and Financial Times. She has also held editor positions at AOL’s online technology channel, PC Plus, IT Expert, and Program Now. In her career spanning more than three decades, the Oxford University-educated journalist has seen and covered the development of the technology industry through many of its most significant stages.

Mary has experience in almost all areas of technology but specialises in all things Microsoft and has written two books on Windows 8. She also has extensive expertise in consumer hardware and cloud services - mobile phones to mainframes. Aside from reporting on the latest technology news and trends, and developing whitepapers for a range of industry clients, Mary also writes short technology mysteries and publishes them through Amazon.