Generative AI could help get your unstructured data in shape, but businesses need a centralized strategy first

Three data streams being blurred into one
(Image credit: Getty Images)

In today’s world, unstructured data is one of the cornerstones of business success. The extent to which it can be harnessed and utilized is increasingly a ‘make-or-break’ situation in organizations. 

Unstructured data brings with it the promise of great opportunities, such as cost reduction, process simplification, security and compliance enhancements, and increased productivity. 

Right now, though, it’s often undervalued. While organizations allocate 60% of their “tech spend” on structured data, they spend only 40% on unstructured data*, according to a whitepaper from researchers at IDC sponsored by cloud content management company Box. 

To reap the rewards, companies need to change their thinking and start focusing on adopting a centralized strategy for the management of their unstructured data.  

Businesses also need to hone their data postures if they want to make use of generative AI, which demands clarified data repositories to achieve its potential. 

This is a process that can’t be rushed, however, and companies need to pause for breathing room while they adopt more coherent strategies for their unstructured data. 

What is unstructured data and why is it so important?

According to a whitepaper from researchers at IDC, 90% of the average organization’s data was unstructured as of 2022, located in various systems and various formats throughout company systems. 

This type of data is made up of content that teams work with daily but is not loaded into structured databases. For example, documents, PDFs, videos, images, and audio clips. 

As unstructured data is highly variable even within a single format, it’s difficult to organize and analyze as it can’t always be easily or consistently sorted or labeled.

Structured data, also known as relational data, is much more manageable by comparison and therefore much easier to translate into value for a business. It exists in tables and columns that can be fitted with constraints to exclude data that doesn’t belong there.

This makes data navigation a breeze for employees, but there are drawbacks, particularly with regard to AI. Most obviously, structured data is largely inflexible, placing limits on the variety of data that can be stored.

Unstructured data, by comparison, constitutes all the data that is garnered outside the remit of structured data’s limits. The wealth of enhanced decision-making possibilities implied by this data is revolutionary for businesses. 

In a Tata Consultancy Services blog post, head of the Google business unit Kumar Amitesh cites a few relevant unstructured data use cases, explaining how a manufacturing system could analyze images to find flaws in glass, or how a product development organization could use customer support call transcripts to identify ideas for new products. 

“These AI technologies improve with the volume of data they process, learning from them and becoming smarter over time,” Amitesh said. “They also have enormous potential to unlock the value of unstructured data”.

Analysts at McKinsey have also cited the importance of unstructured data, recommending that companies start building capabilities and interventions into the data life cycle with unstructured data as a focus. 

“Most of our knowledge is captured, curated, and shared in the form of unstructured data,” IDC says in its whitepaper. “Content is therefore essential to running a business, enabling organizations to embrace complexity, manage business risk, and increase productivity in the era of data and artificial intelligence.”

What are the barriers to utilizing unstructured data?

As unstructured data is scattered throughout an organization’s data infrastructure, it’s difficult even to locate, let alone streamline into a coherent structure fit for development. 

Sifting through and organizing data that is far less coherently classified than its structured counterpart is arduous for a business to deal with, making it near impossible to extract meaningful insights that could aid in deploying generative AI tools.

While generative AI offers something in the way of a solution, promising to automate these sorts of search processes to lessen the burden of unstructured data use, businesses still face myriad challenges. 

Compliance is one such challenge. These vast unstructured repositories may contain data that is sensitive and proprietary. Handled or managed incorrectly, particularly by AI, and such data usage could prove damaging. 

IDC’s whitepaper reveals this problem with clarity, stating that 51% of businesses* had reported non-compliance with external regulations through unstructured data. 

Another problem highlighted by organizations is the “staggering” costs associated with unstructured data breaches. The IDC cited that companies with more “fragmented unstructured data approaches” paid a $4.5 million price tag for data breaches, compared to a $2.2 million in companies with less fragmentation. 

How businesses can start to navigate unstructured data through centralization 

To move past these problems, businesses need to entirely readdress their approach to data centralization. It’s only then that they can consider bringing in generative AI tools to automate unstructured data processes

With 31.5% of businesses in the UK and US saying that data is growing too fast and too expensive to store and backup according to Statista, getting a grip on data location is vital.

The overarching problem is that many companies lack visibility into their unstructured data locations and access permissions, according to Box. Establishing a central content repository with access control and automatic data classification has become essential, the firm says. 

Such a solution ensures the existence of an ongoing, centralized location of an organization’s data, allowing the company to draw on its repositories when making decisions in the future. Box the Content Cloud offers such a solution, providing organization’s with a place to manage all their most valuable information. 

According to the firm, its platform offers “comprehensive security measures and compliance protocols,” by centralizing an organization’s content onto a secure platform while also implementing “robust content strategies.”

Box’s AI tools offer the best of both worlds; as the firm states, its AI models operate with Box the Content Cloud to “ensure privacy while maintaining the integrity of your processes and data.” As such, businesses can navigate their unstructured data through automation while remaining secure in the knowledge that their data is not vulnerable to compliance or security risks.

Find out more about Box the Content Cloud; the intelligence platform for secure content management and collaboration.

*Source: IDC White Paper, sponsored by Box, “Untapped Value: What Every Executive Needs to Know About Unstructured Data,” Doc #US51128223, August 2023


ITPro is a global business technology website providing the latest news, analysis, and business insight for IT decision-makers. Whether it's cyber security, cloud computing, IT infrastructure, or business strategy, we aim to equip leaders with the data they need to make informed IT investments.

For regular updates delivered to your inbox and social feeds, be sure to sign up to our daily newsletter and follow on us LinkedIn and Twitter.