Data integration for open table formats

(Image credit: Future)

By ITPro

Any modern enterprise generates reams of data, which can be difficult to correctly store, process, and use without the right approach.

This has become even more important in the face of AI. Before any leader can begin implementing AI across their workflow, they must first ensure they’ve prepared data at the right scale – and across its entire lifecycle.

Data integration, in which data from across a business is combined into a single stream, can be a huge help here. But where do you start with this? And where do open data formats fit into this mix?

In this episode, in association with Informatica, Jane and Rory are joined by Eric Ho, senior director product management at Informatica, to discuss data integration at scale and explore the evolution of open table formats for modern data architectures and data warehousing.

Highlights

“So there's now more data, there are more data sources additionally, and more data formats that need to be supported in terms of structured, semi-structured, as well as unstructured data. Now, with the data being in the cloud, it makes it even more important that the data is secure and private and the quality of the data needs to be proper, otherwise with poor data quality, poor decisions will be made.”

“Open table formats are essentially specialized formats, which actually manage and organize data in a lakehouse architecture. And what it essentially is, is it's the logical data structures created via metadata to manage and maintain and address the physical data objects that are in the lakehouse architecture.”

“Informatica supports open table formats in multiple ways. At the end of the day, we want to ensure we support every single permutation of open table formats, but we support the different file formats, the different storage components, as well as the different catalogs. When I say the different file formats, we support Iceberg, Delta, as well as Hudi on the roadmap. In terms of storage, we support all hyperscalers, AWS, Microsoft Fabric, OneLake, as well as [Azure Data Lake Storage] (ADLS) gen two, as well as Google Cloud Storage.”

“So what open table formats are designed specifically to do is to be able to leverage processing engines to be able to process large amounts of data, like Spark, and make it available at a convenient cost. Another example or use case, why folks consider using open table is because they might need to audit their records. So for example, if you need to audit your invoices or expenses, every single update of open table formats creates a snapshot in time, so snapshots can be retrieved easily in the lakehouse to be able to answer the audit questions.”

Footnotes

TOPICS

ITPro is a global business technology website providing the latest news, analysis, and business insight for IT decision-makers. Whether it's cyber security, cloud computing, IT infrastructure, or business strategy, we aim to equip leaders with the data they need to make informed IT investments.

For regular updates delivered to your inbox and social feeds, be sure to sign up to our daily newsletter and follow on us LinkedIn and Twitter.

Highlights

Footnotes

Subscribe