IT Pro is supported by its audience. When you purchase through links on our site, we may earn an affiliate commission. Learn more

What is Apache Kafka?

Learn about the distributed stream processing engine powering thousands of well-known companies

Abstract image of blue and white lines

Over the past few years, organisations across many industries have discovered an increasingly important gap in their data infrastructure. Traditionally, organisations have focused on providing a place to store data. But in order to make use of data, they have realised they're missing a method of sending it to destinations like applications. This gap is being filled by streaming platforms like Apache Kafka.

Apache Kafka was first created by Linkedin, and was open sourced in 2011. Thousands of well-known companies are now built on it, for example Airbnb, Netflix and Goldman Sachs, to name a few. It is a distributed stream processing engine for building real-time data pipelines and streaming applications.

How it works

At a basic level, Apache Kafka is a central hub of data streams. It transforms an ever-increasing number of new data producers and consumers into a simple, unified streaming platform at the centre of an organisation. It allows any team to join the platform, while central teams manage the service, and it can scale to trillions of messages per day while delivering messages in real time.

Central to Kafka is a log that behaves, in many ways, like a traditional messaging system. It is a broker-based technology, accepting messages and placing them into topics. Any service can subscribe to a topic and listen for the messages sent to it. But as a distributed log itself, Kafka differs from a traditional messaging system by providing improved properties for scalability, availability and data retention.

The overall architecture for an Apache Kafka environment includes producing services, Kafka itself and consuming services. What differentiates this architecture is that it's completely free of bottlenecks in all three layers. Kafka receives messages and shards them across a set of servers inside the Kafka cluster.

Each shard is modelled as an individual queue. The user can specify a key which controls which shard data is routed to, thus ensuring strong ordering for messages that have the same key.

On the consumption side, Kafka can balance data from a single topic across a set of consuming services, greatly increasing the processing throughput for that topic.

The result of these two architectural elements is a linearly scalable cluster, both from the perspective of incoming and outgoing datasets. This is often difficult to achieve with conventional message-based approaches.

Apache Kafka also offers high availability. If one of the services fails, the environment will detect the fault and re-route shards to another service, ensuring that processing continues uninterrupted by the fault.

Uses

Kafka takes on legacy technology across many different areas, including ETL, data warehouses, Hadoop, messaging middleware and data integration technologies, to substantially simplify an organisation's infrastructure. In many cases, Kafka can replace or augment an existing system to make data more consistently available, faster and less costly to deliver.

The use of Apache Kafta is on the rise. A recent survey by Confluent revealed that 52% of organisations have at least six systems running Kafka, with over a fifth having more than 20.

Kafka is used broadly in the cloud, with the most common use being in some combination of virtual private clouds, public clouds and on-premises.

Apache Kafka can be used in a variety of different ways for many different use cases. The past few years has seen a surge in the number of companies adopting streaming platforms. With this approach, they are able to build mission critical, real-time applications that power their core business - all the way from small to large-scale use cases that handle millions of events per second.

Many organisations are seeing significant benefits from their use of Kafka. Because data is available, shared and immediate, companies can create new products and significantly transform existing ones to take advantage of new market opportunities.

In addition to creating new opportunities, companies are leveraging Kafka to be more efficient and transform existing processes. It makes building data-driven applications and managing complex back-end systems simple. Other business benefits include reduced operating costs according to 47% of organisations surveyed by Confluent, improved customer experience and reduced risk.

Featured Resources

Accelerating AI modernisation with data infrastructure

Generate business value from your AI initiatives

Free Download

Recommendations for managing AI risks

Integrate your external AI tool findings into your broader security programs

Free Download

Modernise your legacy databases in the cloud

An introduction to cloud databases

Free Download

Powering through to innovation

IT agility drive digital transformation

Free Download

Most Popular

Salaries for the least popular programming languages surge as much as 44%
Development

Salaries for the least popular programming languages surge as much as 44%

23 Jun 2022
The top programming languages you need to learn for 2022
Careers & training

The top programming languages you need to learn for 2022

23 Jun 2022
Swift exit: How the world cut off Russian banks
finance

Swift exit: How the world cut off Russian banks

24 Jun 2022