What is Apache Kafka?
Learn about the distributed stream processing engine powering thousands of well-known companies


Over the past few years, organisations across many industries have discovered an increasingly important gap in their data infrastructure. Traditionally, organisations have focused on providing a place to store data. But in order to make use of data, they have realised they're missing a method of sending it to destinations like applications. This gap is being filled by streaming platforms like Apache Kafka.
Apache Kafka was first created by Linkedin, and was open sourced in 2011. Thousands of well-known companies are now built on it, for example Airbnb, Netflix and Goldman Sachs, to name a few. It is a distributed stream processing engine for building real-time data pipelines and streaming applications.
How it works
At a basic level, Apache Kafka is a central hub of data streams. It transforms an ever-increasing number of new data producers and consumers into a simple, unified streaming platform at the centre of an organisation. It allows any team to join the platform, while central teams manage the service, and it can scale to trillions of messages per day while delivering messages in real time.
Central to Kafka is a log that behaves, in many ways, like a traditional messaging system. It is a broker-based technology, accepting messages and placing them into topics. Any service can subscribe to a topic and listen for the messages sent to it. But as a distributed log itself, Kafka differs from a traditional messaging system by providing improved properties for scalability, availability and data retention.
The overall architecture for an Apache Kafka environment includes producing services, Kafka itself and consuming services. What differentiates this architecture is that it's completely free of bottlenecks in all three layers. Kafka receives messages and shards them across a set of servers inside the Kafka cluster.
Each shard is modelled as an individual queue. The user can specify a key which controls which shard data is routed to, thus ensuring strong ordering for messages that have the same key.
On the consumption side, Kafka can balance data from a single topic across a set of consuming services, greatly increasing the processing throughput for that topic.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
The result of these two architectural elements is a linearly scalable cluster, both from the perspective of incoming and outgoing datasets. This is often difficult to achieve with conventional message-based approaches.
Apache Kafka also offers high availability. If one of the services fails, the environment will detect the fault and re-route shards to another service, ensuring that processing continues uninterrupted by the fault.
Uses
Kafka takes on legacy technology across many different areas, including ETL, data warehouses, Hadoop, messaging middleware and data integration technologies, to substantially simplify an organisation's infrastructure. In many cases, Kafka can replace or augment an existing system to make data more consistently available, faster and less costly to deliver.
The use of Apache Kafta is on the rise. A recent survey by Confluent revealed that 52% of organisations have at least six systems running Kafka, with over a fifth having more than 20.
Kafka is used broadly in the cloud, with the most common use being in some combination of virtual private clouds, public clouds and on-premises.
Apache Kafka can be used in a variety of different ways for many different use cases. The past few years has seen a surge in the number of companies adopting streaming platforms. With this approach, they are able to build mission critical, real-time applications that power their core business - all the way from small to large-scale use cases that handle millions of events per second.
Many organisations are seeing significant benefits from their use of Kafka. Because data is available, shared and immediate, companies can create new products and significantly transform existing ones to take advantage of new market opportunities.
In addition to creating new opportunities, companies are leveraging Kafka to be more efficient and transform existing processes. It makes building data-driven applications and managing complex back-end systems simple. Other business benefits include reduced operating costs according to 47% of organisations surveyed by Confluent, improved customer experience and reduced risk.
Esther is a freelance media analyst, podcaster, and one-third of Media Voices. She has previously worked as a content marketing lead for Dennis Publishing and the Media Briefing. She writes frequently on topics such as subscriptions and tech developments for industry sites such as Digital Content Next and What’s New in Publishing. She is co-founder of the Publisher Podcast Awards and Publisher Podcast Summit; the first conference and awards dedicated to celebrating and elevating publisher podcasts.
-
M&S suspends online sales as 'cyber incident' continues
News Marks & Spencer (M&S) has informed customers that all online and app sales have been suspended as the high street retailer battles a ‘cyber incident’.
By Ross Kelly
-
Manners cost nothing, unless you’re using ChatGPT
Opinion Polite users are costing OpenAI millions of dollars each year – but Ps and Qs are a small dent in what ChatGPT could cost the planet
By Ross Kelly
-
Future focus 2025: Technologies, trends, and transformation
Whitepaper Actionable insight for IT decision-makers to drive business success today and tomorrow
By ITPro
-
B2B Tech Future Focus - 2024
Whitepaper An annual report bringing to light what matters to IT decision-makers around the world and the future trends likely to dominate 2024
By ITPro
-
Six steps to success with generative AI
Whitepaper A practical guide for organizations to make their artificial intelligence vision a reality
By ITPro
-
The power of AI & automation: Productivity and agility
whitepaper To perform at its peak, automation requires incessant data from across the organization and partner ecosystem
By ITPro
-
Operational efficiency and customer experience: Insights and intelligence for your IT strategy
Whitepaper Insights from IT leaders on processes and technology, with a focus on customer experience, operational efficiency, and digital transformation
By ITPro
-
Sustainability at scale, accelerated by data
Whitepaper A methodical approach to ESG data management and reporting helps GPT blaze a trail in sustainability
By ITPro
-
What businesses with AI in production can teach those lagging behind
Whitepaper The more sophisticated the AI Model, the more potential it has for the business
By ITPro
-
Four steps to better business decisions
Whitepaper Determining where data can help your business
By ITPro