Getting Started with Apache Kafka

Getting Started with Apache Kafka is a complete tutorial for beginners. Kafka is a fast, salable, durable messaging system and a distributed streaming platform.It is replacement of traditional message brokers like JMS because of its reliability and replication.

Kafka components –

  1. Producer – It allows to publish a stream of records or messages to one or more Kafka topics. Producer API has producer class that have send() method to send messages asynchronously to a topic.
  2. Consumer – It allows to subscribe to one or more topics and process the stream of records produced to them. The ConsumerRecord API is used to receive records from the Kafka cluster. This API consists of a topic name, partition number, from which the record is being received and an offset that points to the record in a Kafka partition
  3. Streams – It allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
  4. Connector – Itallows building and running reusable producers or consumers that connect Kafka topics to existing applications or database. For example, a connector to a relational database might capture every change to a table.
  5. Topic– Topic are name of categories which records the messages and published. Each topic has multiple subscriber.

For each topic, the Kafka cluster maintains a partition log as below –

  1. Broker – Brokers are simple servers for maintaining published data. Kafka brokers are stateless, so they use ZooKeeper for maintaining their cluster state. Each broker may have zero or more partitions per topic. For example, if there are 10 partitions on a topic and 10 brokers, then each broker will have one partition.
  2. Zookeeper– ZooKeeper provides a distributed configuration services for Kafka cluster. ZooKeeper is mainly used to notify producers and consumers about the presence of any new broker in the Kafka cluster or about the failure of any broker in the Kafka cluster.
  3. Offset – Each partitioned message has a unique sequence ID called an offset. For example, in Partition1, the offset is marked from 0 to 9.
  4. Replica – Replica is backup of a partition. Replicas are not used to read or write data. Replicas are using to prevent data loss and backup.


Capabilities of Kafka –

  1. Stream Processing
    1. Kafka allows you to publish and subscribe the records of stream or a enterprise message queue.
    2. Kafka stores streams of records or messages in different categories called topics.

Command to create Kafka topic –

ease2code:~$/usr/hdp/current/kafka-broker/bin/ –create –zookeeper –replication-factor 1 –partitions 1 –topic topicName


To view the topic list –

ease2code:~$/usr/hdp/current/kafka-broker/bin/ –list –zookeeper


To publish message in Kafka topic –

ease2code:~$/usr/hdp/current/kafka-broker/bin/ –broker-list –topic topicName write your message here



ease2code:~$/usr/hdp/current/kafka-broker/bin/ –broker-list –topic topicName <file.csv


To read messages in Kafka topic –

ease2code:~$bin/ –bootstrap-server –topic topicName –from-beginning


  1. Website activity tracking
  2. Metrics Collection and Monitoring



Speak Your Mind