Kafka Basics
Apache Kafka is a distributed streaming platform. It is designed to to handle streams of data with high-availability and is mainly used to build real-time streaming applications.
A Kafka Cluster is comprised of multiple brokers to achieve fault tolerance and high availability.
Messages (or records) are published to Kafka with the Producer API. To read messages from Kafka, the Consumer API is used.
Messages are organized in topics. A producer submit messages to a topic. A consumer subscribes to a topic. A topic can be partitioned in order to balance the work across multiple Kafka brokers. The replication mechanism over partitions is responsible for the high availability (see the example below for to create a partitioned and replicated topic).
Basic Kafka commands
Here are a few basic Kafka commands to interact with a Kafka cluster. In these commands, the ZK_QUORUM
environment variable defines the ZooKeeper’s quorum. The KAFKA_BROKERS
environment variable is the list if hosts and ports of the Kakfa brokers.
Example for the au
List Topics
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh \
--zookeeper $ZK_QUORUM \
Create Topic
In this example, we create a topic that will be split in 3 partitions, each partition having 2 replicas.
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh \
--create \
--zookeeper $ZK_QUORUM \
--replication-factor 2 \
--partitions 3 \
--topic $topic
Describe Topic
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh \
--zookeeper $ZK_QUORUM \
--describe \
--topic $topic
Console Producer
The kafka-console-producer
is a command line utility that uses the Producer API.
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh \
--broker-list $KAFKA_BROKERS \
--producer-property security.protocol=SASL_PLAINTEXT \
--topic $topic
Once the producer is open, type in a few messages. You should be able to consume them with the kafka-console-consumer
Console Consumer
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh \
--bootstrap-server $KAFKA_BROKERS \
--consumer-property security.protocol=SASL_PLAINTEXT \
--topic $topic \
The --from-beginning
is optional, it can be used to read all the messages that are stored in Kafka for the given topic. By default, the retention of the messages is set to one week.
Delete topic
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh \
--zookeeper $ZK_QUORUM \
--delete \
--topic $topic