Adaltas Cloud Academy
Sign out >

Kafka Basics

Apache Kafka is a distributed streaming platform. It is designed to to handle streams of data with high-availability and is mainly used to build real-time streaming applications.

A Kafka Cluster is comprised of multiple brokers to achieve fault tolerance and high availability.

Messages (or records) are published to Kafka with the Producer API. To read messages from Kafka, the Consumer API is used.

Messages are organized in topics. A producer submit messages to a topic. A consumer subscribes to a topic. A topic can be partitioned in order to balance the work across multiple Kafka brokers. The replication mechanism over partitions is responsible for the high availability (see the example below for to create a partitioned and replicated topic).

Basic Kafka commands

Here are a few basic Kafka commands to interact with a Kafka cluster. In these commands, the ZK_QUORUM environment variable defines the ZooKeeper’s quorum. The KAFKA_BROKERS environment variable is the list if hosts and ports of the Kakfa brokers.

Example for the au cluster:

KAFKA_BROKERS="kfk-brk-1.au.adaltas.cloud:6667,kfk-brk-2.au.adaltas.cloud:6667,kfk-brk-3.au.adaltas.cloud:6667"
ZK_QUORUM="zoo-1.au.adaltas.cloud:2181,zoo-2.au.adaltas.cloud:2181,kfk-brk-3.au.adaltas.cloud:2181/kafka"

List Topics

/usr/hdp/current/kafka-broker/bin/kafka-topics.sh \
  --zookeeper $ZK_QUORUM \
  --list

Create Topic

In this example, we create a topic that will be split in 3 partitions, each partition having 2 replicas.

/usr/hdp/current/kafka-broker/bin/kafka-topics.sh \
  --create \
  --zookeeper $ZK_QUORUM \
  --replication-factor 2 \
  --partitions 3 \
  --topic $topic

Describe Topic

/usr/hdp/current/kafka-broker/bin/kafka-topics.sh \
  --zookeeper $ZK_QUORUM \
  --describe \
  --topic $topic

Console Producer

The kafka-console-producer is a command line utility that uses the Producer API.

/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh \
  --broker-list $KAFKA_BROKERS \
  --producer-property security.protocol=SASL_PLAINTEXT \
  --topic $topic

Once the producer is open, type in a few messages. You should be able to consume them with the kafka-console-consumer.

Console Consumer

/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh \
  --bootstrap-server $KAFKA_BROKERS \
  --consumer-property security.protocol=SASL_PLAINTEXT \
  --topic $topic \
  --from-beginning

The --from-beginning is optional, it can be used to read all the messages that are stored in Kafka for the given topic. By default, the retention of the messages is set to one week.

Delete topic

/usr/hdp/current/kafka-broker/bin/kafka-topics.sh \
  --zookeeper $ZK_QUORUM \
  --delete \
  --topic $topic