Kafka Tutorial
Apache Kafka is an open source platform for stream processing. The aim of the project is to provide a unified platform with high throughput and low latency for processing real-time data feeds. Kafka can connect to external systems and, with Kafka Streams, offers stream processing in Java.
In this tutorial you will:
- Discover the Kafka basic commands
- Create a topic
- Publish data to the topic
- Consume data from the topic
A prerequisite to execute the following tutorial is to be connected to the edge-1
node via SSH.
Tutorial
We will be using the brokers’ endpoints as well as ZooKeeper a lot in the following tutorial so let’s start by setting these values as environment variables.
The brokers are the list of nodes that form the Kafka cluster. In our case we have 3 nodes (kfk-brk-1.au.adaltas.cloud, kfk-brk-2.au.adaltas.cloud and kfk-brk-3.au.adaltas.cloud) listening on port 6667, hence:
KAFKA_BROKERS="kfk-brk-1.au.adaltas.cloud:6667,kfk-brk-2.au.adaltas.cloud:6667,kfk-brk-3.au.adaltas.cloud:6667"
The metadata of Kafka (such as topics, replication state, etc.) are stored in a ZooKeeper quorum. Interactions with topics (creation, deletion or configuration) will happen through ZooKeeper:
ZK_QUORUM="zoo-1.au.adaltas.cloud:2181,zoo-2.au.adaltas.cloud:2181,kfk-brk-3.au.adaltas.cloud:2181/kafka"
Note: Kafka uses Kerberos authentication, make sure you have a valid Kerberos ticket before continuing (see here for details).
Topic Creation
The first thing we will do here is create a topic:
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh \
--create \
--zookeeper $ZK_QUORUM \
--replication-factor 2 \
--partitions 3 \
--topic my-tutorial-topic
Created topic "my-tutorial-topic".
This topic will have 3 partitions, each partition will have 2 replicas.
Next, we can verify that our topic was correctly created:
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh \
--zookeeper $ZK_QUORUM \
--list | grep my-tutorial-topic
my-tutorial-topic
Consume a topic
Now, we start a consumer listening this topic for new incoming messages with the kafka-console-consumer
command:
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh \
--bootstrap-server $KAFKA_BROKERS \
--consumer-property security.protocol=SASL_PLAINTEXT \
--topic my-tutorial-topic
Seems ok but nothing happens… It is normal, we are yet to send some data into Kafka.
Produce in a topic
To do that, we open a consumer process with the kafka-console-producer
command. Keep the consumer running. Open another tab and run the following command (don’t forget to set the environment variables as illustrated above):
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh \
--broker-list $KAFKA_BROKERS \
--producer-property security.protocol=SASL_PLAINTEXT \
--topic my-tutorial-topic
>
Now type in a few messages and switch back to the previous tab (with the consumer running):
>This is a test
>And another one
>
Delete a topic
Once we are done using the topic, we can delete it:
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh \
--zookeeper $ZK_QUORUM \
--delete \
--topic my-tutorial-topic
All good! You learned how to create a topic, produce messages to and consume them from a Kafka topic.