You've reached the end of your free preview.
Want to read all 12 pages?
Unformatted text preview: Recitation 1
Apache Kafka
Shreyans Sheth
May 20th, 2020 Agenda
●
●
●
● Kafka Basics
Kafka Architecture Overview
Software Setup
Simple programming exercise Apache Kafka - Introduction
● ● ● What is it ?
○ Essentially, a publish subscribe system in it’s canonical use case
○ Designed for scalability, reliability and high throughput
○ 3 primary concepts - Produces, Topics, Consumers
Why use it ?
○ Used for real time data stream processing
○ Eg. Logging, Metrics, high volume of real time activities, etc.
Why learnt it ?
○ You will need it in your assignments :) Apache Kafka - Architecture Overview ●
●
●
●
●
● ●
● Publishers
Consumers
Clusters
Brokers
Topics
Partitions
○ Why ?
Scalability
Consumers
○ Offsets
Groups
○ One
consumer
per group
per topic Setup
●
● Open a new terminal
Run ssh -L 9092:localhost:9092 [email protected] -NT
○
○
○ ● Install kafkacat (CLI tool for Kafka)
○ ● This command forwards the port 9092 of our server to your local machine (just leave it
running in the background)
Password given during recitation
Alternatively, use the ssh key found here and run ssh -L 9092:localhost:9092
[email protected] -NT -i id_rsa
brew install kafkacat OR sudo apt-get install kafkacat Test your connection
○ kafkacat -b localhost -L Setup - Successful Output Exercise - Bootstrapping the project
1.
2.
3.
4.
5.
6. Create a directory:
● mkdir seai-recitation-1
Navigate to the director:
● cd seai-recitation-1
Install pip:
● (sudo) pip install virtualenv
Setup your virtualenv:
● (python -m) virtualenv -p python3 venv
Activate the virtualenv
● source venv/bin/activate
Install kafka library for python
● (sudo) pip install kafka-python Exercise - Writing to kafka (gist)
Create a python script - touch producer.py
from time import sleep
from json import dumps
from kafka import KafkaProducer
# Create a producer to write data to kafka
producer = KafkaProducer(bootstrap_servers=['localhost:9092'],
value_serializer=lambda x: dumps(x).encode('utf-8'))
# Write data via the producer
for e in range(10):
data = {'number' : e}
producer.send(topic='numtest-<andrewid>', value=data)
sleep(1) Exercise - Test output via kafkacat!
● kafkacat -b localhost -t numtest-<andrewid>
Output Exercise - Reading from Kafka (gist)
from kafka import KafkaConsumer
from json import loads
# Create a consumer to read data from kafka
consumer = KafkaConsumer(
'numtest-<andrewid>',
bootstrap_servers=['localhost:9092'],
# Read from the start of the topic; Default is latest
auto_offset_reset='earliest'
)
# Prints all messages, again and again!
for message in consumer:
# Default message.value type is bytes!
print(loads(message.value)) Exercise - Reading (smartly) from Kafka
How would you make reads more fault tolerant ?
consumer = KafkaConsumer(
'numtest-<andrewid>',
bootstrap_servers=['localhost:9092'],
auto_offset_reset='earliest',
# Consumer group id
group_id='numtest-group-<andrewid>',
# Commit that an offset has been read
enable_auto_commit=True,
# How often to tell Kafka, an offset has been read
auto_commit_interval_ms=1000
)
# Prints messages once, then only new ones. Run again and see!
for message in consumer:
print(loads(message.value)) Refer to official Kafka Python docs for more useful API methods and advanced use
cases - KafkaConsumer Thanks!
References
●
●
...
View
Full Document
- Fall '19