Kafka

Kafka(Apache) is distributed,scalable,fault tolerant,open source publish-subscribe messaging system. It can store the messages, replay back(to subscribers)

consumer            kafka-Broker(192.168.0.1:9092)   //1. broker starts
    --subscribe topic(t1)->                          //2. consumer subscribes to topic
while(1){
..
}
                                                      Producer //3. Producer produces topic
                        kafka-Broker 
                            <--|topic=t1,payload="test"|--
  <--|topic=t1,payload="test"|-
      

Message delivery methods

Method description
1. At-least-once semantics Producer can keep on sending same message until Broker does not ACK
2. At-most-once semantics Producer sends message once only & never retries, Even if ACK comes or not
3. Exactly-once semantics(EOS) A message is only seen once by the consumer of the message

Architecture


|---------------------------------------- Kafka Cluser ------------------------------------------------------------|
|                                                                                                                  |
|   |--------------------------- BROKER-2(master)-----------------------------------|   |--Broker1(master)--|      |
|   | |------------------ Partition-1 ----------------------|                       |   |                   |      |
|   | | |------------- topic = Employee --------------|     |                       |   |-------------------|      |
|   | | | Message-1 {name: "Alice", empid: "1", }     |     |                       |                              |
|   | | | Message-2 {name: "Bob", empid: "2",   }     |     |                       |                              |
|   | | |---------------------------------------------|     |                       |                              |
|   | |                                                     |                       |                              |
|   | | |---------- Messages ------------------|            |                       |                              |
|   | | | msg1->msg2->msg5    msg6->msg9->msg7 |            |                       |                              |
|   | | |--------------------------------------|            |                       |                              |
|   | |-----------------------------------------------------|                       |                              |
|   |                                                                               |                              |
|   |                       |------------------ Partition-2 --------------------|   |                              |
|   |                       | |------------- topic = Orders -----------------|  |   |                              |
|   |                       | | Message-1 {id: 1, product: "toy", cust:"A"}  |  |   |                              |
|   |                       | | Message-2 {id: 2, product: "gun", cust:"B"}  |  |   |                              |
|   |                       | |----------------------------------------------|  |   |                              |
|   |                       |---------------------------------------------------|   |                              |
|   |                                                                               |                              |
|   |              |------------------------- Partition-n ------------------------| |                              |
|   |              | |-------------- topic = __consumer_offset ----------------|  | |                              |
|   |              | |  Consumer_Name  ID    Topic   Committed_Offset  State   |  | |                              |
|   |              | |  Consumer-1     uid1  Orders  Read till ms7     Alive   |  | |                              |
|   |              | |  Consumer-2     uid1  Orders  Read till ms7     Alive   |  | |                              |
|   |              | |---------------------------------------------------------|  | |                              |
|   |-------------------------------------------------------------------------------|                              |
|                                                                                                                  |
|------------------------------------------------------------------------------------------------------------------|
          

Terms

Kafka cluster
  Contains multiple Brokers. Kafka is multimaster system. No node is slave

Partition
  This is NOT Disk partition Partition stores topics.
{1 Partition=1 Topic} 1 partition cannot contains 2 topics.

Broker:
TCP Server(listening on 9092) which recieve Topics(key=id,value=payload) and store them.
Bootstrap Broker: On Init, kafka needs(at least one) broker called the bootstrap brokers. The client will connect to the bootstrap brokers specified by the bootstrap.servers configuration property and query cluster Metadata information which contains the full list of brokers, topic, partitions and their leaders in the Kafka cluster.

Topics & Messages
  Topic is collection of same type of Messages(stored in ordered fashion).Topics are stored in Partitions.
  Schemas are imposed on messages (Eg: XML, JSON) so that messages can be understood easily.
Messages written to partitions are immutable and cannot be updated.
Offset:
  Postition of message inside Partition. 1 Topic can be stored on Multiple partitions.
  Each message is assigned a unique ID(sequence number, offset), which monotonically increases on that Partition Offset sequences are unique only to each partition.
  This means, to locate a specific message, we need to know the Topic, Partition, and Offset number.

Kafka Consumers:
  Kafka consumers are separate physical nodes which want to read messages.
  Internally inside kafka each consumer is identified by a unique consumer ID and other attributes.

Consumer Group:
  Each consumer is identified by a unique consumer ID and other attributes.
  Consumer informations is stored in Consumer Group. A Consumer Group is group consumers which are intrested in 1 topic.
  1 Consumer will read 1 paritition only.
Scalability & fault-tolerant using Consumer Group:
  Consumer1 reads from Partition1, Consumer2 reads from Partition2 & partitions are replicated across multiple brokers
topic="__consumer_offsets"
  This is a special topic stores all information of consumers of a group. It maintain committed offsets, consumes messages, consumerId etc.

Stream
  This is not message from 1 application to other, but real-time flow of records. Records are (key,value) pairs.

  --|key=a,value=x|---|key=c,value=z|---|key=b,value=y|-->
        

Replication & Fault Tolerance in Kafka

Every topic can be replicated to multiple Kafka brokers to make the data fault-tolerant and highly available.
Each topic partition has one leader broker and multiple replica (follower) brokers

Leader Broker

Leader is responsible for partitions(not topics).
Every partition has one Kafka broker acting as a leader.
Partition Leader information is stored on Zookeeper.
All Read/Write operation as performed by Partition Leader, hence All producers & consumers talk to zookeeper to address of leader of partition.

Follower/Replica Broker

Followers replicate data from leader to serve as a backup partition & can become leader when leader goes down In sync Replica (ISR) An ISR is a broker that has the latest data for a given partition. A leader is always an in-sync replica. Only ISRs are eligible to become partition leaders.

High-water mark

When subscriber reads from Leader. Leader never returns messages which have not been replicated to a minimum set of ISRs.
Leader keeps track of highest water mark. Highest watermark is the offset for that partition replication.
Example(in figure below): Leader does not return messages greater than offset=4, as it is the highest offset message that has been replicated to all followers.
Partition Replication