What is Transaction
-
A transaction is a way for an application to group several reads and
writes together into a logical unit.
Application sends data packet to DbMgr. DB Manager will not write every packet to database but it groups several reads/writes together into a logical unit called transaction.
Conceptually, all the reads and writes in a transaction are executed as one operation: either the entire transaction succeeds (commit) or it fails (abort, rollback).
If it fails, the application can safely retry.
|----|
Application ---data pkt----> |DB Manager| | DB |
|----|
Why Transaction is needed?1. Atomicity: if Application is writing 20KB json document to DB. After 10KB node crashes, User will see partially updated values which is Wrong. if write is done as part of transaction into Atomic DB, partial write is discarded and db moves to prev state
2. Safety Gurantees
| Dirty Read | Dirty Write | |
|---|---|---|
| 1. Read Committed | No | No |
Terms
Read Related
| Term | Description |
|---|---|
| Dirty Read |
A transaction is writing some row to the
database, but the transaction got aborted inbetween, ie data is not
committed. if another transaction can see that uncommitted data, it is a dirty read To Prevent Dirty Read? Database remembers both the old committed value and the new value set by the transaction. Currently ongoing write transaction holds the write lock. While the transaction is ongoing, any other transactions that read the object are simply given the old value. Only when the new value is committed do transactions switch over to reading the new value. |
| Read Skew / Non repeatable read |
Say Alice has $1,000 of savings at a bank, split across two accounts
with $500 each. Now a transaction transfers $100 from one of her
accounts to the other. If she looks at balance at same moment she
might find 500$ in one, 400$ in another Solution: Snapshot Isolation |
Write Related
| Term | Decription |
|---|---|
| Dirty Write |
Two transactions concurrently try to
update the same row in a database. 1st write transaction was
uncommitted, data in DB is in inconsistent state On top of this incorrect data, 2nd transaction writes, this is Dirty write. To prevent dirty write? Take lock over row then write a transaction, if transaction fails revert the change and then release the lock |
| Write Skew / Phantom |
Race condition caused when 2 transactions update different rows in
same table after reading common data from table Example: There is a hospital where atleast 1 doctor is required to be on shift(always) in hospital. Doctors can giveup their shifts if they are Sick, provided atleast 1 of their collegue stays in hospital. Alice & Bob are 2 doctors(on particular shift) feeling unwell and decide to request leave. Both click "Apply leave" button at same time.
Lock cannot solve the issue, Since both transactions are updating
DIFFERENT ROWS in DB.Solution: - if both transactions should have run one after other, txn2 will get (on_call<2) and write would have been prevented. |
Distributed Transaction
-
Example(Customer ordering Pizza)
To create an order different microservices need to enter related data into THEIR respective databases.
txn1a,txn1b,txn1c are all part of 1 transaction only. All transactions should be committed together ie part of 1 transaction, if any of transaction(txn1a or txn1b or txn1c) fail other transaction should be reverted.
Committing different transactions(txn1a, txn1b, txn1c) into different DBs and all transactions(txn1a + txn1b + txn1c = t1) are part of 1 transaction is called DISTRIBUTED TRANSACTION
Customer
- Pizza 240/-> createOrder()
|-
|-customer-service --read cust id------> DB1
<-----------------
|-Order-service2 --trxn1a(Write order)-----> DB2
|-Kitchen-service3 --trxn1b(Enter item)---> DB3
|-Accounting-service4 --trxn1c(Create Record)--> DB4
2 Phase Commit / 2PC (Way to acheive Distributed Transaction)
XA Transaction / Open XA (eXtended Architecture)
-
This is standard for implementing two phase commit,
supported by many traditional relational databases (including
PostgreSQL, MySQL, DB2, SQL Server, and Oracle)
It talks about interfacing with a transaction coordinator(Broker) using C APIs. Microservice should talk to DB using network driver or client library.
Working of 2PC (A Definite Promise)
-
Coordinator
- Can be external(Zookeeper) or internal(any microservice)
|
Steps 1. Coordinator generates globally unique transactionId and sends to microservices. Now-on-wards microservices will use same id for communication. 2. Coordinator sends Prepare message(with transactionId) asking microservices whether they can commit or not? 3. When microservices recieves prepare, it makes sure that it can commit to DB at any cost. - Even if microservices crashes After sending Yes. Then after recovery it has to commit the transaction. - There is no turning Back. 5. Coordinator recieves response from all nodes. This is called COMMIT POINT 6. Coordinator writes its decision to Transaction log. so that if any microervice responds with No, whole commit can be moved to prev state. 7. Phase-2: Coordinator Send Message to commit. microervice Commit ie write to DB else //if any one says no Phase-2: Send Abort Message to all microervice via coordinator DOUBT/UNCETAIN STATE: - Coordinator sends prepare, microervice voted Yes, and Coordinator crashes. Now microervices are in Doubt state. microervices will wait forever for Coordinator to send Phase-2 message(Commit or Abort). HEURISTICS DECISIONS: Microervices can communicate & decide amongst themseleves once did not hear from Coordinator for particular time. |
Features of Distributed Transaction
-
1. All Commit or All Abort: Either all nodes commit or all nodes
abort.
2. A transaction commit is IRREVOCABLE: ie once node decides to commit, then there is no turning back. Reason: Once data has been committed, it becomes visible to other transactions, and thus other clients may start relying on that data. Suppose 1 microservice dies & fails to commit, then this service should commit again once it comes back, ie never go back.
Problems with Distributed Transaction
-
1. Not supported by modern technologies Eg: NoSQL DB does not
support. Apache Kafka, Cassandra, RabbitMQ does not support.
2. DT is form of Synchronous RPC: Look at Working on 2PC Kitchen microservice is waiting, holding locks If coordinator crashes → Kitchen blocks indefinitely and even cannot write other transactions
Solution: Saga Design Pattern