Saga microservices design pattern

Circuit Breaker Design Pattern

Suppose service1 gets some data from service2. But there is always the possibility that the other service2 is unavailable/dead.
When service2 is dead, service1's thread might be pooling service2 to respond. This is resource wastage for service1
Failure to get data can potentially cascade to other services throughout the application, if other services depends on service1.

Problem:

How service1 knows quickly that service2 is dead and donot waste its resources?

Solution:

Traditional, Preconfigured thresholds and timeouts
When the number of consecutive failures crosses a threshold stop sending requests for x=10 timeout.
After 10s send only 5 test requests(ie limited number of test requests). If those requests succeed resumes to normal operation. Otherwise, if there is a failure the timeout period begins again.

Modern Adaptive techniques
Using AI and ML dynamically adjust thresholds based on real-time traffic patterns, anomalies, and historical failure rates. This approach improves resiliency and efficiency.

HTTP CLIENT


// g++ service1.cpp -lpthread -o service1
// ./service1

// Configuration
constexpr int restTimeOut = 10;     // Don't send anything for 10 sec
constexpr int failureThreshold = 5; // Consecutive 5 failures, endpoint dead
constexpr const char* endpoint = "ip:8081/service2_health";

// State variables
std::time_t lastFailureTime = 0;
std::atomic <int> successCount(0);
std::atomic <int> failureCount(0);
std::atomic <bool> atomicIsAlive(false);

enum CircuitState {
    CLOSED,      // Normal operation - requests pass through
    OPEN,        // Service unavailable - requests fail fast
    HALF_OPEN    // Testing if service is back
};

std::atomic <CircuitState> State(CLOSED);

// Mock HTTP GET function
bool httpGet(const char* url) {
    // Implementation would make actual HTTP request
    // Returns true if service responds successfully
    return false; // Placeholder
}

// Service check thread function
void pingService2() {
    while (true) {
        std::this_thread::sleep_for(std::chrono::seconds(1));
        
        if (State == OPEN) {
            std::time_t currentTime = std::time(nullptr);
            std::time_t elapsedTimeSinceFailure = currentTime - lastFailureTime;
            
            if (elapsedTimeSinceFailure > restTimeOut) {
                State = HALF_OPEN;  // Move to testing state
                successCount = 0;
                failureCount = 0;
            }
        }
        
        if (State == HALF_OPEN || State == CLOSED) {
            // Try to contact the service
            bool resp = httpGet(endpoint);
            
            if (resp) {  // Success
                successCount++;
                failureCount = 0;
                atomicIsAlive = true;
                
                if (State == HALF_OPEN && successCount >= 1) {
                    State = CLOSED;  // Service recovered
                }
            } else {    // Failure
                failureCount++;
                successCount = 0;
                
                if (failureCount >= failureThreshold) {
                    atomicIsAlive = false;
                    lastFailureTime = std::time(nullptr);
                    State = OPEN;  // Circuit breaker opens
                }
            }
        }
    }
}

// Mock service 1 function
bool threadService1() {
    return atomicIsAlive.load();
}

// Circuit breaker protected request function
bool makeRequest() {
    if (State == OPEN) {
        std::time_t currentTime = std::time(nullptr);
        std::time_t elapsedTimeSinceFailure = currentTime - lastFailureTime;
        
        if (elapsedTimeSinceFailure > restTimeOut) {
            State = HALF_OPEN;  // Allow a test request
        } else {
            return false;  // Fail fast
        }
    }
    
    // In CLOSED or HALF_OPEN state, attempt the request
    bool result = httpGet(endpoint);
    
    if (!result) {
        failureCount++;
        if (failureCount >= failureThreshold) {
            lastFailureTime = std::time(nullptr);
            State = OPEN;
        }
    } else {
        successCount++;
        if (State == HALF_OPEN) {
            State = CLOSED;  // Service is back
        }
        failureCount = 0;
    }
    
    return result;
}

int main() {
    std::thread healthCheckThread(pingService2);
    healthCheckThread.detach();  // Run in background
    
    // Main application loop
    while (true) {
        if (threadService1()) {
            std::cout << "Server2 up, sending packet...";
            bool success = makeRequest();
            if (!success) {
                std::cout << "Request failed through circuit breaker" ;
            }
        } else {
            std::cout << "Server2 down, circuit breaker open";
        }
        
        std::this_thread::sleep_for(std::chrono::seconds(2));
    }
    
    return 0;
}

1. Start healthcheck service in thread
2. Check HTTP Server is up and running
if state=open, and time elapsed since last failure is greater than our configured restTimeOut. ie we donot want to send ping probes before restIntervalBetweenFailedAttempts
Send HTTP GET to endpoint
if response was Success, increment success count, make failure count=0
if response was failure, increament failure count and if count > threshold declare failed.

HTTP SERVER


// g++ service2.cpp -lpthread -o service2
// ./service2
int main() {
	// HTTP Server. RESTEndpoint: service2_health
	// HTTP GET ip:8081/service2_health [1=healthy]
}