>_
GolangStepByStep
Software Engineer

Background Jobs

Job queues, idempotency, retries, poison messages, durability

# The Restaurant Analogy

Imagine you walk into an extremely busy fast-food restaurant. You walk up to the Cashier (the HTTP Handler) and order a complex burger.

What if the Cashier took your money, turned around, slowly cooked the entire burger for 10 minutes, handed it to you, and then finally took the next customer's order? The line out the door would be miles long. The server would timeout and crash.

This is why we use Background Jobs. The Cashier takes your order very quickly, hands a receipt (a Task Message) to the Cooks in the back (the Background Workers), and immediately serves the next customer. The cooks asynchronously cook the burger and call your number when it's done.

# Level 1: Goroutines (Beginner)

In Go, it is incredibly tempting to just use a standard Goroutine to handle background work because it is so easy to write.

func SignupHandler(w http.ResponseWriter, r *http.Request) {
    user := createUser()
    
    // Start a background Goroutine to send the welcome email
    // This takes 3 seconds, but the user doesn't have to wait!
    go func() {
        sendWelcomeEmail(user.Email)
    }()
    
    w.Write([]byte("Signup complete!")) // Responds instantly!
}

The Fatal Flaw: In-Memory Fragility

What if the server crashes abruptly 1 second after returning the response? The Goroutine is brutally executed before the email is sent. The user never gets the email. The task is gone forever. This is acceptable for analyzing simple metrics, but completely unacceptable for processing credit cards.

# Level 2: Durable Job Queues (Intermediate)

To guarantee that a job is truly safe from server crashes, it requires Durability. We must save the task to a system running on an entirely separate, reliable machine (like Redis, RabbitMQ, or PostgreSQL).

Architecture flow for a Durable Job Queue:

  1. The Producer: The HTTP Server accepts the request, saves a tiny JSON payload to a Redis Queue (e.g., {"user_id": 123, "action": "email"}), and happily responds 200 OK.
  2. The Broker: Redis acts as a bulletproof safe, holding the message on disk.
  3. The Consumer (Worker): A completely separate Go application constantly asks Redis, "Do you have any tasks for me?" It grabs the data, processes it safely, and only tells Redis "I am done" after the email successfully sends.

If the Go Worker crashes mid-email, Redis notices the worker died and safely hands the job to a different worker!

# Level 3: Idempotency (Advanced)

Because Queue brokers are designed to retry jobs heavily when workers crash or networks fail, they provide an At-Least-Once Guarantee. This means a single job might mysteriously be executed twice.

If your background job is "Send $500 to Bank Account", and it runs twice, you are fired. You must design your jobs to be Idempotent.

func SendMoneyTask(transferID string, amount int) {
    // IDEMPOTENCY LOCK: We use the database to strictly enforce 
    // that this transferID can ONLY be marked complete one time.
    success := database.MarkTransferComplete(transferID)
    
    // If it was already complete from a previous ghost retry...
    // We abort safely! We do not do the work again.
    if !success {
        log.Println("Transfer already processed. Exiting cleanly.")
        return 
    }
    
    // Now we know it is 100% safe to do the dangerous work.
    BankAPI.Send(amount)
}

Idempotency allows you to aggressively hit "Retry" on failing services hundreds of times without any fear of duplicating consequences.

# Level 4: Poison Messages and Dead Letter Queues (Expert)

Let's say a developer accidentally pushes a bad task into the queue where the JSON is totally corrupted.

The Go worker pulls the task, attempts to decode it, violently panics, and dies. Redis immediately gives the task to Worker B. Worker B pulls it, panics, and dies.

This task is a "Poison Message" (or Poison Pill). It will literally murder every single worker in your cluster iteratively, taking down your business.

To solve this, expert architectures implement a Max Retry Limit combined with a Dead Letter Queue (DLQ).

  • When the worker crashes the first time, Redis logs it: Attempts: 1/5.
  • When it crashes for the 5th time, Redis says: "This message is poisoned."
  • It instantly removes it from the main queue and drops it into a secondary, inactive holding pen called the Dead Letter Queue.
  • The main queue immediately heals and goes back to processing normal jobs quickly.
  • A developer gets a Slack alert, inspects the DLQ manually, fixes the parsing bug in the code, and redeploys the fix.
practice & review