>_ Golang Step By Step
Staff Software Engineer

Scaling Systems

Horizontal vs vertical scaling, sharding, partitioning, and capacity planning

# Horizontal vs Vertical Scaling

There are two fundamental approaches to handling more load:

Vertical Scaling (Scale Up)         Horizontal Scaling (Scale Out)
┌─────────────────────┐            ┌──────┐ ┌──────┐ ┌──────┐
│                     │            │ App  │ │ App  │ │ App  │
│   ███████████████   │  CPU ↑     │  #1  │ │  #2  │ │  #3  │
│   ███████████████   │  RAM ↑     └──┬───┘ └──┬───┘ └──┬───┘
│   ███████████████   │  Disk ↑       │        │        │
│   BIG SERVER        │            ┌──▼────────▼────────▼──┐
│                     │            │     Load Balancer      │
└─────────────────────┘            └────────────────────────┘

Pros: Simple, no code changes      Pros: No upper limit, redundancy
Cons: Has a ceiling, SPOF          Cons: Complexity, distributed state

Strategy: Scale vertically first (it's simpler). When you hit limits (~$10K+ machines, or need redundancy), go horizontal. Most systems end up using both.

# Sharding

Sharding splits your data across multiple databases. Each shard holds a subset of data and operates independently.

Hash-based sharding:
  shard = hash(user_id) % num_shards

  user_id=42  → hash=7839  → 7839 % 4 = shard 3
  user_id=99  → hash=2104  → 2104 % 4 = shard 0

         ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐
         │Shard 0 │  │Shard 1 │  │Shard 2 │  │Shard 3 │
         │users:  │  │users:  │  │users:  │  │users:  │
         │99,156  │  │23,781  │  │44,512  │  │42,301  │
         └────────┘  └────────┘  └────────┘  └────────┘

Challenge: Cross-shard queries (e.g., "find all orders over $100") require scatter-gather across all shards. Design your shard key to match your most common query patterns.

# Caching Layers

Multi-tier caching is the single most effective scaling technique. Each layer reduces load on the next.

Request Flow:

Browser Cache → CDN → API Gateway Cache → App Memory (L1)
                                              │
                                      Redis/Memcached (L2)
                                              │
                                          Database

Layer          │ Latency  │ Hit Rate │ Scope
───────────────┼──────────┼──────────┼────────────
Browser cache  │ 0 ms     │ ~40%     │ Per user
CDN            │ 5 ms     │ ~60%     │ Per region
L1 (in-proc)  │ <1 ms    │ ~80%     │ Per server
L2 (Redis)    │ 1-2 ms   │ ~95%     │ Shared
Database       │ 5-50 ms  │ —        │ Source of truth

# Capacity Planning

Good capacity planning prevents both outages (too little) and waste (too much). Start with traffic estimation, then work backwards to resources.

  • Estimate QPS — DAU × actions per user ÷ 86400, then 3x for peak
  • Estimate storage — Records per day × avg size × retention period
  • Estimate bandwidth — QPS × avg response size
  • Add headroom — 2-3x for spikes, plan 6 months ahead

⚡ Key Takeaways

  • Scale vertically first, horizontally when you need it
  • Sharding enables horizontal scaling for databases — choose shard keys carefully
  • Multi-layer caching is the #1 scaling lever — cache aggressively
  • Consistent hashing minimizes data movement when adding/removing nodes
  • Always start with back-of-envelope math before building
practice & review