Vertical Scaling

Vertical Scaling (or Scale Up) means increasing the capacity of a single server by adding more CPU, RAM, disk space, or faster network. It is the simplest way to handle higher load without changing the application architecture.

✅ When is it appropriate

Vertical Scaling is suitable if most of the following apply:

  • the application is stateful or stores data in memory, making it impractical to split across multiple servers
  • the bottleneck is clearly one resource such as CPU or RAM, not application code or database queries
  • peak load is well understood and a larger instance can handle it within provider limits
  • no load balancer or distributed session management is in place and adding them is not planned
  • a brief restart window during resource upgrades is acceptable
  • the application architecture cannot be easily distributed across multiple machines

Vertical scaling is straightforward because it requires no changes to the application code or architecture. Every cloud provider and hardware vendor offers a maximum instance or machine size. Once that ceiling is reached, the only way to handle more load is to redesign the application to run on multiple servers.

❌ When is it NOT appropriate

Vertical scaling may not be ideal if:

  • the application has already reached the largest instance size available from the cloud provider or hardware vendor
  • availability requirements are high, since a single server crash takes the entire application offline for all users
  • the cost of the next instance tier is significantly higher than running two smaller instances in parallel
  • traffic is unpredictable with sudden spikes that would require repeatedly resizing the server

When the server is upgraded on bare metal or in some cloud configurations, it must be restarted. During that restart, all users lose access. There is also no backup server to take over if the machine crashes unexpectedly.

👍 Advantages

  • simple and quick to implement
  • no changes to application code or architecture are required
  • works well for both stateless and stateful applications
  • a single larger server often performs better than multiple smaller ones for workloads that cannot be parallelised
  • no load balancer or distributed networking configuration is needed

👎 Disadvantages

  • every cloud provider and hardware vendor sets a maximum instance size that cannot be exceeded
  • if the server crashes, all users lose access until the machine recovers with no automatic failover
  • server upgrades may require a restart, causing downtime that affects every user at once
  • cost per unit of CPU or RAM increases significantly at the high end of available instance tiers
  • cannot scale down quickly in response to a traffic drop without another restart or instance change

🛠️ Typical use cases

  • small to medium-sized applications running on a single server
  • predictable or moderate workloads
  • on-premises deployments where adding new servers is difficult
  • early-stage projects or prototypes needing quick performance boosts
  • applications where downtime for upgrades is acceptable

⚠️ Common mistakes (anti-patterns)

  • continuing to upgrade to larger instances when the bottleneck is slow queries or inefficient code, not resource limits
  • running without any backup or failover plan, so a single hardware failure takes the entire application offline
  • not tracking the maximum instance size available, so the team is surprised when there is no larger tier to move to
  • not setting CPU and memory alerts, so resource saturation is discovered only when users report slowdowns
  • storing critical state only in server memory without backups, so a crash loses both uptime and data

A common failure is upgrading to a larger and more expensive instance when the real bottleneck is a slow database query or inefficient application code. Profiling the application first often reveals that no hardware change is needed at all.

💡 How to build on it wisely

Recommended approach:

  1. Check CPU, RAM, and disk metrics before upgrading. If CPU is at 90% and RAM is at 40%, adding RAM will not improve performance.
  2. Upgrade one resource at a time rather than moving to the next full instance tier, which may be two to four times more expensive.
  3. Schedule upgrades during the lowest-traffic window and communicate a maintenance window to users in advance.
  4. Plan for horizontal scaling in parallel so that if the application outgrows the largest available instance, the architectural groundwork is already in place.
  5. Record the current instance size, monthly cost, and the maximum instance size available from your provider so the team knows how much headroom remains.
  6. Set CPU and memory alerts at 75% and 90% thresholds so the team can plan the next upgrade before users are affected.

When the application requires an instance upgrade every few weeks, when the largest available instance tier is already in use, or when the next tier up costs more than two smaller servers running in parallel, it is time to plan for horizontal scaling.

Feedback & Sharing

Give us your thoughts on this page, or share it with others who may find it useful.

Share with your network:

Feedback

Found this helpful? Let me know what you think or suggest improvements 👉 Contact me.