Archive for the ‘Architecture’ Category

Dimensions of Scalability

Designing for scalability is one of the primary challenges of system and software architecture.  For those of us who practice architecture, it’s also great fun thanks the high number of variables involved, the creativity required to discover exploits, the pattern matching to apply tricks and avoid traps, and the necessity to visualize the system in multiple possible futures.

In the broadest terms, “Is it scalable?” = “Will it break under growth?”  A few manifestations that are a bit more useful include “Will performance hold up as we add more users?”, “Will transaction processing time stay flat as the database grows?”, and “Will batch processing still complete within the allotted window as the size of our account base, data warehouse, or whatever multiplies?”.  Architects imagine the kinds of demand parameters that might occur over the life cycle of the system and incorporate mitigation plans.

These examples all pertain to the performance characteristics of a system.  However, there are other dimensions of scalability that are equally important when considering that system in a business context.

Strategic Dimensions

  1. Performance Scalability:  “An observation about the trend in performance in response to increasing demands.”
    Demand can refer to any of several parameters depending on the system such as number of concurrent users, transactions rates, database size, etc.  Performance measures may include event processing time, batch throughput, user perception, and many others.  In any case, we consider a system to be scalable if we observe a flat or nearly flat performance curve (i.e., little or no performance degradation) as any given demand parameter rises.  In reality, even highly scalable systems tend to be scalable through some finite range of demand beyond which some resource tends to become constrained causing degradation.
  2. Operational Scalability:  “An observation about the trend in effort or risk required to maintain performance in response to increasing demands.”
    This may be best illustrated by example. Consider a web application that is experiencing sharp increases in usage and a mid-tier performance bottleneck as a result.  If the application was designed for mid-tier concurrency, the mitigation effort may be simply adding more application servers (i.e., low effort, low risk).  If not, then significant portions of the application may need to be redesigned and rebuilt (i.e., high effort, high risk).  The former case is operationally scalable.  As with performance scalability, operational scalability occurs in finite ranges.  Continuing the previous example, at some point the database may become the bottleneck typically requiring more extensive remedial action.
  3. Economic Scalability:  “An observation about the trend in cost required to maintain performance in response to increasing demands.”
    We consider a system to be economically scalable if the cost of maintaining its performance, reliability, or other characteristics increases slowly (ideally not at all, but keep dreaming) as compared with increasing loads.  The former types of scalability contribute here.  For example, squeezing maximum performance out of each server means buying fewer servers (i.e., performance scalability) and adding new servers when necessary is cheaper than redeveloping applications (i.e., operational scalability).  However, other independent cost factors can swing things including commodity vs. specialty hardware, open source vs. proprietary software licenses, levels of support contracts, levels of redundancy for fault tolerance, and the complexity of developmental software which impacts testing, maintenance, and release costs.

Rocky Roads

Since the underlying theme of these additional dimensions is business context, it should be noted that rarely does an architect get to mitigate all imaginable scalability risks.  Usually this is simple economics.  In the early days of an application, for example, the focus is functionality without which million-user performance may never get to be an issue.  Furthermore, until its particular financial model is proven, excessive spending on scalability may be premature.

However, a good technology roadmap should project forward to anticipate as many scale factors as possible and have its vision corrected periodically.  Scalability almost always comes down to architecture and an architectural change which is usually pervasive by definition is the last thing you want to treat at a hot-fix.

The Redundancy Principle

Architecting complex systems includes the pursuit of “ilities”; qualities that transcend functional requirements such as scalability, extensibility, reliability, maintainability, and availability.  Performance and security are included as honorary “ilities” since aside from being suffix-challenged, they live in the same family of “critical real-world system qualities other than functionality”.  The urge to include beer-flavored took a lot to conquer.

Reliability, maintainability, and availability have some overlap.  For example, most would agree that availability is a key aspect of reliability in addition to repeatable functional correctness.  Similarly, a highly maintainable system is not only one that is composed of easily replaceable commodity parts, but one that can be serviced while remaining available.

As an architect, designing for availability can be great fun.  It’s like a chess game where you have a set of pieces, in many cases multiples of the same kinds.  Your opponent is a set of failure modes.  You know that in combating these failures, pieces will be lost or sacrificed, but if well played, the game continues.

We [Don’t] Interrupt this Broadcast

Every component in a system is subject to failure.  Hardware components like servers and disk drives carry MBTF (mean time before failure) specifications.  Communication media and external services are essentially compositions of components that can fail.  Even software modules may be subject to latent defects, memory leaks, or other unstable states however statistically rare.  Even the steel on a battleship rusts.  Failures cannot be avoided.  They can, however, be tolerated.

The single most effective weapon in the architect’s availability arsenal is redundancy.  Every high availability system incorporates redundancy in some way, shape, or form.

  • The aging U.S. national power grid provides remarkable uptime to the average household in spite of a desperately needed overhaul. At my house, electrical availability exceeds the IT-coveted five nines (i.e., 99.999%) and most outages can be traced to the local last mile.
  • The U.S. Department of Defense almost always contracts with dual sources for the manufacturing of weapon systems and typically on separate coasts in an attempt to survive disasters, natural or not.
  • The Global Positioning System comprises 27 satellites; 24 operational plus 3 redundant spares. The satellites are arranged such that a GPS receiver can “see” at least 4 of them at any point on earth. However, only 3 are minimally required to determine position albeit with less accuracy.
  • Even the smallest private aircraft have magnetos; essentially small alternators that generate just enough energy to keep spark plugs firing in case an alternator failure causes the battery to drain. Having experienced this particular failure mode as a pilot, I was happy indeed that this redundancy kept my engine available to its user.

Returning to the more grounded world of IT, redundancy can occur at many levels.  Disk drives and power supplies have among the highest failure rates of internal components and thus RAID technology and dual power supply modules in many servers and other devices.  Networks can be designed to enable redundant LAN paths among servers.  Servers can be clustered assuming their applications have been designed accordingly.  Devices such as switches, firewalls, and load balancers can be paired for automatic failover.  The WAN can include multiple geographically disparate hosting sites.

Drawing the Line

The appropriate level of redundancy in any system reduces to an economic decision.  By definition, any expenses incurred to achieve redundancy are in excess of those required to deliver required functionality.  Although in some cases, redundant resources used to increase availability may provide ancillary benefits (e.g., a server cluster can increase availability and throughput).

Redundancy decisions really begin as traditional risk analyses.  Consider the events to be addressed (e.g., an entire site going down, certain capabilities being unavailable, a specific application becoming inaccessible; for some period of time).  Then determine the failure modes that can cause these conditions (e.g., a server locking up, a firewall going down, a lightning strike hitting the building).  Finally, consider the cost of these events each as a function of its impact (e.g., lost revenue, SLA penalties, emergency maintenance, bad press) and the probabilities of its failure modes actually occurring.  The cost of redundancy to tolerate these failure modes can now be made dispassionately against their value.

As technologists, our purist hearts want to build the indestructible system.  Capture my bishops and rooks and my crusading knights will continue processing transactions.  However, the cost-benefit tradeoff drives the inexorable move from pure to real.

The good news is that many forms of redundancy within the data center are inexpensive or at least very reasonable these days given the commoditization of hardware and the pervasiveness of the redundancy principle.  Furthermore, if economics keeps you from realizing total redundancy, do not be disheartened.  We’re all currently subject to the upper bound that we live on only one planet.

Rational Scalability

The Chief Technologist of any highly successful Web x.0 company one day must come to grips with the horror of success.  Such companies can find themselves swept within the compressed bipolar extremes of “make it function and be careful with pre-revenue cash” and “holy crap, half the Internet is hitting us”.  It’s like a wormhole suddenly bringing together two points in space that were previously vastly distant and in the process evaporating the time to prepare for arrival.  A good problem to have, right?  (clearing my throat, loosening my collar)  It certainly beats the alternative, but without a successful crossing, the alternative returns.

Extending the Wormhole

Several factors can influence the apparent flight time through the metaphorical wormhole aside from writing big checks on speculation.

Functional Realism:  Understand what the business does and why.  This one seems more obvious than history would indicate.  A deep understanding of how users will approach the system will help bring likely scale issues into focus.  For example, if operation X is 10 times slower than operation Y, conventional wisdom in isolation says “focus on tuning X”.  However, in the field if Y will be called 10,000 times more often, perhaps Y should get the attention.

Early Warning Radar:  Find leading indicators of usage up-ticks that are solid enough to trigger investment or other proactive steps.  At matchmine, our users created traffic to our platform via a network portfolio of media partners with whom we had B2B relationships.  The time it takes to establish these relationships provides a useful response buffer.

Capacity on Tap:  Maintain as much in reserve infrastructure as possible without paying for it.  For example, the bandwidth contract on our main pipes had minimum charges plus various overage terms.  However, all aspects of the technical infrastructure could handle about 100 times these minimums with only a billing change.  Other areas to consider are cloud computing (e.g., Amazon EC2) and edge caching (e.g., Akamai).

Architecture:  If a young Web x.0 company is fortunate enough to apply some serious architecture practice in its early days, the wormhole effect can be greatly reduced.  CPU, storage, and bandwidth are commodities that by and large can be added by releasing funds.  However, silicon, iron, and glass can only take an inadequate architecture so far and good architecture doesn’t happen overnight or under fire.

Rational Scalability

One of the most challenging aspects of planning for scale is how to rationalize large resources against a need that hasn’t happened yet.  Few things will make a CFO’s spinal chord vibrate like a proposal to build a system that will support 10,000,000 users when you’re still at 1,000.  And is 10,000,000 even the right number?  Could it be an order of magnitude either way?  Overspending to this level of uncertainty simply converts technical risk into financial risk.

The key is to find something on which to rationalize or if possible compartmentalize scale.  This may be along functional lines, classes of users, or qualities of service, for example.  Depending on the application, however, this may take a place of prominence on the easier said than done list.  The user base of many applications is simply huge and rather monolithic often leading to other modes of partitioning (e.g., grouping user accounts into separate databases by alphabetical ranges).

At matchmine, we could exploit the aforementioned B2B layer.  All of our users hit our platform in the context of using one or more of our media partners.  This provides a very natural partitioning opportunity.

Consider the matchmine Discovery Server.  This is the platform component that services content retrieval requests in all their variations.  These operations return only content from the partner’s view of the catalog; the subset of our catalog that they sell, rent, share, discuss, or otherwise service.  These subsets are much smaller than the whole rendering them much easier to memory-cache.

Thus the architectural exploit.  The Discovery Server can be refactored into a controller and a set of servicing nodes.  The controller is the web service endpoint that handles front line security, metrics collection, and request routing to nodes.  The nodes service the actual content retrieval operations, but on a partner-specific and highly cached basis.

The caching dramatically improved performance over the monolithic approach and also provides the perfect hunting ground for rational scalability:

  • We know the sizes of content and supporting data for each partner. Therefore, we can determine the best mapping of partners to nodes based on the objective of memory caching their data. Alternately, we can size virtual servers to specific partners.
  • We know the user base size of each partner. Therefore, we have an upper bound for estimating Discovery Server usage per partner and thus can determine how many nodes to allocate per partner to handle their throughput.
  • We know the growth rates of content and user base of each partner. Therefore, we can predict how the foregoing two points will change over time since Discovery Server usage growth is bounded by these rates.
  • We know the total traffic hitting the Discovery Server. Therefore, we can determine how many controllers we’ll need. While controller traffic is non-partitioned, the controllers are functionally very light, stateless, and thus easy to scale.

As this example illustrates, rational scalability is the art of tying the various dimensions of growth to externally observable and/or predictable factors.  The natural regulators of B2B growth can assist greatly whether the business itself is B2B or a B2B layer wraps the user base.  In a pure B2C play with a single large class of users, this may be somewhere between difficult and impossible, but the importance of trying cannot be overstated.

Performance != Scalability

Performance and scalability are often conflated, sometimes equated, but only somewhat related.  In web services platforms, both characteristics are business critical.  But while each exerts influence over the other, they are highly distinct and have profoundly different business impacts.

Performance & Scalability Revealed

Using a web services platform as context, performance and scalability can be defined as follows:

particlePerformance is a measure of speed from a caller’s perspective; the time it takes for the platform to accept a request, process it, and return a response.  We measure response time at the platform’s front door (i.e., excluding Internet latencies, etc.).  From a platform perspective, performance is also a measure of throughput; the number of operations that can be completed per unit time.  This is a function both of response time and concurrency.

Scalability is an observation about the trend in performance under increasing demands.  For example scalability can be characterized as the rate of response time degradation relative to increasing platform traffic.  Alternately, scalability can be viewed as the increase in throughput attained via an increase in system resources (e.g., if I double my server capacity, will I actually get twice the throughput?).

Note that this definition of scalability is irrespective of absolute performance.  For example, a slow operation may be considered scalable if a 10X increase in load results in only a 1.2X increase in response time.  Conversely, a lightning fast operation may require a resource that can only be accessed serially, thereby causing a concurrency bottleneck and thus may not be scalable.

Clearly performance and scalability are related, but are not equivalent.  In practice however, the faster an operation’s response time, the less system resources it consumes.  This means that it reduces load on other operations and is less affected by the load of others.  Both effects can positively impact throughput and thus scalability; doing more per unit system resource.


In our practical universe, scalability and performance are highly related through economics.  Per the previous point, the faster an operation’s response time, the less system resources it consumes.  Generally speaking, this translates into higher throughput by a given set of physical resources.  Thus, the higher the performance, the lower the rate of capital expenditures required to handle a given rate of traffic growth.  Therefore, scalability and performance both contribute to economic scalability; doing more per unit cost.

Put another way, both higher performance and higher scalability reduce the cost of scaling.

So Who Cares?

Performance as response time is business critical because it directly impacts user experience, without which there is no business.  Our users and our business partners care.

Performance as throughput is business critical because it directly impacts system expenditures and the ongoing total cost of ownership.  I just made the CFO happy.

Scalability is business critical for avoiding infrastructure meltdowns in the face of market success and is one of technology’s key contributions to profitability.  Clearly we all care about this one.

Finally, an early understanding of scalability characteristics is critical to aligning engineering investments with company growth.  Unlike performance tuning, scalability limitations are rarely fixed quickly since they are often a function of architecture rather than implementation.  Many resource constrained startups choose functionality over scalability for good and obvious reasons.  However, the sooner an engineering team can target some serious focus on scalability, the fewer techies will be seeing 3am from the office.