Rational Scalability

The Chief Technologist of any highly successful Web x.0 company one day must come to grips with the horror of success.  Such companies can find themselves swept within the compressed bipolar extremes of “make it function and be careful with pre-revenue cash” and “holy crap, half the Internet is hitting us”.  It’s like a wormhole suddenly bringing together two points in space that were previously vastly distant and in the process evaporating the time to prepare for arrival.  A good problem to have, right?  (clearing my throat, loosening my collar)  It certainly beats the alternative, but without a successful crossing, the alternative returns.

Extending the Wormhole

Several factors can influence the apparent flight time through the metaphorical wormhole aside from writing big checks on speculation.

Functional Realism:  Understand what the business does and why.  This one seems more obvious than history would indicate.  A deep understanding of how users will approach the system will help bring likely scale issues into focus.  For example, if operation X is 10 times slower than operation Y, conventional wisdom in isolation says “focus on tuning X”.  However, in the field if Y will be called 10,000 times more often, perhaps Y should get the attention.

Early Warning Radar:  Find leading indicators of usage up-ticks that are solid enough to trigger investment or other proactive steps.  At matchmine, our users created traffic to our platform via a network portfolio of media partners with whom we had B2B relationships.  The time it takes to establish these relationships provides a useful response buffer.

Capacity on Tap:  Maintain as much in reserve infrastructure as possible without paying for it.  For example, the bandwidth contract on our main pipes had minimum charges plus various overage terms.  However, all aspects of the technical infrastructure could handle about 100 times these minimums with only a billing change.  Other areas to consider are cloud computing (e.g., Amazon EC2) and edge caching (e.g., Akamai).

Architecture:  If a young Web x.0 company is fortunate enough to apply some serious architecture practice in its early days, the wormhole effect can be greatly reduced.  CPU, storage, and bandwidth are commodities that by and large can be added by releasing funds.  However, silicon, iron, and glass can only take an inadequate architecture so far and good architecture doesn’t happen overnight or under fire.

Rational Scalability

One of the most challenging aspects of planning for scale is how to rationalize large resources against a need that hasn’t happened yet.  Few things will make a CFO’s spinal chord vibrate like a proposal to build a system that will support 10,000,000 users when you’re still at 1,000.  And is 10,000,000 even the right number?  Could it be an order of magnitude either way?  Overspending to this level of uncertainty simply converts technical risk into financial risk.

The key is to find something on which to rationalize or if possible compartmentalize scale.  This may be along functional lines, classes of users, or qualities of service, for example.  Depending on the application, however, this may take a place of prominence on the easier said than done list.  The user base of many applications is simply huge and rather monolithic often leading to other modes of partitioning (e.g., grouping user accounts into separate databases by alphabetical ranges).

At matchmine, we could exploit the aforementioned B2B layer.  All of our users hit our platform in the context of using one or more of our media partners.  This provides a very natural partitioning opportunity.

Consider the matchmine Discovery Server.  This is the platform component that services content retrieval requests in all their variations.  These operations return only content from the partner’s view of the catalog; the subset of our catalog that they sell, rent, share, discuss, or otherwise service.  These subsets are much smaller than the whole rendering them much easier to memory-cache.

Thus the architectural exploit.  The Discovery Server can be refactored into a controller and a set of servicing nodes.  The controller is the web service endpoint that handles front line security, metrics collection, and request routing to nodes.  The nodes service the actual content retrieval operations, but on a partner-specific and highly cached basis.

The caching dramatically improved performance over the monolithic approach and also provides the perfect hunting ground for rational scalability:

  • We know the sizes of content and supporting data for each partner. Therefore, we can determine the best mapping of partners to nodes based on the objective of memory caching their data. Alternately, we can size virtual servers to specific partners.
  • We know the user base size of each partner. Therefore, we have an upper bound for estimating Discovery Server usage per partner and thus can determine how many nodes to allocate per partner to handle their throughput.
  • We know the growth rates of content and user base of each partner. Therefore, we can predict how the foregoing two points will change over time since Discovery Server usage growth is bounded by these rates.
  • We know the total traffic hitting the Discovery Server. Therefore, we can determine how many controllers we’ll need. While controller traffic is non-partitioned, the controllers are functionally very light, stateless, and thus easy to scale.

As this example illustrates, rational scalability is the art of tying the various dimensions of growth to externally observable and/or predictable factors.  The natural regulators of B2B growth can assist greatly whether the business itself is B2B or a B2B layer wraps the user base.  In a pure B2C play with a single large class of users, this may be somewhere between difficult and impossible, but the importance of trying cannot be overstated.


No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: