A Perfect Moment

clear-mountainsAs a pilot, I find there is something inexplicable in the nature of flying that causes me to feel more connected to life than anything else.  One evening, I set out just before sunset in a Beechcraft Sundowner from Hanscom Air Force Base in Bedford MA enroute to Springfield VT.  The sky was beautifully clear and smooth.  I landing in a deserted Springfield just as night was falling.  The facilities closed at 5pm so there was no one around; just peace, quiet, and mountains.  After stretching my legs for a bit, I got back in the sky now completely enveloped in night.  My route home took me east to Manchester NH and then south back to Bedford.

After reaching my cruising altitude on the leg to Manchester, I paused for a moment to take in the scene.  I was just out of Vermont heading east.  The crystal clear dry air allowed me to see Boston, lit up and alive at my distant 2 o’clock.  I was flying directly into a full moon just 30 degrees above the horizon and the air was so clear, that I could actually see its reflection in the Atlantic a state away.  My plane was performing flawlessly with every instrument spot on.  While monitoring Manchester Approach, I heard and saw another plane flying west about 5 miles off my left wing at 6,500 MSL.  Their dialog included the sighting of an unidentified craft heading east at 5,500 MSL (me).  I called Manchester to identify myself acknowledging my awareness of the other plane; all standard.

This is a very different world, not at all solitary, and happening all the time in a thin layer above the earth.  But what struck me during this very brief moment of reflection was its beauty and its clarity.  This is a world where everything makes sense to me.  Every event happens for a reason that is readily apparent; not at all standard for normal life.

Moments like these are fleeting and cannot be manufactured.  This is probably why these moments are so special.  This and the sense of complete clarity they bring, if only for a moment.


Holes and Drill Bits

So much of communication is about context.  So much of listening is about knowing where the speaker is coming from.

One of the single most important aspects of engineering practice is the acquisition and translation of requirements, irrespective of the technology or the development process.  This is because errors at this stage affect everything and the parties involved often share a rather thin contextual overlap (e.g., a technologist vs. a business sponsor).  As a technologist, it is especially important to listen actively and get inside the heads of the requirement sources.

My Kingdom for a Drill Bit

Here is an abridged version of a parable I sometimes use during interviews for positions like Architects and Principal Engineers; people who will be faced routinely with the translation of requirements.  The roles:

  • Customer:  Bob, Manufacturing Manager
  • Problem Solving Super Hero:  You

While setting up his manufacturing line, Bob runs into an issue.  Among the various parts that need to be prepped for assembly is a portion of a steel chassis.  It is essentially a steel plate and one of the specifications calls for a 1mm hole in a certain location.  The problem is that steel being steel and given its thickness relative to the diameter of the hole, Bob’s drill bits keep snapping.

Now Bob is a busy guy and doesn’t have time to hunt for harder drill bits; this is but one of many issues on his plate (no pun intended).  He’s sure they exist, but simply has not run into this problem prior and thus has not shopped around.

So Bob comes to you.  He tells you his tale and sends you off to research the latest technology in drill bits while he tends to other tasks.  Because you’re a mechanical engineering aficionado and innate problem solver, you get excited about this challenge.  It’s like a holy grail quest to find the latest thing in drill bits – high tempered, diamond coated, beer flavored, marvels of rotational genius.

After a vein search to find anything substantially better than what Bob already had, you stop and think.  What is Bob requirement; his true need?  Is it really better bits?  No, the man needs a hole.

Thanks for the Hole

There are a number of ways to solve Bob’s problem without drill bits, but that’s not really the point.  Bob clearly has a need; a reliable and repeatable method of putting a 1mm hole into his steel chasses.  However, what Bob communicated was not his requirement, but rather his perceived solution.  Based on what Bob actually said, how could you know the difference?  Understanding where he’s coming from, it’s pretty simple.

As technologists, it’s easy for us to latch onto the technical problem as stated.  Through 4 – 8 years of college, we are handed endless problems to solve, as is.  During our formative professional years, we typically interface with more senior technologists who presumably have already translated the issues from Bob-speak.  But there comes a point in our careers when we need to move beyond the system’s what, when, and how and understand why things work.  We need to get to know Bob and solve problems that are human as well as technical.

Rational Scalability

The Chief Technologist of any highly successful Web x.0 company one day must come to grips with the horror of success.  Such companies can find themselves swept within the compressed bipolar extremes of “make it function and be careful with pre-revenue cash” and “holy crap, half the Internet is hitting us”.  It’s like a wormhole suddenly bringing together two points in space that were previously vastly distant and in the process evaporating the time to prepare for arrival.  A good problem to have, right?  (clearing my throat, loosening my collar)  It certainly beats the alternative, but without a successful crossing, the alternative returns.

Extending the Wormhole

Several factors can influence the apparent flight time through the metaphorical wormhole aside from writing big checks on speculation.

Functional Realism:  Understand what the business does and why.  This one seems more obvious than history would indicate.  A deep understanding of how users will approach the system will help bring likely scale issues into focus.  For example, if operation X is 10 times slower than operation Y, conventional wisdom in isolation says “focus on tuning X”.  However, in the field if Y will be called 10,000 times more often, perhaps Y should get the attention.

Early Warning Radar:  Find leading indicators of usage up-ticks that are solid enough to trigger investment or other proactive steps.  At matchmine, our users created traffic to our platform via a network portfolio of media partners with whom we had B2B relationships.  The time it takes to establish these relationships provides a useful response buffer.

Capacity on Tap:  Maintain as much in reserve infrastructure as possible without paying for it.  For example, the bandwidth contract on our main pipes had minimum charges plus various overage terms.  However, all aspects of the technical infrastructure could handle about 100 times these minimums with only a billing change.  Other areas to consider are cloud computing (e.g., Amazon EC2) and edge caching (e.g., Akamai).

Architecture:  If a young Web x.0 company is fortunate enough to apply some serious architecture practice in its early days, the wormhole effect can be greatly reduced.  CPU, storage, and bandwidth are commodities that by and large can be added by releasing funds.  However, silicon, iron, and glass can only take an inadequate architecture so far and good architecture doesn’t happen overnight or under fire.

Rational Scalability

One of the most challenging aspects of planning for scale is how to rationalize large resources against a need that hasn’t happened yet.  Few things will make a CFO’s spinal chord vibrate like a proposal to build a system that will support 10,000,000 users when you’re still at 1,000.  And is 10,000,000 even the right number?  Could it be an order of magnitude either way?  Overspending to this level of uncertainty simply converts technical risk into financial risk.

The key is to find something on which to rationalize or if possible compartmentalize scale.  This may be along functional lines, classes of users, or qualities of service, for example.  Depending on the application, however, this may take a place of prominence on the easier said than done list.  The user base of many applications is simply huge and rather monolithic often leading to other modes of partitioning (e.g., grouping user accounts into separate databases by alphabetical ranges).

At matchmine, we could exploit the aforementioned B2B layer.  All of our users hit our platform in the context of using one or more of our media partners.  This provides a very natural partitioning opportunity.

Consider the matchmine Discovery Server.  This is the platform component that services content retrieval requests in all their variations.  These operations return only content from the partner’s view of the catalog; the subset of our catalog that they sell, rent, share, discuss, or otherwise service.  These subsets are much smaller than the whole rendering them much easier to memory-cache.

Thus the architectural exploit.  The Discovery Server can be refactored into a controller and a set of servicing nodes.  The controller is the web service endpoint that handles front line security, metrics collection, and request routing to nodes.  The nodes service the actual content retrieval operations, but on a partner-specific and highly cached basis.

The caching dramatically improved performance over the monolithic approach and also provides the perfect hunting ground for rational scalability:

  • We know the sizes of content and supporting data for each partner. Therefore, we can determine the best mapping of partners to nodes based on the objective of memory caching their data. Alternately, we can size virtual servers to specific partners.
  • We know the user base size of each partner. Therefore, we have an upper bound for estimating Discovery Server usage per partner and thus can determine how many nodes to allocate per partner to handle their throughput.
  • We know the growth rates of content and user base of each partner. Therefore, we can predict how the foregoing two points will change over time since Discovery Server usage growth is bounded by these rates.
  • We know the total traffic hitting the Discovery Server. Therefore, we can determine how many controllers we’ll need. While controller traffic is non-partitioned, the controllers are functionally very light, stateless, and thus easy to scale.

As this example illustrates, rational scalability is the art of tying the various dimensions of growth to externally observable and/or predictable factors.  The natural regulators of B2B growth can assist greatly whether the business itself is B2B or a B2B layer wraps the user base.  In a pure B2C play with a single large class of users, this may be somewhere between difficult and impossible, but the importance of trying cannot be overstated.

Orbiting the Meta-World

There was a period in the early 90’s at a very large company that will remain nameless (unless you check me out on LinkedIn) when I seemed to be surrounded by people who couldn’t commit to anything, but who wanted to sound devistatingly intelligent in the process.  This caused them to sprinkle throughout their pontifications words like virtually, meta, pseudo, fuzzy, to a point where they were saying nothing at all.

After having these words (and more to the point these people) pluck away at my spinal chord enough times, I decided to write a short tribute to them, which appeared in IEEE Computer in April 1995.  It is reprinted here for your viewing pleasure.

Orbiting the Meta-World

In a land [not so] far away, there lived a clan of artificially intelligent beings.  Loosely based on a quantum mix of carbon and silicon, these intrepid beings comprised a virtual reality.  They spoke in pseudo-code, shared simulated emotion, and possessed a remarkable meta-knowledge.  It was a perfectly homogeneous society (excluding the exceptions), which was governed by an absolutely constant set of standards that constantly evolved.

One day (although there is a nonzero probability that it was another day), some of these quasi-creatures began to doubt the wisdom of their relative state of perceived existence.  A fuzzy subset of the populace conceived the possibility of simulating actual reality!  Fuzzy fear spread throughout virtually the entire colony.  The quasi creatures were called quasi-crazy and pseudo-sane and became outcasts from their meta-world.

Partially undaunted, they made their plans and decided a program was needed – yes a program to transform their pseudo-silicon to quasi-carbon.  A quantum shift, simulated by pseudo-code, was all that would be needed… until the process failed, at which time much reengineering would be needed.  It would take all their meta-knowledge, pseudo-skills, and fuzzy faith to artificially achieve it.  Vibrating at the speed of light as they labored, virtually no time passed.  Finally, they were virtually ready.

The band of outcasts prepared to leave their meta-world.  Billions of clock cycles ticked by until the moment of departure, when they executed the program (fixed the bugs and executed it again).  They, themselves were the input.  In a flash of simulated light, their fuzzy sets converged and the quantum shift was complete.  The pseudo-code transformed virtually all of the quasi-creatures from the meta-world.  Neither theorem, nor corollary, nor lemma could have predicted the outcome.

No more would they be virtual, pseudo, or quasi; no longer were they children of the meta-world.  A great synthesis had taken place — actual reality had been achieved.  The fuzzy colony of artificial intelligence had become…  well-defined natural stupidity.

Performance != Scalability

Performance and scalability are often conflated, sometimes equated, but only somewhat related.  In web services platforms, both characteristics are business critical.  But while each exerts influence over the other, they are highly distinct and have profoundly different business impacts.

Performance & Scalability Revealed

Using a web services platform as context, performance and scalability can be defined as follows:

particlePerformance is a measure of speed from a caller’s perspective; the time it takes for the platform to accept a request, process it, and return a response.  We measure response time at the platform’s front door (i.e., excluding Internet latencies, etc.).  From a platform perspective, performance is also a measure of throughput; the number of operations that can be completed per unit time.  This is a function both of response time and concurrency.

Scalability is an observation about the trend in performance under increasing demands.  For example scalability can be characterized as the rate of response time degradation relative to increasing platform traffic.  Alternately, scalability can be viewed as the increase in throughput attained via an increase in system resources (e.g., if I double my server capacity, will I actually get twice the throughput?).

Note that this definition of scalability is irrespective of absolute performance.  For example, a slow operation may be considered scalable if a 10X increase in load results in only a 1.2X increase in response time.  Conversely, a lightning fast operation may require a resource that can only be accessed serially, thereby causing a concurrency bottleneck and thus may not be scalable.

Clearly performance and scalability are related, but are not equivalent.  In practice however, the faster an operation’s response time, the less system resources it consumes.  This means that it reduces load on other operations and is less affected by the load of others.  Both effects can positively impact throughput and thus scalability; doing more per unit system resource.


In our practical universe, scalability and performance are highly related through economics.  Per the previous point, the faster an operation’s response time, the less system resources it consumes.  Generally speaking, this translates into higher throughput by a given set of physical resources.  Thus, the higher the performance, the lower the rate of capital expenditures required to handle a given rate of traffic growth.  Therefore, scalability and performance both contribute to economic scalability; doing more per unit cost.

Put another way, both higher performance and higher scalability reduce the cost of scaling.

So Who Cares?

Performance as response time is business critical because it directly impacts user experience, without which there is no business.  Our users and our business partners care.

Performance as throughput is business critical because it directly impacts system expenditures and the ongoing total cost of ownership.  I just made the CFO happy.

Scalability is business critical for avoiding infrastructure meltdowns in the face of market success and is one of technology’s key contributions to profitability.  Clearly we all care about this one.

Finally, an early understanding of scalability characteristics is critical to aligning engineering investments with company growth.  Unlike performance tuning, scalability limitations are rarely fixed quickly since they are often a function of architecture rather than implementation.  Many resource constrained startups choose functionality over scalability for good and obvious reasons.  However, the sooner an engineering team can target some serious focus on scalability, the fewer techies will be seeing 3am from the office.