Contact

Connect

+44(0)7889908061

©2020 by Solid Code Solutions Ltd. All rights reserved.

Registered number: 08750436. Registered office: 20 - 22 Wenlock Road, London, England, N1 7GU

  • Matt Burrell

Does your application really scale?

When it comes to the scalability, architecture diagrams can give us a false sense of security.


We may have designed and provisioned a slick cloud infrastructure, used an enterprise container orchestration tool, adopted a battle-tested distributed streaming platform. Or gone serverless. But none of these guarantees that our application will scale.


The truth is no amount of planning or whiteboarding can tell us how many concurrent users our application can support. Neither can they tell us how our application will behave under load. Or at what point (or where) it will break.


There are two reasons for this.


Firstly, despite their best intentions, many developers write code that doesn’t scale. Scalability isn’t just about architecture it’s also about implementation.


Those tricky bugs that keep developers up at night are usually the ones that involve memory leaks, deadlocks and race conditions. These also tend to be the ones that aren’t easily identified using ordinary manual (or automated) functional tests. Instead, they crop up in production having been found by real users.


The second reason is that bottlenecks in our systems often occur in places we haven’t anticipated. We can plan to scale out our applications using extra VMs or rely on Kubernetes to increase the number of replicas. But this won’t help when the problem is a database that’s run out of memory. Or a clogged network. Or because a developer hasn’t used HttpClient correctly.


Ultimately, the only way to find out how whether an application truly scales is to performance test it.


The Cost of Ignoring Performance Testing


In my experience, I’ve found that many teams don’t carry out any performance testing. There are many reasons for this including not being given enough time, not having the tools or in-house expertise or simply because they don’t see the need.


Some teams believe that they can rely on functional testing, reference architectures and the auto-scale features of cloud vendors. But functional testing won’t find the performance-related issues associated with realistic high load. Neither will reference architectures and auto-scaling help because every codebase and implementation are unique.


This ‘scalability fallacy’, i.e. assuming our application will scale because we’ve used a ‘scalable architecture’, has some significant implications. Systems that don’t scale to meet user demand can result in significant revenue losses and degradation in customer experience.


For example, system downtime and slow response times of e-commerce websites result in customers abandoning online purchases. In 2013, Amazon’s servers crashed for 30 minutes and it lost an estimated $66,240 per minute. That was in 2013 so the loss would be much higher with today’s traffic.


Of course, it’s not just tech giants like Amazon that need to worry about performance. ChannelAdvisor estimates that online retailers lose about 4% of a day’s sales for each hour of website downtime. But a site doesn’t need to go completely down to have a negative impact on user experience.


It’s well known that slower response times increase the bounce rate of any type of website. The BBC, for instance, noticed that they lost 10 per cent of users for every additional second of page load latency. Part of their solution is to gracefully remove less important features from a page when they detect an increase in page load latency.


‘Black Friday’-type peaks in traffic can be planned for and architected around. But only if you know where, when and how your system will break.


The alternative is to wait for your users to unintentionally discover the breaking points. Or worse for malicious users to find them. System crashes might expose valuable system information to an attacker, create temporary vulnerabilities or expose unprotected files or memory dumps.


The better option is to invest in some professional performance testing. This will be cheaper than the cost of abandoned carts, reputation damage, potential security issues, and data corruption that system downtime may cause.