Obvouisly there is price scaling, fixed-rate discounts, refunding, rate limiting, and legal implications to this. These are just some thoughts on serverless/usage-based billing
There always feels like there is a section of by subconcious that is asking:
How can I abuse this? How can I make this work better for me? How can I make this do something it’s not supposed to do? How can I get access to this?
I originally entered the tech world through videogames. First playing them, then making them (poorly, admittedly).
I ultimately entered programming through cybersecurity. In 2016 I discovered the world of BadUSBs. My obsession with engineering started off by programming an Arduino Leonardo to behave as a keyboard, and write to a new notes file. The lightbulb turned on, and from there it grew to building a cybersecurity startup focused on a novel phishing training platform. Up until about 2018, then I left the security industry for more… interesting pursuits.
In fact one of the only jobs that could have kept me out of entrepreneurship was the “Anti-abuse Engineering” role at Discord. Still shocked they didn’t want a 21 year old without a degree for that position, I was such a great fit!
My groundwork of programming has always been in security and (anti)abuse. It’s what’s allowed me to spot security issues in every PR, and lead a team of engineers to build the once great Ultimate Arcade, a platform that begged for attackers yet at 10k MAUs never succeeded.
Maybe this is not a “problem”, but more of a consequence.
Many services that businesses of all sizes use are usage-based. This has immense benefits, such as only paying for what you use. This makes building at smaller scale (i.e. startup, a bootstrapped founder, etc.) using powerful services far more reachable. However there are a few issues that immediately come to light:
- Unpredictable costs - you never know how much the bill will be until you get the bill
- Uncontrolled costs - different than the above, as now some external factor controls your spend
#2 is the issue we are targeting in this post.
Tools like AWS Lambda (serverless functions), Algolia (full-text search) and more bill directly on how many times they are used. Often, this usage is entirely determined by actions from users.
One saving grace is that often times these services are behind an auth-wall, which is a strong deterrent of abuse. Yet often search is available to all.
Many user analytics platforms bill on the concept of a “Monthly-Tracked User”, or “MTU”.
While many of these products have a generous free tier.
Our bills are notably large for Mixpanel, and thankfully we have a nice chunk of Segment credits to burn through before we invoke real charges.
From Segment’s docs:
If a user returns to your site after the cookie expires, Analytics.js looks for an old ID in the user’s localStorage, and if one is found, sets it as the user’s ID again in a new cookie. If the user clears their cookies and localStorage, all of the IDs are removed. The user gets a completely new anonymousId when they next visit the page.
In most cases, the tally of MTUs is equal to the number of distinct_ids who have performed a tracked event this month. The only exception to this rule is if your users average more than 1000 events each, in which case MTUs are equal to:
Mixpanel MTU overage reduces as you use more. Here is a screenshot from an email I received while we were working on the Ultimate Arcade:
$0.0258 per MTU? Woah, that’s a lot.
See the issue? Let’s talk about the attack.
Knowing how Segment and Mixpanel determine their billing, we can design an attack to maximize MTU consumption. Furthermore, if someone is using Segment, chances are they have at least one (if not many) other usage-based billing services that operate on MTUs downstream. Meaning we have a single entry point to many systems.
- Manually go to the target website and look in the network tab for segment tracking events firing. Notice that there is no form of hashing or anything that prevents replaying an event beyond changing the
- Copy that request as Node Fetch and load that into a load testing framework like k6
- Add some random ID generation for the
messageIdproperty, and add some randomization to the event being sent to segment.
- Don’t get rate limited or banned
Doing so should generate a new user on every request. Scaling this out to multiple machines, rotating residential proxies, using puppeteer for better user emulatino, and more can rack up the bill quite fast.
Let’s design a small-scale attack and see how much this adds up with the pricing above.
10 machines Each machine sends 10 request/second 10*10 = 100 MTUs/second ~730 hours in a billing period 60 minutes/hour 60 seconds/minute 100*60*60*730 = 262.8 million MTUs in a billing period
With 262.8 million MTUs, let’s see how much that would cost with the above pricing.
For the segment billing, we will ignore the first 2 tiers and the included 10k MTU since 262.8M dwarfs those numbers.
262,800,000/1,000 = 262,800 billable units $10/billable unit 262,800*$10 = $2,628,000 per month
262,800,000*$0.0258 = $6,780,240
Now I don’t think these companies would be charging millions of dollars for such simple abuse, but you can quickly see the problem here.
This same method can be extrapolated to different platforms.
One that I love, Algolia, admittedly has quite high pricing:
That Premium Tier? $1.50/1000… ouch.
Algolia is designed so that you get a “search as you type” experience: One “search” per keypress. This means that small bursts of many requests are not an uncommon. Just messing around on an Algolia-powered website I can easily get away with doing searches of “hello world” 5 times per minute for 10 minutes.
12 characters 5 searches per minute 10 minutes 12*5*10 = 600 searches or $0.60 for the standard tier, $0.90 for the premium tier
In 10 minutes, typing “hello world” once every 10 seconds, I generated a pretty substantial charge. Using a similar formula to the Segment attack, we can very quickly write puppeteer workers to load the website and type out an enormous amount of searches. Puppeteer is used to emulate real users in an attempt to blend in as much as possible and avoid and anti-abuse automation that the site or Algolia might have.
Let’s design a small-scale attack, that we assume will not be blocked:
10 concurrent puppeteer instances Each type “hello world” (12 characters) 5 times per minute 10*12*5 = 600 searches per minute ~730 hours billed in a month 730*60 = 43,800 minutes billed 600 searches/minute * 43,800 minutes = ~26.3 million searches per month 26,300,000 searches/1,000 searches/dollar = $26,300/month
As we can see, even small scale attacks can generate extremely large bills for a website maybe doing a few hundred MRR.
The greatest concern with this attack is that you can probably get away with it under the radar. Unless Algolia starts sending the companies emails, chances are they don’t know about this until they get billed.
Many companies understand this as a consequence of usage-based billing, and are generous with refunding abuse. AWS notoriously refunds extremely large bills when their users slip, and this tends to be a theme among customer-caring businesses. Just google “aws reddit large bill refund” and you’ll see a plethora of stories.
However, this is not always the case, especially if the attack can be played off like normal growth - it’s hard for the victim to prove. And in some cases, if it’s a pure mistake of the victim, the vendors will hold them responsible for their actions.
Mixpanel for example was very generous in refunding over 3k MTUs when we got hit with our first spam attack at Ultimate Arcade, I greatly appreicate that.
In extreme cases, the vendor could ban your account, or charge you anyway.
Not only does it take time away from building to realize that this is happening, and becomes a total distraction, but you need to take time to do 2 major things:
- Build defenses as necessary
- Collect proper evidence to provide to vendors in order to receive refunds
This could take days in extreme cases, which looking at salaries converted to hourly rates, is thousands of dollars wasted and a delay in the product dev cycle.
I should note that I’ve never actually implemented these attacks/tests, nor would I ever encourage anyone to.
While I’ve highlighed some pretty extreme numbers, we all know that at scale the unit pricing comes WAY down, and abusive usage usually gets refunded.
The craziest part is that you could run these attacks of a few Rasperry Pis and some proxies. Pretty low cost way to cause major headaches to a small company.
Maybe my besties over at Algolia will let me pick a random site to test this on to see if this is a valid concern ;)