Note: This is a prototype and the
mainbranch lacks some of the features specified in this post. Whether this becomes a fully production-ready tool depends on many factors such as community/company interest and usage.
See the code here: https://github.com/danthegoodman1/SQLGateway/
To give a little context, it’s no secret that I am a big fan of CockroachDB. I think it’s the database that comes closest to disproving CAP theorem, the people are awesome, and the features are nearly unmatched. Google “Dan Goodman CockroachDB” and you’ll get nearly 2 full pages of me doing events with them (and sometimes AWS too).
I also think Cloudflare Workers are awesome! The billing model, the performance, and Durable Objects are so under-appreciated.
The frustrating limitation is that Workers are on a WASM runtime, meaning they cannot connect to TCP databases like my beloved CockroachDB.
CockroachDB Serverless + Cloudflare Workers would be a match made in heaven, so I set out to make it happen.
🤔 Why Not Alternatives?
There are a few external database solutions that work with the WASM environment.
First we have the HTTP-based databases such as DynamoDB, Firestore, Datastax, etc.
I personally love DyanmoDB, but the problem with them is the strength of their ecosystems: Nothing comes close to Postgres (maybe MySQL, but that has the same problem!)
But if you are already running these databases, then no worries, you’re all set!
Next we have the native databases like Cloudflare D1, KV, Durable Objects
Again the ecosystem argument could be had here, but the other is that they are hard to access from outside of Cloudflare, and totally change the way you think about interfacing with a DB. While this can be good, in many cases people want to build fast using tools and tech they are already familiar with. Changing where some JS code runs is a VERY different leap than changing the database I am used to using.
When D1 comes out of early access it will be interesting, but I am also not the largest fan of primary-writer style DBs. Single points of failure make me uneasy.
Finally, we have some off-shoots like Neon.tech, Supabase HTTP, Planetscale Serverless Driver, etc.
While I think these are really cool technologies, they both have the lock-in, even if they are open source. Who is really going to fork hundreds of thousands of lines of code and maintain it? Unless the company if far smaller than you, it’s not feasible.
And I’m not talking feature additions, I’m talking massive migrations off of managed platforms or a company dies that you rely on.
But these are viable alternatives, and for completeness, again I am not a huge fan of primary-writer DBs.
😄 A Simple Goal
I just want to be able to use Postgres from WASM-based runtimes, while making the HTTP layer feel invisible. Maye also give it some superpowers.
💪 Engineering A Solution
To remedy this, I made
Quickly the features are:
- ☁️ HTTP access for SQL databases enable WASM-based runtimes to use the best DBs without hoops
- 🛡 Connection pooling protects from reconnects and unbound idle connections, while also providing protection with a predictable max-load on the DB
- 🔎 Query and transaction tracing finds slow spots and anomalies, see performance over time
- 🚀 Caching options for a super speed boost
- 🌎 Distributed through Redis so you can add nodes in any region to expand pool size, cache size, and reduce latency
The idea was to keep the HTTP layer out of the way and make it feel like you are talking to a normal SQL database.
Devs running WASM-based runtimes now have far more DB options, and devs already using SQL DBs now have WASM-based runtimes as a viable option to extend their systems. For example now startups running on GCP Cloud SQL or AWS RDS can now use Cloudflare Workers without building API-shims to access their DB.
With SQLGateway you treat HTTP requests like commands to an acquired DB pool connection. You can send single queries, multiple queries will run atomically in a transaction, and starting a transaction will give you a consistent pool connection that allows you to go back and forth between the DB and your code just like normal. Using the client libraries the HTTP layer becomes completely invisible, and just feels like another SQL DB library (but with superpowers).
If you run it clustered with coordination through Redis, it will even intelligently route requests to the correct pod as needed:
Automatic query and transaction tracing is awesome because you get metrics and alerting about the performance of individual queries, as well as entire transactions. If something starts slowing down or something keeps timing out, you can see how that started over time and when it started happening. This becomes super useful as you scale to see where your DB bottlenecks are, and where you can cache.
Caching support protects your DB and speeds up queries, so if you have certain SELECTs that don’t need to be consistent you can have them cache and TTL on the gateway nodes so a trip to the DB isn’t even required. It also supports true stale-while-revalidate, so if desired you never have to hit the DB synchronously more than once per node.
Connection pooling protects you from expensive reconnects and idle connection kidnapping when you have massive spikes in traffic, and now have 800 execution environments holding idle DB connections for the next 15 minutes (looking at you Lambda/Cloud Functions 😜). Creating a new connection to a DB is far more expensive than just opening a new TCP connection, the heavy part is establishing a new session and that’s why PgBouncer (and other connection pooling tools) were originally made, but they aren’t ready for the serverless future we live in. With HTTP-based connection you can use HTTP Keep-Alive to keep connections warm for Lambda-like environments, but don’t risk overloading the DB with new connections or leaving tons of resource-intensive DB sessions idle.
Predictable max throughput protects your database from the thundering herd because the cluster can only process so many queries at a time. This, combined with caching, should mean that your SQLGateway cluster serves as a throttling point to the DB due to a finite number of DB connections. If anything, slowing your service before making your entire database unavailable under unexpected load.
Some Final Notes:
possibly just premptive answers to complaints/questions :P
- Yes I know some WASM runtimes like WASMEdge have TCP support, but only Netlify Functions seems to be running them
- Vercel Serverless functions can talk TCP, but their “Edge Functions” are WASM-based
- Cloudflare Database Connectors are a pain, although they are working on a native solution for early 2023 release to better connect workers and external databases
- Yes I know Postgres is primary-writer, that’s why I really like CRDB
- And yes I know Aurora for Postgres can fail over in under a minute, but I like a few query/transaction retries with CRDB on failover :P