Gildra - TLS Terminating Proxy For Unlimited Domains In Your CloudGildra - TLS Terminating Proxy For Unlimited Domains In Your Cloud

Gildra - TLS Terminating Proxy For Unlimited Domains In Your Cloud

Tags
Infrastructure
cloud
Published
Author
Gildra is a Layer 7 (HTTP) TLS terminating proxy that runs in your cloud, supporting unlimited domains and certificates by pulling them dynamically from a Control Plane.
With Gildra you can power custom websites and software running on virtual machines, containers, serverless functions, or any runtime while keeping all network traffic within your cloud environment.
Gildra natively supports HTTP/1.1, HTTP/2, H2C, HTTP/3 (UDP), and WebSocket traffic.
Check out the Github repo: https://github.com/danthegoodman1/Gildra
notion imagenotion image

Motivation

I was recently talking to a fellow founder in my YC batch. I won’t go into specifics, but they host a web-based solution that allows their users to map their custom domain to their cloud product.
Because they are unfamiliar with TLS management and advanced networking, they decided to let the experts at Cloudflare manage this for them.
The annual contract cost for a dedicated IP and bringing their per-cert cost down to $0.60/cert-month? $160k/year!!! 😵
"WHAT!? 😱” I yelled over a Google Meet.
“You know you can do that basically for free with Let’s Encrypt and a free dedicated IP from AWS, right? I literally have a demo right here of provisioning certs using the ACME HTTP challenge dynamically and terminating TLS that I made a few months ago.”
“…I have no idea what ACME is, but I wish I had talked to you before I signed that contract…” he said.
 
Oh well, here I go getting nerd-sniped again!
 
So how do you go about handling arbitrary custom domains for a hosted solution?
Well routing requests based on an domain name is easy, but what’s hard is the TLS certificate management.
TLS certificates are mapped to domain names, so it’s not as simple as asking your cloud provider to use the same certificate for the 2,000 websites you host.
Cloud provider load balancers like the AWS ALB, or self-hosted like Nginx and Traefik, allow you to specify custom TLS certificates, but not in a way that would scale even past a handful of certificates for your customers. Their configuration model is simply not suited to terminate TLS at high levels of multi-tenancy.
To power this we’d need to be able to dynamically create certificates for domains, and be able to dynamically terminate TLS traffic and route them to their target origin based on the domain of an incoming request, all configured through an API (not configuration files and environment variables).
This is not a widespread issue, but for those that need this, they need it like oxygen.

Handling TLS Termination

notion imagenotion image
Gildra will look up TLS Certificates for a requested domain and terminate the traffic before it reaches your service.
Gildra supports all major HTTP protocols, including WebSockets. This gives a direct speed boost to your applications by being able to support HTTP/3, regardless of whether your internal services support it. See how DropBox observed speed boosts just by switching the outbound traffic to HTTP/3 here: https://dropbox.tech/frontend/investigating-the-impact-of-http3-on-network-latency-for-search
Gildra sends requests as HTTP/1.1 to your origin to ensure the widest support. In the future specified HTTP/2 and HTTP/3 support can be easily added.

Handling Domain Routing

notion imagenotion image
Once we’ve terminated TLS, we now need to know where to route traffic based on the domain.
Each domain has a routing configuration similar to that of HTTPRoute Resource in Kubernetes. It specifies how traffic will be routed to what origins, and sends it through.
Whether you have each domain mapped to their own Lambda function, a dedicated cluster of VM instances, or all domains going to a single Kubernetes service exposed by an internal load balancer, your traffic will be routed where ever you need with the Host header retained so that you always know what domain it’s for.

Clustered Caching

Gildra uses groupcache, which is a shared cache package that allows local peers to share memory for in-memory caching.
This means within a local DC (region, AZ, etc.) your cache is not only the sum of the Gildra node memory, but also request collapsing and hot-record replication is handled natively.
Your Control Plane will never get slammed with a thundering herd of requests because Gildra knowns how to lookup what node is responsible for caching it, and waiting for the node to get the result before returning.
In the future I am also looking to extend this caching to disk, for deployments that might be looking at using millions of certificates within a caching interval, and can’t handle extra few milliseconds that a Control Plane lookup might take.

The Control Plane

The Control Plane is a separate service that implements 3 endpoints for Gildra to talk to:
  1. Get certificate for domain
  1. Get routing config for domain
  1. Get a HTTP-01 challenge key for a given domain and challenge token (only if you are using the HTTP-01 challenge)
 
The Control Plane is self-implemented, with a pre-made one available. I’ve also made a Go package for handling the ACME HTTP-01 very easily.
Being able to make your own Control Plane means you can choose to integrate with any certificate authority, do any ACME challenge, use any language or DB to power it, and run on any platform!
Because Gildra is stateless (with caching), it means as long as you can scale your control plane, you can scale as far as you want!

The HTTP-01 Challenge

Gildra will answer the ACME HTTP-01 challenge for you, meaning no weird L7 mappings to an external service have to be made. This makes getting custom certs a breeze as your origin never sees these requests.
The HTTP-01 challenge is far more elegant than the DNS-01 challenge. It only requires 1 DNS record to be created (vs 2) and requires the least amount of involvement from end-users. They only have to make a single A or CNAME DNS record for base domains or subdomains.
With the DNS-01 challenge, they must delegate the ACME challenge to the hosting provider as a second DNS record. If they ever remove this record, then you are unable to manage certs for them. If they ever change the A or CNAME record it will be very obvious very fast that they've broken something.
You can still use Gildra with DNS-01 challenge certs though! For example if you wanted to support wildcard subdomains. However, Gildra won't handle the challenge for you.
Combined with providers like Let’s Encrypt and ZeroSSL that give unlimited certs for free, the cost of 1 cert is the same a 1,000,000!

Why not managed solutions

There are some solutions that exist such as Cloudflare SSL for SaaS, but there are a handful of issues with that.

Performance

Cloudflare can help your network performance, but not always.
While we were building ultimatearcade.io, we started using Cloudflare Tunnels for our game servers. We noticed that these seemed to have higher latency than direct connections to our game servers, so we decided to build a custom TLS solution and directly connect game clients to our servers.
We dropped from a 48ms ping to 22ms ping average, and broken websocket connections dropped from 12% to <0.1%.
When we tested our website behind Cloudflare, we observed that without using Argo we saw 5-12% higher latency, and with Argo 8-15% higher latency, tested from residential connections in Delaware, San Francisco, Colorado, and Germany to our AWS us-east-1 DC without Cloudfront (direct VM/LB IPs).
Gildra doesn’t require you to pass through another cloud provider first, it sits in front of your services in your cloud.

Giving them your traffic

Another consequence is that these companies get to see your unencrypted traffic. Not only does this present privacy and security concerns, but if what you are building requires certain compliance that the providers may not have, this is a non-starter.

Price

Providers will range form $2/cert/month to $0.10/cert/month.
10 cents is reasonable, but $2 is INSANE (Cloudflare). I’ve seen $160k/yr Cloudflare contracts only bring that down to $0.60 per cert, which is still high. The reason that they can charge these prices is how desperately this is needed for a product to work, and how hard it is to get over the hurdle of figuring out how provisioning and managing certificates work.
Considering that certs are free to generate from CAs like Let’s Encrypt and ZeroSSL, there’s no reason that it should cost that much.
Plus, you have to either pay for a dedicated IP address (if you are doing root domains), which can be extremely expensive from network providers like Cloudflare and Fastly.
Cloud providers such as GCP, AWS, and DigitalOcean give you dedicated IPs for free if they are mapped to something like a load balancer or VM instance.
So you could do this with a $4/mo VM that runs Gildra and uses a free dedicated IP and free TLS certificates, or you could pay hundreds to tens of thousands per month for someone else to…
💡
FWIW I really love Cloudflare products. They make excellent products, have great community support, and are really well priced in more areas (Workers, R2, KV, etc.) I see this as a significant outlier among their product lineup. I’ll also acknowledge that fly.io has the ability to do this mapping for you. Obviously that’s contrained to a single provider, and it’s still using the DNS-01 challenge.

Build it yourself?

Frankly, this is shocklingly hard. The ACME protocol requires lots of crypto not found in other typical coding fields, and the security requirements add another layer of complexity. Crack open the ACME RFC and you can see it’s loooooooong. It took me a long time to get this working, and I like to think I am someone who can pick up new concepts relatively fast.

Host on Vercel?

Vercel works if you are simply hosting a website (say you are a low/no-code web builder), but there are a few issues with that:
First, you are completely at their mercy. For example, if they change their billing model from purely traffic based to per-site based, your bill might jump a thousand-fold.
Second, you must follow any changes to their API, use their locations and supported frameworks, and their serverless functions (which are quite slow).
I personally would not take on such platform risk if I were building a product that I had planned to sell.

Future Gildra Functionality

Gildra is not designed to have the same level of functionality as something like Istio when it comes to complex network routing, however some of the features expected from a L7 load balancer are relatively trivial to implement.
Round-robin, least request, and random routing is relatively simple.
Routing based on path glob or regex matching, weighted routing, optimistic rate limiting, JWT property routing, header-based routing, automatic retries, and more are all relatively trivial to implement once we are in control of the request.