Trust Nobody (Including Yourself) - Zero Trust Pt. 1

This series is on pause as I reevaluate how things are built and maintained in my cluster! It’s still probably a valid read, but it could be a while before I end up updating it.

Intro#

In classic “Q.” fashion, I came up with an idea, started implementing it, and then realized how far of a rabbit-hole that this was going to be. So rather than just doing it and then trying to remember all of the details myself. I figured that I should start by documenting it, and then work through the various steps as I go.

Background#

Back when I worked for Google, one of my favorite things was the fact that I didn’t have to think about what network I was connected to. There were no VPNs to connect to, no websites to authenticate with (frequently, anyways), and things just “worked”.

“Zero Trust” (or ZT) is a bit of a misnomer. It’s not that there’s no trust, but rather, it’s no trust by default.

Most business allow access to services so long as you’re connected to a VPN. The “default” trust is that you’re connected to the corporate network, something else must have validate your access. But if an attacker plugs an ethernet cable into a wall jack, or compromises your credentials, then it’s game over.

ZT, ideally, means that every application must validate every connection, and the entire chain must be validated. The Application must trust the Load Balancer, the Load Balancer must trust the Firewall, the Firewall must trust the Client, and the Client must trust the User.

Breaking one link in the chain means that the entire system denies access.

So, how does one do that outside of Google? Painfully.

Building Trust#

When it comes to building a zero-trust stack from scratch, there’s a dozen different options, and a lot of dead ends.

Like any decent project, we need to define some goals:

Be Reasonable
- We’re not trying to defend against nation-state level actors here. If they wanted access to my services, a wrench is much easier.
Be cost-efficient
- Free is best. I’m not an enterprise, if there’s a page that says “Call Sales” I’m nopeing the fuck out.
Be low-maintenence
- My job is to be an SRE, I’m not going to be doing work pro-bono.

Based on my pre-work for this project, this is not going to be an easy or simple objective to complete. But breaking things down into components makes it easier to tackle. Using our guide from earlier, we’ll define our components and break them down into different solutions:

Application Security
Gateway Security
Network Security
Client Security

Where We’re At#

So, what does our environment look like? Well, convieniently, I already have a blog post that covers that, but to summarize, Kubernetes on Talos Linux, Pomerium for Ingress/Authentication/Authorization, MetalLB for L2 Load Balancing.

Go read the post for more detail. For better or worse, most of it is going to be ripped out and replaced.

Where We’re Going#

This is going to be a series of blog posts, covering the different components and how to configure and set them up for my particular use case.

List of posts:

Client (MDM, JumpCloud)
Network (Cloudflare ZT)
Gateway (TBD)
Applications (TBD)

Once a post is up, I’ll update the relevant sections and add links. Keep an eye out for other topics as well. Working on quite a few posts in the background.