
There’s a moment every creative hits when they realize the tool they’re using is also using them.
Maybe it’s subtle. A response that sounds a little too polished. A suggestion that feels like it came from everyone and no one.
A slow, creeping suspicion that the thing helping you think is also quietly learning from you, feeding your voice, your ideas, and your unpublished work into a machine that belongs to someone else.
I hit that moment a while back.
And I did something about it.
It Started With a Folder and a Microphone
I’m a walker. I think out loud.
Some of my best writing starts as a ramble into my phone on a sidewalk.
So I built a system.
A private, local, completely free audio transcription pipeline. Metal + Whisper, running entirely on my MacBook M1.
The way it works is almost embarrassingly simple: I drop an audio file into a folder, a script notices it, Whisper transcribes it, and a clean text file appears. No cloud. No subscription. No third party sitting between me and my own words.
My voice goes in. My words come out. Nobody else is in the room.
Once I had that working, I got greedy. In a good way.
The Question That Started All This
I wonder if I can keep all my data and IP completely private, but still get more horsepower than my M1 can give me.
That’s it. That’s the whole origin story.
My laptop is capable. But it has limits. Bigger models, faster inference, running multiple things simultaneously — there’s a ceiling. And I didn’t want to blow through that ceiling by handing my work over to an API that logs everything, trains on everything, and wraps it all in a terms-of-service that no human actually reads.
So I went looking for a middle path.
The Setup (Plain English Version)
What I built is a hybrid. Two ends of a private, encrypted connection:
My laptop runs the AI interface, the part I interact with, where my data lives and where my actual work happens.
An AWS cloud server (EC2) runs the models, the heavy compute, the inference engine, and the muscle. It knows nothing about me. It just does math.
Between them: an encrypted SSH tunnel. Think of it as a private hallway between two rooms. What moves through it is encoded. Nobody’s listening at the door.
The models I’m running, Llama 3, Mistral, and Phi 3, are open source. They don’t phone home. They don’t have a product team reading your outputs. They’re just models, doing model things, on hardware I’m renting by the hour.
What “Private” Actually Took to Build
I’m not going to pretend it was smooth. It wasn’t.
Getting SSH to behave took longer than it should have. Wrong key permissions. Wrong username. Security settings that silently blocked everything while looking completely fine. At one point, broken quotes in the terminal left me staring at a > prompt, waiting for something to happen that was never going to happen.
Disk space nearly killed it before it started. The default server came with 8GB of storage. Llama 3 alone is 4.6 GB. Mistral is another 4.3 GB. The math doesn’t work.
The fix, expanding the storage volume, is trivial once you know to do it. Before you know, it just looks like everything is broken for no reason.
Docker had opinions. There are two versions of a key command, the old way and the new way, and using the wrong one produces errors that sound catastrophic but are actually just a version mismatch. Ten minutes of confusion, thirty seconds to fix.
Container networking had one specific gotcha that took the longest to crack: the model server was running internally but not accessible externally, even from my own tunnel. One configuration line fixed it. Finding that line took a while.
This is the nature of building private infrastructure. The friction is real. It’s front-loaded. And it’s absolutely worth it.
The Moment It Clicked
When the tunnel finally connected, when Agent Zero on my laptop reached through that encrypted hallway and pulled a response from a model running on a cloud server I controlled, using data that never left my possession, I knew we had achieved something significant.
It felt like something.
Not magic. More like correct.
Like the tool was finally working for me, completely, with no asterisks.
The Part That Makes It Repeatable
Here’s the move that turned a hard-won setup into a reusable system:
Once everything was stable, I captured a snapshot of the entire server, OS, software, models, and configuration as an AMI (Amazon Machine Image).
A saved state. A starting line.
Now when I need compute, I don’t rebuild anything. I launch from the snapshot, open the tunnel, and I’m running in minutes.
What took hours of debugging the first time takes almost no time now.
That’s the real win. Not just that it works, but that it keeps working, every time, without starting over.
Why Any of This Matters (Especially If You’re Not Technical)
The creative AI tools most people use are convenient. They’re also, by design, extractive.
Your prompts, your drafts, your half-formed ideas—they flow upstream into systems you don’t control, toward outcomes you didn’t agree to.
That’s a tradeoff a lot of people are making without realizing they’re making it.
What I built isn’t the only answer.
But it’s an answer. A proof of concept that says you can have capable AI, running on real compute, with your data staying yours. It takes more work upfront. The payoff is a system that doesn’t ask anything of you except electricity.
Your voice is yours.
Your ideas are yours.
Your IP is yours.
Build like it.
Be well.
Bert.