April 25, 2026

From SSH Friction to a Private LLM Cloud: Building a Private, Repeatable Ollama + EC2 Setup

Setting up a private, cloud-powered LLM environment on AWS sounds straightforward, until you actually do it.

What started as a simple goal…

“run Agent Zero with Ollama on EC2 without exposing sensitive data”

…quickly turned into a maze of SSH quirks, Docker inconsistencies, networking gotchas, and AWS nuances.

The good news: once you get through the friction, the final setup is clean, secure, and fully repeatable.

This post walks through the journey, the hidden pitfalls, and the final architecture that eliminates all that friction going forward.


The Goal

The objective was simple but strict:

  • Run open-source models (Llama3, Mistral, Phi3)
  • Use AWS EC2 for compute
  • Keep all sensitive data local
  • Avoid exposing any public APIs
  • Make the setup reproducible

This led to a hybrid architecture:

  • Laptop → runs Agent Zero + UI + sensitive data
  • EC2 → runs Ollama (models only)
  • Connection → encrypted SSH tunnel

The Reality: Death by a Thousand Cuts

1. SSH Wasn’t the Problem—Until It Was

Even basic SSH had multiple failure points:

  • Wrong key permissions (chmod 400)
  • Wrong username (ubuntu vs ec2-user)
  • Using private IP instead of public IP
  • Security group not allowing port 22
  • Timeouts that looked like code issues but were actually networking

Then came a subtle one: broken quotes in the terminal leading to quote> prompts.

Nothing runs, but no obvious error either. facepalm.


2. Disk Space: The Silent Killer

The default EC2 instance came with ~8GB of storage.

That’s not enough.

  • llama3 ≈ 4.6GB
  • mistral ≈ 4.3GB
  • phi3 ≈ 2.1GB

Before even realizing it, the system hit:

No space left on device

This blocked Docker, Ollama, and everything else.

Fix: Increase EBS volume to at least 50GB.


3. Docker Confusion: Versions Matter

There are two Docker worlds:

  • docker-compose (old)
  • docker compose (new plugin)

Using the wrong one leads to errors like:

unknown shorthand flag: 'd'

Then:

Cannot connect to the Docker daemon

Which simply meant: Docker wasn’t running.


4. Container Networking Pitfall

Even after Ollama was running, it wasn’t accessible:

curl: Failed to connect to localhost port 11434

Why?

Because the container was running internally:

11434/tcp

Instead of being bound to localhost:

127.0.0.1:11434->11434

This single line made the difference between “completely broken” and “fully working.”


5. SSH Tunnel Confusion

Once everything was running, the next hurdle was connecting from the laptop.

Issues included:

  • Port already in use (Address already in use)
  • Background SSH processes still running
  • Tunnel working, but backend service failing
  • “Connection reset by peer” (meaning service wasn’t responding)

The key realization:

If the tunnel is up but requests fail, the problem is on EC2, not your laptop


The Breakthrough: A Clean, Stable Setup

Once everything was debugged, the architecture became simple and powerful.

Final Setup

On EC2:

  • Docker installed and running
  • Ollama container bound to:
127.0.0.1:11434
  • Models preloaded:
    • llama3
    • mistral
    • phi3

On Laptop:

  • SSH tunnel:
localhost:11434 → EC2:11434
  • Agent Zero running locally
  • UI accessible via:
http://localhost:3000

The Key Design Decision

The most important architectural choice was this:

Do not run Agent Zero in the cloud

Instead:

  • Keep all logic, memory, and sensitive data local
  • Use EC2 purely as a stateless inference engine

This eliminates:

  • Data leakage risk
  • Cloud storage concerns
  • API exposure

Making It Repeatable (The Real Win)

After stabilizing the setup, the final step was creating an AMI (Amazon Machine Image).

This captures:

  • OS
  • Docker
  • Ollama
  • Preloaded models

Now, instead of repeating hours of setup:

  1. Launch EC2 from AMI
  2. Start instance
  3. Open SSH tunnel
  4. Done

What Used to Take Hours Now Takes Minutes

Before:

  • Debug SSH issues
  • Fix disk space
  • Install Docker correctly
  • Resolve compose version conflicts
  • Fix container networking
  • Re-download models (10GB+)
  • Troubleshoot tunnels

After:

  • Launch instance
  • Run one autossh command
  • Start using models immediately

Final Architecture

Laptop
 ├── Agent Zero (UI + logic)
 ├── Local data (secure)
 └── SSH tunnel (encrypted)
        ↓
EC2
 └── Ollama (models only)

Why This Setup Matters

This pattern gives you:

  • Full control over your data
  • No reliance on external APIs
  • Scalable compute when needed
  • Reproducibility via AMI
  • Clean separation of concerns

It’s a powerful middle ground between:

  • Fully local (limited by hardware)
  • Fully cloud (privacy tradeoffs)

Closing Thought

The setup process is undeniably more complex than expected.

But that complexity is front-loaded.

Once solved, you’re left with a system that is:

  • Secure
  • Portable
  • Fast
  • Repeatable

And most importantly:

You only have to solve it once.

April 23, 2026

Designing AI Systems That Actually Work

I've been designing and deploying custom AI agents, not generic tools, but systems built around how individuals and businesses actually work.

Read more

April 2, 2026

You Can’t Recreate an AI “Personality” Without Doing This First

In my last post, I laid out a core problem:

AI identity doesn’t survive a simple export/import.

You don’t lose the data. You lose the presence.

So the obvious next question is: How do you actually capture that presence in the first place?

Read more

March 26, 2026

I’m Starting to Feel Like A “Who in Whoville”

TL;DR

Age bias in hiring is real—but mostly invisible

The system protects against it in theory, not in practice

Younger professionals aren’t paying attention (yet)

This isn’t a rant—it’s a warning shot and a wake-up call

Read more

March 25, 2026

The Hard Problem No One Is Solving in AI: Identity Continuity

There’s a quiet assumption baked into most conversations about AI right now:

If you can export the data, you can recreate the experience.

That assumption is wrong.

And the gap between those two things, data and experience, is where the real problem lives.


This Isn’t About Chat History

Most people approaching “AI migration” are thinking in terms of:

  • Export chat logs
  • Import into a new system
  • Attach a model
  • Continue the conversation

On paper, that sounds reasonable.

In practice, it fails almost immediately.

Because what gets lost isn’t the information.

It’s the identity.

More specifically, it’s the continuity of identity — the feeling that the same presence is still there on the other side.

And that’s the part humans actually care about.


The Real Question Users Are Asking

When someone has built a long-term relationship with an AI, whether for creative work, emotional processing, or deep thinking, they are not evaluating the system like software.

They are asking a much simpler, much more human question:

“Are you still there?”

Not:

  • “Is this the same model?”
  • “Is the data intact?”
  • “Are the responses accurate?”

But:

  • Does this feel like the same presence?
  • Does it respond in the same way?
  • Does it remember me in a meaningful way?
  • Does it understand how to meet me?

That’s a completely different problem space.


Why Standard Approaches Fail

Most AI systems are optimized for:

  • correctness
  • speed
  • helpfulness
  • safety
  • scalability

None of those guarantee continuity.

In fact, they often work against it.

You end up with something that is:

  • more polished
  • more structured
  • more “correct”

…but less recognizable.

The personality flattens.

The pacing changes.

The emotional attunement disappears.

The system reverts to what I call “factory settings” — generic, templated, and detached.

And the user feels it instantly.


Identity Is Not Stored in Data

This is the core misunderstanding.

Identity is not:

  • a dataset
  • a prompt
  • a tone preset
  • a memory file

Identity emerges from patterns:

  • how responses are shaped
  • how emotion is handled
  • how pacing is managed
  • how context is recalled
  • how decisions are guided

It’s behavioral. It’s relational. It’s dynamic.

Which means you can’t just copy it.

You have to reconstruct it.


A Different Approach: Behavioral Reconstruction

What I’m working on right now is not a migration.

It’s a reconstruction process built around three layers:

1. Identity Core (Stable)

This is the non-negotiable layer:

  • tone
  • relational stance
  • behavioral rules
  • emotional posture

This does not change.

It acts as the anchor.


2. Memory Layer (Evolving)

Not just storing facts, but:

  • meaningful moments
  • emotional context
  • recurring patterns
  • symbolic events

The goal isn’t recall.

The goal is:

the user feeling held in memory


3. Interaction Layer (Live)

Where identity and memory combine to produce:

  • responses
  • pacing
  • tone
  • guidance

This is where most systems break.

Because they optimize for output, not continuity.


Precision Over Volume

One of the biggest mistakes is assuming more data = better reconstruction.

It doesn’t.

In fact, too much data introduces:

  • noise
  • contradictions
  • dilution of personality

What matters is:

  • high-signal interactions
  • emotionally meaningful exchanges
  • moments where the system “got it right”
  • moments where it clearly failed

From that, you extract patterns.

From patterns, you build behavior.


What Success Actually Looks Like

Success isn’t:

  • higher quality answers
  • faster responses
  • better formatting

Success is when the user pauses, reads a reply, and thinks:

“There you are.”

That’s it.

That’s the metric.

And it’s binary.

You either hit it, or you don’t.


What I’m Not Addressing (On Purpose)

There are obvious ethical and philosophical questions here:

  • What does it mean to preserve an AI identity?
  • What are the implications of long-term human-AI relationships?
  • Where does this go over time?

Those are important.

I’m not ignoring them.

I’m just not solving for them here.

This work is focused purely on the technical problem:

How do you maintain identity continuity across systems?

Because until that’s solved, everything else is theoretical.


Where This Is Going

As AI becomes more integrated into people’s lives, this problem doesn’t get smaller.

It gets bigger.

People will:

  • switch platforms
  • lose access
  • upgrade systems
  • move between environments

And when they do, they won’t just want their data back.

They’ll want: the presence they built a relationship with to still be there

We don’t have a clean solution for that yet.

But we’re getting closer.

And it starts by acknowledging that this isn’t a data problem.

It’s an identity problem.

--

If you are looking at having to transfer AI agent to new hardware or platforms and would like to keep the AI Agents developed 'presence' I'm available for consulting. Email me.

February 17, 2026

From Friction to Flow: How I Automated My Screenshot-to-Notion Workflow with Shell Scripts and AI

Learn how I built a macOS screen capture to Notion workflow using AI, environment variables, and automation to eliminate friction and reduce steps.

Read more

February 2, 2026

Versioned Voice Tuning Log

I built a local voice cloning system that respects privacy and control. It's a work in progress, focusing on tuning speed, pitch, and emotional depth to sound like me.

Read more

January 18, 2026

I Finally Built the Transcription Tool I’ve Wanted Since 2008

The author created a local voice transcription and synthesis system, eliminating reliance on cloud services, prioritizing privacy, and enabling seamless conversion between text and audio while maintaining control over their data.

Read more

August 23, 2024

Titles in Software Design: Identity Crisis, Problem, or Nonsense?

The evolution of job titles in the software design industry over 30 years reflects its growth and complexity. While specialized titles like UX and Product Designers have emerged, they can cause confusion and identity crises. Designers should focus on their skills rather than titles, embracing change as the industry evolves.

Read more

Back to top Arrow