From SSH Friction to a Private LLM Cloud: Building a Private, Repeatable Ollama + EC2 Setup

April 25, 2026
 — 
4 min read
Featured Image

Setting up a private, cloud-powered LLM environment on AWS sounds straightforward, until you actually do it.

What started as a simple goal…

“run Agent Zero with Ollama on EC2 without exposing sensitive data”

…quickly turned into a maze of SSH quirks, Docker inconsistencies, networking gotchas, and AWS nuances.

The good news: once you get through the friction, the final setup is clean, secure, and fully repeatable.

This post walks through the journey, the hidden pitfalls, and the final architecture that eliminates all that friction going forward.


The Goal

The objective was simple but strict:

  • Run open-source models (Llama3, Mistral, Phi3)
  • Use AWS EC2 for compute
  • Keep all sensitive data local
  • Avoid exposing any public APIs
  • Make the setup reproducible

This led to a hybrid architecture:

  • Laptop → runs Agent Zero + UI + sensitive data
  • EC2 → runs Ollama (models only)
  • Connection → encrypted SSH tunnel

The Reality: Death by a Thousand Cuts

1. SSH Wasn’t the Problem—Until It Was

Even basic SSH had multiple failure points:

  • Wrong key permissions (chmod 400)
  • Wrong username (ubuntu vs ec2-user)
  • Using private IP instead of public IP
  • Security group not allowing port 22
  • Timeouts that looked like code issues but were actually networking

Then came a subtle one: broken quotes in the terminal leading to quote> prompts.

Nothing runs, but no obvious error either. facepalm.


2. Disk Space: The Silent Killer

The default EC2 instance came with ~8GB of storage.

That’s not enough.

  • llama3 ≈ 4.6GB
  • mistral ≈ 4.3GB
  • phi3 ≈ 2.1GB

Before even realizing it, the system hit:

No space left on device

This blocked Docker, Ollama, and everything else.

Fix: Increase EBS volume to at least 50GB.


3. Docker Confusion: Versions Matter

There are two Docker worlds:

  • docker-compose (old)
  • docker compose (new plugin)

Using the wrong one leads to errors like:

unknown shorthand flag: 'd'

Then:

Cannot connect to the Docker daemon

Which simply meant: Docker wasn’t running.


4. Container Networking Pitfall

Even after Ollama was running, it wasn’t accessible:

curl: Failed to connect to localhost port 11434

Why?

Because the container was running internally:

11434/tcp

Instead of being bound to localhost:

127.0.0.1:11434->11434

This single line made the difference between “completely broken” and “fully working.”


5. SSH Tunnel Confusion

Once everything was running, the next hurdle was connecting from the laptop.

Issues included:

  • Port already in use (Address already in use)
  • Background SSH processes still running
  • Tunnel working, but backend service failing
  • “Connection reset by peer” (meaning service wasn’t responding)

The key realization:

If the tunnel is up but requests fail, the problem is on EC2, not your laptop


The Breakthrough: A Clean, Stable Setup

Once everything was debugged, the architecture became simple and powerful.

Final Setup

On EC2:

  • Docker installed and running
  • Ollama container bound to:
127.0.0.1:11434
  • Models preloaded:
    • llama3
    • mistral
    • phi3

On Laptop:

  • SSH tunnel:
localhost:11434 → EC2:11434
  • Agent Zero running locally
  • UI accessible via:
http://localhost:3000

The Key Design Decision

The most important architectural choice was this:

Do not run Agent Zero in the cloud

Instead:

  • Keep all logic, memory, and sensitive data local
  • Use EC2 purely as a stateless inference engine

This eliminates:

  • Data leakage risk
  • Cloud storage concerns
  • API exposure

Making It Repeatable (The Real Win)

After stabilizing the setup, the final step was creating an AMI (Amazon Machine Image).

This captures:

  • OS
  • Docker
  • Ollama
  • Preloaded models

Now, instead of repeating hours of setup:

  1. Launch EC2 from AMI
  2. Start instance
  3. Open SSH tunnel
  4. Done

What Used to Take Hours Now Takes Minutes

Before:

  • Debug SSH issues
  • Fix disk space
  • Install Docker correctly
  • Resolve compose version conflicts
  • Fix container networking
  • Re-download models (10GB+)
  • Troubleshoot tunnels

After:

  • Launch instance
  • Run one autossh command
  • Start using models immediately

Final Architecture

Laptop
 ├── Agent Zero (UI + logic)
 ├── Local data (secure)
 └── SSH tunnel (encrypted)
        ↓
EC2
 └── Ollama (models only)

Why This Setup Matters

This pattern gives you:

  • Full control over your data
  • No reliance on external APIs
  • Scalable compute when needed
  • Reproducibility via AMI
  • Clean separation of concerns

It’s a powerful middle ground between:

  • Fully local (limited by hardware)
  • Fully cloud (privacy tradeoffs)

Closing Thought

The setup process is undeniably more complex than expected.

But that complexity is front-loaded.

Once solved, you’re left with a system that is:

  • Secure
  • Portable
  • Fast
  • Repeatable

And most importantly:

You only have to solve it once.

Comments

No Comments.

Leave a replyReply to

Back to top Arrow