From SSH Friction to a Private LLM Cloud: Building a Private, Repeatable Ollama + EC2 Setup

April 25, 2026

—

4 min read

Setting up a private, cloud-powered LLM environment on AWS sounds straightforward, until you actually do it.

What started as a simple goal…

“run Agent Zero with Ollama on EC2 without exposing sensitive data”

…quickly turned into a maze of SSH quirks, Docker inconsistencies, networking gotchas, and AWS nuances.

The good news: once you get through the friction, the final setup is clean, secure, and fully repeatable.

This post walks through the journey, the hidden pitfalls, and the final architecture that eliminates all that friction going forward.

The Goal

The objective was simple but strict:

Run open-source models (Llama3, Mistral, Phi3)
Use AWS EC2 for compute
Keep all sensitive data local
Avoid exposing any public APIs
Make the setup reproducible

This led to a hybrid architecture:

Laptop → runs Agent Zero + UI + sensitive data
EC2 → runs Ollama (models only)
Connection → encrypted SSH tunnel

The Reality: Death by a Thousand Cuts

1. SSH Wasn’t the Problem—Until It Was

Even basic SSH had multiple failure points:

Wrong key permissions (chmod 400)
Wrong username (ubuntu vs ec2-user)
Using private IP instead of public IP
Security group not allowing port 22
Timeouts that looked like code issues but were actually networking

Then came a subtle one: broken quotes in the terminal leading to quote> prompts.

Nothing runs, but no obvious error either. facepalm.

2. Disk Space: The Silent Killer

The default EC2 instance came with ~8GB of storage.

That’s not enough.

llama3 ≈ 4.6GB
mistral ≈ 4.3GB
phi3 ≈ 2.1GB

Before even realizing it, the system hit:

No space left on device

This blocked Docker, Ollama, and everything else.

Fix: Increase EBS volume to at least 50GB.

3. Docker Confusion: Versions Matter

There are two Docker worlds:

docker-compose (old)
docker compose (new plugin)

Using the wrong one leads to errors like:

unknown shorthand flag: 'd'

Then:

Cannot connect to the Docker daemon

Which simply meant: Docker wasn’t running.

4. Container Networking Pitfall

Even after Ollama was running, it wasn’t accessible:

curl: Failed to connect to localhost port 11434

Why?

Because the container was running internally:

11434/tcp

Instead of being bound to localhost:

127.0.0.1:11434->11434

This single line made the difference between “completely broken” and “fully working.”

5. SSH Tunnel Confusion

Once everything was running, the next hurdle was connecting from the laptop.

Issues included:

Port already in use (Address already in use)
Background SSH processes still running
Tunnel working, but backend service failing
“Connection reset by peer” (meaning service wasn’t responding)

The key realization:

If the tunnel is up but requests fail, the problem is on EC2, not your laptop

The Breakthrough: A Clean, Stable Setup

Once everything was debugged, the architecture became simple and powerful.

Final Setup

On EC2:

Docker installed and running
Ollama container bound to:

127.0.0.1:11434

Models preloaded:
- llama3
- mistral
- phi3

On Laptop:

SSH tunnel:

localhost:11434 → EC2:11434

Agent Zero running locally
UI accessible via:

http://localhost:3000

The Key Design Decision

The most important architectural choice was this:

Do not run Agent Zero in the cloud

Instead:

Keep all logic, memory, and sensitive data local
Use EC2 purely as a stateless inference engine

This eliminates:

Data leakage risk
Cloud storage concerns
API exposure

Making It Repeatable (The Real Win)

After stabilizing the setup, the final step was creating an AMI (Amazon Machine Image).

This captures:

OS
Docker
Ollama
Preloaded models

Now, instead of repeating hours of setup:

Launch EC2 from AMI
Start instance
Open SSH tunnel
Done

What Used to Take Hours Now Takes Minutes

Before:

Debug SSH issues
Fix disk space
Install Docker correctly
Resolve compose version conflicts
Fix container networking
Re-download models (10GB+)
Troubleshoot tunnels

After:

Launch instance
Run one autossh command
Start using models immediately

Final Architecture

Laptop
 ├── Agent Zero (UI + logic)
 ├── Local data (secure)
 └── SSH tunnel (encrypted)
        ↓
EC2
 └── Ollama (models only)

Why This Setup Matters

This pattern gives you:

Full control over your data
No reliance on external APIs
Scalable compute when needed
Reproducibility via AMI
Clean separation of concerns

It’s a powerful middle ground between:

Fully local (limited by hardware)
Fully cloud (privacy tradeoffs)

Closing Thought

The setup process is undeniably more complex than expected.

But that complexity is front-loaded.

Once solved, you’re left with a system that is:

Secure
Portable
Fast
Repeatable

And most importantly:

You only have to solve it once.

Comments

No Comments.