Setting up a private, cloud-powered LLM environment on AWS sounds straightforward, until you actually do it.
What started as a simple goal…
“run Agent Zero with Ollama on EC2 without exposing sensitive data”
…quickly turned into a maze of SSH quirks, Docker inconsistencies, networking gotchas, and AWS nuances.
The good news: once you get through the friction, the final setup is clean, secure, and fully repeatable.
This post walks through the journey, the hidden pitfalls, and the final architecture that eliminates all that friction going forward.
The Goal
The objective was simple but strict:
- Run open-source models (Llama3, Mistral, Phi3)
- Use AWS EC2 for compute
- Keep all sensitive data local
- Avoid exposing any public APIs
- Make the setup reproducible
This led to a hybrid architecture:
- Laptop → runs Agent Zero + UI + sensitive data
- EC2 → runs Ollama (models only)
- Connection → encrypted SSH tunnel
The Reality: Death by a Thousand Cuts
1. SSH Wasn’t the Problem—Until It Was
Even basic SSH had multiple failure points:
- Wrong key permissions (chmod 400)
- Wrong username (ubuntu vs ec2-user)
- Using private IP instead of public IP
- Security group not allowing port 22
- Timeouts that looked like code issues but were actually networking
Then came a subtle one: broken quotes in the terminal leading to quote> prompts.
Nothing runs, but no obvious error either. facepalm.
2. Disk Space: The Silent Killer
The default EC2 instance came with ~8GB of storage.
That’s not enough.
- llama3 ≈ 4.6GB
- mistral ≈ 4.3GB
- phi3 ≈ 2.1GB
Before even realizing it, the system hit:
No space left on device
This blocked Docker, Ollama, and everything else.
Fix: Increase EBS volume to at least 50GB.
3. Docker Confusion: Versions Matter
There are two Docker worlds:
- docker-compose (old)
- docker compose (new plugin)
Using the wrong one leads to errors like:
unknown shorthand flag: 'd'
Then:
Cannot connect to the Docker daemon
Which simply meant: Docker wasn’t running.
4. Container Networking Pitfall
Even after Ollama was running, it wasn’t accessible:
curl: Failed to connect to localhost port 11434
Why?
Because the container was running internally:
11434/tcp
Instead of being bound to localhost:
127.0.0.1:11434->11434
This single line made the difference between “completely broken” and “fully working.”
5. SSH Tunnel Confusion
Once everything was running, the next hurdle was connecting from the laptop.
Issues included:
- Port already in use (Address already in use)
- Background SSH processes still running
- Tunnel working, but backend service failing
- “Connection reset by peer” (meaning service wasn’t responding)
The key realization:
If the tunnel is up but requests fail, the problem is on EC2, not your laptop
The Breakthrough: A Clean, Stable Setup
Once everything was debugged, the architecture became simple and powerful.
Final Setup
On EC2:
- Docker installed and running
- Ollama container bound to:
127.0.0.1:11434
- Models preloaded:
- llama3
- mistral
- phi3
On Laptop:
- SSH tunnel:
localhost:11434 → EC2:11434
- Agent Zero running locally
- UI accessible via:
http://localhost:3000
The Key Design Decision
The most important architectural choice was this:
Do not run Agent Zero in the cloud
Instead:
- Keep all logic, memory, and sensitive data local
- Use EC2 purely as a stateless inference engine
This eliminates:
- Data leakage risk
- Cloud storage concerns
- API exposure
Making It Repeatable (The Real Win)
After stabilizing the setup, the final step was creating an AMI (Amazon Machine Image).
This captures:
- OS
- Docker
- Ollama
- Preloaded models
Now, instead of repeating hours of setup:
- Launch EC2 from AMI
- Start instance
- Open SSH tunnel
- Done
What Used to Take Hours Now Takes Minutes
Before:
- Debug SSH issues
- Fix disk space
- Install Docker correctly
- Resolve compose version conflicts
- Fix container networking
- Re-download models (10GB+)
- Troubleshoot tunnels
After:
- Launch instance
- Run one autossh command
- Start using models immediately
Final Architecture
Laptop
├── Agent Zero (UI + logic)
├── Local data (secure)
└── SSH tunnel (encrypted)
↓
EC2
└── Ollama (models only)
Why This Setup Matters
This pattern gives you:
- Full control over your data
- No reliance on external APIs
- Scalable compute when needed
- Reproducibility via AMI
- Clean separation of concerns
It’s a powerful middle ground between:
- Fully local (limited by hardware)
- Fully cloud (privacy tradeoffs)
Closing Thought
The setup process is undeniably more complex than expected.
But that complexity is front-loaded.
Once solved, you’re left with a system that is:
- Secure
- Portable
- Fast
- Repeatable
And most importantly:
You only have to solve it once.








