Back to Portfolio
INFRASTRUCTURE

Home Lab Personal Cloud

A four-node self-hosted platform spanning a cloud edge gateway, ARM home server, x86 compute offload node, and a dedicated ZFS NAS. It powers custom apps, agents, media services, and K3s behind a self-operated WireGuard edge.

Last updated: 19/04/2026

From Tiny Cluster to Full Rack

What started as a quiet Raspberry Pi cluster in my living room has grown into a four-node self-hosted platform: an ARM edge gateway in the cloud, an ARM home server, an x86_64 laptop for compute offload, and a dedicated ZFS NAS for storage. The on-site gear now lives in a compact 10U rack with monitored power and airflow, while public traffic still reaches the platform only through a self-operated WireGuard tunnel.

My Home Lab rack with the home server, compute node, storage node, and networking gear

Current physical layout: a compact 10U rack

Primary Node

arm64 / raspberry-pi-5
4 cores
8GB RAM

The control plane. Runs the main Docker Compose stack, hosts the backend and custom apps, and acts as the K3s control node.

[MASTER] [ARM64] [LOW-POWER]

Worker Node

amd64 / ryzen-5-5600u
6 cores
16GB RAM

The offload node. Handles Playwright, Sharp/libvips, private search, and other heavier workloads that are faster on x86_64.

[WORKER] [X86_64] [BATTERY-UPS]
ORACLE AMPERE A1

Cloud Gateway

arm64 / oracle-ampere
4 cores
24GB RAM
FREE TIER

The internet-facing edge. Terminates TLS, filters traffic with CrowdSec, exposes a VPN-only gateway API, and carries ingress over WireGuard.

[GATEWAY] [CADDY] [WIREGUARD]

Storage Node

x86_64 / truenas-scale
2 x 16TB mirror
32GB RAM

The storage plane. Hosts ZFS pools, the media stack, photo backup, file sync, music services, and hot/cold S3-compatible object storage.

[STORAGE] [ZFS] [OBJECT]
TL-SG608E

Network Switch

TP-Link TL-SG608E / 8-Port Gigabit
8 ports / 1 Gbps
Easy Smart Managed

Connects the home server, compute node, and NAS on the local network. Managed switch with QoS and IGMP snooping — sits in the rack and links all on-site nodes to the home router.

[MANAGED] [GIGABIT] [8-PORT]

All on-site hardware is organized around the same real-world constraints that shaped the first version of the lab: quiet operation, low idle power, minimal cable mess, and hardware that earns its spot in the rack by solving an actual problem.

What Runs On It

The platform is now split across four complementary layers: the home-server control plane, an agent and worker layer, a dedicated storage/media plane, and a hardened edge gateway.

Core Platform (Home Server)

  • Unified backend control plane: FastAPI + SQLAlchemy + Alembic + SQLite drive authentication, permissions, app registration, audit logs, AI orchestration, and cross-app state.
  • Custom product suite: Identity portal, app hub, admin console, content digest, vault, term-mastery app, task log, career workspace, and a QR generator microservice.
  • Compose-first runtime: The home server still runs the main Docker Compose stack for day-to-day services and hot-reload development, while K3s is reserved for isolated microservices.
  • Hard application kill switch: Every custom frontend is registered centrally, so disabling an app immediately blocks access even on direct URL visits.
  • Platform CLI: A Typer-based Python CLI wraps the backend API for terminal workflows over VPN. Supports auth, task management, security operations, and health checks via Personal Access Tokens. Distributed through a self-hosted private PyPI server.

Agent + Worker Layer

  • Retriever agent: Researches the web through a private meta-search backend and headless browsing, then returns structured briefs.
  • Term-mastery agent: Generates summaries, flashcards, and remediation workflows for learning content.
  • Security triage agent: Correlates edge alerts, access logs, sessions, and approval-gated write actions for admin review.
  • Compute offload pattern: The laptop handles Playwright scraping, Sharp image optimization, SearXNG, and audio feature extraction, with local fallbacks on the home server.
  • Shared guardrails: Every agent runs behind kill switches, token/cost/runtime budgets, tool allowlists, concurrency caps, and blocked-action logging.

Storage + Media Plane (NAS)

  • TrueNAS SCALE + ZFS: Separate storage node with mirrored HDD bulk storage, SSD hot tier, and service-specific datasets.
  • Self-hosted media services: Jellyfin for streaming personal media libraries, Immich for photo backup with ML-powered organization, Nextcloud for file sync and collaboration, and Navidrome for music streaming.
  • S3-compatible object storage: Dual MinIO tiers keep hot objects on SSD and move colder data to HDD through lifecycle rules.

K3s Cluster

  • Traefik ingress: Host-based routing and a clean path to declarative microservice deployments.
  • Portainer: Cluster visibility and day-to-day Kubernetes management.
  • QR microservice: A React + Vite service running in K3s as the current reference workload for the cluster.
  • Private multi-arch image flow: Images are prepared for ARM64 and x86_64 and distributed through a private registry workflow.

Cloud Gateway (Oracle VPS)

  • Caddy at the edge: Automatic TLS, strict security headers, structured logs, compression, and route-level rate limits.
  • CrowdSec inline filtering: Suspicious traffic is blocked before it ever reaches the backend.
  • WireGuard backhaul: All public ingress crosses a self-operated VPN tunnel; the home server has zero direct internet exposure.
  • Tailscale admin mesh: A separate VPN for remote SSH and development access that never carries public request traffic.
  • Gateway API: A VPN-only internal API exposes threat and access-log data to backend tooling and the security agent.

Why I Built It

I needed a sandbox to break things safely. This was my first real personal project that pushed me beyond tutorials and into actual problem-solving. What started with Jellyfin for media streaming grew into a full platform after I built a custom secure tunnel to replace third-party services.

Beyond learning, I kept running into the same frustration: existing tools were either limited, ad-ridden, or just didn't fit what I needed. The QR generator? Most online versions were locked behind paywalls or covered in ads. The Vault app? Nothing out there matched the workflow I had in mind. So instead of settling, I started building my own — and that grew into a full self-hosted ecosystem of custom applications replacing third-party tools on my own terms.

Along the way I learned how to design around real constraints: mixed architectures, separate storage and compute planes, zero-trust ingress, and applications that share auth without sharing security shortcuts. I built 8+ custom apps and an agent framework from scratch, which proved to me that serious systems design is possible long before you have enterprise hardware.

Lessons Learned

  • ARM has quirksNot all Docker images support ARM64. Finding compatible alternatives and tweaking configs taught me to read docs carefully.
  • DNS is powerfulManaging records in Cloudflare and understanding how traffic flows made the whole system click.
  • Start small, iterate fastThis setup grew organically. Each problem solved unlocked the next improvement.
  • Cost-conscious infrastructureRunning on low-power devices and free-tier cloud taught me to optimize before scaling.
  • Security is a journeyImplementing SSO, token blacklisting, and audit logging taught me that authentication is more than just passwords.
  • Hybrid architecture complexityCoordinating Docker Compose and K3s on the same node, plus a remote VPS gateway, required careful port planning and network design.
  • Environment mattersWith only one LAN outlet in the home, the cluster had to live in the living room. This constraint forced smart hardware choices: silent components, low-power ARM processors, and efficient cooling. Adapting to real-world limitations made me a better engineer.

What's Next

Next steps are deeper K3s adoption for selected microservices, more agent-driven workflows, continued hardening of the edge and auth stack, and expanding the platform with new internal tools only when they solve a real gap in my day-to-day workflows.