Jed's Mac Studio M4 Max — What Runs Where

Hardware confirmed: M4 Max · 14-core CPU (10P + 4E) · 32-core GPU · 16-core Neural Engine · 36GB unified memory @ 410 GB/s · 512GB SSD · 4× Thunderbolt 5 · 10Gb Ethernet

TL;DR — the 36GB memory and 512GB SSD change the local model lineup and push more of the heavy work to our server cluster. That's actually a feature: less running hot on your Mac, longer machine life, and you use our already-paid-for GPU compute for token-heavy jobs.

What runs on YOUR Mac (local, zero token cost, zero internet)

Layer	Choice for 36GB	RAM used	Why
CEO reasoning	Qwen 2.5 32B (Q4_K_M via MLX)	~19 GB	Apple-silicon-tuned, strong reasoning, 128K context, fits with headroom.
Worker / draft	Qwen 2.5 7B (Q4 MLX)	~5 GB	Fast passes — product blurb drafts, summaries, tagging.
Embeddings	Nomic embed-text-v1.5	~1 GB	Vector index for Plaud, lessons.md, venture docs.
Speech-to-text	Whisper medium (MLX)	~2 GB	Transcribes Plaud overflow, meeting recordings.
Vector DB	LanceDB	<100 MB	Stores embeddings, searchable by every CEO.
Mission Control	Next.js dev server	~500 MB	Your dashboard.

Total working set: ~28 GB, leaving ~8 GB headroom for OS + browser + Claude Code. Comfortable; won't swap.

Inference speed you can expect on M4 Max (32-core GPU):

Qwen 2.5 32B: ~18–22 tokens/sec (faster than you can read)
Qwen 2.5 7B: ~60–80 tokens/sec

What runs on OUR servers (via Tailscale, access is included)

You get authenticated access to the CertiHomes GPU cluster. Your mac never does this work, and you never get a cloud bill — it's bundled in your retainer.

Task	Lives on	Why it's not local
Frontier reasoning (Claude Opus, GPT-5, Gemini 2.5)	Our pooled OpenRouter key	Use sparingly — only when local 32B can't crack it. ~10-15% of workload.
70B+ class local models (Hermes, Llama 3.3 70B)	geo2.tlcengine.com (RTX 5090)	Needs >36 GB VRAM. We run it for you.
Image generation (ComfyUI / Stable Diffusion XL)	geo.tlcengine.com:8188	Needs a proper GPU.
Virtual staging (empty room → furnished)	geo.tlcengine.com:8002	AI model too heavy for your machine.
Voice cloning / avatars	geo2.tlcengine.com (F5-TTS, LivePortrait, Kokoro)	Multi-model pipeline; 32GB VRAM.
HeyGen avatar videos	HeyGen API (our account)	Third-party; we bundle the quota.
Voicebox high-fidelity TTS	Voicebox API (our account)	Third-party; overflow from local F5-TTS.
Video rendering at scale (Remotion batch)	geo.tlcengine.com	CPU/GPU intensive, fits better on our box.
Nominatim / Pelias geocoders	geo.tlcengine.com	Replaces Google Maps API.
ClaudeTube (YouTube deep-analysis)	geo.tlcengine.com	Downloads + whisper + frame extraction.
Firecrawl / Brave Search	Our pooled API keys	Quota bundled.

How you access these: Your CEO agent issues a job, it goes over Tailscale to our cluster, result streams back to your Mac. Same as it being local, except the GPU work happens on our side.

The storage conversation (512GB is tight)

Budget once everything lands on your Mac:

macOS + apps: ~80 GB
Two local models (32B + 7B): ~30 GB
Whisper + Nomic + MCP runtimes: ~5 GB
LanceDB indexes (3 ventures, growing): ~20 GB year-1
Plaud transcript archive + embeddings: ~15 GB year-1
Daily logs (3 ventures × 365 files): ~3 GB/year
Product/brand asset raw files (photos, reels, video b-roll): this is the wildcard — easily 200+ GB

Recommendation: one external Thunderbolt 5 NVMe, 2TB. TB5 gives you 120 Gb/s — faster than the internal SSD for most workloads. Models and LanceDB go on it. macOS + apps stay on internal.

Good picks: OWC Envoy Ultra TB5 2TB, Samsung T9 (TB4, cheaper fallback).
Cost: $200–350.
Setup: APFS formatted, symlinks from ~/openclaw/ into the external drive. Zero speed penalty; infinite headroom.

We can live without it for Phase 1 and add during Phase 2 if storage gets tight — but buy it before month 4.

What you gain from the Thunderbolt 5 + 10Gb Ethernet

These aren't just specs — they change the architecture:

10 Gb Ethernet + Tailscale: your Mac talks to our cluster at near-LAN speed. A frontier-model call that'd take 8 seconds over typical home internet drops to 1–2 seconds. Feels local.
Thunderbolt 5 (120 Gb/s): the external NVMe above. Also means if we ever install an eGPU (RTX 4090 in an enclosure), it's plug-and-play — future-proof for when Shred explodes and you want local image generation.

Token-cost math with this setup

Typical month for 3 ventures at Jed's described cadence:

Task	Volume	Model	Cost
Product descriptions (Shred + FFF + Floor)	200 drafts × 3K tokens	Qwen 32B local	$0
Daily social posts + captions	90 × 2K tokens	Qwen 7B local	$0
Plaud transcript ingestion + tagging	30 hours audio, ~50K tokens each	Qwen 32B local + Whisper local	$0
Influencer scrape → summary (20 per vertical × 3)	60 analyses × 10K tokens	Qwen 32B local	$0
Image generation (product shots, graphics)	100 images	ComfyUI on our geo	$0 (bundled)
Voice samples for reels	30 samples	F5-TTS on our geo2	$0 (bundled)
Hard reasoning (architectural decisions, legal)	~20 calls × 20K tokens	Claude Opus via OpenRouter	~$6
HeyGen avatar videos	10 videos	HeyGen API (our account)	Bundled

Net token cost to Jed: ~$6/month. Everything else is either local (free) or pre-paid in the retainer.

Compared to doing this without CertiHomes infra:

Claude Sonnet for 500K tokens/mo = ~$7.50
Midjourney = $30
HeyGen Creator = $29
Voicebox = $60
Hosting your own GPU box to run 70B models = $2000 upfront + $40/mo power
Savings: ~$160–200/mo + $2K upfront, hidden in the retainer.

What we change in the install

Updating install-jed-stack.sh:

- Hermes 4 70B                # won't fit in 36GB
+ Qwen 2.5 32B (Q4_K_M via MLX)    # 19GB, fits with headroom
  Qwen 2.5 14B                # keep as fallback
+ Qwen 2.5 7B (Q4 MLX)        # worker tier
+ Whisper medium (MLX)        # local STT
  Nomic embeddings            # unchanged
+ Symlink ~/openclaw/ targets to external TB5 drive when present
+ Tailscale with pre-issued auth key for geo/geo2 access

Every other step (MCP servers, security baseline, Mission Control, Discord, backups) stays the same.

Next steps before Phase 1 kickoff

You: order the 2TB Thunderbolt 5 NVMe (or tell me if you'd rather skip it; we adapt).
Me: update the installer to the M4-Max-sized model list above.
Me: provision your Tailscale auth key so you land on our cluster at install-time.
You + Me: 15-minute call to confirm the split (what's local vs ours) feels right.

That 36 GB number isn't a blocker — it just routes the heavy work to infrastructure you're already paying for.

— Krish