Jed's Mac Studio M4 Max — What Runs Where
Hardware confirmed: M4 Max · 14-core CPU (10P + 4E) · 32-core GPU · 16-core Neural Engine · 36GB unified memory @ 410 GB/s · 512GB SSD · 4× Thunderbolt 5 · 10Gb Ethernet
TL;DR — the 36GB memory and 512GB SSD change the local model lineup and push more of the heavy work to our server cluster. That's actually a feature: less running hot on your Mac, longer machine life, and you use our already-paid-for GPU compute for token-heavy jobs.
What runs on YOUR Mac (local, zero token cost, zero internet)
| Layer | Choice for 36GB | RAM used | Why |
|---|---|---|---|
| CEO reasoning | Qwen 2.5 32B (Q4_K_M via MLX) | ~19 GB | Apple-silicon-tuned, strong reasoning, 128K context, fits with headroom. |
| Worker / draft | Qwen 2.5 7B (Q4 MLX) | ~5 GB | Fast passes — product blurb drafts, summaries, tagging. |
| Embeddings | Nomic embed-text-v1.5 | ~1 GB | Vector index for Plaud, lessons.md, venture docs. |
| Speech-to-text | Whisper medium (MLX) | ~2 GB | Transcribes Plaud overflow, meeting recordings. |
| Vector DB | LanceDB | <100 MB | Stores embeddings, searchable by every CEO. |
| Mission Control | Next.js dev server | ~500 MB | Your dashboard. |
Total working set: ~28 GB, leaving ~8 GB headroom for OS + browser + Claude Code. Comfortable; won't swap.
Inference speed you can expect on M4 Max (32-core GPU):
- Qwen 2.5 32B: ~18–22 tokens/sec (faster than you can read)
- Qwen 2.5 7B: ~60–80 tokens/sec
What runs on OUR servers (via Tailscale, access is included)
You get authenticated access to the CertiHomes GPU cluster. Your mac never does this work, and you never get a cloud bill — it's bundled in your retainer.
| Task | Lives on | Why it's not local |
|---|---|---|
| Frontier reasoning (Claude Opus, GPT-5, Gemini 2.5) | Our pooled OpenRouter key | Use sparingly — only when local 32B can't crack it. ~10-15% of workload. |
| 70B+ class local models (Hermes, Llama 3.3 70B) | geo2.tlcengine.com (RTX 5090) | Needs >36 GB VRAM. We run it for you. |
| Image generation (ComfyUI / Stable Diffusion XL) | geo.tlcengine.com:8188 | Needs a proper GPU. |
| Virtual staging (empty room → furnished) | geo.tlcengine.com:8002 | AI model too heavy for your machine. |
| Voice cloning / avatars | geo2.tlcengine.com (F5-TTS, LivePortrait, Kokoro) | Multi-model pipeline; 32GB VRAM. |
| HeyGen avatar videos | HeyGen API (our account) | Third-party; we bundle the quota. |
| Voicebox high-fidelity TTS | Voicebox API (our account) | Third-party; overflow from local F5-TTS. |
| Video rendering at scale (Remotion batch) | geo.tlcengine.com | CPU/GPU intensive, fits better on our box. |
| Nominatim / Pelias geocoders | geo.tlcengine.com | Replaces Google Maps API. |
| ClaudeTube (YouTube deep-analysis) | geo.tlcengine.com | Downloads + whisper + frame extraction. |
| Firecrawl / Brave Search | Our pooled API keys | Quota bundled. |
How you access these: Your CEO agent issues a job, it goes over Tailscale to our cluster, result streams back to your Mac. Same as it being local, except the GPU work happens on our side.
The storage conversation (512GB is tight)
Budget once everything lands on your Mac:
- macOS + apps: ~80 GB
- Two local models (32B + 7B): ~30 GB
- Whisper + Nomic + MCP runtimes: ~5 GB
- LanceDB indexes (3 ventures, growing): ~20 GB year-1
- Plaud transcript archive + embeddings: ~15 GB year-1
- Daily logs (3 ventures × 365 files): ~3 GB/year
- Product/brand asset raw files (photos, reels, video b-roll): this is the wildcard — easily 200+ GB
Recommendation: one external Thunderbolt 5 NVMe, 2TB. TB5 gives you 120 Gb/s — faster than the internal SSD for most workloads. Models and LanceDB go on it. macOS + apps stay on internal.
- Good picks: OWC Envoy Ultra TB5 2TB, Samsung T9 (TB4, cheaper fallback).
- Cost: $200–350.
- Setup: APFS formatted, symlinks from
~/openclaw/into the external drive. Zero speed penalty; infinite headroom.
We can live without it for Phase 1 and add during Phase 2 if storage gets tight — but buy it before month 4.
What you gain from the Thunderbolt 5 + 10Gb Ethernet
These aren't just specs — they change the architecture:
- 10 Gb Ethernet + Tailscale: your Mac talks to our cluster at near-LAN speed. A frontier-model call that'd take 8 seconds over typical home internet drops to 1–2 seconds. Feels local.
- Thunderbolt 5 (120 Gb/s): the external NVMe above. Also means if we ever install an eGPU (RTX 4090 in an enclosure), it's plug-and-play — future-proof for when Shred explodes and you want local image generation.
Token-cost math with this setup
Typical month for 3 ventures at Jed's described cadence:
| Task | Volume | Model | Cost |
|---|---|---|---|
| Product descriptions (Shred + FFF + Floor) | 200 drafts × 3K tokens | Qwen 32B local | $0 |
| Daily social posts + captions | 90 × 2K tokens | Qwen 7B local | $0 |
| Plaud transcript ingestion + tagging | 30 hours audio, ~50K tokens each | Qwen 32B local + Whisper local | $0 |
| Influencer scrape → summary (20 per vertical × 3) | 60 analyses × 10K tokens | Qwen 32B local | $0 |
| Image generation (product shots, graphics) | 100 images | ComfyUI on our geo | $0 (bundled) |
| Voice samples for reels | 30 samples | F5-TTS on our geo2 | $0 (bundled) |
| Hard reasoning (architectural decisions, legal) | ~20 calls × 20K tokens | Claude Opus via OpenRouter | ~$6 |
| HeyGen avatar videos | 10 videos | HeyGen API (our account) | Bundled |
Net token cost to Jed: ~$6/month. Everything else is either local (free) or pre-paid in the retainer.
Compared to doing this without CertiHomes infra:
- Claude Sonnet for 500K tokens/mo = ~$7.50
- Midjourney = $30
- HeyGen Creator = $29
- Voicebox = $60
- Hosting your own GPU box to run 70B models = $2000 upfront + $40/mo power
- Savings: ~$160–200/mo + $2K upfront, hidden in the retainer.
What we change in the install
Updating install-jed-stack.sh:
- Hermes 4 70B # won't fit in 36GB
+ Qwen 2.5 32B (Q4_K_M via MLX) # 19GB, fits with headroom
Qwen 2.5 14B # keep as fallback
+ Qwen 2.5 7B (Q4 MLX) # worker tier
+ Whisper medium (MLX) # local STT
Nomic embeddings # unchanged
+ Symlink ~/openclaw/ targets to external TB5 drive when present
+ Tailscale with pre-issued auth key for geo/geo2 access
Every other step (MCP servers, security baseline, Mission Control, Discord, backups) stays the same.
Next steps before Phase 1 kickoff
- You: order the 2TB Thunderbolt 5 NVMe (or tell me if you'd rather skip it; we adapt).
- Me: update the installer to the M4-Max-sized model list above.
- Me: provision your Tailscale auth key so you land on our cluster at install-time.
- You + Me: 15-minute call to confirm the split (what's local vs ours) feels right.
That 36 GB number isn't a blocker — it just routes the heavy work to infrastructure you're already paying for.
— Krish