What running local AI on a Mac Mini actually taught me: 7 things the tutorials, YouTube and ChatGPT all skipped

I got caught in the FOMO about OpenClaw. The last straw was a fireside chat at a startup house — someone across the room was monitoring their Mac Mini autonomously executing build tasks while we both sat there. They gave me the evangelical pitch. I went home and ordered one.

I did a little research (courtesy of ChatGPT and YouTube), got a plan together, and used the 48-hour Apple Store pickup window as planning time. There's a lot out there about OpenClaw beyond its multiple name changes, and especially with the horror stories, I wanted to make my setup as secure as possible while not blowing out my budget in API tokens.

I've now been running a local AI shadow-testing system on a Mac Mini M4 for the past 2 days. I ignored the setup wizard's advice on model selection, and it got expensive fast. Like many others, I capitulated and dialled down to Sonnet 4.6. But I came into this with a cost plan. Even Sonnet 4.6 builds up an API bill worth watching over time.

The idea: route tasks to local Ollama models ($0/run) in parallel with Claude, evaluate them automatically, and promote the ones that match Claude's quality so that over time my daily costs would get as low as could be reasonably expected.

The execution involved hitting seven infrastructure walls I didn't expect.

Most of these aren't in any tutorial because they only show up when you're running something 24/7, not when you're demoing it. Here's what I actually ran into.

1. Your Mac is probably sleeping every hour

macOS default power settings are designed for laptops, not servers. The default sleep value is 1 - one minute after the display goes dark, the system follows. With displaysleep defaulting to 60 minutes on a fresh Mac Mini, that's the whole machine going offline roughly once an hour whenever nothing's happening.

When the system sleeps, if networkoversleep is off (which it is by default), the network goes down. Any request that arrives while the machine is asleep gets dropped. No error, no log, no notification - nothing. I discovered this when someone sent a Telegram message and didn't get a response until they physically walked over and moved the mouse. The Mac Mini had done this 29 times since the last reboot.

The fix is a single LaunchAgent running caffeinate -s -i:

<key>ProgramArguments</key>
<array>
  <string>/usr/bin/caffeinate</string>
  <string>-s</string>
  <string>-i</string>
</array>
<key>KeepAlive</key>
<true/>

-s prevents idle system sleep on AC power. -i prevents idle sleep. No admin rights needed. KeepAlive: true means launchd restarts it if it ever exits. Display sleep still works - screen goes off, machine stays on.

Check your current settings: pmset -g | grep "^\ *sleep". If it's not 0 or sleep prevented by caffeinate, your daemon has been silently dropping requests.

2. `launchctl setenv` doesn't affect your LaunchAgent

This one cost me a day.

If your LaunchAgent plist has its own <EnvironmentVariables> block, that block takes precedence over anything you set with launchctl setenv. The two environments don't merge. The process only sees what's in its own plist block.

I was trying to inject a 1Password service account token so the gateway could read secrets. I ran launchctl setenv OP_SERVICE_ACCOUNT_TOKEN $TOKEN, confirmed it was set in the session, restarted the gateway, and the gateway had no idea the variable existed. The plist's own EnvironmentVariables block overrides everything.

Bake variables directly into the plist instead:

/usr/libexec/PlistBuddy -c \
  "Add :EnvironmentVariables:MY_SECRET string $VALUE" \
  ~/Library/LaunchAgents/my.service.plist

Run launchctl unload and launchctl load after any plist change. launchctl setenv is fine for testing in a shell session. For persistent daemons it does nothing useful.

3. Non-admin users and Homebrew don't mix

I run the AI workload under a standard (non-admin) macOS user account - good for security, bad for Homebrew. /opt/homebrew/Cellar is owned by the admin account, and brew install fails with permission errors.

For most CLI tools this is solvable: GitHub releases provide static binaries you can drop in ~/.local/bin with no package manager involved. I installed jq, rg, ffmpeg, himalaya, and a few others this way. The GitHub releases API at /repos/{owner}/{repo}/releases/latest makes it scriptable.

The real problem is tools that genuinely need Homebrew and have no standalone binary. For those you need to either use the admin account or find an alternative. I replaced Chroma (a Python vector database) with an in-process vector store partly for this reason. Fewer system-level dependencies means fewer permission problems.

If you're setting up a Mac for this kind of work, figure out up front which account owns Homebrew and which one runs the daemons. I assumed they were the same. They weren't.

4. Changing your Node.js binary path silently revokes macOS permissions

I have a gateway process running on Node.js. The plist specifies the full binary path. I have two Node installations - Homebrew Cellar and NVM - and switched between them.

macOS TCC (Transparency, Consent, and Control) ties disk access permissions to the binary path, not the process name. Changing from /opt/homebrew/Cellar/node@24/24.14.0/bin/node to ~/.nvm/versions/node/v24.14.0/bin/node looked identical to me and completely different to macOS. The gateway lost access to the workspace directory and other protected locations it had previously been granted.

Getting those permissions back required interactive approval at the machine. Not remotely. Not headlessly.

Once a daemon's binary path is set and permissions are granted, don't change the path without understanding TCC. If you need to, do it while sitting at the machine.

5. Python 3.14 breaks anything still using Pydantic V1

I chose Python 3.14.3 because I wanted to run on current stable Python. The cost: anything that depends on Pydantic V1's BaseSettings fails at import:

AttributeError: module 'pydantic' has no attribute 'BaseSettings'

Chroma 1.5.1 hit this immediately. I replaced it with an in-process vector store - Voyage embeddings and numpy - which actually worked out better, since it used the same embedding model as my semantic similarity scorer. Consistent vector space across retrieval and evaluation. (More on Voyage in section 7.)

qdrant-client 1.17.0 had a separate breaking change in the same window: client.search() removed, replaced with client.query_points(). The old method just didn't exist anymore.

Check your key dependencies against your Python version before committing to either. I learned that one mid-sprint.

6. A syntax error in your LaunchAgent script fails silently for hours

If a Python script used by a LaunchAgent has a syntax error, the process exits with code 256. The LaunchAgent logs show the exit code. If the crash happens before your own logging initialises, there's no other trace.

I put an IndentationError into the shadow accumulator - a dict entry accidentally placed outside the closing }. The LaunchAgent kept firing every 20 minutes. Every run exited with 256. I didn't notice for six hours. Lost about 18 accumulated runs.

Two fixes. First, verify syntax before committing any script a LaunchAgent will run:

python3 -c "import run_accumulate"

Second, log process startup as the absolute first line:

import logging, os
logging.basicConfig(level=logging.INFO, ...)
logging.info("starting (pid=%d)", os.getpid())

If you don't see that line in the logs, the crash happened before your code ran. Import error, syntax error, missing dependency - you know immediately where to look.

7. Free-tier embedding APIs will quietly corrupt your evaluation data

Voyage AI's free tier capped at 3 RPM and 10K TPM - at least when I was on it. If you're using an embedding API for RAG and running tasks faster than the free tier allows, calls start failing silently.

If you've implemented a fallback (I use TF-IDF cosine similarity as backup), those runs complete. But their similarity scores come from a different, lower-quality metric. If you're not tagging which backend produced each score, you'll have a mixed dataset with no way to separate it later. I tagged every trace, so I can filter. I added that tagging after I noticed the rate limit errors - not before.

Any external API with rate limits is a data quality risk in an evaluation system, not just a performance one. Either add a payment method before you start collecting data you care about, or build your fallback to produce comparable scores and tag every run.

None of this is exotic. Every problem here has a documented fix. The issue is none of them announce themselves - they show up as "why isn't this working" or, worse, "why does the data look slightly off."

The Mac Mini M4 with 24GB unified memory is good hardware for this. I run 10 Ollama models (44GB total) and the ones that fit in memory are fast enough. The hardware isn't the problem. The macOS daemon environment has assumptions baked in that don't match what you need for a persistent AI service.

Check your sleep settings. Bake your env vars directly into the plist. Pin your dependencies before you touch Python versions. And log the first line of every daemon - if you can't see that line in the logs, you don't know what killed it.

If you'd rather not start from scratch, the actual working files from this setup are in a small GitHub repo - the caffeinate plist, the env injection script, the daemon template with startup logging, and a few others. MIT licensed, take what's useful: github.com/nissan/macos-ai-daemon-toolkit