Hacker NewsSaturday · June 13, 2026FREE

How to setup a local coding agent on macOS

llama.cppgemma-4local-aimacoscoding-agent

Kyle Howells created a guide for running a local coding agent on macOS after experiencing internet outages that left him without remote coding agents. The final setup uses llama.cpp built with Metal, the Gemma 4 26B-A4B model in GGUF format (Q4_K_XL, ~16 GB), a Q8 MTP draft model for speculative decoding, the Gemma 4 multimodal projector, and Pi as the terminal coding agent. Testing was done on an Apple M1 Max with 64 GB unified memory running macOS 15.7.7. Baseline generation speed with the main model alone was 58.2 tokens/second. Adding the MTP draft model and tuning the draft token count (--spec-draft-n-max) yielded a best result of 72.2 tokens/second with 3 draft tokens, a 24% speedup. Prompt processing speed remained nearly unchanged at around 295-299 tokens/second. The author also tested MLX-LM for comparison but did not provide a complete result in the excerpt. The setup supports an OpenAI-compatible API and can handle screenshots/images, making it suitable for coding agent workflows.

// why it matters

Local coding agents can operate without internet, improving reliability for developers.

Sources

Primary · Hacker News
▸ Read original at ikyle.me

Like this? Get the next digest.