Running an LLM inference server from the menu bar — the idea itself is very "Mac."
omlx gained 1,362 new stars this week, totaling 14.3k. 1,204 forks means not just watching, but actually trying.
Three Core Technical Points
SSD Caching: Apple Silicon's unified memory architecture has an inherent bottleneck — when large model weights don't fit in RAM, performance drops off a cliff. omlx uses SSD as a second-level cache, dramatically reducing model load/unload costs. Simply put: a 16GB MacBook can now run 30B parameter models, with switch speeds in an acceptable range.
Continuous Batching: Not a new concept — vLLM did this first. But bringing continuous batching to menu-bar tool level on Mac — omlx might be the first. Multiple concurrent requests are efficiently scheduled instead of queuing serially.
Menu Bar Management: Seems minor but is actually the key UX differentiator. Not opening a terminal to run commands, but controlling model loading, switching, and monitoring from the menu bar. Whether local inference tools get adopted by non-technical users depends more on this interaction detail than benchmark numbers in papers.
Comparison with Ollama / LM Studio
Three mainstream choices for Mac users running LLMs locally:
| Tool | Core Positioning | Interface | SSD Cache | Continuous Batching |
|---|---|---|---|---|
| Ollama | General inference server | CLI + API | ❌ | ✅ (partial) |
| LM Studio | Desktop GUI | GUI | ❌ | ❌ |
| omlx | Menu bar server | Menu bar + API | ✅ | ✅ |
omlx's differentiation is clear: the only tool combining SSD caching and continuous batching on Mac.
But Ollama's ecosystem advantage is massive — model support, community docs, tool integration. What omlx needs to catch up on isn't technology, it's ecosystem.
Practical Experience Inference
Without running omlx on an M-series chip, here's what the architecture suggests:
- 16GB M2/M3: SSD caching is critical. Without it, models above 7B barely run. With it, 13B-30B are possible, speed depending on SSD read/write.
- 32GB+ M2/M3 Max: RAM is sufficient, SSD cache's marginal benefit drops, but continuous batching still valuable for concurrent API requests.
- M4 Ultra level: RAM is plentiful, omlx's value is more in menu bar interaction and API service.
Why It's Worth Following
omlx doesn't solve "can it run" — Ollama already proved Mac running LLMs is feasible. It solves "can it be used comfortably."
Menu bar interaction lowers the usage barrier, SSD caching extends the upper limit of runnable models, continuous batching improves throughput in service scenarios. These three points together point to a clear product direction: making Mac a first-class citizen for local AI inference.
14.3k stars shows people are buying into this direction. But how many of the 1,204 forks are actually using it in production remains to be seen.
If Apple announces more about Neural Engine for LLM inference at the next WWDC, tools like omlx will see their value amplified further.