AtomGradient

AtomGradient ⚡

Bringing AI to the Edge — We build open research and products for on-device AI inference on Apple Silicon and consumer hardware.

We believe powerful AI should be private, fast, and free from cloud dependency. All our research is open-source.

Project	Description	Highlights
speculative-moe-research	Does speculative decoding help Mixture-of-Experts? Empirical study on Qwen3.5-35B-A3B	1.30× MoE speedup · <4% acceptance · batch verification amortization
apple-silicon-llm-inference	Efficient on-device LLM inference: from quantization to speculative decoding	Q6_K Pareto-optimal · +25.7% SD throughput · 7 quant levels benchmarked
Prism	Cross-domain personal data integration on consumer hardware	1.48x IIR · 125.5x federation compression · 49.9 TPS (35B on M2 Ultra)
hybird-batch-prefill-on-ane	ANE batch prefill for on-device parallel LLM inference	11.3x prefill speedup · 79% power reduction · <30ms state transfer
hybrid-ane-mlx-bench	Disaggregated LLM inference on Apple Silicon	ANE matches GPU at ~410 tokens · 282x power reduction
swift-qwen3-tts	On-device text-to-speech (Qwen3 TTS 0.6B, native Swift)	67% compression · RTF 0.68x · 12 languages
swift-gemma-cli	On-device vision language model (Gemma 3 4B)	25% compression · 110 tok/s · 3.4x image speedup
OptMLX	MLX memory optimization research	20x mmap speedup · zero-copy model loading

Edge Inference — Running large models on consumer Apple Silicon (ANE + GPU hybrid pipelines)
Model Compression — Quantization, pruning, and distillation for on-device deployment
Privacy-First AI — All computation local, zero data leakage
Open Research — Reproducible benchmarks and open-source implementations