gguf
Here are 775 public repositories matching this topic...
Distribute and run LLMs with a single file.
-
Updated
Jun 9, 2026 - C++
⚡ Pure-Rust WebGPU inference engine — OpenAI-API compatible, GGUF native, runs on any GPU. No Python. No llama.cpp. Single binary.
-
Updated
Jun 10, 2026 - Rust
Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.
-
Updated
Jun 10, 2026 - Python
Maid is a free and open source application for interfacing with llama.cpp models locally, and with Anthropic, DeepSeek, Ollama, Mistral and OpenAI models remotely.
-
Updated
Apr 7, 2026 - TypeScript
动手学Ollama,CPU玩转大模型部署,在线阅读地址:https://datawhalechina.github.io/handy-ollama/
-
Updated
Jan 15, 2026 - Jupyter Notebook
The Swiss Army Knife of Offline AI. Chat, Speak, and Generate Images - Privacy First, Zero Internet. Download an LLM and use it on your mobile device. No data ever leaves your phone. Supports text-to-text, vision, text-to-image
-
Updated
Jun 9, 2026 - TypeScript
LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3 Janus-Pro, Linkage graphRAG
-
Updated
Mar 8, 2026 - Python
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
-
Updated
Jun 7, 2026 - TypeScript
A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.
-
Updated
Jun 10, 2026 - Python
Interface for OuteTTS models.
-
Updated
Mar 23, 2026 - Python
An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.
-
Updated
Jun 9, 2026 - Go
A CLI to estimate inference memory requirements for Hugging Face models, written in Python.
-
Updated
May 18, 2026 - Python
Local AI app and inference engine for agents. Run open-weight LLMs locally — private, 100% offline on your computer.
-
Updated
Jun 10, 2026 - TypeScript
Llama 3+ inference in pure Java
-
Updated
Apr 24, 2026 - Java
Private on-device AI suite for Android. Fork of Google AI Edge Gallery with llama.cpp, whisper.cpp, stable-diffusion.cpp, GGUF import, voice chat, vision AI, on-device image generation, biometric lock, encrypted history, and CPU/NPU/GPU acceleration.
-
Updated
Jun 10, 2026 - Kotlin
Improve this page
Add a description, image, and links to the gguf topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the gguf topic, visit your repo's landing page and select "manage topics."