gguf

Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.

python cli ai gpu inference benchmarks command-line-tool vram huggingface apple-silicon llm local-llm ollama gguf

Updated Jun 10, 2026
Python

janhq / cortex.cpp

Star

Local AI API Platform

onnx onnxruntime llamacpp gguf

Updated Jul 4, 2025
C++

Mobile-Artificial-Intelligence / maid

Sponsor

Star

Maid is a free and open source application for interfacing with llama.cpp models locally, and with Anthropic, DeepSeek, Ollama, Mistral and OpenAI models remotely.

android facebook chatbot openai llama mistral claude chatgpt anthropic llama-cpp ollama gguf mobile-artificial-intelligence deepseek

Updated Apr 7, 2026
TypeScript

datawhalechina / handy-ollama

Star

动手学Ollama，CPU玩转大模型部署，在线阅读地址：https://datawhalechina.github.io/handy-ollama/

agent tutorial rag large-language-models llm langchain llamaindex ollama gguf

Updated Jan 15, 2026
Jupyter Notebook

alichherawalla / off-grid-mobile-ai

Sponsor

Star

The Swiss Army Knife of Offline AI. Chat, Speak, and Generate Images - Privacy First, Zero Internet. Download an LLM and use it on your mobile device. No data ever leaves your phone. Supports text-to-text, vision, text-to-image

privacy-first edge-ai ondevice mobile-ai llama-cpp local-ai offline-llm gguf stable-diffusion-android offline-ai whisper-android tool-calling ondevice-ai

Updated Jun 9, 2026
TypeScript

heshengtao / comfyui_LLM_party

Star

LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3 Janus-Pro, Linkage graphRAG

linux agent flux workflow ocr mcp gemini openai llama vlm dify o1 comfyui ollama gguf gpt-sovits graphrag omost janus-pro

Updated Mar 8, 2026
Python

withcatai / node-llama-cpp

Star

Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level

Updated Jun 7, 2026
TypeScript

sammcj / gollama

Sponsor

Star

Go manage your Ollama models

macos linux ai models tui llm ggml ollama gguf

Updated Dec 30, 2025
Go

intel / auto-round

Star

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

transformers rounding quantization omni int4 diffusers llms vllm gguf vlms sglang mxfp4 nvfp4

Updated Jun 10, 2026
Python

edwko / OuteTTS

Sponsor

Star

Interface for OuteTTS models.

text-to-speech transformers tts llama gguf

Updated Mar 23, 2026
Python

kitops-ml / kitops

Star

An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.

Updated Jun 9, 2026
Go

alvarobartt / hf-mem

Sponsor

Star

A CLI to estimate inference memory requirements for Hugging Face models, written in Python.

huggingface safetensors gguf hf-extension

Updated May 18, 2026
Python

AtomicBot-ai / Atomic-Chat

Star

Local AI app and inference engine for agents. Run open-weight LLMs locally — private, 100% offline on your computer.

Updated Jun 10, 2026
TypeScript

eastriverlee / LLM.swift

Star

LLM.swift is a simple and readable library that allows you to interact with large language models locally with ease for macOS, iOS, watchOS, tvOS, and visionOS.

macos swift ios tvos watchos llm llm-inference visionos gguf

Updated Dec 6, 2025
C++

mukel / llama3.java

Star

Llama 3+ inference in pure Java

java transformers simd openai llama huggingface llm llms chatgpt llamacpp genai llm-inference gguf llama3

Updated Apr 24, 2026
Java

jegly / Box

Sponsor

Star

Private on-device AI suite for Android. Fork of Google AI Edge Gallery with llama.cpp, whisper.cpp, stable-diffusion.cpp, GGUF import, voice chat, vision AI, on-device image generation, biometric lock, encrypted history, and CPU/NPU/GPU acceleration.

Updated Jun 10, 2026
Kotlin

Improve this page

Add a description, image, and links to the gguf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gguf topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gguf

Here are 775 public repositories matching this topic...

AlexsJones / llmfit

mozilla-ai / llamafile

LostRuins / koboldcpp

Michael-A-Kuykendall / shimmy

Andyyyy64 / whichllm

janhq / cortex.cpp

Mobile-Artificial-Intelligence / maid

datawhalechina / handy-ollama

alichherawalla / off-grid-mobile-ai

heshengtao / comfyui_LLM_party

withcatai / node-llama-cpp

sammcj / gollama

intel / auto-round

edwko / OuteTTS

kitops-ml / kitops

alvarobartt / hf-mem

AtomicBot-ai / Atomic-Chat

eastriverlee / LLM.swift

mukel / llama3.java

jegly / Box

Improve this page

Add this topic to your repo