Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
-
Updated
May 13, 2019 - Python
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
d3LLM: Ultra-Fast Diffusion LLM 🚀
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
[NeurIPS'23] Speculative Decoding with Big Little Decoder
🔥 Blazingly fast ML inference server powered by Rust and Burn framework
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"
An implementation of the encoder-decoder transformer for SMILES-to-SMILES translation tasks with inference accelerated by speculative decoding
Fast Forward-Only Deep Neural Network Library for the Nao Robots
AI-powered legal assistant for Brazilian lawyers, built with Groq to deliver fast, accurate insights and document support.
AudioMuse-AI-DCLAP is a lightweight, high-speed distilled version of LAION CLAP, designed for fast and efficient text-to-music search
Verification of the effect of speculative decoding in Japanese.
Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder
Multilable fast inference classifiers (Ridge Regression and MLP) for NLPs with Sentence Embedder, K-Fold, Bootstrap and Boosting. NOTE: since the MLP (fully connected NN) Classifier was too heavy to be loaded, you can just compile it with the script.
The excellent Image captioning model using the DETR inspired architecture
Fast MLX port of ZeroEntropy zerank-2 cross-encoder reranker. 10x faster than PyTorch MPS on Apple Silicon. bf16, validated.
A simple toxicity detector.
High-performance TUI dashboard to benchmark LLM latencies across free-tier providers and instantly hot-swap models for OpenCode agents.
Repository for Discord bot using Cerebras API inference
Add a description, image, and links to the fast-inference topic page so that developers can more easily learn about it.
To associate your repository with the fast-inference topic, visit your repo's landing page and select "manage topics."