CANN Recipes
Docs

Inference Recipes

面向 LLM / 多模态推理部署与加速的实践合集。

文档正文

Inference Recipes

面向 LLM / 多模态推理部署与加速的实践合集。

Repo: https://gitcode.com/cann/cann-recipes-infer

Featured Recipes

Card Level Description Link
GPT-OSS Beginner 20B 单卡 / 120B 8 卡推理 https://gitcode.com/cann/cann-recipes-infer/-/blob/master/models/gpt-oss/README.md
LongCat-Flash Intermediate 低时延推理,支持控核与权重预取 https://gitcode.com/cann/cann-recipes-infer/-/blob/master/models/longcat-flash/README.md
HunyuanImage-3.0 Intermediate CFG / VAE 并行 + Operator Fusion https://gitcode.com/cann/cann-recipes-infer/-/blob/master/models/hunyuan-image-3.0/README.md
Qwen3-MoE Intermediate Atlas A3 推理适配,支持 TP / EP 部署 https://gitcode.com/cann/cann-recipes-infer/-/blob/master/models/qwen3-moe/README.md
HunyuanVideo Intermediate xDiT 推理,支持 Ulysses / RingAttention / TeaCache https://gitcode.com/cann/cann-recipes-infer/-/blob/master/models/hunyuan-video/README.md
Wan2.2-I2V Intermediate Transformers 推理适配 https://gitcode.com/cann/cann-recipes-infer/-/blob/master/models/wan2.2-i2v/README.md
DeepSeek-R1 / Kimi-K2 Intermediate 低时延与高吞吐部署,支持 DP / TP+SP / EP https://gitcode.com/cann/cann-recipes-infer/-/blob/master/models/deepseek-r1/README.md
Kimi-K2-Thinking Advanced 256K 长序列推理,原生量化 W4A16 https://gitcode.com/cann/cann-recipes-infer/-/blob/master/models/kimi-k2-thinking/README.md
DeepSeek-V3.2-Exp Advanced CP 并行 + 大 EP 并行,Operator Fusion 与 Multi-Stream 优化 https://gitcode.com/cann/cann-recipes-infer/-/blob/master/models/deepseek-v3.2-exp/README.md