Linux Kernel Architecture USB

OpenBMB/infllmv2_cuda_impl

This repository contains the optimized CUDA kernel implementation for InfLLM V2's Two-Stage Sparse Attention Mechanism. Our implementation provides high-performance kernels for both Stage 1 (Top-K ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

OpenBMB/infllmv2_cuda_impl

Trending now