Ceva-NeuPro-M is a scalable neural processing unit (NPU) IP designed for edge AI applications and targeting SoC integration. It is particularly designed for tasks involving transformers, vision transformers (ViT) and generative AI models. The Ceva-NeuPro-M can scale from 4 to 200 TOPs (tera operations per second) per core, and multi-core clusters can achieve performance levels exceeding 2000 TOPs. It offers up to 3500 tokens-per-second/Watt for models like Llama 2. The NPU includes multiple-MAC parallel neural computing engines, activation and sparsity control units, programmable vector-processing units and local shared memory. It is designed for a range of edge AI applications, including automotive, infrastructure, mobile, and PC. A tool suite supports hardware implementation, model optimization and runtime module composition. The Ceva-NeuPro-M is designed to handle AI workloads.
Ceva / www.ceva-ip.com