Origin E2
Balanced Performance for AI Inference
On-device AI is a must-have for many new designs. Silicon architects look for solutions that support the latest AI technologies, like transformers and stable diffusion, while balancing performance and low power consumption with minimal latency.
Ideal for Edge AI
The Origin™ E2 is a family of power and area optimized NPU IP cores designed for devices like smartphones and edge nodes. It supports video—with resolutions up to 4K and beyond— audio, and text-based neural networks, including public, custom, and proprietary networks.
Innovative Architecture
The Origin E2 neural engine uses Expedera’s unique packet-based architecture, which is far more efficient than common layer-based architectures. The architecture enables parallel execution across multiple layers achieving better resource utilization and deterministic performance. It also eliminates the need for hardware-specific optimizations, allowing customers to run their trained neural networks unchanged without reducing model accuracy. This innovative approach greatly increases performance while lowering power, area, and latency.
Choose the Features You Need
Customization brings many advantages, including increased performance, lower latency, reduced power consumption, and eliminating dark silicon waste. Expedera works with customers to understand their use case(s), PPA goals, and deployment needs during their design stage. Using this information, we configure Origin IP to create a customized solution that perfectly fits the application.
Market-Leading 18 TOPS/W
Sustained power efficiency is key to successful AI deployments. Continually cited as one of the most power-efficient architectures in the market, Origin NPU IP achieves a market-leading, sustained 18 TOPS/W.
Efficient Resource Utilization
Origin IP scales from GOPS to 128 TOPS in a single core. The architecture eliminates the memory sharing, security, and area penalty issues faced by lower-performing, tiled AI accelerator engines. Origin NPUs achieve sustained utilization averaging 80%—compared to the 20-40% industry norm—avoiding dark silicon waste.
Full TVM-Based Software Stack
Origin uses a TVM-based full software stack. TVM is widely trusted and used by OEMs worldwide. This easy-to-use software allows the importing of trained networks and provides various quantization options, automatic completion, compilation, estimator and profiling tools. It also supports multi-job APIs.
Successfully Deployed in 10M Devices
Quality is key to any successful product. Origin IP has successfully deployed in over 10 million consumer devices, with designs in multiple leading-edge nodes.
Use Case
A Better Smartphone User Experience
One of the world's leading smartphone manufacturers wanted to deploy a 4K video low light denoising AI algorithm on its next-generation platform. Their current generation NPU could process only a few frames per second (FPS) and wasn’t up to the task. The manufacturer selected Expedera’s Origin NPU IP because it exceeded all expectations and outperformed every other NPU they evaluated. It increased FPS by 20X while using less than half the power of the former NPU—improving PPA by 40X—enabling them to deliver a competitive-differentiated smartphone. Origin’s impressive performance gains and power efficiencies resulted from its efficient architecture and use case customizations. The manufacturer includes Origin IP in a series of successful products.
|
|
|
|
|
|
Compute Capacity | 0.5K to 10K INT8 MACs |
Multi-tasking | Run Multiple Simultaneous Jobs |
Power Efficiency | 18 TOPS/W effective; no pruning, sparsity or compression required (though supported) |
Example Networks Supported | ResNet, MobileNet, MobileNet SSD Inception V3, RNN-T, BERT, EfficientNet, FSR CNN, CPN, CenterNet, Unet, YOLO V3, YOLO V5, ShuffleNet2, others |
Example Performance | MobileNet V1 (226 x 226): 8750 IPS, 13,482 IPS/W (N7 process, 1GHz, no sparsity/pruning/compression applied) |
Layer Support | Standard NN functions, including Conv, Deconv, FC, Activations, Reshape, Concat, Elementwise, Pooling, Softmax, others. Programmable general FP function, including Sigmoid, Tanh, Sine, Cosine, Exp, others, custom operators supported. |
Data types | INT4/INT8/INT10/INT12/INT16 Activations/Weights FP16/BFloat16 Activations/Weights |
Quantization | Channel-wise Quantization (TFLite Specification) Software toolchain supports Expedera, customer-supplied, or third-party quantization |
Latency | Deterministic performance guarantees, no back pressure |
Frameworks | TensorFlow, TFlite, ONNX, others supported |