Origin E8

A Top Performer

The Origin E8 is a family of NPU IP inference cores designed for the most performance-intensive applications, including automotive and data centers. With its ability to run multiple networks concurrently with zero penalty context switching, the E8 excels when high performance, low latency, and efficient processor utilization are required. Unlike other IPs that rely on tiling to scale performance—introducing associated power, memory sharing, and area penalties—the E8 offers single-core performance of up to 128 TOPS, delivering the computational capability required by the most advanced ADAS implementations.

Innovative Architecture

The Origin E8 neural engine uses Expedera’s unique packet-based architecture, which is far more efficient than common layer-based architectures. The architecture enables parallel execution across multiple layers achieving better resource utilization and deterministic performance. It also eliminates the need for hardware-specific optimizations, allowing customers to run their trained neural networks unchanged without reducing model accuracy. This innovative approach greatly increases performance while lowering power, area, and latency.

Customizable

Power-efficient

Sustainable performance

Easy to deploy

field proven

Choose the Features You Need

Customization brings many advantages, including increased performance, lower latency, reduced power consumption, and eliminating dark silicon waste. Expedera works with customers to understand their use case(s), PPA goals, and deployment needs during their design stage. Using this information, we configure Origin IP to create a customized solution that perfectly fits the application.

An electric vehicle OEM aimed to develop its own ADAS processor optimized to run both standard and internally developed neural networks and with the flexibility to deploy new networks post-silicon. After testing more than ten different platforms from multiple vendors, the OEM rated Expedera’s ASIL-B readiness-certified Origin IP as the “best-in-market.” This was based on tests that included input resolutions up to 8K and compute capacity ranging from 100 to 500 TOPS, focusing on the lowest deterministic latency and DDR memory bandwidth.

Features

Specifications

Compute Capacity	16K to 64K INT8 MACs
Multi-tasking	Run >10 Simultaneous Jobs
Power Efficiency	18 TOPS/W effective; no pruning, sparsity or compression required (though supported)
Example Networks Supported	YOLO v3, YOLO V5, RetinaNet, Panoptix Deeplab, PlainLite, ResNext, ResNet 50, Inception V3, RNN-T, MobileNet V1, MobileNet SSD, BERT, EfficientNet, FSR CNN, CPN, CenterNet, Unet, ShuffleNet2, Swin, SSD-ResNet34, DETR, others
Example Performance	YOLO v3 (608 x 608): 626 IPS, 115.6 IPS/W (N7 process, 1GHz, no sparsity/pruning/compression applied)
Layer Support	Standard NN functions, including Conv, Deconv, FC, Activations, Reshape, Concat, Elementwise, Pooling, Softmax, others. Programmable general FP function, including Sigmoid, Tanh, Sine, Cosine, Exp, others, custom operators supported.
Data types	INT4/INT8/INT10/INT12/INT16 Activations/Weights FP16/BFloat16 Activations/Weights
Quantization	Channel-wise Quantization (TFLite Specification) Software toolchain supports Expedera, customer-supplied, or third-party quantization
Latency	Deterministic performance guarantees, no back pressure
Frameworks	TensorFlow, TFlite, ONNX, others supported

32 - 128 TOPS performance, PetaOps, and beyond with multiple cores	Input resolutions up to 8K and beyond
Performance efficiencies up to 18 TOPS/Watt	Support for standard, custom, and proprietary neural networks
Runs CNN, RNN, DNN, LSTM, and other network types	Full software stack provided, including compiler, estimator, scheduler, and quantizer
Support for transformers, stable diffusion, others	Delivered as Soft IP (RTL) or GDS

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Search

AI Inference for the Highest Performing Systems

From data centers to autonomous cars, the most demanding AI applications need high-performance NPUs with the lowest possible latency. With its highly customizable architecture, the Origin^® E8 delivers performance that scales to 128 TOPS in a single core and PetaOps with multiple cores.

A Top Performer

Innovative Architecture

Choose the Features You Need

Market-Leading 18 TOPS/W

Efficient Resource Utilization

Full TVM-Based Software Stack

Successfully Deployed in 10M Devices

Use Case

Custom Network Automotive Deployments

Download our White Papers

Get in Touch With Us

Origin E8

AI Inference for the Highest Performing Systems

From data centers to autonomous cars, the most demanding AI applications need high-performance NPUs with the lowest possible latency. With its highly customizable architecture, the Origin® E8 delivers performance that scales to 128 TOPS in a single core and PetaOps with multiple cores.

A Top Performer

Innovative Architecture

Choose the Features You Need

Use Case

Custom Network Automotive Deployments

Download our White Papers

Get in Touch With Us

STAY INFORMED

Subscribe to our News

From data centers to autonomous cars, the most demanding AI applications need high-performance NPUs with the lowest possible latency. With its highly customizable architecture, the Origin^® E8 delivers performance that scales to 128 TOPS in a single core and PetaOps with multiple cores.