srijeda

studeni

2025

Deploying YOLOv8 on Rockchip NPU Using RKNN

Rockchip’s NPU platform introduces a complete toolchain built around the RKNN model format.
RKNN files, typically ending in .rknn, are optimized models designed specifically for
Rockchip NPU hardware. To support developers, Rockchip provides a full suite of model conversion
tools, including Python APIs for RKNN-Toolkit2 along with C/C++ and Python runtime interfaces on
device. Together, these components streamline the deployment of deep-learning algorithms from desktop
development environments to embedded edge devices powered by Rockchip SoCs.

Overview of the Rockchip NPU Software Stack

The Rockchip NPU SDK is divided into two major parts: the desktop-side tools and the device-side runtime.
On a PC, developers use RKNN-Toolkit2 to convert mainstream deep-learning models—
including Caffe, TensorFlow, TensorFlow Lite, ONNX, DarkNet, and PyTorch—into the RKNN format.
The toolkit also enables simulation-based inference, performance analysis, and memory usage evaluation
directly on the desktop, making it easy to estimate deployment results before transferring models to hardware.

On the target board, Rockchip provides a runtime environment consisting of a set of C APIs, Python
bindings, communication drivers, and essential executables. This runtime environment ensures that RKNN
models generated by RKNN-Toolkit2 can execute efficiently on the NPU. The entire RKNN software stack
is designed to help developers rapidly deploy AI workloads onto Rockchip-based systems.

The following components form the core of the ecosystem:

RKNN-Toolkit2: A software development kit for converting, simulating, and evaluating AI models on PC or Rockchip NPU platforms.

RKNN-Toolkit-Lite2: A lightweight Python deployment interface intended for direct use on Rockchip NPU devices, simplifying inference workflows.

RKNN Runtime: A C/C++ runtime for executing RKNN models directly on Rockchip NPU hardware.

RKNPU Kernel Driver: The low-level driver responsible for interacting with the NPU hardware. It is open-source and available within Rockchip’s kernel repositories.

Capabilities of RKNN-Toolkit2

RKNN-Toolkit2 provides a convenient Python interface for model conversion and inference on a PC.
With this toolkit, developers can:

Convert Deep-Learning Models: Convert Caffe, TensorFlow, TensorFlow Lite, ONNX, DarkNet, and PyTorch models into RKNN format. The converted RKNN files can be exported, imported, and deployed on Rockchip NPU devices.

Quantize Models: Convert floating-point models into fixed-point models using asymmetric quantization (asymmetric_quantized-8). Hybrid quantization is also supported.

Simulate Inference: Run inference using the RKNN model on a PC to simulate NPU execution, enabling developers to validate outputs before deployment.

Evaluate Performance: Send the model to an NPU-equipped device to measure execution speed and memory consumption.

Analyze Quantization Accuracy: Compare layer-by-layer outputs between floating-point and quantized models using cosine similarity, helping identify quantization-sensitive operations.

Encrypt Models: Encrypt RKNN models using specified security levels. Encryption occurs within the driver, meaning encrypted models load and run identically to normal models.

RKNN SDK supports mainstream Rockchip chips such as RK3566, RK3568, RK3576, and RK3588.
A typical development environment requires Ubuntu 20.04 (64-bit), Python 3.8, and at least 16GB RAM.

YOLOv8: A SOTA Vision Model Meets Edge Deployment

In January 2023, Ultralytics released YOLOv8, the newest generation of the popular YOLO model
family. Presented as a state-of-the-art, cutting-edge vision framework, YOLOv8 is designed to support
an extensive range of visual AI tasks—including image classification, object detection, instance segmentation,
pose estimation, and even multi-object tracking.

One of YOLOv8’s strengths lies in its diversified model lineup. Ultralytics provides five pretrained
variants: YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. These range from lightweight models suitable
for edge devices to high-accuracy variants ideal for demanding applications. Compared with YOLOv5 of
similar size, YOLOv8 achieves significantly higher accuracy on the COCO dataset, thanks to improvements
in architecture, training methods, loss functions, and anchor-free designs.

Where YOLOv8 Shines

It offers strong accuracy improvements while still running efficiently on lightweight platforms.

The command-line interface allows easy training, inference, and model export.

Its built-in integration with tracking algorithms such as BoT-SORT and ByteTrack enables robust tracking applications.

The FastSAM segmentation approach uses YOLOv8’s backbone for generalizable mask-generation tasks.

Because of these advantages, YOLOv8 has become a strong candidate for edge-based vision systems such as
robots, surveillance devices, autonomous machines, and embedded smart cameras.

Understanding the Model Deployment Process

Deploying a machine-learning model means placing it in a real operating environment where it can perform
tasks for an actual product or service. During deployment, models often undergo optimization steps such as
operator fusion, graph restructuring, post-training quantization, and knowledge distillation. This ensures
that the model runs efficiently on specialized hardware while meeting power and latency constraints.

From Training Framework to Inference Engine

Most models are initially developed using flexible frameworks like PyTorch or TensorFlow. However,
deployment platforms frequently rely on specialized inference engines such as TensorRT, ONNX Runtime,
or Rockchip’s RKNN Runtime to achieve real-time performance.

To bridge these environments, Facebook and Microsoft introduced ONNX in 2017—a universal model
representation format that allows conversion between training and inference frameworks. As a result,
the deployment process typically follows the workflow:

Training Framework ’ ONNX Representation ’ Hardware-Specific Inference Engine

Instead of manually rewriting models layer by layer, today’s conversion tools rely on graph tracing.
Given a sample input, the tool traces intermediate operations and reconstructs the computational graph,
which becomes the converted model. In RKNN conversion, sample inputs also help determine quantization
parameters.

Deploying Models on Rockchip NPU Using RKNN

RKNN is the official inference framework for Rockchip NPUs. It enables optimized execution of
deep-learning models on NPU hardware, ensuring high throughput and low power consumption.
The rknn-toolkit ecosystem provides Python APIs for:

model conversion

post-training quantization

simulation inference

validity checking and debugging

A typical RKNN deployment pipeline consists of:

RKNN Model Configuration
Specify preprocessing parameters, mean values, quantization type, platform, and other settings.

Model Loading
Import the source model (ONNX, PyTorch, TensorFlow, Caffe, etc.). Developers can also designate output nodes, which affects how the model is sliced during conversion.

RKNN Model Building
Perform quantization if needed and supply calibration datasets.

Model Export
Save the final .rknn file for deployment on Rockchip hardware.

Rockchip RK3576/RK3588 NPU Highlights

Rockchip’s high-performance AIoT platforms—particularly RK3576 and RK3588—feature powerful NPUs
built on advanced 8nm processes. Delivering up to 6 TOPS, these NPUs handle compute-intensive
tasks such as:

image classification

object detection and recognition

face recognition

speech processing

natural language understanding

Rockchip NPUs support mainstream deep-learning frameworks including TensorFlow, PyTorch, Caffe, and MXNet,
allowing developers to train models in familiar environments while benefiting from accelerated inference
on embedded devices. Their strong ecosystem makes them well suited for large-scale computer vision
applications such as surveillance, autonomous driving, industrial robotics, and medical imaging.

Conclusion

By combining the flexibility of YOLOv8 with the high-performance capabilities of Rockchip’s NPU ecosystem,
developers can build powerful edge-AI applications that operate efficiently even under strict resource
constraints. The RKNN toolchain—spanning RKNN-Toolkit2, RKNN-Toolkit-Lite2, and RKNN Runtime—provides a
comprehensive workflow from model conversion to deployment. As edge AI continues to evolve, Rockchip’s
platform offers a scalable and practical solution for embedding advanced vision algorithms directly
into next-generation devices.

<< Arhiva >>

Ovaj blog je ustupljen pod Creative Commons licencom Imenovanje-Dijeli pod istim uvjetima.