19
srijeda
studeni
2025
Deploying YOLOv8 on Rockchip NPU Using RKNN
Rockchips NPU platform introduces a complete toolchain built around the RKNN model format.
RKNN files, typically ending in .rknn, are optimized models designed specifically for
Rockchip NPU hardware. To support developers, Rockchip provides a full suite of model conversion
tools, including Python APIs for RKNN-Toolkit2 along with C/C++ and Python runtime interfaces on
device. Together, these components streamline the deployment of deep-learning algorithms from desktop
development environments to embedded edge devices powered by Rockchip SoCs.

Overview of the Rockchip NPU Software Stack
The Rockchip NPU SDK is divided into two major parts: the desktop-side tools and the device-side runtime.
On a PC, developers use RKNN-Toolkit2 to convert mainstream deep-learning models
including Caffe, TensorFlow, TensorFlow Lite, ONNX, DarkNet, and PyTorchinto the RKNN format.
The toolkit also enables simulation-based inference, performance analysis, and memory usage evaluation
directly on the desktop, making it easy to estimate deployment results before transferring models to hardware.
On the target board, Rockchip provides a runtime environment consisting of a set of C APIs, Python
bindings, communication drivers, and essential executables. This runtime environment ensures that RKNN
models generated by RKNN-Toolkit2 can execute efficiently on the NPU. The entire RKNN software stack
is designed to help developers rapidly deploy AI workloads onto Rockchip-based systems.
The following components form the core of the ecosystem:
- RKNN-Toolkit2: A software development kit for converting, simulating, and evaluating AI models on PC or Rockchip NPU platforms.
- RKNN-Toolkit-Lite2: A lightweight Python deployment interface intended for direct use on Rockchip NPU devices, simplifying inference workflows.
- RKNN Runtime: A C/C++ runtime for executing RKNN models directly on Rockchip NPU hardware.
- RKNPU Kernel Driver: The low-level driver responsible for interacting with the NPU hardware. It is open-source and available within Rockchips kernel repositories.
Capabilities of RKNN-Toolkit2
RKNN-Toolkit2 provides a convenient Python interface for model conversion and inference on a PC.
With this toolkit, developers can:
- Convert Deep-Learning Models: Convert Caffe, TensorFlow, TensorFlow Lite, ONNX, DarkNet, and PyTorch models into RKNN format. The converted RKNN files can be exported, imported, and deployed on Rockchip NPU devices.
- Quantize Models: Convert floating-point models into fixed-point models using asymmetric quantization (asymmetric_quantized-8). Hybrid quantization is also supported.
- Simulate Inference: Run inference using the RKNN model on a PC to simulate NPU execution, enabling developers to validate outputs before deployment.
- Evaluate Performance: Send the model to an NPU-equipped device to measure execution speed and memory consumption.
- Analyze Quantization Accuracy: Compare layer-by-layer outputs between floating-point and quantized models using cosine similarity, helping identify quantization-sensitive operations.
- Encrypt Models: Encrypt RKNN models using specified security levels. Encryption occurs within the driver, meaning encrypted models load and run identically to normal models.
RKNN SDK supports mainstream Rockchip chips such as RK3566, RK3568, RK3576, and RK3588.
A typical development environment requires Ubuntu 20.04 (64-bit), Python 3.8, and at least 16GB RAM.
YOLOv8: A SOTA Vision Model Meets Edge Deployment
In January 2023, Ultralytics released YOLOv8, the newest generation of the popular YOLO model
family. Presented as a state-of-the-art, cutting-edge vision framework, YOLOv8 is designed to support
an extensive range of visual AI tasksincluding image classification, object detection, instance segmentation,
pose estimation, and even multi-object tracking.
One of YOLOv8s strengths lies in its diversified model lineup. Ultralytics provides five pretrained
variants: YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. These range from lightweight models suitable
for edge devices to high-accuracy variants ideal for demanding applications. Compared with YOLOv5 of
similar size, YOLOv8 achieves significantly higher accuracy on the COCO dataset, thanks to improvements
in architecture, training methods, loss functions, and anchor-free designs.
Where YOLOv8 Shines
- It offers strong accuracy improvements while still running efficiently on lightweight platforms.
- The command-line interface allows easy training, inference, and model export.
- Its built-in integration with tracking algorithms such as BoT-SORT and ByteTrack enables robust tracking applications.
- The FastSAM segmentation approach uses YOLOv8s backbone for generalizable mask-generation tasks.
Because of these advantages, YOLOv8 has become a strong candidate for edge-based vision systems such as
robots, surveillance devices, autonomous machines, and embedded smart cameras.
Understanding the Model Deployment Process
Deploying a machine-learning model means placing it in a real operating environment where it can perform
tasks for an actual product or service. During deployment, models often undergo optimization steps such as
operator fusion, graph restructuring, post-training quantization, and knowledge distillation. This ensures
that the model runs efficiently on specialized hardware while meeting power and latency constraints.
From Training Framework to Inference Engine
Most models are initially developed using flexible frameworks like PyTorch or TensorFlow. However,
deployment platforms frequently rely on specialized inference engines such as TensorRT, ONNX Runtime,
or Rockchips RKNN Runtime to achieve real-time performance.
To bridge these environments, Facebook and Microsoft introduced ONNX in 2017a universal model
representation format that allows conversion between training and inference frameworks. As a result,
the deployment process typically follows the workflow:
Training Framework ONNX Representation Hardware-Specific Inference Engine
Instead of manually rewriting models layer by layer, todays conversion tools rely on graph tracing.
Given a sample input, the tool traces intermediate operations and reconstructs the computational graph,
which becomes the converted model. In RKNN conversion, sample inputs also help determine quantization
parameters.
Deploying Models on Rockchip NPU Using RKNN
RKNN is the official inference framework for Rockchip NPUs. It enables optimized execution of
deep-learning models on NPU hardware, ensuring high throughput and low power consumption.
The rknn-toolkit ecosystem provides Python APIs for:
- model conversion
- post-training quantization
- simulation inference
- validity checking and debugging
A typical RKNN deployment pipeline consists of:
- RKNN Model Configuration
Specify preprocessing parameters, mean values, quantization type, platform, and other settings. - Model Loading
Import the source model (ONNX, PyTorch, TensorFlow, Caffe, etc.). Developers can also designate output nodes, which affects how the model is sliced during conversion. - RKNN Model Building
Perform quantization if needed and supply calibration datasets. - Model Export
Save the final .rknn file for deployment on Rockchip hardware.
Rockchip RK3576/RK3588 NPU Highlights
Rockchips high-performance AIoT platformsparticularly RK3576 and RK3588feature powerful NPUs
built on advanced 8nm processes. Delivering up to 6 TOPS, these NPUs handle compute-intensive
tasks such as:
- image classification
- object detection and recognition
- face recognition
- speech processing
- natural language understanding
Rockchip NPUs support mainstream deep-learning frameworks including TensorFlow, PyTorch, Caffe, and MXNet,
allowing developers to train models in familiar environments while benefiting from accelerated inference
on embedded devices. Their strong ecosystem makes them well suited for large-scale computer vision
applications such as surveillance, autonomous driving, industrial robotics, and medical imaging.
Conclusion
By combining the flexibility of YOLOv8 with the high-performance capabilities of Rockchips NPU ecosystem,
developers can build powerful edge-AI applications that operate efficiently even under strict resource
constraints. The RKNN toolchainspanning RKNN-Toolkit2, RKNN-Toolkit-Lite2, and RKNN Runtimeprovides a
comprehensive workflow from model conversion to deployment. As edge AI continues to evolve, Rockchips
platform offers a scalable and practical solution for embedding advanced vision algorithms directly
into next-generation devices.
