RK3588 MNN CPU/Vulkan/OpenCL ResNet50推理测试
一、背景介绍
1.1 RK3588芯片特性
1.2 为什么选择MNN?
1.3 测试目标解析
二、参考链接
三、操作步骤
3.1 Vulkan环境搭建
3.2 安装OpenCL环境
3.3 Vulkan运行`relu`算子
3.3.1 安装`glslang-tools`
3.3.2 编写计算着色器(`relu.comp`)
3.3.3 生成 C++ 代码(`main.cpp`)
3.3.4 编译计算着色器
3.3.5 编译 C++ 程序
3.3.6 运行程序
3.4 MNN运行`resnet50`推理
3.4.1 编译MNN
3.4.2 生成onnx模型、量化用的图片、量化配置文件
3.4.3 模型转换
3.4.4 生成、编译、运行测试程序
3.4.5 推理性能数据分析
一、背景介绍
1.1 RK3588芯片特性
Rockchip RK3588是面向AIoT领域的高性能SoC芯片,采用8nm制程工艺,搭载:
4xCortex-A76 + 4xCortex-A55大小核架构
Mali-G610 MP4 GPU(支持Vulkan 1.2/OpenCL 2.2)
6TOPS NPU(本测试未涉及)
1.2 为什么选择MNN?
阿里巴巴开源的MNN(Mobile Neural Network)推理引擎具有以下优势:
多平台支持:iOS/Android/Linux/Windows全平台覆盖
异构计算:支持CPU/GPU/NPU多后端
轻量化:基础库仅约500KB
量化加速:支持FP16/INT8量化压缩
1.3 测试目标解析
通过ResNet50模型测试不同计算后端的性能表现:
CPU:通用计算,验证基础性能
Vulkan:新一代跨平台图形计算API,低开销并行计算
OpenCL:通用异构计算标准,支持多类型加速器
量化对比:验证精度与速度的平衡点
二、参考链接
Mali610Vulkan
三、操作步骤
3.1 Vulkan环境搭建
# 安装Mali GPU官方驱动(包含Vulkan支持)
wget https://repo.rock-chips.com/edge/debian-release-v2.0.0/pool/main/r/rockchip-mali/rockchip-mali_1.9-12_arm64.deb
sudo dpkg -i rockchip-mali_1.9-12_arm64.deb
# 创建符号链接确保动态库可见性
sudo ln -s /usr/lib/aarch64-linux-gnu/libmali-valhall-g610-g6p0-wayland-gbm-vulkan.so /usr/lib/aarch64-linux-gnu/libmali.so
# 配置Vulkan驱动描述文件
sudo mkdir -p /etc/vulkan/icd.d/
echo '{
"file_format_version": "1.0.0",
"ICD": {
"library_path": "/usr/lib/aarch64-linux-gnu/libmali-valhall-g610-g6p0-wayland-gbm-vulkan.so",
"api_version": "1.0.0"
}
}' | sudo tee /etc/vulkan/icd.d/mali.json
apt install vulkan-tools vulkan-utils -y
vulkaninfo
关键配置解析:
libmali.so是Mali GPU的统一驱动入口
ICD(Installable Client Driver)文件声明Vulkan驱动路径
vulkaninfo工具用于验证驱动安装成功
如果一切正常,控制台将输出:
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '10'.
'DISPLAY' environment variable not set... skipping surface info
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '10'.
==========
VULKANINFO
==========
Vulkan Instance Version: 1.2.131
Instance Extensions: count = 10
====================
VK_EXT_debug_report : extension revision 9
VK_EXT_debug_utils : extension revision 1
VK_EXT_headless_surface : extension revision 1
VK_KHR_device_group_creation : extension revision 1
VK_KHR_display : extension revision 23
VK_KHR_external_fence_capabilities : extension revision 1
VK_KHR_external_memory_capabilities : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 2
VK_KHR_surface : extension revision 25
Layers: count = 0
=======
Presentable Surfaces:
=====================
Groups:
=======
Device Group Properties (Group 0):
physicalDeviceCount: count = 1
Mali-LODX (ID: 0)
subsetAllocation = 0
Device Group Present Capabilities (Group 0):
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '10'.
Mali-LODX (ID: 0)
Can present images from the following devices:
Mali-LODX (ID: 0)
Present modes:
DEVICE_GROUP_PRESENT_MODE_LOCAL_BIT_KHR
Device Properties and Extensions:
=================================
GPU0:
VkPhysicalDeviceProperties:
---------------------------
apiVersion = 4202661 (1.2.165)
driverVersion = 25165824 (0x1800000)
vendorID = 0x13b5
deviceID = 0xa8670000
deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
deviceName = Mali-LODX
3.2 安装OpenCL环境
# 替换系统默认OpenCL驱动
mv /lib/aarch64-linux-gnu/libOpenCL.so.1 /lib/aarch64-linux-gnu/libOpenCL.so.1.bk
ln -s /usr/lib/aarch64-linux-gnu/libmali.so /lib/aarch64-linux-gnu/libOpenCL.so.1
# 安装开发工具链
sudo apt install -y opencl-headers
sudo apt install -y ocl-icd-libopencl1
sudo apt install -y ocl-icd-opencl-dev
sudo apt install -y clinfo
clinfo
3.3 Vulkan运行relu算子
3.3.1 安装glslang-tools
apt install glslang-tools -y
3.3.2 编写计算着色器(relu.comp)
计算着色器原理
ReLU(Rectified Linear Unit)是深度学习中的常用激活函数,数学表达式为:
f ( x ) = m a x ( 0 , x ) f(x) = max(0, x) f(x)=max(0,x)
生成GLSL着色器代码
cat > relu.comp <<-'EOF'
#version 450
layout(local_size_x = 256) in; // 每个工作组256个线程
layout(binding = 0) buffer InputBuffer {
float inputData[];
};
layout(binding = 1) buffer OutputBuffer {
float outputData[];
};
void main() {
uint idx = gl_GlobalInvocationID.x; // 全局线程索引
outputData[idx] = max(inputData[idx], 0.0);
}
EOF
说明:
layout(local_size_x = 256) in; 指定计算着色器的工作组大小。
binding = 0 和 binding = 1 分别绑定输入和输出缓冲区。
gl_GlobalInvocationID.x 获取全局线程 ID,遍历所有数据元素。
max(inputData[idx], 0.0) 实现 ReLU 操作,对于每个元素,输出其与零的最大值。
Vulkan执行流程
关键对象生命周期管理
实例(Instance):应用级上下文
设备(Device):物理设备的逻辑表示
缓冲区(Buffer):存储输入输出数据
管线(Pipeline):包含计算着色器和执行参数
命令缓冲(CommandBuffer):记录执行命令
性能优化技巧
内存对齐:Vulkan要求缓冲区按最小对齐(通常256字节)分配
批处理:单次提交多个计算任务减少API开销
流水线复用:避免频繁创建/销毁管线对象
3.3.3 生成 C++ 代码(main.cpp)
cat > main.cpp <<-'EOF'
#include <vulkan/vulkan.h>
#include <iostream>
#include <vector>
#include <fstream>
#include <memory>
#include <string.h>
#include <string>
#include <math.h>
#include <cassert>
// 读取 SPIR-V 字节码的辅助函数
std::vector<char> readFile(const std::string& filename) {
std::ifstream file(filename, std::ios::ate | std::ios::binary);
assert(file.is_open() && "无法打开着色器文件!");
size_t fileSize = (size_t)file.tellg();
std::vector<char> buffer(fileSize);
file.seekg(0);
file.read(buffer.data(), fileSize);
file.close();
return buffer;
}
// 主函数
int main() {
// 初始化输入数据
const size_t dataSize = 1024;
std::vector<float> inputData(dataSize);
for (size_t i = 0; i < dataSize; ++i) {
inputData[i] = static_cast<float>(i) - 512.0f; // 一些负值和正值数据
}
std::vector<float> outputData(dataSize, 0.0f);
// ------------------------
// Vulkan 初始化步骤
// ------------------------
// 创建 Vulkan 实例
VkInstance instance;
VkInstanceCreateInfo instanceCreateInfo{
};
instanceCreateInfo.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
vkCreateInstance(&instanceCreateInfo, nullptr, &instance);
// 选择物理设备
uint32_t deviceCount = 0;
vkEnumeratePhysicalDevices(instance, &deviceCount, nullptr);
assert(deviceCount > 0 && "找不到支持 Vulkan 的物理设备!");
std::vector<VkPhysicalDevice> physicalDevices(deviceCount);
vkEnumeratePhysicalDevices(instance, &deviceCount, physicalDevices.data());
VkPhysicalDevice physicalDevice = physicalDevices[0];
// 创建逻辑设备和队列
uint32_t queueFamilyIndex = 0; // 简化处理,实际需要选择支持计算的队列族
float queuePriority = 1.0f;
VkDeviceQueueCreateInfo queueCreateInfo{
};
queueCreateInfo.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO;
queueCreateInfo.queueFamilyIndex = queueFamilyIndex;
queueCreateInfo.queueCount = 1;
queueCreateInfo.pQueuePriorities = &queuePriority;
VkDevice device;
VkDeviceCreateInfo deviceCreateInfo{
};
deviceCreateInfo.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;
deviceCreateInfo.queueCreateInfoCount = 1;
deviceCreateInfo.pQueueCreateInfos = &queueCreateInfo;
vkCreateDevice(physicalDevice, &deviceCreateInfo, nullptr, &device);
VkQueue queue;
vkGetDeviceQueue(device, queueFamilyIndex, 0, &queue);
// 创建着色器模块
auto shaderCode = readFile("relu.comp.spv");
VkShaderModule shaderModule;
VkShaderModuleCreateInfo shaderModuleCreateInfo{
};
shaderModuleCreateInfo.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO;
shaderModuleCreateInfo.codeSize = shaderCode.size();
shaderModuleCreateInfo.pCode = reinterpret_cast<const uint32_t*>(shaderCode.data());
vkCreateShaderModule(device, &shaderModuleCreateInfo, nullptr, &shaderModule);
// 创建缓冲区和内存
VkDeviceSize bufferSize = dataSize * sizeof(float);
// 辅助函数:创建缓冲区
auto createBuffer = [&](VkBuffer& buffer, VkDeviceMemory& bufferMemory) {
VkBufferCreateInfo bufferCreateInfo{
};
bufferCreateInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
bufferCreateInfo.size = bufferSize;
bufferCreateInfo.usage = VK_BUFFER_USAGE_STORAGE_BUFFER_BIT;
bufferCreateInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
vkCreateBuffer(device, &bufferCreateInfo, nullptr, &buffer);
VkMemoryRequirements memRequirements;
vkGetBufferMemoryRequirements(device, buffer, &memRequirements);
VkMemoryAllocateInfo allocInfo{
};
allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
allocInfo.allocationSize = memRequirements.size;
// 简化处理,实际需要找到合适的内存类型
allocInfo.memoryTypeIndex = 0;
vkAllocateMemory(device, &allocInfo, nullptr, &bufferMemory);
vkBindBufferMemory(device, buffer, bufferMemory, 0);
};
VkBuffer inputBuffer, outputBuffer;
VkDeviceMemory inputBufferMemory, outputBufferMemory;
createBuffer(inputBuffer, inputBufferMemory);
createBuffer(outputBuffer, outputBufferMemory);
// 将数据拷贝到输入缓冲区
void* data;
vkMapMemory(device, inputBufferMemory, 0, bufferSize, 0, &data);
memcpy(data, inputData.data(), (size_t)bufferSize);
vkUnmapMemory(device, inputBufferMemory);
// 创建描述符集布局
VkDescriptorSetLayoutBinding inputBinding{
};
inputBinding.binding = 0;
inputBinding.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
inputBinding.descriptorCount = 1;
inputBinding.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
VkDescriptorSetLayoutBinding outputBinding{
};
outputBinding.binding = 1;
outputBinding.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
outputBinding.descriptorCount = 1;
outputBinding.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
VkDescriptorSetLayoutBinding bindings[] = {
inputBinding, outputBinding };
VkDescriptorSetLayoutCreateInfo descriptorSetLayoutCreateInfo{
};
descriptorSetLayoutCreateInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
descriptorSetLayoutCreateInfo.bindingCount = 2;
descriptorSetLayoutCreateInfo.pBindings = bindings;
VkDescriptorSetLayout descriptorSetLayout;
vkCreateDescriptorSetLayout(device, &descriptorSetLayoutCreateInfo, nullptr, &descriptorSetLayout);
// 创建管线布局
VkPipelineLayout pipelineLayout;
VkPipelineLayoutCreateInfo pipelineLayoutCreateInfo{
};
pipelineLayoutCreateInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
pipelineLayoutCreateInfo.setLayoutCount = 1;
pipelineLayoutCreateInfo.pSetLayouts = &descriptorSetLayout;
vkCreatePipelineLayout(device, &pipelineLayoutCreateInfo, nullptr, &pipelineLayout);
// 创建计算管线
VkPipeline pipeline;
VkComputePipelineCreateInfo computePipelineCreateInfo{
};
computePipelineCreateInfo.sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO;
computePipelineCreateInfo.stage.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
computePipelineCreateInfo.stage.stage = VK_SHADER_STAGE_COMPUTE_BIT;
computePipelineCreateInfo.stage.module = shaderModule;
computePipelineCreateInfo.stage.pName = "main";
computePipelineCreateInfo.layout = pipelineLayout;
vkCreateComputePipelines(device, VK_NULL_HANDLE, 1, &computePipelineCreateInfo, nullptr, &pipeline);
// 创建描述符池和描述符集
VkDescriptorPoolSize poolSize{
};
poolSize.type = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
poolSize.descriptorCount = 2;
VkDescriptorPoolCreateInfo descriptorPoolCreateInfo{
};
descriptorPoolCreateInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO;
descriptorPoolCreateInfo.poolSizeCount = 1;
descriptorPoolCreateInfo.pPoolSizes = &poolSize;
descriptorPoolCreateInfo.maxSets = 1;
VkDescriptorPool descriptorPool;
vkCreateDescriptorPool(device, &descriptorPoolCreateInfo, nullptr, &descriptorPool);
VkDescriptorSet descriptorSet;
VkDescriptorSetAllocateInfo descriptorSetAllocateInfo{
};
descriptorSetAllocateInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO;
descriptorSetAllocateInfo.descriptorPool = descriptorPool;
descriptorSetAllocateInfo.descriptorSetCount = 1;
descriptorSetAllocateInfo.pSetLayouts = &descriptorSetLayout;
vkAllocateDescriptorSets(device, &descriptorSetAllocateInfo, &descriptorSet);
VkDescriptorBufferInfo inputBufferInfo{
};
inputBufferInfo.buffer = inputBuffer;
inputBufferInfo.offset = 0;
inputBufferInfo.range = bufferSize;
VkDescriptorBufferInfo outputBufferInfo{
};
outputBufferInfo.buffer = outputBuffer;
outputBufferInfo.offset = 0;
outputBufferInfo.range = bufferSize;
VkWriteDescriptorSet writeDescriptorSets[2]{
};
writeDescriptorSets[0].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
writeDescriptorSets[0].dstSet = descriptorSet;
writeDescriptorSets[0].dstBinding = 0;
writeDescriptorSets[0].descriptorCount = 1;
writeDescriptorSets[0].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
writeDescriptorSets[0].pBufferInfo = &inputBufferInfo;
writeDescriptorSets[1].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
writeDescriptorSets[1].dstSet = descriptorSet;
writeDescriptorSets[1].dstBinding = 1;
writeDescriptorSets[1].descriptorCount = 1;
writeDescriptorSets[1].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
writeDescriptorSets[1].pBufferInfo = &outputBufferInfo;
vkUpdateDescriptorSets(device, 2, writeDescriptorSets, 0, nullptr);
// 创建命令池和命令缓冲区
VkCommandPool commandPool;
VkCommandPoolCreateInfo commandPoolCreateInfo{
};
commandPoolCreateInfo.sType = VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO;
commandPoolCreateInfo.queueFamilyIndex = queueFamilyIndex;
vkCreateCommandPool(device, &commandPoolCreateInfo, nullptr, &commandPool);
VkCommandBuffer commandBuffer;
VkCommandBufferAllocateInfo commandBufferAllocateInfo{
};
commandBufferAllocateInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
commandBufferAllocateInfo.commandPool = commandPool;
commandBufferAllocateInfo.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
commandBufferAllocateInfo.commandBufferCount = 1;
vkAllocateCommandBuffers(device, &commandBufferAllocateInfo, &commandBuffer);
// 记录命令缓冲区
VkCommandBufferBeginInfo commandBufferBeginInfo{
};
commandBufferBeginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
vkBeginCommandBuffer(commandBuffer, &commandBufferBeginInfo);
vkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, pipeline);
vkCmdBindDescriptorSets(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, pipelineLayout, 0, 1, &descriptorSet, 0, nullptr);
vkCmdDispatch(commandBuffer, (uint32_t)ceil(dataSize / 256.0), 1, 1);
vkEndCommandBuffer(commandBuffer);
// 提交命令缓冲区
VkSubmitInfo submitInfo{
};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &commandBuffer;
vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE);
vkQueueWaitIdle(queue);
// 从输出缓冲区读取数据
vkMapMemory(device, outputBufferMemory, 0, bufferSize, 0, &data);
memcpy(outputData.data(), data, (size_t)bufferSize);
vkUnmapMemory(device, outputBufferMemory);
// 验证结果
bool success = true;
for (size_t i = 0; i < dataSize; ++i) {
float expected = std::max(inputData[i], 0.0f);
if (outputData[i] != expected) {
success = false;
std::cout << "结果不匹配!索引:" << i << ",期望值:" << expected << ",实际值:" << outputData[i] << std::endl;
break;
}
}
if (success) {
std::cout << "ReLU 计算正确完成!" << std::endl;
}
// 清理资源
vkDestroyBuffer(device, inputBuffer, nullptr);
vkFreeMemory(device, inputBufferMemory, nullptr);
vkDestroyBuffer(device, outputBuffer, nullptr);
vkFreeMemory(device, outputBufferMemory, nullptr);
vkDestroyPipeline(device, pipeline, nullptr);
vkDestroyPipelineLayout(device, pipelineLayout, nullptr);
vkDestroyShaderModule(device, shaderModule, nullptr);
vkDestroyDescriptorPool(device, descriptorPool, nullptr);
vkDestroyDescriptorSetLayout(device, descriptorSetLayout, nullptr);
vkDestroyCommandPool(device, commandPool, nullptr);
vkDestroyDevice(device, nullptr);
vkDestroyInstance(instance, nullptr);
return 0;
}
EOF
说明:
初始化输入数据,其中包含正值和负值,以测试 ReLU 运算。
创建了 Vulkan 实例、设备、队列、缓冲区、着色器模块等必要资源。
使用计算着色器执行 ReLU 运算,将结果写入输出缓冲区。
从输出缓冲区读取数据,验证计算结果的正确性。
在代码中添加了必要的错误检查和资源清理,确保程序稳定运行。
3.3.4 编译计算着色器
使用 glslangValidator 将 GLSL 着色器编译为 SPIR-V 字节码:
glslangValidator -V relu.comp -o relu.comp.spv
3.3.5 编译 C++ 程序
确保安装了 Vulkan SDK,并设置了必要的环境变量。使用以下命令编译 main.cpp:
g++ main.cpp -o relu -lvulkan
3.3.6 运行程序
执行编译后的程序:
./relu
如果一切正常,控制台将输出:
ReLU 计算正确完成!
3.4 MNN运行resnet50推理
3.4.1 编译MNN
git clone https://github.com/alibaba/MNN.git
cd MNN
./schema/generate.sh
rm build -rf
mkdir build && cd build
cmake -DMNN_BUILD_CONVERTER=ON -DMNN_BUILD_QUANTOOLS=ON -DCMAKE_BUILD_TYPE=Release
-DMNN_BUILD_SHARED_LIBS=ON -DMNN_SEP_BUILD=OFF -DMNN_USE_SYSTEM_LIB=ON
-DMNN_USE_THREAD_POOL=OFF -DMNN_OPENMP=ON -DMNN_BUILD_OPENCV=ON
-DMNN_OPENCL=ON -DMNN_BUILD_TOOLS=OFF -DMNN_BUILD_HARD=OFF
-DMNN_IMGCODECS=ON -DMNN_VULKAN=ON ..
make -j4
3.4.2 生成onnx模型、量化用的图片、量化配置文件
cat> resnet50.py<<-'EOF'
import torchvision.transforms as transforms
import torch
import numpy as np
import cv2
import os
import torchvision.models as models
input_tensor = torch.ones((1,3,224,224),dtype=torch.float32)
model = models.resnet50(pretrained=False)
model.eval()
with torch.no_grad():
output = model(input_tensor)
input_names = ["input"]
output_names = ["output"]
torch.onnx.export(model, input_tensor, "resnet50.onnx",
verbose=False, input_names=input_names,
output_names=output_names,opset_version=17,export_params=True)
img=np.ones((224,224,3),dtype=np.int8)
os.makedirs("quant_images")
cv2.imwrite("quant_images/img.jpg",img)
EOF
python3 resnet50.py
cat> preprocessConfig.json<<-'EOF'
{
"format":"RGB",
"mean":[
127.5,
127.5,
127.5
],
"normal":[
0.00784314,
0.00784314,
0.00784314
],
"width":224,
"height":224,
"path":"quant_images/",
"used_image_num":1,
"feature_quantize_method":"KL",
"weight_quantize_method":"MAX_ABS"
}
EOF
3.4.3 模型转换
./MNNConvert -f ONNX --modelFile resnet50.onnx --MNNModel resnet50.mnn
./quantized.out resnet50.mnn resnet50_quant.mnn preprocessConfig.json
3.4.4 生成、编译、运行测试程序
cat> resnet50_demo.cpp <<-'EOF'
#include <iostream>
#include <MNN/Interpreter.hpp>
#include <MNN/Tensor.hpp>
#include <chrono>
int main(int argc,char *argv[]) {
if(argc!=3)
{
printf("Usage:%s model_path device_type
",argv[0]);
return -1;
}
char *model_path=argv[1];
int type=atoi(argv[2]);
// 加载模型
auto net = MNN::Interpreter::createFromFile(model_path);
if (net == nullptr) {
std::cerr << "无法加载模型文件!" << std::endl;
return 1;
}
// 配置后端和会话
MNN::ScheduleConfig config;
config.type = (MNNForwardType)type;
config.numThread = 4; // 根据需要设置线程数
switch (config.type) {
case MNN_FORWARD_CPU:
std::cout << "当前使用 CPU 后端" << std::endl;
break;
case MNN_FORWARD_OPENCL:
std::cout << "当前使用 OpenCL 后端" << std::endl;
break;
case MNN_FORWARD_METAL:
std::cout << "当前使用 Metal 后端" << std::endl;
break;
case MNN_FORWARD_VULKAN:
std::cout << "当前使用 Vulkan 后端" << std::endl;
break;
// 增加其他类型
default:
std::cout << "未知后端" << std::endl;
break;
}
MNN::BackendConfig backendConfig;
backendConfig.precision = MNN::BackendConfig::Precision_Low;
config.backendConfig = &backendConfig;
auto session = net->createSession(config);
// 获取输入张量
auto input_tensor = net->getSessionInput(session, nullptr);
// 准备输入数据(以224x224的随机数据为例)
int input_width = 224;
int input_height = 224;
int input_channel = 3;
std::vector<float> input_data(input_width * input_height * input_channel);
// 生成随机输入数据
for (auto& v : input_data) {
v = 1/255.0;
}
// 将数据拷贝到输入张量
auto nchw_tensor = MNN::Tensor::create<float>({
1, input_height, input_width,
input_channel}, input_data.data(),
MNN::Tensor::CAFFE);
input_tensor->copyFromHostTensor(nchw_tensor);
delete nchw_tensor;
// 预热
int warmup_runs = 5;
for (int i = 0; i < warmup_runs; ++i) {
net->runSession(session);
}
// 推理并统计性能
int infer_runs = 10;
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < infer_runs; ++i) {
net->runSession(session);
}
auto end = std::chrono::high_resolution_clock::now();
// 计算平均推理时间
std::chrono::duration<double, std::milli> elapsed = end - start;
std::cout << "FPS: " << infer_runs/(elapsed.count()/1000) << std::endl;
// 获取输出张量
auto output_tensor = net->getSessionOutput(session, nullptr);
// 拷贝输出数据到主机
MNN::Tensor output_host(output_tensor, output_tensor->getDimensionType());
output_tensor->copyToHostTensor(&output_host);
// 解析输出(根据模型的具体输出格式,这里以分类概率为例)
auto output_data = output_host.host<float>();
int output_size = output_host.elementSize();
// 输出前5个概率值
std::cout << "模型输出前5个值:" << std::endl;
for (int i = 0; i < 5 && i < output_size; ++i) {
std::cout << output_data[i] << " ";
}
std::cout << std::endl;
// 释放资源
net->releaseModel();
delete net;
return 0;
}
EOF
g++ -o resnet50_demo resnet50_demo.cpp -I ../include/ -L . -lMNN -std=c++11 -lgomp -lpthread -lOpenCL -Wl,-rpath=./
# CPU
./resnet50_demo resnet50.mnn 0
./resnet50_demo resnet50_quant.mnn 0
# OpenCL
./resnet50_demo resnet50.mnn 3
./resnet50_demo resnet50_quant.mnn 3
# Vulkan
./resnet50_demo resnet50.mnn 7
./resnet50_demo resnet50_quant.mnn 7
3.4.5 推理性能数据分析
测试环境:
输入尺寸:224×224 RGB
Batch Size:1
预热次数:5次
统计次数:10次
| 设备 | FP16 (FPS) | INT8 (FPS) | 量化收益 |
|---|---|---|---|
| CPU | 11.48 | 21.72 | +89% |
| OpenCL | 5.40 | 9.42 | +74% |
| Vulkan | 6.80 | 6.78 | -0.3% |















暂无评论内容