RK3588 MNN CPU/Vulkan/OpenCL ResNet50推理测试

RK3588 MNN CPU/Vulkan/OpenCL ResNet50推理测试

一、背景介绍

1.1 RK3588芯片特性
1.2 为什么选择MNN?
1.3 测试目标解析

二、参考链接
三、操作步骤

3.1 Vulkan环境搭建
3.2 安装OpenCL环境
3.3 Vulkan运行`relu`算子

3.3.1 安装`glslang-tools`
3.3.2 编写计算着色器(`relu.comp`)
3.3.3 生成 C++ 代码(`main.cpp`)
3.3.4 编译计算着色器
3.3.5 编译 C++ 程序
3.3.6 运行程序

3.4 MNN运行`resnet50`推理

3.4.1 编译MNN
3.4.2 生成onnx模型、量化用的图片、量化配置文件
3.4.3 模型转换
3.4.4 生成、编译、运行测试程序
3.4.5 推理性能数据分析

一、背景介绍

1.1 RK3588芯片特性

Rockchip RK3588是面向AIoT领域的高性能SoC芯片,采用8nm制程工艺,搭载:

4xCortex-A76 + 4xCortex-A55大小核架构
Mali-G610 MP4 GPU(支持Vulkan 1.2/OpenCL 2.2)
6TOPS NPU(本测试未涉及)

1.2 为什么选择MNN?

阿里巴巴开源的MNN(Mobile Neural Network)推理引擎具有以下优势:

多平台支持:iOS/Android/Linux/Windows全平台覆盖
异构计算:支持CPU/GPU/NPU多后端
轻量化:基础库仅约500KB
量化加速:支持FP16/INT8量化压缩

1.3 测试目标解析

通过ResNet50模型测试不同计算后端的性能表现:

CPU:通用计算,验证基础性能
Vulkan:新一代跨平台图形计算API,低开销并行计算
OpenCL:通用异构计算标准,支持多类型加速器
量化对比:验证精度与速度的平衡点

二、参考链接

Mali610Vulkan

三、操作步骤

3.1 Vulkan环境搭建

# 安装Mali GPU官方驱动(包含Vulkan支持)
wget https://repo.rock-chips.com/edge/debian-release-v2.0.0/pool/main/r/rockchip-mali/rockchip-mali_1.9-12_arm64.deb
sudo dpkg -i rockchip-mali_1.9-12_arm64.deb

# 创建符号链接确保动态库可见性
sudo ln -s /usr/lib/aarch64-linux-gnu/libmali-valhall-g610-g6p0-wayland-gbm-vulkan.so /usr/lib/aarch64-linux-gnu/libmali.so

# 配置Vulkan驱动描述文件
sudo mkdir -p /etc/vulkan/icd.d/
echo '{
    "file_format_version": "1.0.0",
    "ICD": {
        "library_path": "/usr/lib/aarch64-linux-gnu/libmali-valhall-g610-g6p0-wayland-gbm-vulkan.so",
        "api_version": "1.0.0"
    }
}' | sudo tee /etc/vulkan/icd.d/mali.json

apt install vulkan-tools  vulkan-utils -y
vulkaninfo

关键配置解析

libmali.so是Mali GPU的统一驱动入口
ICD(Installable Client Driver)文件声明Vulkan驱动路径
vulkaninfo工具用于验证驱动安装成功

如果一切正常,控制台将输出:

arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '10'.
'DISPLAY' environment variable not set... skipping surface info
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '10'.
==========
VULKANINFO
==========

Vulkan Instance Version: 1.2.131


Instance Extensions: count = 10
====================
        VK_EXT_debug_report                    : extension revision 9
        VK_EXT_debug_utils                     : extension revision 1
        VK_EXT_headless_surface                : extension revision 1
        VK_KHR_device_group_creation           : extension revision 1
        VK_KHR_display                         : extension revision 23
        VK_KHR_external_fence_capabilities     : extension revision 1
        VK_KHR_external_memory_capabilities    : extension revision 1
        VK_KHR_external_semaphore_capabilities : extension revision 1
        VK_KHR_get_physical_device_properties2 : extension revision 2
        VK_KHR_surface                         : extension revision 25

Layers: count = 0
=======
Presentable Surfaces:
=====================

Groups:
=======
        Device Group Properties (Group 0):
                physicalDeviceCount: count = 1
                        Mali-LODX (ID: 0)
                subsetAllocation = 0

        Device Group Present Capabilities (Group 0):
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '10'.
                Mali-LODX (ID: 0)
                Can present images from the following devices:
                        Mali-LODX (ID: 0)
                Present modes:
                        DEVICE_GROUP_PRESENT_MODE_LOCAL_BIT_KHR


Device Properties and Extensions:
=================================
GPU0:
VkPhysicalDeviceProperties:
---------------------------
        apiVersion     = 4202661 (1.2.165)
        driverVersion  = 25165824 (0x1800000)
        vendorID       = 0x13b5
        deviceID       = 0xa8670000
        deviceType     = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
        deviceName     = Mali-LODX

3.2 安装OpenCL环境

# 替换系统默认OpenCL驱动
mv /lib/aarch64-linux-gnu/libOpenCL.so.1 /lib/aarch64-linux-gnu/libOpenCL.so.1.bk
ln -s /usr/lib/aarch64-linux-gnu/libmali.so /lib/aarch64-linux-gnu/libOpenCL.so.1

# 安装开发工具链
sudo apt install -y opencl-headers
sudo apt install -y ocl-icd-libopencl1
sudo apt install -y ocl-icd-opencl-dev
sudo apt install -y clinfo
clinfo

3.3 Vulkan运行relu算子

3.3.1 安装glslang-tools

apt install glslang-tools -y

3.3.2 编写计算着色器(relu.comp

计算着色器原理
ReLU(Rectified Linear Unit)是深度学习中的常用激活函数,数学表达式为:

f ( x ) = m a x ( 0 , x ) f(x) = max(0, x) f(x)=max(0,x)

生成GLSL着色器代码

cat > relu.comp <<-'EOF'
#version 450

layout(local_size_x = 256) in; // 每个工作组256个线程

layout(binding = 0) buffer InputBuffer {
    float inputData[];
};

layout(binding = 1) buffer OutputBuffer {
    float outputData[];
};

void main() {
    uint idx = gl_GlobalInvocationID.x; // 全局线程索引
    outputData[idx] = max(inputData[idx], 0.0);
}
EOF

说明:

layout(local_size_x = 256) in; 指定计算着色器的工作组大小。
binding = 0binding = 1 分别绑定输入和输出缓冲区。
gl_GlobalInvocationID.x 获取全局线程 ID,遍历所有数据元素。
max(inputData[idx], 0.0) 实现 ReLU 操作,对于每个元素,输出其与零的最大值。

Vulkan执行流程

关键对象生命周期管理

实例(Instance):应用级上下文
设备(Device):物理设备的逻辑表示
缓冲区(Buffer):存储输入输出数据
管线(Pipeline):包含计算着色器和执行参数
命令缓冲(CommandBuffer):记录执行命令

性能优化技巧

内存对齐:Vulkan要求缓冲区按最小对齐(通常256字节)分配
批处理:单次提交多个计算任务减少API开销
流水线复用:避免频繁创建/销毁管线对象


3.3.3 生成 C++ 代码(main.cpp

cat > main.cpp <<-'EOF'
#include <vulkan/vulkan.h>
#include <iostream>
#include <vector>
#include <fstream>
#include <memory>
#include <string.h>
#include <string>
#include <math.h>
#include <cassert>

// 读取 SPIR-V 字节码的辅助函数
std::vector<char> readFile(const std::string& filename) {
            
    std::ifstream file(filename, std::ios::ate | std::ios::binary);
    assert(file.is_open() && "无法打开着色器文件!");
    size_t fileSize = (size_t)file.tellg();
    std::vector<char> buffer(fileSize);
    file.seekg(0);
    file.read(buffer.data(), fileSize);
    file.close();
    return buffer;
}

// 主函数
int main() {
            
    // 初始化输入数据
    const size_t dataSize = 1024;
    std::vector<float> inputData(dataSize);
    for (size_t i = 0; i < dataSize; ++i) {
            
        inputData[i] = static_cast<float>(i) - 512.0f; // 一些负值和正值数据
    }
    std::vector<float> outputData(dataSize, 0.0f);

    // ------------------------
    // Vulkan 初始化步骤
    // ------------------------

    // 创建 Vulkan 实例
    VkInstance instance;
    VkInstanceCreateInfo instanceCreateInfo{
            };
    instanceCreateInfo.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
    vkCreateInstance(&instanceCreateInfo, nullptr, &instance);

    // 选择物理设备
    uint32_t deviceCount = 0;
    vkEnumeratePhysicalDevices(instance, &deviceCount, nullptr);
    assert(deviceCount > 0 && "找不到支持 Vulkan 的物理设备!");
    std::vector<VkPhysicalDevice> physicalDevices(deviceCount);
    vkEnumeratePhysicalDevices(instance, &deviceCount, physicalDevices.data());
    VkPhysicalDevice physicalDevice = physicalDevices[0];

    // 创建逻辑设备和队列
    uint32_t queueFamilyIndex = 0; // 简化处理,实际需要选择支持计算的队列族
    float queuePriority = 1.0f;
    VkDeviceQueueCreateInfo queueCreateInfo{
            };
    queueCreateInfo.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO;
    queueCreateInfo.queueFamilyIndex = queueFamilyIndex;
    queueCreateInfo.queueCount = 1;
    queueCreateInfo.pQueuePriorities = &queuePriority;

    VkDevice device;
    VkDeviceCreateInfo deviceCreateInfo{
            };
    deviceCreateInfo.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;
    deviceCreateInfo.queueCreateInfoCount = 1;
    deviceCreateInfo.pQueueCreateInfos = &queueCreateInfo;
    vkCreateDevice(physicalDevice, &deviceCreateInfo, nullptr, &device);

    VkQueue queue;
    vkGetDeviceQueue(device, queueFamilyIndex, 0, &queue);

    // 创建着色器模块
    auto shaderCode = readFile("relu.comp.spv");
    VkShaderModule shaderModule;
    VkShaderModuleCreateInfo shaderModuleCreateInfo{
            };
    shaderModuleCreateInfo.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO;
    shaderModuleCreateInfo.codeSize = shaderCode.size();
    shaderModuleCreateInfo.pCode = reinterpret_cast<const uint32_t*>(shaderCode.data());
    vkCreateShaderModule(device, &shaderModuleCreateInfo, nullptr, &shaderModule);

    // 创建缓冲区和内存
    VkDeviceSize bufferSize = dataSize * sizeof(float);

    // 辅助函数:创建缓冲区
    auto createBuffer = [&](VkBuffer& buffer, VkDeviceMemory& bufferMemory) {
            
        VkBufferCreateInfo bufferCreateInfo{
            };
        bufferCreateInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
        bufferCreateInfo.size = bufferSize;
        bufferCreateInfo.usage = VK_BUFFER_USAGE_STORAGE_BUFFER_BIT;
        bufferCreateInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
        vkCreateBuffer(device, &bufferCreateInfo, nullptr, &buffer);

        VkMemoryRequirements memRequirements;
        vkGetBufferMemoryRequirements(device, buffer, &memRequirements);

        VkMemoryAllocateInfo allocInfo{
            };
        allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
        allocInfo.allocationSize = memRequirements.size;
        // 简化处理,实际需要找到合适的内存类型
        allocInfo.memoryTypeIndex = 0;
        vkAllocateMemory(device, &allocInfo, nullptr, &bufferMemory);
        vkBindBufferMemory(device, buffer, bufferMemory, 0);
    };

    VkBuffer inputBuffer, outputBuffer;
    VkDeviceMemory inputBufferMemory, outputBufferMemory;

    createBuffer(inputBuffer, inputBufferMemory);
    createBuffer(outputBuffer, outputBufferMemory);

    // 将数据拷贝到输入缓冲区
    void* data;
    vkMapMemory(device, inputBufferMemory, 0, bufferSize, 0, &data);
    memcpy(data, inputData.data(), (size_t)bufferSize);
    vkUnmapMemory(device, inputBufferMemory);

    // 创建描述符集布局
    VkDescriptorSetLayoutBinding inputBinding{
            };
    inputBinding.binding = 0;
    inputBinding.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
    inputBinding.descriptorCount = 1;
    inputBinding.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;

    VkDescriptorSetLayoutBinding outputBinding{
            };
    outputBinding.binding = 1;
    outputBinding.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
    outputBinding.descriptorCount = 1;
    outputBinding.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;

    VkDescriptorSetLayoutBinding bindings[] = {
             inputBinding, outputBinding };
    VkDescriptorSetLayoutCreateInfo descriptorSetLayoutCreateInfo{
            };
    descriptorSetLayoutCreateInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
    descriptorSetLayoutCreateInfo.bindingCount = 2;
    descriptorSetLayoutCreateInfo.pBindings = bindings;

    VkDescriptorSetLayout descriptorSetLayout;
    vkCreateDescriptorSetLayout(device, &descriptorSetLayoutCreateInfo, nullptr, &descriptorSetLayout);

    // 创建管线布局
    VkPipelineLayout pipelineLayout;
    VkPipelineLayoutCreateInfo pipelineLayoutCreateInfo{
            };
    pipelineLayoutCreateInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
    pipelineLayoutCreateInfo.setLayoutCount = 1;
    pipelineLayoutCreateInfo.pSetLayouts = &descriptorSetLayout;
    vkCreatePipelineLayout(device, &pipelineLayoutCreateInfo, nullptr, &pipelineLayout);

    // 创建计算管线
    VkPipeline pipeline;
    VkComputePipelineCreateInfo computePipelineCreateInfo{
            };
    computePipelineCreateInfo.sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO;
    computePipelineCreateInfo.stage.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
    computePipelineCreateInfo.stage.stage = VK_SHADER_STAGE_COMPUTE_BIT;
    computePipelineCreateInfo.stage.module = shaderModule;
    computePipelineCreateInfo.stage.pName = "main";
    computePipelineCreateInfo.layout = pipelineLayout;
    vkCreateComputePipelines(device, VK_NULL_HANDLE, 1, &computePipelineCreateInfo, nullptr, &pipeline);

    // 创建描述符池和描述符集
    VkDescriptorPoolSize poolSize{
            };
    poolSize.type = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
    poolSize.descriptorCount = 2;

    VkDescriptorPoolCreateInfo descriptorPoolCreateInfo{
            };
    descriptorPoolCreateInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO;
    descriptorPoolCreateInfo.poolSizeCount = 1;
    descriptorPoolCreateInfo.pPoolSizes = &poolSize;
    descriptorPoolCreateInfo.maxSets = 1;

    VkDescriptorPool descriptorPool;
    vkCreateDescriptorPool(device, &descriptorPoolCreateInfo, nullptr, &descriptorPool);

    VkDescriptorSet descriptorSet;
    VkDescriptorSetAllocateInfo descriptorSetAllocateInfo{
            };
    descriptorSetAllocateInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO;
    descriptorSetAllocateInfo.descriptorPool = descriptorPool;
    descriptorSetAllocateInfo.descriptorSetCount = 1;
    descriptorSetAllocateInfo.pSetLayouts = &descriptorSetLayout;
    vkAllocateDescriptorSets(device, &descriptorSetAllocateInfo, &descriptorSet);

    VkDescriptorBufferInfo inputBufferInfo{
            };
    inputBufferInfo.buffer = inputBuffer;
    inputBufferInfo.offset = 0;
    inputBufferInfo.range = bufferSize;

    VkDescriptorBufferInfo outputBufferInfo{
            };
    outputBufferInfo.buffer = outputBuffer;
    outputBufferInfo.offset = 0;
    outputBufferInfo.range = bufferSize;

    VkWriteDescriptorSet writeDescriptorSets[2]{
            };

    writeDescriptorSets[0].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
    writeDescriptorSets[0].dstSet = descriptorSet;
    writeDescriptorSets[0].dstBinding = 0;
    writeDescriptorSets[0].descriptorCount = 1;
    writeDescriptorSets[0].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
    writeDescriptorSets[0].pBufferInfo = &inputBufferInfo;

    writeDescriptorSets[1].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
    writeDescriptorSets[1].dstSet = descriptorSet;
    writeDescriptorSets[1].dstBinding = 1;
    writeDescriptorSets[1].descriptorCount = 1;
    writeDescriptorSets[1].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
    writeDescriptorSets[1].pBufferInfo = &outputBufferInfo;

    vkUpdateDescriptorSets(device, 2, writeDescriptorSets, 0, nullptr);

    // 创建命令池和命令缓冲区
    VkCommandPool commandPool;
    VkCommandPoolCreateInfo commandPoolCreateInfo{
            };
    commandPoolCreateInfo.sType = VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO;
    commandPoolCreateInfo.queueFamilyIndex = queueFamilyIndex;
    vkCreateCommandPool(device, &commandPoolCreateInfo, nullptr, &commandPool);

    VkCommandBuffer commandBuffer;
    VkCommandBufferAllocateInfo commandBufferAllocateInfo{
            };
    commandBufferAllocateInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
    commandBufferAllocateInfo.commandPool = commandPool;
    commandBufferAllocateInfo.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
    commandBufferAllocateInfo.commandBufferCount = 1;
    vkAllocateCommandBuffers(device, &commandBufferAllocateInfo, &commandBuffer);

    // 记录命令缓冲区
    VkCommandBufferBeginInfo commandBufferBeginInfo{
            };
    commandBufferBeginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
    vkBeginCommandBuffer(commandBuffer, &commandBufferBeginInfo);
    vkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, pipeline);
    vkCmdBindDescriptorSets(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, pipelineLayout, 0, 1, &descriptorSet, 0, nullptr);
    vkCmdDispatch(commandBuffer, (uint32_t)ceil(dataSize / 256.0), 1, 1);
    vkEndCommandBuffer(commandBuffer);

    // 提交命令缓冲区
    VkSubmitInfo submitInfo{
            };
    submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
    submitInfo.commandBufferCount = 1;
    submitInfo.pCommandBuffers = &commandBuffer;
    vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE);
    vkQueueWaitIdle(queue);

    // 从输出缓冲区读取数据
    vkMapMemory(device, outputBufferMemory, 0, bufferSize, 0, &data);
    memcpy(outputData.data(), data, (size_t)bufferSize);
    vkUnmapMemory(device, outputBufferMemory);

    // 验证结果
    bool success = true;
    for (size_t i = 0; i < dataSize; ++i) {
            
        float expected = std::max(inputData[i], 0.0f);
        if (outputData[i] != expected) {
            
            success = false;
            std::cout << "结果不匹配!索引:" << i << ",期望值:" << expected << ",实际值:" << outputData[i] << std::endl;
            break;
        }
    }
    if (success) {
            
        std::cout << "ReLU 计算正确完成!" << std::endl;
    }

    // 清理资源
    vkDestroyBuffer(device, inputBuffer, nullptr);
    vkFreeMemory(device, inputBufferMemory, nullptr);
    vkDestroyBuffer(device, outputBuffer, nullptr);
    vkFreeMemory(device, outputBufferMemory, nullptr);
    vkDestroyPipeline(device, pipeline, nullptr);
    vkDestroyPipelineLayout(device, pipelineLayout, nullptr);
    vkDestroyShaderModule(device, shaderModule, nullptr);
    vkDestroyDescriptorPool(device, descriptorPool, nullptr);
    vkDestroyDescriptorSetLayout(device, descriptorSetLayout, nullptr);
    vkDestroyCommandPool(device, commandPool, nullptr);
    vkDestroyDevice(device, nullptr);
    vkDestroyInstance(instance, nullptr);

    return 0;
}
EOF

说明:

初始化输入数据,其中包含正值和负值,以测试 ReLU 运算。
创建了 Vulkan 实例、设备、队列、缓冲区、着色器模块等必要资源。
使用计算着色器执行 ReLU 运算,将结果写入输出缓冲区。
从输出缓冲区读取数据,验证计算结果的正确性。
在代码中添加了必要的错误检查和资源清理,确保程序稳定运行。


3.3.4 编译计算着色器

使用 glslangValidator 将 GLSL 着色器编译为 SPIR-V 字节码:

glslangValidator -V relu.comp -o relu.comp.spv

3.3.5 编译 C++ 程序

确保安装了 Vulkan SDK,并设置了必要的环境变量。使用以下命令编译 main.cpp

g++ main.cpp -o relu -lvulkan

3.3.6 运行程序

执行编译后的程序:

./relu

如果一切正常,控制台将输出:

ReLU 计算正确完成!

3.4 MNN运行resnet50推理

3.4.1 编译MNN

git clone https://github.com/alibaba/MNN.git
cd MNN
./schema/generate.sh
rm build -rf
mkdir build && cd build
cmake -DMNN_BUILD_CONVERTER=ON -DMNN_BUILD_QUANTOOLS=ON -DCMAKE_BUILD_TYPE=Release 
    -DMNN_BUILD_SHARED_LIBS=ON -DMNN_SEP_BUILD=OFF -DMNN_USE_SYSTEM_LIB=ON 
    -DMNN_USE_THREAD_POOL=OFF -DMNN_OPENMP=ON -DMNN_BUILD_OPENCV=ON 
    -DMNN_OPENCL=ON -DMNN_BUILD_TOOLS=OFF -DMNN_BUILD_HARD=OFF 
    -DMNN_IMGCODECS=ON -DMNN_VULKAN=ON ..
make -j4

3.4.2 生成onnx模型、量化用的图片、量化配置文件

cat> resnet50.py<<-'EOF'    
import torchvision.transforms as transforms
import torch
import numpy as np
import cv2
import os
import torchvision.models as models
input_tensor = torch.ones((1,3,224,224),dtype=torch.float32)
model = models.resnet50(pretrained=False)
model.eval()
with torch.no_grad():
    output = model(input_tensor)
input_names = ["input"]
output_names = ["output"]
torch.onnx.export(model, input_tensor, "resnet50.onnx", 
                    verbose=False, input_names=input_names,
                    output_names=output_names,opset_version=17,export_params=True)
img=np.ones((224,224,3),dtype=np.int8)
os.makedirs("quant_images")
cv2.imwrite("quant_images/img.jpg",img)               
EOF
python3 resnet50.py

cat> preprocessConfig.json<<-'EOF'  
{
            
    "format":"RGB",
    "mean":[
        127.5,
        127.5,
        127.5
    ],
    "normal":[
        0.00784314,
        0.00784314,
        0.00784314
    ],
    "width":224,
    "height":224,
    "path":"quant_images/",
    "used_image_num":1,
    "feature_quantize_method":"KL",
    "weight_quantize_method":"MAX_ABS"
}
EOF

3.4.3 模型转换

./MNNConvert -f ONNX --modelFile resnet50.onnx --MNNModel resnet50.mnn
./quantized.out resnet50.mnn resnet50_quant.mnn preprocessConfig.json

3.4.4 生成、编译、运行测试程序

cat> resnet50_demo.cpp <<-'EOF'  
#include <iostream>
#include <MNN/Interpreter.hpp>
#include <MNN/Tensor.hpp>
#include <chrono>

int main(int argc,char *argv[]) {
            
    if(argc!=3)
    {
            
        printf("Usage:%s model_path device_type
",argv[0]);
        return -1;
    }
    char *model_path=argv[1];
    int type=atoi(argv[2]);
    // 加载模型
    auto net = MNN::Interpreter::createFromFile(model_path);
    if (net == nullptr) {
            
        std::cerr << "无法加载模型文件!" << std::endl;
        return 1;
    }

    // 配置后端和会话
    MNN::ScheduleConfig config;
    config.type = (MNNForwardType)type;                 
    config.numThread = 4;              // 根据需要设置线程数
    switch (config.type) {
            
        case MNN_FORWARD_CPU:
            std::cout << "当前使用 CPU 后端" << std::endl;
            break;
        case MNN_FORWARD_OPENCL:
            std::cout << "当前使用 OpenCL 后端" << std::endl;
            break;
        case MNN_FORWARD_METAL:
            std::cout << "当前使用 Metal 后端" << std::endl;
            break;
        case MNN_FORWARD_VULKAN:
            std::cout << "当前使用 Vulkan 后端" << std::endl;
            break;
        // 增加其他类型
        default:
            std::cout << "未知后端" << std::endl;
            break;
    }

    MNN::BackendConfig backendConfig;
    backendConfig.precision = MNN::BackendConfig::Precision_Low;
    config.backendConfig = &backendConfig;

    auto session = net->createSession(config);
    
    // 获取输入张量
    auto input_tensor = net->getSessionInput(session, nullptr);

    // 准备输入数据(以224x224的随机数据为例)
    int input_width = 224;
    int input_height = 224;
    int input_channel = 3;

    std::vector<float> input_data(input_width * input_height * input_channel);
    // 生成随机输入数据
    for (auto& v : input_data) {
            
        v = 1/255.0;
    }

    // 将数据拷贝到输入张量
    auto nchw_tensor = MNN::Tensor::create<float>({
            1, input_height, input_width,
                                                   input_channel}, input_data.data(),
                                                  MNN::Tensor::CAFFE);
    input_tensor->copyFromHostTensor(nchw_tensor);
    delete nchw_tensor;

    // 预热
    int warmup_runs = 5;
    for (int i = 0; i < warmup_runs; ++i) {
            
        net->runSession(session);
    }

    // 推理并统计性能
    int infer_runs = 10;
    auto start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < infer_runs; ++i) {
            
        net->runSession(session);
    }
    auto end = std::chrono::high_resolution_clock::now();

    // 计算平均推理时间
    std::chrono::duration<double, std::milli> elapsed = end - start;
    std::cout << "FPS: " << infer_runs/(elapsed.count()/1000) << std::endl;

    // 获取输出张量
    auto output_tensor = net->getSessionOutput(session, nullptr);

    // 拷贝输出数据到主机
    MNN::Tensor output_host(output_tensor, output_tensor->getDimensionType());
    output_tensor->copyToHostTensor(&output_host);

    // 解析输出(根据模型的具体输出格式,这里以分类概率为例)
    auto output_data = output_host.host<float>();
    int output_size = output_host.elementSize();

    // 输出前5个概率值
    std::cout << "模型输出前5个值:" << std::endl;
    for (int i = 0; i < 5 && i < output_size; ++i) {
            
        std::cout << output_data[i] << " ";
    }
    std::cout << std::endl;

    // 释放资源
    net->releaseModel();
    delete net;

    return 0;
}
EOF
g++ -o resnet50_demo resnet50_demo.cpp -I ../include/ -L . -lMNN -std=c++11 -lgomp -lpthread -lOpenCL -Wl,-rpath=./

# CPU
./resnet50_demo resnet50.mnn 0
./resnet50_demo resnet50_quant.mnn 0

# OpenCL
./resnet50_demo resnet50.mnn 3
./resnet50_demo resnet50_quant.mnn 3

# Vulkan
./resnet50_demo resnet50.mnn 7
./resnet50_demo resnet50_quant.mnn 7

3.4.5 推理性能数据分析

测试环境:

输入尺寸:224×224 RGB
Batch Size:1
预热次数:5次
统计次数:10次

设备 FP16 (FPS) INT8 (FPS) 量化收益
CPU 11.48 21.72 +89%
OpenCL 5.40 9.42 +74%
Vulkan 6.80 6.78 -0.3%
© 版权声明
THE END
如果内容对您有所帮助,就支持一下吧!
点赞0 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容