基于深度学习的果蔬识别与定位软件

项目

数据集

课程

比赛

模型库

活动

论坛

访问飞桨官网

项目

数据集

课程

比赛

模型库

活动

论坛

访问飞桨官网

AIStudio774937 发布于2022-08

选题说明

农业作为立国之本，农业的发展关系到中国经济现代化的发展，政府大力推进农业现代化，促进农业发展科技化，政府扶持并大力发展现代科技与农业的结合，具备政策基础。中国农业种植户基数大，软件的主要受众用户庞大。使用该软件可以在果蔬采摘阶段进行有效的降本增效。提高种植收益。该软件在使用场景上涵盖所有的用户，对果蔬识别感兴趣的用户都可以使用该软件，以此了解农业领域相关知识，提高对农业生产领域的认识。催进农业知识科普化。本软件采用 APP 形式与 WEB 形式，涵盖了绝大多数运用场景，用户可以根据自身需要采取不同的智能设备来使用此软件。打破了传统硬件的空间限制，用户可以在任何时间，任何地点使用该软件。

软件概述

在农业生产领域，人工智能技术可以有效降本增效。在果蔬成熟采摘的季节，监测果实的数量和生长位置对统筹安排采摘工作有积极意义。通过人工智能视觉技术可以有效计算果实数量与精确位置。

本软件实现WEB端与APP端的两套可视化界面，均可以通过摄像头以及上传图片进行对果蔬的数量识别以及定位，WEB端还加以实现了实时识别和对数据的统计分析以及可视化。

软件结构

本软件采用传统的软件开发生命周期方法和敏捷开发相结合，采用自顶向下，逐步求精的结构化的软件设计方法。主要开发了Web端和app端。下面将展示本软件的总体流程图和总体结构图。

软件各要素占比如下：

下图为果蔬识别系统总体结构图：

下图为Web端总体流程图：

下图为APP端总体流程图：

模型选型

卷积神经网络是深度学习中一类重要的算法，在图像处理领域，其相对于传统算法来说，能更好的完成局部特征的提取，卷积核提取的特征能够很好地表现图像中像素间的关联性。在图片分类的任务。

在考察各类模型后，我们选择了YOLOv5s轻量级模型作为开发。YOLOv5s具有体积小，相应速度快，精度高的优点。适用于各类物体的识别工作。

模型改进

在本项目中，我们使用了YOLO5s模型作为模板开发，YOLOv5s模型作为YOLOv5系列的轻量级模型，同样也具有采样层较少的特点。我们对其进行了CA注意力机制添加和GhostNet网络使用。下图为CA注意力机制结构图：

GhostNet是华为诺亚方舟实验室在CVPR202提出，它可以使用更少的参数来生成更多特征图。具体来说，GhostNet将深度神经网络中的普通卷积层将分为两部分。

实验结果表明，所提出的Ghost模块能够在保持相似识别性能的同时降低通用卷积层的计算成本，并且GhostNet可以超越MobileNetV3等先进的高效深度模型，在移动设备上进行快速推断。下图位GhostNet网络结构示意图：

经过我们修改，模型的新结构如下：

模型对比

我们对YOLOv5进行了注意力机制的添加及网络的修改后，将修改后的Yolov5与原版yolov5网络进行了对比。我们对两个模型同时使用了相同数据集，相同条件下的训练，得到的对比结果如下：

可看出模型的性能得到了提高

数据集准备

在将数据集制作完成后，我们使用数据增强的方式来继续扩充数据集的数量。数据增强可提高模型的鲁棒性，防止其易在训练中出现过拟合的现象。在本项目中，我们使用了（1）平移变换，（2）长宽扭曲，（3）镜像翻转，（4）色域扭曲，（5）马赛克数据增强等方式扩充了数据。实现的效果如下图所示：

模型训练

经过飞浆平台和近1.2万张张图片的较为复杂数据集训练，得到的权重如下：

使用ONNX封装模型

模型构建完成后，需要将模型部署于flask框架上，并封装成接口便于调用，但是pth模型受限于环境因素和较为庞大的模型结构，并不适用于模型的部署。所以我们需要一个中间表示用于支持和调用训练好的模型。

使用ONNX Runtime推理模型可以很好的解决以上问题，ONNX Runtime 是由微软维护的一个跨平台机器学习推理加速器，该推理模型可将不同的深度学习框架（如Pytorch， MXNet）采用相同格式存储模型数据，并在对应硬件平台上高效运行模型。

使用 ONNX Runtime推理模型，首先我们需将PTH模型转换为ONNX模型。代码如下：

import onnx
import torch

from nets.yolo_orgin import YoloBody
anchors_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
num_classes = 80
phi = 's'
input_shape = [640,640]
#model = YoloBody(anchors_mask, num_classes, phi, backbone='cspdarknet', pretrained=False, input_shape=input_shape)
model = YoloBody(anchors_mask, num_classes, phi, pretrained=False)
dummy_input1 = torch.randn(1, 3, 640 , 640 ,requires_grad=True)
pthfile = "logs/yolov5_s_v6.1.pth"
model_path = "yolo5org.onnx"
simplify = True
input_names = ["input1"]
output_names = ["output1", "output2", "output3"]
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.load_state_dict(torch.load(pthfile, map_location=device))
model = model.eval()
torch.onnx.export(
    model,dummy_input1,f = model_path,
    verbose=False,opset_version=12,training=torch.onnx.TrainingMode.EVAL,
    do_constant_folding=True,
    input_names=input_names,
    output_names=output_names,
    dynamic_axes={
        "input1":{0: 'batch_size', 2 : 'in_width', 3: 'in_height'},
        "output1":{0: 'batch_size', 2: 'out_width', 3:'out_height'},
        "output2":{0: 'batch_size', 2: 'out_width', 3:'out_height'},
        "output3":{0: 'batch_size', 2: 'out_width', 3:'out_height'},})

model_onnx = onnx.load(model_path)  # load onnx model
onnx.checker.check_model(model_onnx)  # check onnx model

print('Onnx model save as {}'.format(model_path))

检测模块优化

传统yolov5s模型在对一些较大的图片中中小目标进行检测的时候，检测的效果不佳。分析原因，是因为：在图片传入模型前，图片需进行压缩处理，较大的图片在进行压缩处理时，压缩的比例较高，容易造成原图上的特征的丢失。

对于如何提高模型对较大图片上的中小目标识别能力，主要集中于模型对图片处理的3个阶段，主要包括：（1）模型预处理阶段。（2）模型阶段。（3）模型后处理阶段。

在模型预处理阶段，我们选择通过构建动态inputshape的方式，实现输入的图片拥有着符合其本身的大小传入模型。

在模型阶段，使用较小步长确实可以提高模型的中小目标检测能力，但是鉴于轻量级模型采样层数量较少，而且只针对小目标优化而放弃了一般模式下的使用体验，也是不明智的考虑，所以我们没有再对模型阶段再进行优化。

在模型后处理阶段，我们可以通过为模型新添加一种检测算法实现小目标优化。切片算法是一种不错的选择。

切割图片相关代码如下：

class Mup_Pic:
    def __init__(self, image_shape, mulpicplus, image, letterbox_image, input_shape_auto):
        self.image_shape = image_shape
        self.mode = mulpicplus
        self.image = image
        self.letterbox_image = letterbox_image
        self.input_shape_auto = input_shape_auto
        self.input_shape = None

    def process_mun_pic(self, ):

        mode = self.mode
        image = self.image
        starpoint = []
        dis_pics = []
        dis_pics_shape = []

        xsz = self.image_shape[1]
        ysz = self.image_shape[0]
        #print(xsz,ysz)
        mulpicplus = int(mode)
        x_smalloccur = int(xsz / mulpicplus * 1.2)
        y_smalloccur = int(ysz / mulpicplus * 1.2)
        #print(x_smalloccur,y_smalloccur)
        mulpicplus = int(mulpicplus)

        for i in range(mulpicplus):
            x_startpoint = int(i * (xsz / mulpicplus))
            for j in range(mulpicplus):
                y_startpoint = int(j * (ysz / mulpicplus))
                x_real = min(x_startpoint + x_smalloccur, xsz)
                y_real = min(y_startpoint + y_smalloccur, ysz)
                cropim = image.crop((x_startpoint, y_startpoint, x_real, y_real))
                #cropim.show()
                dis_pics.append(cropim)
                # ---------------------------------------------------------#
                #   这里将x_star和y_star的坐标倒置安放，是为了在定位的时候能够顺序读取
                # ---------------------------------------------------------#
                starpoint.append([y_startpoint,x_startpoint])
                dis_pics_shape.append([cropim.size[1], cropim.size[0]])

        dis_pics, input_shape = self.dis_pics_process(dis_pics)

        #print(input_shape)

        return starpoint, dis_pics, dis_pics_shape, input_shape

    # =====================================================#
    #   加工切割的图片为numpy数组，并分配至连续内存上，用于减少推理时间
    # =====================================================#
    def dis_pics_process(self, dis_pics):
        pics_np = []
        mode = int(self.mode) ** 2
        input_shape = [416, 416]
        # =====================================================#
        #   将分割的图片中最大尺寸的size作为inputshape缩放的参数
        # =====================================================#
        if self.input_shape_auto == True:
            shape = []
            for i in range(mode):
                shape.append(dis_pics[i].size)
            max_size = max(shape)
            input_shape = auto_input(max_size)
        print("mulinput_shape:",input_shape)
        for i in range(len(dis_pics)):
            image_data = resize_image(dis_pics[i], (input_shape[1], input_shape[0]), self.letterbox_image)
            image_data = np.expand_dims(
                np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
            pics_np.append(image_data)
        return  np.ascontiguousarray(pics_np),input_shape


    @property
    def imageshow(self):
        print(type(self.image))
        return self.image.show()

图片可缩放至适合自己大小后，再传入模型的代码如下：

def auto_input(imgsize):
    image_shape = imgsize
    maxsize = max(image_shape)
    maxsize = int(ceil((maxsize / 1.2) / 32))
    input_num = maxsize * 32
    if input_num <= 640:
        input_shape = [640, 640]
    elif 640 < input_num <= 1280:
        input_shape = [input_num, input_num]
    else:
        input_shape = [1280, 1280]
    print(input_shape)
    return input_shape

模型进行图片切片检测的代码如下：

        org_image = image
        mup_pic = Mup_Pic(image_shape, mulpicplus, org_image, letterbox_image, mulinput_shape_auto)
        starpoint, dis_pics, dis_pics_shape, muinput_shape = mup_pic.process_mun_pic()

        bbox_util = DecodeBox(anchors, num_classes, (muinput_shape[0], muinput_shape[1]), anchors_mask)

        for i in range(len(dis_pics)):
            outputs = session.run(outname, {inname: dis_pics[i]})
            outputs = outputs_process(outputs)
            outputs = bbox_util.decode_box(outputs)
            prediction, un_lab = bbox_util.confidence_screening(torch.cat(outputs, 1), num_classes, confidence)
            # ---------------------------------------------------------#
            #   将输出结果进行拼接
            # ---------------------------------------------------------#
            if prediction == None:
                continue
            bbox_util.position_adj(prediction, dis_pics_shape[i], starpoint[i], letterbox_image, muinput_shape)
            if con_outputs == None and unique_labels == None:
                con_outputs = prediction
                unique_labels = un_lab
            else:
                con_outputs = torch.cat((con_outputs, prediction), 0)
                unique_labels = torch.cat((unique_labels,un_lab), 0).unique()

        results = bbox_util.nsm(con_outputs, unique_labels, nms_iou)

切片图片输出后，需将小图片中果蔬的位置信息基于原图位置进行调整。相关代码如下所示：

 def position_adj(self,prediction,dis_pics_shape,starpoint,letterbox_image,muinput_shape):
        #print(starpoint)
        prediction = prediction.cpu().numpy()
        #print(con_outputs[...,:4])
        box_xy, box_wh = (prediction[:, 0:2] + prediction[:, 2:4]) / 2, prediction[:, 2:4] - prediction[:, 0:2]
        #print(type(box_wh))
        prediction[:, :4] = self.yolo_correct_boxes(box_xy, box_wh, input_shape=muinput_shape,
                                                     image_shape = dis_pics_shape
                                                    , letterbox_image =letterbox_image)
        prediction[..., 0] = prediction[..., 0] + starpoint[0]
        prediction[..., 1] = prediction[..., 1] + starpoint[1]
        prediction[..., 2] = prediction[..., 2] + starpoint[0]
        prediction[..., 3] = prediction[..., 3] + starpoint[1]

        return prediction

原模型，改进后的模型，和使用分割算法检测的模型在对小目标识别方面的对比如下，可看出，取得了较大的性能提升。

使用flask框架完成模型部署

使用Flask框架将推理后的模型封装，通过接口的方式，可实现在不同编译的环境下均可对模型调用。接口调用的方式有利于降低程序的耦合性，让模型和后端程序在自己的环境下分别运行，互不冲突。待后端调用该接口后，模型将返回相应的目标数据和处理后的base64图片流与josnify文件。相关模块封装代码如下所示

切片识别代码：

@app.route("/photo/mupload", methods=['POST'])
def muploads():
    # ---------------------------------------------------------#
    #   请求一个文件（注：没有验证该文件的类型，请确认为图片）
    #   自动调整input_shape大小，也可以直接设置为固定参数
    #   模型返回处理后的图片和onnx_data字典
    #   将处理后的图片进行base64处理并加入字典
    #   返回经过josnify打包后的字典
    # ---------------------------------------------------------#
    file = request.files['file']
    print(file.filename)
    image = Image.open(file)
    t1 = time.time()
    img_out,onnx_data = detect_image(image, mulpicplus = "2")
    t2 = time.time()
    print("预测时间：",t2-t1)
    onnx_data["ZZZ_image"] = str(image_to_base64(img_out))

    return jsonify(onnx_data)

普通识别代码：

@app.route("/photo/upload", methods=['POST'])
def siguploads():
    # ---------------------------------------------------------#
    #   请求一个文件（注：没有验证该文件的类型，请确认为图片）
    #   自动调整input_shape大小，也可以直接设置为固定参数
    #   模型返回处理后的图片和onnx_data字典
    #   将处理后的图片进行base64处理并加入字典
    #   返回经过josnify打包后的字典
    # ---------------------------------------------------------#
    file = request.files['file']
    print(file.filename)
    image = Image.open(file)
    t1 = time.time()
    img_out,onnx_data = detect_image(image, mulpicplus = "1")
    t2 = time.time()
    onnx_data["ZZZ_image"] = str(image_to_base64(img_out))
    print("总时间：", t2 - t1)
    return jsonify(onnx_data)

实时监控识别代码：

def gen(camera):
    while True:
        frame = camera.get_frame()
        yield (b'--frame\r\n'
               b'Content-Type: image/jpeg\r\n\r\n' + frame + b'\r\n\r\n')


class VideoCamera(object):
    def __init__(self):
        #==========================================================#
        #   访问相机
        #==========================================================#
        self.video = cv2.VideoCapture(0)
        self.fps  = 0.0

    def __del__(self):
        self.video.release()

    def get_frame(self):
        y0, dy = 50, 25
        t1 = time.time()
        success, image = self.video.read()
        frame = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # 视频RGB转换
        frame = cv2.flip(frame, 1)  # 镜像翻转
        frame = Image.fromarray(np.uint8(frame))
        #detect_image(frame,"1")
        data = detect_image(frame, "1",mod = "video")


        frame = np.array(frame)
        frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)

        for i, txt in enumerate(str(data).split('\n')):
            y = y0 + i * dy
            cv2.putText(frame, txt, (470, y), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2)
        #frame = cv2.putText(frame, str(data), (0, 20), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
        ret, frame = cv2.imencode('.jpg', frame)
        return frame.tobytes()

#相机喂流
@app.route('/video_feed')
def video_feed():
    return Response(gen(VideoCamera()),
                    mimetype='multipart/x-mixed-replace; boundary=frame')

模型调用

后端可通过HTTP的形式使用模型的接口。代码如下：

/**
 * @Description 图片上传(快速模式)
 * @param multipartFile
 * @return identify表
 */
@PostMapping("/fast")
public Result uploadPic(@RequestParam("file") MultipartFile multipartFile){
    String originalFileName=multipartFile.getOriginalFilename();
    String uuid= IdUtil.fastSimpleUUID();
    //部署平台判断
    String os = System.getProperty("os.name");
    if(os.toLowerCase().startsWith("win")){
        rootFileName=winPath+"\\before\\"+uuid+originalFileName;
    }else{
        rootFileName=linuxPath+"/before/"+uuid+originalFileName;
    }
    System.out.println(rootFileName);
    try {
        FileUtil.writeBytes(multipartFile.getBytes(),rootFileName);
    } catch (IOException e) {
        e.printStackTrace();
    }
    String beforeUrl=ip+"image/"+uuid+originalFileName;
    //模型调用
    HashMap paramMap = new HashMap<>();
    paramMap.put("file", FileUtil.file(rootFileName));
    String res = HttpUtil.post("http://localhost:5000/photo/upload", paramMap);

    Identify dto= null;
    try {
        dto = JsonChange.Change(res);
    } catch (IOException e) {
        e.printStackTrace();
    }
    dto.setBeforeUrl(beforeUrl);
    Identify finalDto = dto;
    Thread thread = new Thread(() -> {
        identifyService.insert(finalDto);
    });
    thread.start();
    Result result=new Result();
    result.setCode("200");
    result.setMessage("成功");
    result.setData(finalDto);
    return result;
}

项目展示