用 PaddleOCR 进行集装箱箱号检测识别

项目

数据集

课程

比赛

模型库

活动

论坛

访问飞桨官网

项目

数据集

课程

比赛

模型库

活动

论坛

访问飞桨官网

HyperAI超神经发布于2022-11

基于PaddleOCR的集装箱箱号检测识别
一、项目介绍
集装箱号是指装运出口货物集装箱的箱号，填写托运单时必填此项。标准箱号构成基本概念：采用ISO6346（1995）标准

标准集装箱箱号由11位编码组成，如：CBHU 123456 7，包括三个部分：

第一部分由4位英文字母组成。前三位代码主要说明箱主、经营人，第四位代码说明集装箱的类型。列如CBHU 开头的标准集装箱是表明箱主和经营人为中远集运
第二部分由6位数字组成。是箱体注册码，用于一个集装箱箱体持有的唯一标识
第三部分为校验码由前4位字母和6位数字经过校验规则运算得到，用于识别在校验时是否发生错误。即第11位编号
本教程基于PaddleOCR进行集装箱箱号检测识别任务，使用少量数据分别训练检测、识别模型，最后将他们串联在一起实现集装箱箱号检测识别的任务

二、环境准备
1、在 openbayes 启动一个「模型训练」的容器，环境选择 paddlepaddle-2.3，资源选择 vgpu 或其他 gpu 容器

2、在 Jupyter 中依次执行如下命令：
进入PaddleOCR-release-2.5文件夹

cd PaddleOCR-release-2.5

/output/PaddleOCR-release-2.5

安装PaddleOCR

!pip install -r requirements.txt #安装PaddleOCR所需依赖

安装完毕返回上层文件夹

cd ..

/output

三、数据集介绍
本教程所使用的集装箱箱号数据集，该数据包含3003张分辨率为1920×1080的集装箱图像

1、PaddleOCR检测模型训练标注规则如下，中间用"\t"分隔：

" 图像文件名                    json.dumps编码的图像标注信息"
ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]

其中json.dumps编码前的图像标注信息是包含多个字典的list，字典中的 points 表示文本框的四个点的坐标(x, y)，从左上角的点开始顺时针排列。 transcription 表示当前文本框的文字，当其内容为“###”时，表示该文本框无效，在训练时会跳过。

2、PaddleOCR识别模型训练标注规则如下，中间用"\t"分隔：

" 图像文件名                 图像标注信息 "

train_data/rec/train/word_001.jpg   简单可依赖
train_data/rec/train/word_002.jpg   用科技让复杂的世界更简单

四、数据整理
4.1 检测模型所需数据准备
将数据集3000张图片按2:1划分成训练集和验证集，运行以下代码

from tqdm import tqdm
finename = "all_label.txt"
f = open(finename)
lines = f.readlines() 
t = open('det_train_label.txt','w')
v = open('det_eval_label.txt','w')
count = 0
for line in tqdm(lines):
    if count < 2000:
        t.writelines(line)
        count += 1
    else:
        v.writelines(line)
f.close()
t.close()
v.close()

100%|██████████| 3003/3003 [00:00<00:00, 37908.32it/s]

4.2 识别模型所需数据准备
我们根据检测部分的注释，裁剪数据集尽可能只包含文字部分图片作为识别的数据，运行以下代码

from PIL import Image
import json
from tqdm import tqdm
import os
import numpy as np
import cv2
import math

from PIL import Image, ImageDraw

class Rotate(object):

    def __init__(self, image: Image.Image, coordinate):
        self.image = image.convert('RGB')
        self.coordinate = coordinate
        self.xy = [tuple(self.coordinate[k]) for k in ['left_top', 'right_top', 'right_bottom', 'left_bottom']]
        self._mask = None
        self.image.putalpha(self.mask)

    @property
    def mask(self):
        if not self._mask:
            mask = Image.new('L', self.image.size, 0)
            draw = ImageDraw.Draw(mask, 'L')
            draw.polygon(self.xy, fill=255)
            self._mask = mask
        return self._mask

    def run(self):
        image = self.rotation_angle()
        box = image.getbbox()
        return image.crop(box)

    def rotation_angle(self):
        x1, y1 = self.xy[0]
        x2, y2 = self.xy[1]
        angle = self.angle([x1, y1, x2, y2], [0, 0, 10, 0]) * -1
        return self.image.rotate(angle, expand=True)

    def angle(self, v1, v2):
        dx1 = v1[2] - v1[0]
        dy1 = v1[3] - v1[1]
        dx2 = v2[2] - v2[0]
        dy2 = v2[3] - v2[1]
        angle1 = math.atan2(dy1, dx1)
        angle1 = int(angle1 * 180 / math.pi)
        angle2 = math.atan2(dy2, dx2)
        angle2 = int(angle2 * 180 / math.pi)
        if angle1 * angle2 >= 0:
            included_angle = abs(angle1 - angle2)
        else:
            included_angle = abs(angle1) + abs(angle2)
            if included_angle > 180:
                included_angle = 360 - included_angle
        return included_angle



def image_cut_save(path, bbox, save_path):
    """
    :param path: 图片路径
    :param left: 区块左上角位置的像素点离图片左边界的距离
    :param upper：区块左上角位置的像素点离图片上边界的距离
    :param right：区块右下角位置的像素点离图片左边界的距离
    :param lower：区块右下角位置的像素点离图片上边界的距离
    """
    img_width  = 1920
    img_height = 1080
    img = Image.open(path)
    coordinate = {'left_top': bbox[0], 'right_top': bbox[1], 'right_bottom': bbox[2], 'left_bottom': bbox[3]}
    rotate = Rotate(img, coordinate)
    
    left, upper = bbox[0]
    right, lower = bbox[2]
    if lower-upper > right-left:
        rotate.run().convert('RGB').transpose(Image.ROTATE_90).save(save_path)
    else:
        rotate.run().convert('RGB').save(save_path)
    return True

#读取检测标注制作识别数据集
files = ["det_train_label.txt","det_eval_label.txt"]
filetypes =["train","eval"]
for index,filename in enumerate(files):
    f = open(filename)
    l = open('rec_'+filetypes[index]+'_label.txt','w')
    if index == 0:
        data_dir = "RecTrainData"
    else:
        data_dir = "RecEvalData"
    if not os.path.exists(data_dir):
        os.mkdir(data_dir)
    lines = f.readlines() 
    for line in tqdm(lines):
        image_name = line.split("\t")[0].split("/")[-1]
        annos = json.loads(line.split("\t")[-1])
        img_path = os.path.join("/input0/images",image_name)
        for i,anno in enumerate(annos):
            data_path = os.path.join(data_dir,str(i)+"_"+image_name)
            if image_cut_save(img_path,anno["points"],data_path):
                l.writelines(str(i)+"_"+image_name+"\t"+anno["transcription"]+"\n")
    l.close()
    f.close()

100%|██████████| 2000/2000 [03:48<00:00,  8.76it/s]
100%|██████████| 1003/1003 [01:53<00:00,  8.86it/s]

五、实验
由于数据集比较少，为了模型更好和更快的收敛，这里选用 PaddleOCR 中的 PP-OCRv3 模型进行检测和识别。PP-OCRv3在PP-OCRv2的基础上，中文场景端到端Hmean指标相比于PP-OCRv2提升5%, 英文数字模型端到端效果提升11%。详细优化细节请参考PP-OCRv3技术报告。
5.1 检测模型
5.1.1 检测模型配置
PaddleOCR提供了许多检测模型，在路径PaddleOCR-release-2.5/configs/det下可找到模型及其配置文件。如我们选用模型ch_PP-OCRv3_det_student.yml，其配置文件路径在：PaddleOCR-release-2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml。使用前需对其进行必要的设置，如训练参数、数据集路径等。将部分关键配置展示如下：

#关键训练参数
use_gpu: true #是否使用显卡
epoch_num: 1200 #训练epoch个数
save_model_dir: ./output/ch_PP-OCR_V3_det/ #模型保存路径
save_epoch_step: 200 #每训练200epoch，保存一次模型
eval_batch_step: [0, 100] #训练每迭代100次，进行一次验证
pretrained_model: ./PaddleOCR-release
2.5/pretrain_models/ch_PP-OCR_V3_det/best_accuracy.pdparams #预训练模型路径
#训练集路径设置
Train:
  dataset:
    name: SimpleDataSet
    data_dir: /input0/images #图片文件夹路径
    label_file_list:
      - ./det_train_label.txt #标签路径

5.1.2 模型微调
在notebook中运行如下命令对模型进行微调，其中 -c 传入的为配置好的模型文件路径

%run PaddleOCR-release-2.5/tools/train.py \
    -c PaddleOCR-release-2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml

使用默认超参数，模型ch_PP-OCRv3_det_student在训练集上训练385个epoch后，模型在验证集上的hmean达到：96.96%，此后再无明显增长

[2022/10/11 06:36:09] ppocr INFO: best metric, hmean: 0.969551282051282, precision: 0.9577836411609498,
recall: 0.981611681990265, fps: 20.347745459258228, best_epoch: 385

5.2 识别模型
5.2.1 识别模型配置
PaddleOCR也提供了许多识别模型，在路径PaddleOCR-release-2.5/configs/rec下可找到模型及其配置文件。如我们选用模型ch_PP-OCRv3_rec_distillation，其配置文件路径在：PaddleOCR-release-2.5/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml。使用前需对其进行必要的设置，如训练参数、数据集路径等。将部分关键配置展示如下：

#关键训练参数
use_gpu: true #是否使用显卡
epoch_num: 1200 #训练epoch个数
save_model_dir: ./output/rec_ppocr_v3_distillation #模型保存路径
save_epoch_step: 200 #每训练200epoch，保存一次模型
eval_batch_step: [0, 100] #训练每迭代100次，进行一次验证
pretrained_model: ./PaddleOCR-release-2.5/pretrain_models/PPOCRv3/best_accuracy.pdparams #预训练模型路径
#训练集路径设置
Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./RecTrainData/ #图片文件夹路径
    label_file_list:
      - ./rec_train_label.txt #标签路径

5.2.2 模型微调
在notebook中运行如下命令对模型进行微调，其中 -c 传入的为配置好的模型文件路径

%run PaddleOCR-release-2.5/tools/train.py \
    -c PaddleOCR-release-2.5/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml

使用默认超参数，模型ch_PP-OCRv3_rec_distillation在训练集上训练136个epoch后，模型在验证集上的精度达到：96.11%，此后再无明显增长

[2022/10/11 20:04:28] ppocr INFO: best metric, acc: 0.9610600272522444, norm_edit_dis: 0.9927426548965615,
Teacher_acc: 0.9540291998159589, Teacher_norm_edit_dis: 0.9905629345025616, fps: 246.029195787707, best_epoch: 136

六、结果展示
6.1 检测模型推理
在notebook中运行如下命令使用微调过的模型检测测试图片中的文字，其中： Global.infer_img 为图片路径或图片文件夹路径， Global.pretrained_model 为微调过的模型， Global.save_res_path 为推理结果保存路径

%run PaddleOCR-release-2.5/tools/infer_det.py \
    -c PaddleOCR-release-2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
    -o Global.infer_img="/input0/images" Global.pretrained_model="./output/ch_PP-OCR_V3_det/best_accuracy" Global.save_res_path="./output/det_infer_res/predicts.txt"

6.2 识别模型推理
在notebook中运行如下命令使用微调过的模型检测测试图片中的文字，其中： Global.infer_img 为图片路径或图片文件夹路径， Global.pretrained_model 为微调过的模型， Global.save_res_path 为推理结果保存路径

%run PaddleOCR-release-2.5/tools/infer_rec.py \
    -c PaddleOCR-release-2.5/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml \
    -o Global.infer_img="./RecEvalData/" Global.pretrained_model="./output/rec_ppocr_v3_distillation/best_accuracy" Global.save_res_path="./output/rec_infer_res/predicts.txt"

6.3 检测识别模型串联推理
6.3.1 模型转换
在串联推理前首先需要将训练保存的模型转换成推理模型，分别执行如下检测命令即可。其中，-c传入要转换模型的配置文件路径，-o Global.pretrained_model为要被转换的模型文件，Global.save_inference_dir为转换得到推理模型的储存路径

# 检测模型转换
%run PaddleOCR-release-2.5/tools/export_model.py \
-c PaddleOCR-release-2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
-o Global.pretrained_model="./output/ch_PP-OCR_V3_det/best_accuracy" Global.save_inference_dir="./output/det_inference/"

W1011 07:10:20.363173   544 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.4, Runtime API Version: 11.1
W1011 07:10:20.366801   544 gpu_context.cc:306] device: 0, cuDNN Version: 8.0.
W1011 07:10:22.629678   544 gpu_context.cc:506] WARNING: device: . The installed Paddle is compiled with CUDNN 8.1, but CUDNN version in your machine is 8.0, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.

[2022/10/11 07:10:24] ppocr INFO: load pretrain successful from ./output/ch_PP-OCR_V3_det/best_accuracy
[2022/10/11 07:10:27] ppocr INFO: inference model is saved to ./output/det_inference/inference

# 识别模型转换
%run PaddleOCR-release-2.5/tools/export_model.py \
-c PaddleOCR-release-2.5/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml \
-o Global.pretrained_model="./output/rec_ppocr_v3_distillation/best_accuracy" Global.save_inference_dir="./output/rec_inference/"

W1108 13:21:27.524312   937 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.4, Runtime API Version: 11.1
W1108 13:21:27.527870   937 gpu_context.cc:306] device: 0, cuDNN Version: 8.0.
W1108 13:21:29.481864   937 gpu_context.cc:506] WARNING: device: . The installed Paddle is compiled with CUDNN 8.1, but CUDNN version in your machine is 8.0, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.

[2022/11/08 13:21:31] ppocr INFO: load pretrain successful from ./output/rec_ppocr_v3_distillation/best_accuracy
[2022/11/08 13:21:32] ppocr INFO: inference model is saved to ./output/rec_inference/Teacher/inference
[2022/11/08 13:21:34] ppocr INFO: inference model is saved to ./output/rec_inference/Student/inference

6.3.2 模型串联推理
转换完毕后，PaddleOCR提供了检测和识别模型的串联工具，可以将训练好的任一检测模型和任一识别模型串联成两阶段的文本识别系统。输入图像经过文本检测、检测框矫正、文本识别、得分过滤四个主要阶段输出文本位置和识别结果。执行代码如下,其中image_dir为单张图像或者图像集合的路径，det_model_dir为检测inference模型的路径，rec_model_dir为识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。

%run PaddleOCR-release-2.5/tools/infer/predict_system.py \
--image_dir="OCRTest" \
--det_model_dir="./output/det_inference/" \
--rec_model_dir="./output/rec_inference/Student/"

[2022/11/08 13:23:46] ppocr INFO: In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320
[2022/11/08 13:23:46] ppocr DEBUG: dt_boxes num : 2, elapse : 0.6137089729309082
[2022/11/08 13:23:46] ppocr DEBUG: rec_res num  : 2, elapse : 0.016110897064208984
[2022/11/08 13:23:46] ppocr DEBUG: 0  Predict time of OCRTest/1-122700001-OCR-LF-C01.jpg: 0.638s
[2022/11/08 13:23:46] ppocr DEBUG: TTEMU3108252, 0.879
[2022/11/08 13:23:46] ppocr DEBUG: 22G1, 0.957
[2022/11/08 13:23:46] ppocr DEBUG: The visualized image saved in ./inference_results/1-122700001-OCR-LF-C01.jpg
[2022/11/08 13:23:46] ppocr DEBUG: dt_boxes num : 1, elapse : 0.029441118240356445
[2022/11/08 13:23:46] ppocr DEBUG: rec_res num  : 1, elapse : 0.011059045791625977
[2022/11/08 13:23:46] ppocr DEBUG: 1  Predict time of OCRTest/1-122720001-OCR-AH-A01.jpg: 0.047s
[2022/11/08 13:23:46] ppocr DEBUG: ITU1786393, 0.996
[2022/11/08 13:23:47] ppocr DEBUG: The visualized image saved in ./inference_results/1-122720001-OCR-AH-A01.jpg
[2022/11/08 13:23:47] ppocr DEBUG: dt_boxes num : 2, elapse : 0.03130960464477539
[2022/11/08 13:23:47] ppocr DEBUG: rec_res num  : 2, elapse : 0.012927055358886719
[2022/11/08 13:23:47] ppocr DEBUG: 2  Predict time of OCRTest/1-122720001-OCR-AS-B01.jpg: 0.051s
[2022/11/08 13:23:47] ppocr DEBUG: EITU1786393, 0.991
[2022/11/08 13:23:47] ppocr DEBUG: 45G1, 0.982
[2022/11/08 13:23:47] ppocr DEBUG: The visualized image saved in ./inference_results/1-122720001-OCR-AS-B01.jpg
[2022/11/08 13:23:47] ppocr DEBUG: dt_boxes num : 2, elapse : 0.030177831649780273
[2022/11/08 13:23:47] ppocr DEBUG: rec_res num  : 2, elapse : 0.01282501220703125
[2022/11/08 13:23:47] ppocr DEBUG: 3  Predict time of OCRTest/1-122720001-OCR-LB-C02.jpg: 0.048s
[2022/11/08 13:23:47] ppocr DEBUG: TU1, 0.781
[2022/11/08 13:23:47] ppocr DEBUG: 45G1, 0.997
[2022/11/08 13:23:47] ppocr DEBUG: The visualized image saved in ./inference_results/1-122720001-OCR-LB-C02.jpg
[2022/11/08 13:23:47] ppocr DEBUG: dt_boxes num : 2, elapse : 0.032221078872680664
[2022/11/08 13:23:47] ppocr DEBUG: rec_res num  : 2, elapse : 0.012808084487915039
[2022/11/08 13:23:47] ppocr DEBUG: 4  Predict time of OCRTest/1-122720001-OCR-RF-D01.jpg: 0.052s
[2022/11/08 13:23:47] ppocr DEBUG: EITU1786393, 0.877
[2022/11/08 13:23:47] ppocr DEBUG: 45G1, 0.982
[2022/11/08 13:23:47] ppocr DEBUG: The visualized image saved in ./inference_results/1-122720001-OCR-RF-D01.jpg
[2022/11/08 13:23:47] ppocr DEBUG: dt_boxes num : 0, elapse : 0.033019065856933594
[2022/11/08 13:23:47] ppocr DEBUG: rec_res num  : 0, elapse : 1.1920928955078125e-06
[2022/11/08 13:23:47] ppocr DEBUG: 5  Predict time of OCRTest/1-122728001-OCR-AH-A01.jpg: 0.037s
[2022/11/08 13:23:47] ppocr DEBUG: The visualized image saved in ./inference_results/1-122728001-OCR-AH-A01.jpg
[2022/11/08 13:23:47] ppocr INFO: The predict total time is 1.8189082145690918

提issue

需求/bug反馈？一键提issue告诉我们

提pr

发现bug？如果您知道修复办法，欢迎提pr直接参与建设飞桨~