求助帖：图片训练时有损坏的文件报错，求解决办法

项目

数据集

课程

比赛

模型库

活动

论坛

访问飞桨官网

项目

数据集

课程

比赛

模型库

活动

论坛

访问飞桨官网

琦琦的小老鼠发布于2021-03

我在训练一个2万张图片的数据集，在训练执行中发现报错，求教解决办法

The loss value printed in the log is the current step, and the metric is the average value of previous step.
Epoch 1/100

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
return (isinstance(seq, collections.Sequence) and
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:648: UserWarning: When training, we now always track global mean and variance.
"When training, we now always track global mean and variance.")

step 60/229 [======>.......................] - loss: 3.1190 - acc_top1: 0.0799 - acc_top5: 0.2940 - ETA: 20:21 - 7s/ste
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:770: UserWarning: Possibly corrupt EXIF data. Expecting to read 23836229632 bytes but only got 0. Skipping tag 0
" Skipping tag %s" % (size, len(data), tag)

step 229/229 [==============================] - loss: 3.1398 - acc_top1: 0.0979 - acc_top5: 0.3521 - 7s/step
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 30/58 [==============>...............] - loss: 3.5679 - acc_top1: 0.0578 - acc_top5: 0.3641 - ETA: 3:18 - 7s/st
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:788: UserWarning: Corrupt EXIF data. Expecting to read 12 bytes but only got 6.
warnings.warn(str(msg))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/PIL/JpegImagePlugin.py:793: UserWarning: Image appears to be a malformed MPO file, it will be interpreted as a base JPEG file
"Image appears to be a malformed MPO file, it will be "

TowerNet

已解决

37# 回复于2021-12

可以通过EasyData来解决

全部评论(35)

琦琦的小老鼠

#2 回复于2021-03

不能沉，不能沉

AIStudio810258

#3 回复于2021-03

清洗数据

琦琦的小老鼠

#4 回复于2021-03

AIStudio810258 #3

清洗数据

大佬，这个清洗数据定义到哪一层

乌拉__----

#5 回复于2021-03

cv2. imread()循环读取所有图片试试

可能原因在于有些图片可以正常看，但是读取时rbg编码有问题

乌拉__----

#6 回复于2021-03

乌拉__---- #5

cv2. imread()循环读取所有图片试试可能原因在于有些图片可以正常看，但是读取时rbg编码有问题

这种图片，mpimg.imread()可以正常读取，但是cv2. imread()读取就是none，根据路径就可以找到坏图片了

AIStudio810259

#7 回复于2021-03

try处理下呗。

琦琦的小老鼠

#8 回复于2021-03

乌拉__---- #6

这种图片，mpimg.imread()可以正常读取，但是cv2. imread()读取就是none，根据路径就可以找到坏图片了

老大，小代码有些不会写

path=""
while True:
    path = input(prompt)
    if path == "quit":
        break
    else:
        img = cv2.imread(path)
        cv2.namedWindow('img',0)
        cv2.imshow('img',img)
        cv2.waitKey()

琦琦的小老鼠

#9 回复于2021-03

import os
import cv2
import shutil

dirName = 'E:\PaddleClas-release-2.0\dataset\cat_12\cat_12_train'
# 将dirName路径下的所有文件路径全部存入all_path列表
all_path = []
for root, dirs, files in os.walk(dirName):
for file in files:
if "jpg" in file:
all_path.append(os.path.join(root, file))
all_path.sort()

bad = []
# 坏图片存放路径
badpath = 'E:\PaddleClas-release-2.0\dataset\\bad'

for i in range(len(all_path)):
org = all_path[i]
# print(all_path[i].split('/')[-1])
try:
img = cv2.imread(org)
ss = img.shape
except:
bad.append(all_path[i])
shutil.move(all_path[i],badpath)
continue

print('共有%s张坏图'%(len(bad)))
print(bad)

乌拉__----

#10 回复于2021-03

琦琦的小老鼠 #9

import os import cv2 import shutil dirName = 'E:\PaddleClas-release-2.0\dataset\cat_12\cat_12_train' # 将dirName路径下的所有文件路径全部存入all_path列表 all_path = [] for root, dirs, files in os.walk(dirName): for file in files: if "jpg" in file: all_path.append(os.path.join(root, file)) all_path.sort() bad = [] # 坏图片存放路径 badpath = 'E:\PaddleClas-release-2.0\dataset\\bad' for i in range(len(all_path)): org = all_path[i] # print(all_path[i].split('/')[-1]) try: img = cv2.imread(org) ss = img.shape except: bad.append(all_path[i]) shutil.move(all_path[i],badpath) continue print('共有%s张坏图'%(len(bad))) print(bad)

展开

我说咋代码看着眼熟呢，之前写过

https://aistudio.baidu.com/aistudio/projectdetail/1133588

乌拉__----

#11 回复于2021-03

AIStudio810258 #3

清洗数据

哈哈哈

琦琦的小老鼠

#12 回复于2021-03

乌拉__---- #10

我说咋代码看着眼熟呢，之前写过 https://aistudio.baidu.com/aistudio/projectdetail/1133588

是大佬写的啊，啊哈哈哈

AIStudio810258

#13 回复于2021-03

琦琦的小老鼠 #4

大佬，这个清洗数据定义到哪一层

属于数据预处理，可以在读取时进行

AIStudio810258

#14 回复于2021-03

琦琦的小老鼠 #12

是大佬写的啊，啊哈哈哈

作者负责～～～～～～

七年期限

#15 回复于2021-03

AIStudio810258 #3

清洗数据

我记得跟坤哥讨论过这个问题

琦琦的小老鼠

#16 回复于2021-03

我用了这个代码把数据集里的图片都跑了一遍，然后启动程序后还会报错，请各位支支招。我的数据集文件夹是这样的

总文件夹---a类

-b类

-c类

import os
import cv2
import shutil

dirName = '/home/aistudio/work/dataset'
# 将dirName路径下的所有文件路径全部存入all_path列表
all_path = []
for root, dirs, files in os.walk(dirName):
        for file in files:
            if "jpeg" in file:
                    all_path.append(os.path.join(root, file))
all_path.sort()

bad = []
# 坏图片存放路径
badpath = '/home/aistudio/bad'

for i in range(len(all_path)):
    org = all_path[i]
    # print(all_path[i].split('/')[-1])
    try:
        img = cv2.imread(org)
        ss = img.shape
    except:
        bad.append(all_path[i])
        shutil.move(all_path[i],badpath)
        continue

print('共有%s张坏图'%(len(bad)))
print(bad)

AIStudio810258

#17 回复于2021-03

我想起一个坑了，看看是否有黑白图片

AIStudio810258

#18 回复于2021-03

有些api读黑白图片是三通道和rgb一样，有的就直接单通道了

AIStudio810258

#19 回复于2021-03

我记得我用matplotlib读就出错，用cv2就好了

TowerNet

#20 回复于2021-03

AIStudio810258 #17

我想起一个坑了，看看是否有黑白图片

这种代码怎么写呢，大佬

七年期限

#21 回复于2021-03

AIStudio810258 #17

我想起一个坑了，看看是否有黑白图片

这个也有影响吗？