我在训练一个2万张图片的数据集,在训练执行中发现报错,求教解决办法
The loss value printed in the log is the current step, and the metric is the average value of previous step.
Epoch 1/100
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
return (isinstance(seq, collections.Sequence) and
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:648: UserWarning: When training, we now always track global mean and variance.
"When training, we now always track global mean and variance.")
step 60/229 [======>.......................] - loss: 3.1190 - acc_top1: 0.0799 - acc_top5: 0.2940 - ETA: 20:21 - 7s/ste
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:770: UserWarning: Possibly corrupt EXIF data. Expecting to read 23836229632 bytes but only got 0. Skipping tag 0
" Skipping tag %s" % (size, len(data), tag)
step 229/229 [==============================] - loss: 3.1398 - acc_top1: 0.0979 - acc_top5: 0.3521 - 7s/step
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 30/58 [==============>...............] - loss: 3.5679 - acc_top1: 0.0578 - acc_top5: 0.3641 - ETA: 3:18 - 7s/st
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:788: UserWarning: Corrupt EXIF data. Expecting to read 12 bytes but only got 6.
warnings.warn(str(msg))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/PIL/JpegImagePlugin.py:793: UserWarning: Image appears to be a malformed MPO file, it will be interpreted as a base JPEG file
"Image appears to be a malformed MPO file, it will be "
不能沉,不能沉
清洗数据
大佬,这个清洗数据定义到哪一层
cv2. imread()循环读取所有图片试试
可能原因在于有些图片可以正常看,但是读取时rbg编码有问题
这种图片,mpimg.imread()可以正常读取,但是cv2. imread()读取就是none,根据路径就可以找到坏图片了
try处理下呗。
老大,小代码有些不会写
path=""
while True:
path = input(prompt)
if path == "quit":
break
else:
img = cv2.imread(path)
cv2.namedWindow('img',0)
cv2.imshow('img',img)
cv2.waitKey()
import os
import cv2
import shutil
dirName = 'E:\PaddleClas-release-2.0\dataset\cat_12\cat_12_train'
# 将dirName路径下的所有文件路径全部存入all_path列表
all_path = []
for root, dirs, files in os.walk(dirName):
for file in files:
if "jpg" in file:
all_path.append(os.path.join(root, file))
all_path.sort()
bad = []
# 坏图片存放路径
badpath = 'E:\PaddleClas-release-2.0\dataset\\bad'
for i in range(len(all_path)):
org = all_path[i]
# print(all_path[i].split('/')[-1])
try:
img = cv2.imread(org)
ss = img.shape
except:
bad.append(all_path[i])
shutil.move(all_path[i],badpath)
continue
print('共有%s张坏图'%(len(bad)))
print(bad)
我说咋代码看着眼熟呢,之前写过
https://aistudio.baidu.com/aistudio/projectdetail/1133588
哈哈哈
是大佬写的啊,啊哈哈哈
属于数据预处理,可以在读取时进行
作者负责~~~~~~
我记得跟坤哥讨论过这个问题
我用了这个代码把数据集里的图片都跑了一遍,然后启动程序后还会报错,请各位支支招。我的数据集文件夹是这样的
总文件夹---a类
-b类
-c类
import os
import cv2
import shutil
dirName = '/home/aistudio/work/dataset'
# 将dirName路径下的所有文件路径全部存入all_path列表
all_path = []
for root, dirs, files in os.walk(dirName):
for file in files:
if "jpeg" in file:
all_path.append(os.path.join(root, file))
all_path.sort()
bad = []
# 坏图片存放路径
badpath = '/home/aistudio/bad'
for i in range(len(all_path)):
org = all_path[i]
# print(all_path[i].split('/')[-1])
try:
img = cv2.imread(org)
ss = img.shape
except:
bad.append(all_path[i])
shutil.move(all_path[i],badpath)
continue
print('共有%s张坏图'%(len(bad)))
print(bad)
我想起一个坑了,看看是否有黑白图片
有些api读黑白图片是三通道和rgb一样,有的就直接单通道了
我记得我用matplotlib读就出错,用cv2就好了
这种代码怎么写呢,大佬
这个也有影响吗?