VOC格式数据集操作类构建-2.统计数据集各类别标签数目和剪裁图像中标注框并保存图片

时间：2022-11-03 13:23:47

总目标：建立VOC格式数据集类以及操作内置函数

github项目地址(附有使用说明书)：

/A-mockingbird/VOCtype-datasetOperation

Day2.统计数据集各类别标签数目和剪裁图像中标注框并保存

1.统计数据集各类别标签数目

使用之前写好的解析代码，对每个xml及xml中每个标注框信息遍历

记录不同类别出现的次数，并保存再字典中

def _Countobject(self, annofile=None):"""Count the label numbers of every class, and print itPrecondition: annofile-the direction of xml file"""if annofile == None:annofile = self.dataset_anno#获取数据集中全部xml文件解析数据annoparse = self._ParseAnnos(annofile)#建立空字典，用于存储count = {}#对存储每个xml文件标注信息的字典进行遍历for anno in annoparse:#对单个xml文件中每个标注框信息进行遍历for obj in anno['info']:#检测标注类别是否第一次出现，若第一次出现则设置其数目为0；#否则计数逐次加1if obj[0] in count:count[obj[0]] +=1else:count[obj[0]] = 1#输出每个类别的统计数目for c in count.items():print("{}: {}".format(c[0], c[1]))#返回一个字典，{'类别名称': 数目, ...}return count

2.剪裁标注框并将图片保存

先获取数据集全部标注框信息，再遍历图片中每一个标注框，剪裁并保存

需加载的库: import matplotlib.pyplot as plt

import numpy as np

fromPIL import Image

def _Crop(self, imgdir, cropdir, annos=None):"""To crop all the box region of object in dataset"""if annos == None:annos = self._ParseAnnos()#获取全部xml文件数目total = len(annos)#遍历数据集解析数据，及对每一个xml文件的字典存储数据遍历for num, annotation in enumerate(annos):#获取xml文件名annofile = annotation['file']if os.path.exists(imgdir+annofile[:-4]+'.jpg') == False:raise FileNotFoundError#打开图片pil_im = Image.open(imgdir+annofile[:-4]+'.jpg') #对xml文件中的标注框遍历for i, obj in enumerate(annotation['info']):#获取类别名称obj_class = obj[0]#获取坐标信息obj_box = tuple(obj[1:5])#创建各类别的存储文件夹if os.path.exists(cropdir+obj_class) == False:os.mkdir(cropdir+obj_class)#剪裁标注框region = pil_im.crop(obj_box)#将numpy的数组格式转化为pil图像格式pil_region = Image.fromarray(np.uint8(region))#保存剪裁后的图片pil_region.save(os.path.join(cropdir+obj_class, annofile[:-4]+'_'+str(i)+'.jpg'))#记录程序进度，显示(打印)进度条process = int(num*100 / total)s1 = "\r%d%%[%s%s]"%(process,"*"*process," "*(100-process))s2 = "\r%d%%[%s]"%(100,"*"*100)sys.stdout.write(s1)sys.stdout.flush()sys.stdout.write(s2)sys.stdout.flush()print('')print("crop is completed!")

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。