数据集介绍 来源:中国农业大学农作物生长情况识别挑战赛 数据包括:一个train文件夹,存放草莓处于营养生长阶段的图片。共557张。一个test文件夹,有230张测试图片。一个train。csv文件。格式内容如下: imageid categoryid a1009。jpg 0hra1011。jpg 2hra1025。jpg 3hr其中包含有作物图片及生长情况标签。草莓生长阶段大致可以分为以下几个阶段:生长期、开花期、结果期、成熟期。一、整理数据集导入库importosimportpandasaspdimportnumpyasnpimportmatplotlib。pyplotaspltimportmatplotlib。imageasmpimgbasedirstrawberrytraincsvtrain。csvclassnames〔growth,flowering,fruit,seed〕读取数据集datapd。readcsv(os。path。join(basedir,traincsv))data。head()随机乱序datadata。sample(frac1)。resetindex(dropTrue)拆分训练集和测试集traindatadata。iloc〔:int(0。8len(data))〕valdatadata。iloc〔int(0。8len(data)):〕统计每个类别的数量classnumdata。groupby(label)。count()print(classnum)trainnumtraindata。groupby(label)。count()valnumvaldata。groupby(label)。count()print(trainnum,valnum)运行结果:每种情况分类数量imagelabel0146114521463120imagelabel011711142120394imagelabel029131226326根据类别把图片分类,并复制到对应的文件夹importshutilfordindata。label。unique():ifnotos。path。exists(os。path。join(basedir,crops,str(d))):os。mkdir(os。path。join(basedir,crops,str(d)))foriindata〔data。labeld〕。index:shutil。copy(os。path。join(basedir,train,data。iloc〔i〕〔0〕),os。path。join(basedir,crops,str(d)))随机取1条数据,并显示idxnp。random。randint(0,len(traindata),size1)plt。figure(figsize(10,10))imgfileos。path。join(basedir,train,traindata〔idx〕〔0〕〔0〕)imgmpimg。imread(imgfile)labelclassnames〔traindata〔idx〕〔0〕〔1〕〕plt。imshow(img)plt。title(label)plt。axis(off)plt。show() 二、模型训练 这里还是选择PPLCNet轻量级型分类模型作为演示。模型训练importpaddlexaspdxfrompaddleximporttransformsasT定义训练和验证时的transformstraintransformsT。Compose(〔T。RandomCrop(cropsize224),T。RandomHorizontalFlip(),T。Normalize()〕)evaltransformsT。Compose(〔T。ResizeByShort(shortsize256),T。CenterCrop(cropsize224),T。Normalize()〕)定义训练和验证时的数据集traindatasetpdx。datasets。ImageNet(datadirstrawberrycrops,fileliststrawberrycropstrainlist。txt,labelliststrawberrycropslabels。txt,transformstraintransforms,shuffleTrue)evaldatasetpdx。datasets。ImageNet(datadirstrawberrycrops,fileliststrawberrycropsvallist。txt,labelliststrawberrycropslabels。txt,transformsevaltransforms)初始化模型,并进行训练numclasseslen(traindataset。labels)modelpdx。cls。PPLCNetssld(numclassesnumclasses)model。train(numepochs20,pretrainweightsIMAGENET,traindatasettraindataset,trainbatchsize64,evaldatasetevaldataset,lrdecayepochs〔5,10,15〕,learningrate0。01,savediroutputpplcnetcrops,logintervalsteps10,labelsmoothing。1,usevdlFalse)最后训练数据:〔TRAIN〕Epoch20finished,loss0。41098955,acc10。9869792,acc41。0。〔INFO〕Starttoevaluate(totalsamples111,totalsteps2)。。。〔INFO〕〔EVAL〕Finished,Epoch20,acc10。963098,acc41。000000。〔INFO〕Currentevaluatedbestmodelonevaldatasetisepoch2,acc10。984375〔INFO〕Modelsavedinoutputpplcnetcropsepoch20。三、模型预测 测试数据模型预测,并预测结果写入csv文件importpaddlexaspdximportmatplotlib。pyplotaspltimportmatplotlib。imageasmpimg创建推理对象predictorpdx。deploy。Predictor(outputpplcnetcropsinferencemodel,usegpuTrue)res〔〕存放预测结果遍历测试数据文件夹forfinos。listdir(os。path。join(basedir,test)):fnameos。path。join(basedir,test,f)imgmpimg。imread(fname)resultpredictor。predict(img)print(f,classnames〔int(result〔0〕〔category〕)〕:str(result〔0〕〔score〕))res。append(〔f,classnames〔int(result〔0〕〔category〕)〕〕)结果写入csv文件,加表头importpandasaspdpd。DataFrame(res)。tocsv(os。path。join(basedir,submit。csv),indexFalse,header〔image,label〕)