Tensorrt踩坑日记pythonpytorch转onnx
作者makcooo
来源https:blog。csdn。netqq44756223articledetails107727863
编辑极市平台
简单说明一下pytorch转onnx的意义。在pytorch训练出一个深度学习模型后,需要在TensorRT或者openvino部署,这时需要先把Pytorch模型转换到onnx模型之后再做其它转换。因此,在使用pytorch训练深度学习模型完成后,在TensorRT或者openvino或者opencv和onnxruntime部署时,pytorch模型转onnx这一步是必不可少的。本文介绍Python、pytorch转换onnx的过程中遇到的坑。
配置
Ubuntu16。04
python3。6
onnx1。6
pytorch1。5
pycuda2019。1。2
torchvision0。1。8
建议详读,先安装好环境:https:docs。nvidia。comdeeplearningtensorrtdeveloperguideindex。htmlimportonnxpython)步骤1。将pytorch模型转换成onnx模型
这边用的是Darknet生成的pytoch模型importtorchfromtorch。autogradimportVariableimportonnxinputname〔input〕outputname〔output〕inputVariable(torch。randn(1,3,544,544))。cuda()modelx。model。cuda()x。model为我生成的模型modeltorch。load(,maplocationcuda:0)torch。onnx。export(model,input,model。onnx,inputnamesinputname,outputnamesoutputname,verboseTrue)
其中modelx。model。cuda()若是不添加cuda()modelx。model
出现报错RuntimeError:Inputtype(torch。cuda。FloatTensor)andweighttype(torch。FloatTensor)shouldbethesame2。检查模型modelonnx。load(model。onnx)onnx。checker。checkmodel(model)print(Passed)3。测试onnx模型使用tensorrt推理前后对比importpycuda。autoinitimportnumpyasnpimportpycuda。driverascudaimporttensorrtastrtimporttorchimportosimporttimefromPILimportImageimportcv2importtorchvisionfilename000000。jpgmaxbatchsize1onnxmodelpathyolo。onnxTRTLOGGERtrt。Logger()Thisloggerisrequiredtobuildanenginedefgetimgnpnchw(filename):imagecv2。imread(filename)imagecvcv2。cvtColor(image,cv2。COLORBGR2RGB)imagecvcv2。resize(imagecv,(1920,1080))miunp。array(〔0。485,0。456,0。406〕)stdnp。array(〔0。229,0。224,0。225〕)imgnpnp。array(imagecv,dtypefloat)255。r(imgnp〔:,:,0〕miu〔0〕)std〔0〕g(imgnp〔:,:,1〕miu〔1〕)std〔1〕b(imgnp〔:,:,2〕miu〔2〕)std〔2〕imgnptnp。array(〔r,g,b〕)imgnpnchwnp。expanddims(imgnpt,axis0)returnimgnpnchwclassHostDeviceMem(object):definit(self,hostmem,devicemem):Withinthiscontext,hostmommeansthecpumemoryanddevicemeanstheGPUmemoryself。hosthostmemself。devicedevicememdefstr(self):returnHost:str(self。host)Device:str(self。device)defrepr(self):returnself。str()defallocatebuffers(engine):inputs〔〕outputs〔〕bindings〔〕streamcuda。Stream()forbindinginengine:sizetrt。volume(engine。getbindingshape(binding))engine。maxbatchsizedtypetrt。nptype(engine。getbindingdtype(binding))Allocatehostanddevicebuffershostmemcuda。pagelockedempty(size,dtype)devicememcuda。memalloc(hostmem。nbytes)Appendthedevicebuffertodevicebindings。bindings。append(int(devicemem))Appendtotheappropriatelist。ifengine。bindingisinput(binding):inputs。append(HostDeviceMem(hostmem,devicemem))else:outputs。append(HostDeviceMem(hostmem,devicemem))returninputs,outputs,bindings,streamdefgetengine(maxbatchsize1,onnxfilepath,enginefilepath,fp16modeFalse,int8modeFalse,saveengineFalse,):Attemptstoloadaserializedengineifavailable,otherwisebuildsanewTensorRTengineandsavesit。defbuildengine(maxbatchsize,saveengine):TakesanONNXfileandcreatesaTensorRTenginetoruninferencewithEXPLICITBATCH1(int)(trt。NetworkDefinitionCreationFlag。EXPLICITBATCH)withtrt。Builder(TRTLOGGER)asbuilder,builder。createnetwork(EXPLICITBATCH)asnetwork,trt。OnnxParser(network,TRTLOGGER)asparser:builder。maxworkspacesize130Yourworkspacesizebuilder。maxbatchsizemaxbatchsizepdb。settrace()builder。fp16modefp16modeDefault:Falsebuilder。int8modeint8modeDefault:Falseifint8mode:TobeupdatedraiseNotImplementedErrorParsemodelfileifnotos。path。exists(onnxfilepath):quit(ONNXfile{}notfound。format(onnxfilepath))print(LoadingONNXfilefrompath{}。。。。format(onnxfilepath))withopen(onnxfilepath,rb)asmodel:print(BeginningONNXfileparsing)parser。parse(model。read())ifnotparser。parse(model。read()):forerrorinrange(parser。numerrors):print(parser。geterror(error))print(Parsingfail!!!!)else:print(CompletedparsingofONNXfile)print(Buildinganenginefromfile{};thismaytakeawhile。。。。format(onnxfilepath))enginebuilder。buildcudaengine(network)print(CompletedcreatingEngine)ifsaveengine:withopen(enginefilepath,wb)asf:f。write(engine。serialize())returnengineifos。path。exists(enginefilepath):Ifaserializedengineexists,loaditinsteadofbuildinganewone。print(Readingenginefromfile{}。format(enginefilepath))withopen(enginefilepath,rb)asf,trt。Runtime(TRTLOGGER)asruntime:returnruntime。deserializecudaengine(f。read())else:returnbuildengine(maxbatchsize,saveengine)defdoinference(context,bindings,inputs,outputs,stream,batchsize1):TransferdatafromCPUtotheGPU。〔cuda。memcpyhtodasync(inp。device,inp。host,stream)forinpininputs〕Runinference。context。executeasync(batchsizebatchsize,bindingsbindings,streamhandlestream。handle)TransferpredictionsbackfromtheGPU。〔cuda。memcpydtohasync(out。host,out。device,stream)foroutinoutputs〕Synchronizethestreamstream。synchronize()Returnonlythehostoutputs。return〔out。hostforoutinoutputs〕defpostprocesstheoutputs(houtputs,shapeofoutput):houtputshoutputs。reshape(shapeofoutput)returnhoutputsimgnpnchwgetimgnpnchw(filename)imgnpnchwimgnpnchw。astype(dtypenp。float32)Thesetwomodesaredependentonhardwaresfp16modeFalseint8modeFalsetrtenginepath。modelfp16{}int8{}。trt。format(fp16mode,int8mode)Buildanengineenginegetengine(maxbatchsize,onnxmodelpath,trtenginepath,fp16mode,int8mode)Createthecontextforthisenginecontextengine。createexecutioncontext()Allocatebuffersforinputandoutputinputs,outputs,bindings,streamallocatebuffers(engine)input,output:hostbindingsDoinferenceshapeofoutput(maxbatchsize,1000)Loaddatatothebufferinputs〔0〕。hostimgnpnchw。reshape(1)inputs〔1〕。host。。。formultipleinputt1time。time()trtoutputsdoinference(context,bindingsbindings,inputsinputs,outputsoutputs,streamstream)numpydatat2time。time()featpostprocesstheoutputs(trtoutputs〔0〕,shapeofoutput)print(TensorRTok)将model改为自己的模型,此处为pytoch的resnet50,需联网下载modeltorchvision。models。resnet50(pretrainedTrue)。cuda()resnetmodelmodel。eval()inputfortorchtorch。fromnumpy(imgnpnchw)。cuda()t3time。time()feat2resnetmodel(inputfortorch)t4time。time()feat2feat2。cpu()。data。numpy()print(Pytorchok!)msenp。mean((featfeat2)2)print(InferencetimewiththeTensorRTengine:{}。format(t2t1))print(InferencetimewiththePyTorchmodel:{}。format(t4t3))print(MSEError{}。format(mse))print(Allcompleted!)
报错:Innode1(importModel):INVALIDVALUE:Assertionfailed:!importerctx。network()hasImplicitBatchDimension()ThisversionoftheONNXparseronlysupportsTensorRTINetworkDefinitionswithanexplicitbatchdimension。PleaseensurethenetworkwascreatedusingtheEXPLICITBATCHNetworkDefinitionCreationFlag。
解决:defbuildengine(maxbatchsize,saveengine):EXPLICITBATCH1(int)(trt。NetworkDefinitionCreationFlag。EXPLICITBATCH)withtrt。Builder(TRTLOGGER)asbuilder,builder。createnetwork(EXPLICITBATCH)asnetwork,trt。OnnxParser(network,TRTLOGGER)asparser:
报错:Traceback(mostrecentcalllast):line126,inlistcomp〔cuda。memcpyhtodasync(inp。device,inp。host,stream)forinpininputs〕pycuda。driver。LogicError:cuMemcpyHtoDAsyncfailed:invalidargument
解决:defgetimgnpnchw(filename):imagecv2。imread(filename)imagecvcv2。cvtColor(image,cv2。COLORBGR2RGB)imagecvcv2。resize(imagecv,(1920,1080))
输入的检测图像尺寸需要resize成model的input的size
改为defgetimgnpnchw(filename):imagecv2。imread(filename)imagecvcv2。cvtColor(image,cv2。COLORBGR2RGB)imagecvcv2。resize(imagecv,(544,544))
报错line139,inpostprocesstheoutputshoutputshoutputs。reshape(shapeofoutput)ValueError:cannotreshapearrayofsize5780intoshape(1,1000)
解决:shapeofoutput(maxbatchsize,1000)修改成自己模型ouput的大小shapeofoutput(1,20,17,17)