一:背景1。讲故事 前段时间有位朋友在微信上找到我,说他的程序会出现一些偶发卡死的情况,让我帮忙看下是怎么回事,刚好朋友也抓到了dump,就让朋友把dump丢给我,接下来用windbg探究下到底咋回事。二:WinDbg分析1。程序真的卡死吗 因为是一个winform程序,验证起来很简单,观察主线程此时在做什么即可。 spanstylecolor:986801;lineheight:26px;0span:spanstylecolor:986801;lineheight:26px;000span:x86spanstylelineheight:26px;kb spanstylecolor:4078f2;lineheight:26px;CvRegToMachinespan(spanstylelineheight:26px;x86span)conversionfailurespanstylecolor:a626a4;lineheight:26px;forspan0x14f X86MachineInfo::SetVal:unknownregister0requested ChildEBPRetAddrArgstoChild 00018fe0a877413ff9000009180000000000000000ntdll77530000!NtWaitForSingleObject0xc 01018fe0a877413f5200000918ffffffff00000000KERNELBASE!WaitForSingleObjectEx0x99 02018fe0bc1000fe9c00000918ffffffff1000fec0KERNELBASE!WaitForSingleObject0x12 WARNING:Stackunwindinformationnotavailable。Followingframesmaybewrong。 03018fe33803d7808a000000000000000000000000USB3101A!USB3101AAUXgetch0xdc 04018fe35803d7803a0000000000000000000000000x3d7808a 05018fe3786ff87596046e192803f0297003f02db00x3d7803a 。。。 span 从主线程的线程栈看,托管代码调用了非托管的USB3101A!USB3101AAUXgetch方法,然后在NtWaitForSingleObject方法上等待,熟悉NtWaitForSingleObject方法的朋友都知道,它的第一个参数是句柄类型,签名如下: spanstylelineheight:26px;NTSTATUSspanstylecolor:4078f2;lineheight:26px;NtWaitForSingleObjectspanspanstylelineheight:26px;( 〔in〕HANDLEHandle, 〔in〕BOOLEANAlertable, 〔in〕PLARGEINTEGERTimeout )spanspan; 有了这个信息,我们可以用windbg提取ntdll77530000!NtWaitForSingleObject方法的第一个参数00000918。 spanstylecolor:986801;lineheight:26px;0span:spanstylecolor:986801;lineheight:26px;000span:x86!handlespanstylecolor:986801;lineheight:26px;00000918spanf Handlespanstylecolor:986801;lineheight:26px;00000918span TypeMutant Attributesspanstylecolor:986801;lineheight:26px;0span GrantedAccessspanstylecolor:986801;lineheight:26px;0x1f0001span: Delete,ReadControl,WriteDac,WriteOwner,Synch QueryState HandleCountspanstylecolor:986801;lineheight:26px;2span PointerCountspanstylecolor:986801;lineheight:26px;59730span NameSessionsspanstylecolor:986801;lineheight:26px;9spanBaseNamedObjectsUSB3101ALOCKspanstylecolor:986801;lineheight:26px;0span Objectspecificinformation MutexisOwned MutantOwnerspanstylecolor:986801;lineheight:26px;1334。1spanec0 从输出信息的MutantOwner1334。1ec0来看,这是一个mutex锁,当前这个锁被1134号进程中的1ec0线程持有,我们都知道mutex是可以跨进程的,接下来疑问就来了,难道这个锁被其他的进程持有后不释放吗?那到底是不是其他进程呢?可以用看下当前进程的进程号。spanstylecolor:986801;lineheight:26px;0span:spanstylecolor:986801;lineheight:26px;000span:x86 。spanstylecolor:986801;lineheight:26px;0spanId:spanstylecolor:986801;lineheight:26px;1334。1e74spanSuspend:spanstylecolor:986801;lineheight:26px;0spanTeb:spanstylecolor:986801;lineheight:26px;016spanee000Unfrozen spanstylecolor:986801;lineheight:26px;1spanId:spanstylecolor:986801;lineheight:26px;1334。1354spanSuspend:spanstylecolor:986801;lineheight:26px;0spanTeb:spanstylecolor:986801;lineheight:26px;016fspana000Unfrozen spanstylecolor:986801;lineheight:26px;2spanId:spanstylecolor:986801;lineheight:26px;1334。2spanc30Suspend:spanstylecolor:986801;lineheight:26px;0spanTeb:spanstylecolor:986801;lineheight:26px;016fspand000Unfrozen spanstylecolor:986801;lineheight:26px;3spanId:spanstylecolor:986801;lineheight:26px;1334。spandb4Suspend:spanstylecolor:986801;lineheight:26px;0spanTeb:spanstylecolor:986801;lineheight:26px;01706000spanUnfrozen spanstylecolor:986801;lineheight:26px;4spanId:spanstylecolor:986801;lineheight:26px;1334。2spanac4Suspend:spanstylecolor:986801;lineheight:26px;0spanTeb:spanstylecolor:986801;lineheight:26px;0170fspan000Unfrozen spanstylecolor:986801;lineheight:26px;5spanId:spanstylecolor:986801;lineheight:26px;1334。spand54Suspend:spanstylecolor:986801;lineheight:26px;0spanTeb:spanstylecolor:986801;lineheight:26px;01718000spanUnfrozen spanstylecolor:986801;lineheight:26px;6spanId:spanstylecolor:986801;lineheight:26px;1334。4fspancSuspend:spanstylecolor:986801;lineheight:26px;0spanTeb:spanstylecolor:986801;lineheight:26px;0171bspan000Unfrozen spanstylecolor:986801;lineheight:26px;7spanId:spanstylecolor:986801;lineheight:26px;1334。241spancSuspend:spanstylecolor:986801;lineheight:26px;0spanTeb:spanstylecolor:986801;lineheight:26px;01727000spanUnfrozen spanstylecolor:986801;lineheight:26px;8spanId:spanstylecolor:986801;lineheight:26px;1334。2464spanSuspend:spanstylecolor:986801;lineheight:26px;0spanTeb:spanstylecolor:986801;lineheight:26px;01733000spanUnfrozen spanstylecolor:986801;lineheight:26px;9spanId:spanstylecolor:986801;lineheight:26px;1334。1spanec0Suspend:spanstylecolor:986801;lineheight:26px;0spanTeb:spanstylecolor:986801;lineheight:26px;0175spand000Unfrozen spanstylecolor:986801;lineheight:26px;10spanId:spanstylecolor:986801;lineheight:26px;1334。3bspanc4Suspend:spanstylecolor:986801;lineheight:26px;0spanTeb:spanstylecolor:986801;lineheight:26px;01790000spanUnfrozen spanstylecolor:986801;lineheight:26px;11spanId:spanstylecolor:986801;lineheight:26px;1334。2844spanSuspend:spanstylecolor:986801;lineheight:26px;0spanTeb:spanstylecolor:986801;lineheight:26px;01799000spanUnfrozen spanstylecolor:986801;lineheight:26px;12spanId:spanstylecolor:986801;lineheight:26px;1334。2spana88Suspend:spanstylecolor:986801;lineheight:26px;0spanTeb:spanstylecolor:986801;lineheight:26px;0179spanc000Unfrozen spanstylecolor:986801;lineheight:26px;13spanId:spanstylecolor:986801;lineheight:26px;1334。2190spanSuspend:spanstylecolor:986801;lineheight:26px;0spanTeb:spanstylecolor:986801;lineheight:26px;0179fspan000Unfrozen 从输出看1334。1ec0来看,mutex是被本进程的9号线程持有,是本进程就好办了。2。为什么9号线程不释放 带着好奇心立刻切到9号线程上观察它的托管和非托管栈。 spanstylecolor:986801;lineheight:26px;0span:spanstylecolor:986801;lineheight:26px;009span:x86!clrstack OSThreadId:spanstylecolor:986801;lineheight:26px;0x1ec0span(spanstylecolor:986801;lineheight:26px;9span) ChildSPIPCallSite spanstylecolor:986801;lineheight:26px;0395spanec00spanstylecolor:986801;lineheight:26px;0000002bspan〔InlinedCallFrame:spanstylecolor:986801;lineheight:26px;0395spanec00〕 spanstylecolor:986801;lineheight:26px;0395spanebfcspanstylecolor:986801;lineheight:26px;0spandbfc91dDomainBoundILStubClass。ILSTUBPInvoke(IntPtr,Int16〔〕,UInt32,UInt32ByRef,UInt32ByRef,Double) spanstylecolor:986801;lineheight:26px;0395spanec00spanstylecolor:986801;lineheight:26px;0spandbfc3e0〔InlinedCallFrame:spanstylecolor:986801;lineheight:26px;0395spanec00〕xxxx。USB3101AAIReadBinary(IntPtr,Int16〔〕,UInt32,UInt32ByRef,UInt32ByRef,Double) spanstylecolor:986801;lineheight:26px;0span:spanstylecolor:986801;lineheight:26px;009span:x86spanstylelineheight:26px;kb spanstylecolor:4078f2;lineheight:26px;CvRegToMachinespan(spanstylelineheight:26px;x86span)conversionfailurespanstylecolor:a626a4;lineheight:26px;forspan0x14f X86MachineInfo::SetVal:unknownregister0requested ChildEBPRetAddrArgstoChild 000395e4c077447a94000009100000000000000000ntdll77530000!NtWaitForSingleObject0xc 010395e4c07665fc4b000009100022004b0395e524KERNELBASE!DeviceIoControl0x35404 020395e4ec1000c5bb000009100022004b0395e524kernel32!DeviceIoControlImplementation0x4b WARNING:Stackunwindinformationnotavailable。Followingframesmaybewrong。 03000009101000f7ea000107e70010000100220009USB3101A!USB3101ASetPassword0x24b 040403a7b47292cc680438b5d4000000000000000cUSB3101A!USB3101AE2PUpdateToFirmware0x10a 0500000000775a2b1c77413ff90000091800000000clr!StringObject::NewString0x4c 060000000077413ff9000009180000000077414016ntdll77530000!NtWaitForSingleObject0xc 070000000010022e610395e7a000000000e9c915c7KERNELBASE!WaitForSingleObjectEx0x99 span 从输出信息看,DeviceIoControl是一个非常底层的Win32API接口,看了下文档说是给指定的驱动设备下达指令,了,那它在等待什么呢?用同样的方式提取00000910参数。 spanstylecolor:986801;lineheight:26px;0span:spanstylecolor:986801;lineheight:26px;009span:x86!handlespanstylecolor:986801;lineheight:26px;00000910spanf Handlespanstylecolor:986801;lineheight:26px;00000910span TypeFile Attributesspanstylecolor:986801;lineheight:26px;0span GrantedAccessspanstylecolor:986801;lineheight:26px;0x12019fspan: ReadControl,Synch ReadList,WriteAdd,AppendSubDirCreatePipe,ReadEA,WriteEA,ReadAttr,WriteAttr HandleCountspanstylecolor:986801;lineheight:26px;2span PointerCountspanstylecolor:986801;lineheight:26px;59992span Nospanstylecolor:a626a4;lineheight:26px;objectspanspecificinformationavailable 从输出信息看,这是一个file类型的句柄,既然朋友说卡死,那就说明9号线程在这个handle上一直等待或者由于各种情况出不来,那为什么出不来呢?3。为什么不能全身而退 既然9号线程不能很好的退出非托管操作,内部可能发生了什么错误,要想提取当前线程在win32层面是否发生错误,可以用windbg的!gle命令, spanstylecolor:986801;lineheight:26px;0span:spanstylecolor:986801;lineheight:26px;009span:x86!gle LastErrorValue:(Win32)spanstylecolor:986801;lineheight:26px;0xb7span(spanstylecolor:986801;lineheight:26px;183span)Unabletospanstylecolor:a626a4;lineheight:26px;getspanerrorcodetext LastStatusValue:(NTSTATUS)spanstylecolor:986801;lineheight:26px;0spanSTATUSSUCCESS Wow64TEBstatus:spanstylecolor:986801;lineheight:26px;24506368span LastErrorValue:(NTSTATUS)spanstylecolor:986801;lineheight:26px;0span(spanstylecolor:986801;lineheight:26px;0span)STATUSSUCCESS LastStatusValue:(NTSTATUS)spanstylecolor:986801;lineheight:26px;0spanSTATUSSUCCESS 从输出信息看,当前报了一个0xb7的错误,不过可惜的是现在!error不能很好的展示错误信息,只能到msdn上去查,参考链接:https:learn。microsoft。comenuswindowswin32debugsystemerrorcodes0499 分析到这里,逻辑大概就捋清楚了。1号线程等待9号线程释放mutex锁。9号线程意外出现了错误得不到退出,导致mutex锁不能释放。 接下来就是让朋友重点看下9号线程的线程栈,为什么会出现重复创建的逻辑,毕竟涉及到了业务逻辑,我也只能帮到这里了。三:总结 这种类型的dump分析起来还是挺锻炼分析基本功的,文章中涉及到了一些windbg命令的使用技巧,相信大家会有收获的。