pythonselenium实现gitlab全文搜索
一般来说软件开发相关企业都会有自己内部的源代码管理工具,比如私有部署的gitlab服务器。特别是企业上规模之后会有多个产品线,各个业务产品线各自的项目解决方案会非常多。
以我们公司为例,就招聘事业部来说,内部的大大小小的中台ESB、MRest、各种Consumer消费端、各种工具等等解决方案现在已经上百个了。这个时候你就会遇到如下一些场景:
1。需要修改某个公共接口的参数或者某个基础库项目包中的公共方法,但是不知道到底哪些项目、哪些地方引用了该接口,不好评估影响点?
2。业务代码中已知道某个KafkaTopic,但是当初写代码兄弟没备注消费端的项目,找了很久就是找不到Consumer项目在哪儿?
3。我想通过某一些特定的关键词搜索某一段代码,记不清到底在哪些项目中使用了?
如果你有上述的困惑,那么下面介绍的这个gitlab全局Search代码搜索工具能够帮你解决这些问题。工具的实现采用pythonseleniumchromedriver实现自动化登录内部gitlab站点,通过勾选默认配置的产品分组,实现对多个分组内的项目代码特定多个关键词查询搜索。工具的运行流程及界面大概如下面几幅截图所示:
step1。读取配置文件信息自动登录:{username:yourname,password:yourpassword,projectGroups:〔recrxxx,platformuiframework,platforminfrastructure,uxshareplatform〕}
step2:登陆成功后主页注入搜索填充信息,可选择的搜索项目分组、关键词录入框等
step3:开始遍历项目分组,获取项目id,并执行关键词搜索
step4:获取命中结果展示,小于等于10个结果的会默认打开浏览器tab页全部展开,大于10个结果的需要手动打开单个或全部
step5:因为使用的chromedriver来驱动实现的,需要注意chrome浏览器版本与chromedriver版本的匹配,如不匹配会记录如下日志;
主要利用python驱动selenium实现自动化控制gitlab项目页面,通过注入特定html标签代码,实现自动化搜索gitlab项目代码。下面是python脚本部分主要实现:classGitLabSearchTool(object):definit(self):self。usernameself。passwordself。projectGroups〔〕self。usedKeywords〔〕self。getConfigInfo()self。maxPageIndex50self。pIdspiderContainerself。searchDivIdsearchContainerself。baseurlhttp:gitlab。xxxcorp。comself。baseLoginUrlhttp:gitlab。xxxcorp。comuserssigninself。startTimedatetime。now()self。isSearchingFalseself。stopSearchFalseself。isCloseFalseself。successUrlsdict()self。searchGroup〔〕self。keywords〔〕self。requestNoneself。driverNonedefstart(self):useragentMozilla5。0(WindowsNT10。0;WOW64)AppleWebKit537。36(KHTML,likeGecko)Chrome86。0。4240。111Safari537。36chromeoptionsOptions()chromeoptions。addargument(useragent{}。format(useragent))chromeoptions。addargument(disableplugins)禁用插件chromeoptions。addargument(startmaximized)启动GoogleChrome就最大化chromeoptions。addexperimentaloption(excludeSwitches,〔enableautomation〕)隐藏Chrome正在受到自动软件的控制pathItem〔chromedriver。exe〕driverPathPath。cwd()。joinpath(pathItem)self。driverwebdriver。Chrome(driverPath,optionschromeoptions)self。driver。get(self。baseLoginUrl)ifself。usernameandself。password:WebDriverWait(self。driver,1000)。until(EC。presenceofelementlocated((By。XPATH,〔idnewldapuser〕)))time。sleep(0。3)self。driver。findelement(By。XPATH,〔idusername〕)。sendkeys(self。username)time。sleep(0。3)self。driver。findelement(By。XPATH,〔idpassword〕)。sendkeys(self。password)time。sleep(0。3)self。driver。findelement(By。XPATH,〔idrememberme〕)。click()self。driver。findelement(By。XPATH,〔idnewldapuser〕input〔3〕)。click()threading。Thread(targetself。checkBrowserIsClose)。start()self。requestrequests。session()try:whilenotself。isClose:try:homepself。driver。findelement(By。ID,xxxyoucangohomenow)ifhomepisnotNone:self。driver。get(self。baseurl)except:passtry:searchDivself。driver。findelement(By。ID,xxxyoucanstartsearchnow)ifsearchDivisNone:time。sleep(1)else:self。startTimedatetime。now()self。successUrls。clear()self。searchGroup。clear()self。keywords。clear()chkListself。driver。findelements(By。XPATH,〔idsearchGroup〕descendant::input〔typecheckbox〕)forchkinchkList:ifchk。getattribute(checked)true:self。searchGroup。append(chk。getattribute(attrvalue)。strip())iflen(self。searchGroup)0:returnkeywordInputself。driver。findelement(By。ID,searchKeyword)searchKeywordkeywordInput。getattribute(value)。strip()keywordsre。split(,,,searchKeyword)iflen(keywords)0:forkwinkeywords:kwkw。strip()iflen(kw)0:self。keywords。append(kw)iflen(self。keywords)0:self。driver。executescript(arguments〔0〕。focus();,keywordInput)returnself。search()except:time。sleep(1)print(webdriverisclose)returnexceptExceptionasex:print(异常:{}。format(ex))returndefsearch(self):self。isSearchingTrueself。stopSearchFalseforgroupinself。searchGroup:ifself。stopSearch:breakforpageinrange(1,self。maxPageIndex):ifself。stopSearch:breakurlhttp:gitlab。xxxcorp。com{}?page{}。format(group,page)self。driver。get(url)WebDriverWait(self。driver,5)。until(EC。presenceofelementlocated((By。XPATH,〔idcontentbody〕p〔2〕p〔1〕ulli〔1〕a)))projectsself。driver。findelements(By。XPATH,〔idprojects〕puldescendant::a〔classproject〕)iflen(projects)0:breakforprojinprojects:try:stopSearchself。driver。findelement(By。ID,xxxyoucanstopsearchnow)ifstopSearchisnotNone:self。stopSearchTruebreakexcept:passprojUrlproj。getattribute(href)self。searchProject(projUrl)endTimedatetime。now()delta(endTimeself。startTime)。secondssuccessCountlen(self。successUrls)searchKeyword,。join(self。keywords)ifsuccessCount0:searchedPojectUrlself。getSearchedProject()htmlspan查询{}span
span耗时:{}秒!命中{}个项目spanbuttonstylewidth:150px;marginleft:50px;color:red;fontsize:16px;fontweight:normal;typebuttononclickgotohome()跳转搜索主页buttonbuttonstylewidth:150px;color:black;fontsize:16px;fontweight:normal;typebuttononclickopenAllUrl()打开全部链接button
{}。format(searchKeyword,delta,successCount,searchedPojectUrl)else:htmlspan查询{}span
span耗时:{}秒!命中{}个项目spanbuttontypebuttonstylewidth:150px;marginleft:50px;color:red;fontsize:16px;fontweight:normal;onclickgotohome()跳转搜索主页button
。format(searchKeyword,delta,successCount)self。createDom(html)self。isSearchingFalseiflen(self。successUrls)10:forurl,nameinself。successUrls。items():self。driver。executescript(window。open({})。format(url))defsearchProject(self,projUrl):projself。getProjectId(projUrl)ifproj〔0〕0:returnforkeywordinself。keywords:ifnot(keywordandlen(keyword。strip())0):continuesearchUrl{}search?utf8snippetsscopesearch{}projectid{}。format(self。baseurl,keyword,proj〔0〕)dataself。request。get(searchUrl)。texthtmletree。HTML(data)topResultshtml。xpath(〔idcontentbody〕p〔contains(class,prependtop10)〕)iflen(topResults)0:self。successUrls〔searchUrl〕proj〔1〕jswindow。open({})。format(searchUrl)self。driver。executescript(js)self。driver。switchto。window(self。driver。windowhandles〔0〕)successCountlen(self。successUrls)ifsuccessCount0:searchedPojectUrlself。getSearchedProject()htmlbuttonstylewidth:150px;color:red;fontsize:16px;fontweight:normal;typebuttononclickstopSearch()停止搜索button
span正在查询{}span
span{}span
span查询命中{}个项目span
{}。format(keyword,projUrl,successCount,searchedPojectUrl)else:htmlbuttonstylewidth:150px;color:red;fontsize:16px;fontweight:normal;typebuttononclickstopSearch()停止搜索button
span正在查询{}span
span{}span。format(keyword,projUrl)self。createDom(html)defgetProjectId(self,url):projid0projnamedataself。request。get(url)。texthtmletree。HTML(data)valueshtml。xpath(〔idsearchprojectid〕value)iflen(values)0:projidint(values〔0〕)nameshtml。xpath(〔idsearchprojectid〕dataname)iflen(names)0:projnamenames〔0〕return(projid,projname)。。。。。