File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change 2323## proxyInfo
2424爬取某个代理页面的信息,输出到控制台。
2525
26+ ## srtpInfo
27+ 爬取学校网站的srtp讲座信息。
28+
2629## tieba
2730根据《极客学院》的相关教程实现。使用多线程爬取百度贴吧的帖子信息。涉及到的技术见代码注释。
Original file line number Diff line number Diff line change 1+ #!/usr/bin/env python
2+ # encoding: utf-8
3+ import requests
4+ from lxml import etree
5+ import sys
6+ reload (sys )
7+ sys .setdefaultencoding ('utf-8' )
8+
9+
10+ def getSrtpInfo ():
11+ htmlTpl = '<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body>%s</body></html>'
12+ url = 'http://jwc.seu.edu.cn/10097/list.htm'
13+
14+ html = requests .get (url ).content
15+ tree = etree .HTML (html .decode ('utf-8' ))
16+ links = [a for a in tree .xpath ("//a" ) if a .text and a .text .startswith ("课外研学讲座" )]
17+ for link in links :
18+ print link .text
19+ print link .get ('href' )
20+ date = link .getparent ().getnext ().xpath ("div" )[0 ].text
21+ print date
22+
23+
24+ if __name__ == "__main__" :
25+ getSrtpInfo ()
You can’t perform that action at this time.
0 commit comments