Skip to content

Commit 90418c1

Browse files
committed
爬取srtp讲座信息
1 parent 1b7cf67 commit 90418c1

2 files changed

Lines changed: 28 additions & 0 deletions

File tree

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,5 +23,8 @@
2323
## proxyInfo
2424
爬取某个代理页面的信息,输出到控制台。
2525

26+
## srtpInfo
27+
爬取学校网站的srtp讲座信息。
28+
2629
## tieba
2730
根据《极客学院》的相关教程实现。使用多线程爬取百度贴吧的帖子信息。涉及到的技术见代码注释。

srtpInfo/srtpSpider.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
#!/usr/bin/env python
2+
# encoding: utf-8
3+
import requests
4+
from lxml import etree
5+
import sys
6+
reload(sys)
7+
sys.setdefaultencoding('utf-8')
8+
9+
10+
def getSrtpInfo():
11+
htmlTpl = '<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body>%s</body></html>'
12+
url = 'http://jwc.seu.edu.cn/10097/list.htm'
13+
14+
html = requests.get(url).content
15+
tree = etree.HTML(html.decode('utf-8'))
16+
links = [a for a in tree.xpath("//a") if a.text and a.text.startswith("课外研学讲座")]
17+
for link in links:
18+
print link.text
19+
print link.get('href')
20+
date = link.getparent().getnext().xpath("div")[0].text
21+
print date
22+
23+
24+
if __name__ == "__main__":
25+
getSrtpInfo()

0 commit comments

Comments
 (0)