Skip to content

Commit 8808f69

Browse files
committed
add project: print proxy data from a simple webpage
1 parent d1b0eda commit 8808f69

2 files changed

Lines changed: 42 additions & 0 deletions

File tree

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,6 @@
1212

1313
## qiushibaike
1414
爬取糗事百科的内容,输出到命令行。主要参考http://cuiqingcai.com/990.html ,略作修改。
15+
16+
## proxyInfo
17+
爬取某个代理页面的信息,输出到控制台。

proxyInfo/proxyInfo2.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
#!/usr/bin/env python
2+
# encoding: utf-8
3+
4+
import requests
5+
from bs4 import BeautifulSoup
6+
7+
def getInfo(url):
8+
proxy_info = []
9+
page_code = requests.get(url).text
10+
soup = BeautifulSoup(page_code)
11+
table_soup = soup.find('table')
12+
proxy_list = table_soup.findAll('tr')[1:]
13+
for tr in proxy_list:
14+
td_list = tr.findAll('td')
15+
ip = td_list[2].string
16+
port = td_list[3].string
17+
location = td_list[4].string
18+
anonymity = td_list[5].string
19+
proxy_type = td_list[6].string
20+
speed = td_list[7].find('div', {'class': 'bar'})['title']
21+
connect_time = td_list[8].find('div', {'class': 'bar'})['title']
22+
validate_time = td_list[9].string
23+
24+
# strip
25+
l = [ip, port, location, anonymity, proxy_type, speed, connect_time, validate_time]
26+
for i in range( len(l) ):
27+
if l[i]:
28+
l[i] = l[i].strip()
29+
proxy_info.append(l)
30+
31+
return proxy_info
32+
33+
if __name__ == '__main__':
34+
url = 'http://www.xici.net.co/nn/1'
35+
proxy_info = getInfo(url)
36+
for row in proxy_info:
37+
for s in row:
38+
print s,
39+
print

0 commit comments

Comments
 (0)