突然想研究一下Python的爬虫功能,简单的做一下笔记:
安装依赖库
库的相关说明:https://urllib3.readthedocs.io/en/latest/
示例代码:
1 2 3 4 5 6 7 8 9 10 11
| import urllib3 import json
http = urllib3.PoolManager()
r = http.request('GET', 'https://mod.3dmgame.com/mod/API/147160')
print(r.data)
print(json.loads(r.data.decode('utf-8')))
|
实例-获取Script Hook RDR2的更新状态
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| import urllib3 import re
http = urllib3.PoolManager()
r = http.request('GET','http://www.dev-c.com/rdr2/scripthookrdr2')
pattern = re.compile('<tr>.*?<th>Released</th>.*?<td>(.*?)</td>.*?'+ '</tr>.*?<tr>.*?<th>Version</th>.*?<td>(.*?)</td>.*?</tr>.*?'+ '<tr>.*?<th>Supported patches</th>.*?<td>(.*?)</td>.*?</tr>',re.S)
items = re.findall(pattern,r.data.decode('utf-8'))
print(items)
for item in items: Released = item[0] Version = item[1] Supported = item[2]
print("更新时间:"+Released+",\n版本:"+Version+",\n支持版本:"+Supported)
|
输出结果:
1 2 3 4
| [('15 Jan 2020', 'v1.0.1232.17', '1.0.1207.58/1232.17 and above')] 更新时间:15 Jan 2020, 版本:v1.0.1232.17, 支持版本:1.0.1207.58/1232.17 and above
|