
1.1import urllib错误 module urllib has no attribute request应该import urllib.requestimport urllib.request urlhttp://www.baidu.com/ responseurllib.request.urlopen(url) contentresponse.read().decode(utf-8) print(content)2.1#返回字节 contentresponse.read() #返回行 contentresponse.readline() contentresponse.readlines() #返回状态码 contentresponse.getcode() #返回url contentresponse.geturl() #返回状态信息 contentresponse.getheaders()2.2下载链接内容url urllib.request.urlretrieve(url,filename)2.3 ua 请求对象定制urlhttps://www.dianping.com/ header{User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0, Accept-Encoding: identity}#ua在检查-网络 中找到的 requesturllib.request.Request(urlurl,headersheader)#将url和header包装成object responseurllib.request.urlopen(request) contentresponse.read().decode(utf-8)2.4 urllib.parse.quote将语言统一为unicodeget请求对象import urllib.parse url1https://www.baidu.com/s?wd nameurllib.parse.quote(搜索的内容) urlurl1name #将多个内容用连接 base_urlhttps://www.baidu.com/s? data{ wd:主花, fandom:Persona4 } new_dataurllib.parse.urlencode(data) urlbase_urlnew_data2.5 post请求对象在 检查-网络-负载 被拦截了百度和百度翻译base_urlhttps://fanyi.baidu.com/sug data{ kw:formular } new_dataurllib.parse.urlencode(data).encode(utf-8)#post请求需要将其编码成字节 header{User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0, Accept-Encoding: identity}#ua在检查-网络 中找到的 requesturllib.request.Request(urlurl,datanew_data,headersheader)#将url和header以及不在url中显示的data包装成object responseurllib.request.urlopen(request) contentresponse.read().decode(utf-8) print(content)3.1将数据正则化方便加进headersimport re text name: 张三, age: 18, city: 北京 result re.findall(r\s*([^:,])\s*:\s*([^,])\s*, text) data dict(result) print(data) #不用正则 text name:张三,age:18,city:北京 data {} for item in text.split(,): key, value item.split(:) data[key] value print(data)3.2请求头是一行key一行value加上cookie就能请求成功了但是gpt给了无需cookie的头同样成功了headers {User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.0.0 Safari/537.36 Edg/147.0.0.0,Referer: https://fanyi.baidu.com/,Origin: https://fanyi.baidu.com,Content-Type: application/x-www-form-urlencoded,Accept: */*,Accept-Encoding: identity,}raw_headers accept */* accept-encoding gzip, deflate, br, zstd accept-language zh-CN,zh;q0.9,en;q0.8,en-GB;q0.7,en-US;q0.6 connection keep-alive lines [line.strip() for line in raw_headers.splitlines() if line.strip()] headers {} for i in range(0, len(lines), 2): key lines[i] value lines[i 1] headers[key] value print(headers)