日韩黑丝制服一区视频播放|日韩欧美人妻丝袜视频在线观看|九九影院一级蜜桃|亚洲中文在线导航|青草草视频在线观看|婷婷五月色伊人网站|日本一区二区在线|国产AV一二三四区毛片|正在播放久草视频|亚洲色图精品一区

分享

Python3爬蟲之模擬post登陸及get登陸

 暗夜精靈fdznnm 2021-04-16

一、模擬登陸需要賬號,密碼的網(wǎng)址

一些不需要登陸的網(wǎng)址操作已經(jīng)試過了,這次來用Python嘗試需要登陸的網(wǎng)址,來利用cookie模擬登陸

由于我們教務(wù)系統(tǒng)有驗證碼偏困難一點,故挑了個軟柿子捏,賽氪,https://www.

我用的是火狐瀏覽器自帶的F12開發(fā)者工具,打開網(wǎng)址輸入賬號,密碼,登陸,如圖

可以看到捕捉到很多post和get請求,第一個post請求就是我們提交賬號和密碼的,

點擊post請求的參數(shù)選項可以看到我們提交的參數(shù)在bian表單數(shù)據(jù)里,name為賬戶名,pass為加密后的密碼,remember為是否記住密碼,0為不記住密碼。

我們再來看看headers,即消息頭

我們把這些請求頭加到post請求的headers后對網(wǎng)頁進行模擬登陸,

Cookie為必填項,否則會報錯:

{"code":403,"message":"訪問超時,請重試,多次出現(xiàn)此提示請聯(lián)系QQ:1409765583","data":[]}

便可以創(chuàng)建一個帶有cookie的opener,在第一次訪問登錄的URL時,將登錄后的cookie保存下來,然后利用帶有這個cookie的opener來訪問該網(wǎng)址的其他版塊,查看登錄之后才能看到的信息。

比如我是登陸https://www./login后模擬登陸了“我的競賽”版塊https://www./u/5598522

代碼如下: 

  1. import urllib
  2. from urllib import request
  3. from http import cookiejar
  4. login_url = "https://www./login"
  5. postdata ={
  6. "name": "your account","pass": "your password(加密后)"
  7. }
  8. header = {
  9. "Accept":"application/json, text/javascript, */*; q=0.01",
  10. "Accept-Language":"zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
  11. "Connection":"keep-alive",
  12. "Host":"www.",
  13. "Referer":"https://www./login",
  14. "Cookie":"your cookie",
  15. "Content-Type":"application/x-www-form-urlencoded; charset=UTF-8",
  16. "TE":"Trailers","X-Requested-With":"XMLHttpRequest"
  17. }
  18. postdata = urllib.parse.urlencode(postdata).encode('utf8')
  19. #req = requests.post(url,postdata,header)
  20. #聲明一個CookieJar對象實例來保存cookie
  21. cookie = cookiejar.CookieJar()
  22. #利用urllib.request庫的HTTPCookieProcessor對象來創(chuàng)建cookie處理器,也就CookieHandler
  23. cookie_support = request.HTTPCookieProcessor(cookie)
  24. #通過CookieHandler創(chuàng)建opener
  25. opener = request.build_opener(cookie_support)
  26. #創(chuàng)建Request對象
  27. my_url="https://www./u/5598522"
  28. req1 = request.Request(url=login_url, data=postdata, headers=header)#post請求
  29. req2 = request.Request(url=my_url)#利用構(gòu)造的opener不需要cookie即可登陸,get請求
  30. response1 = opener.open(req1)
  31. response2 = opener.open(req2)
  32. print(response1.read().decode('utf8'))
  33. print(response2.read().decode('utf8'))

到此就告一段落了: 

ps:有點小插曲,當(dāng)在headers里加入

Accept-Encoding

gzip, deflate, br

時,最后在 print(response1.read().decode('utf8'))時便會報錯

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

原因:在請求header中設(shè)置了'Accept-Encoding': 'gzip, deflate'

參考鏈接:https://www.cnblogs.com/chyu/p/4558782.html

解決方法:去掉Accept-Encoding后就正常了

二、模擬登陸網(wǎng)址常用方法總結(jié)

1.通過urllib庫的request庫的函數(shù)進行請求

  1. from urllib import request
  2. #get請求
  3. ------------------------------------------------------
  4. #不加headers
  5. response=request.urlopen(url)
  6. page_source = response.read().decode('utf-8')

  7. #加headers,由于urllib.request.urlopen() 函數(shù)不接受headers參數(shù),所以需要構(gòu)建一個urllib.request.Request對象來實現(xiàn)請求頭的設(shè)置
  8. req= request.Request(url=url,headers=headers)
  9. response=request.urlopen(req)
  10. page_source = response.read().decode('utf-8')

  11. #post請求
  12. -------------------------------------------------------
  13. postdata = urllib.parse.urlencode(data).encode('utf-8')#必須進行重編碼
  14. req= request.Request(url=url,data=postdata,headers=headers)
  15. response=request.urlopen(req)
  16. page_source = response.read().decode('utf-8')
  17. #使用cookie訪問其他版塊
  18. #聲明一個CookieJar對象實例來保存cookie
  19. cookie = cookiejar.CookieJar()
  20. #利用urllib.request庫的HTTPCookieProcessor對象來創(chuàng)建cookie處理器,也就CookieHandler
  21. cookie_support = request.HTTPCookieProcessor(cookie)
  22. #通過CookieHandler創(chuàng)建opener
  23. opener = request.build_opener(cookie_support)
  24. # 將Opener安裝位全局,覆蓋urlopen函數(shù),也可以臨時使用opener.open()函數(shù)
  25. #urllib.request.install_opener(opener)
  26. #創(chuàng)建Request對象
  27. my_url="https://www./u/5598522"
  28. req2 = request.Request(url=my_url)
  29. response1 = opener.open(req1)
  30. response2 = opener.open(req2)
  31. #或者直接response2=opener.open(my_url)
  32. print(response1.read().decode('utf8'))
  33. print(response2.read().decode('utf8'))

 

2.通過requests庫的get和post函數(shù)

  1. import requests
  2. import urllib
  3. import json
  4. #get請求
  5. -----------------------------------------------------------
  6. #method1
  7. url="https://www./"
  8. params={ 'key1': 'value1','key2': 'value2' }
  9. real_url = base_url + urllib.parse.urlencode(params)
  10. #real_url="https://www./key1=value1&key2=value2"
  11. response=requests.get(real_url)
  12. #method2
  13. response=requests.get(url,params)
  14. print(response.text)#<class 'str'>
  15. print(response.content)# <class 'bytes'>

  16. #post請求
  17. login_url = "https://www./login"
  18. postdata ={
  19. "name": "1324802616@qq.com","pass": "my password",
  20. }
  21. header = {
  22. "Accept":"application/json, text/javascript, */*; q=0.01",
  23. "Accept-Language":"zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
  24. "Connection":"keep-alive",
  25. "Host":"www.",
  26. "Referer":"https://www./login",
  27. "Cookie":"mycookie",
  28. "Content-Type":"application/x-www-form-urlencoded; charset=UTF-8",
  29. "TE":"Trailers","X-Requested-With":"XMLHttpRequest"
  30. }
  31. #requests中的post中傳入的data可以不進行重編碼
  32. #login_postdata = urllib.parse.urlencode(postdata).encode('utf8')
  33. response=requests.post(url=login_url,data=postdata,headers=header)#<class 'requests.models.Response'>
  34. #以下三種都可以解析結(jié)果
  35. json1 = response1.json()#<class 'dict'>
  36. json2= json.loads(response1.text)#<class 'dict'>
  37. json_str = response2.content.decode('utf-8')#<class 'str'>

  38. #利用session維持會話訪問其他版塊
  39. --------------------------------------------------------------------
  40. login_url = "https://www./login"
  41. postdata ={
  42. "name": "1324802616@qq.com","pass": "my password",
  43. }
  44. header = {
  45. "Accept":"application/json, text/javascript, */*; q=0.01",
  46. "Connection":"keep-alive",
  47. "Referer":"https://www./login",
  48. "Cookie":"mycookie",
  49. }
  50. session = requests.session()
  51. response = session.post(url=url, data=data, headers=headers)
  52. my_url="https://www./u/5598522"
  53. response1 = session.get(url=my_url, headers=headers)
  54. print(response1.json())

 

    本站是提供個人知識管理的網(wǎng)絡(luò)存儲空間,所有內(nèi)容均由用戶發(fā)布,不代表本站觀點。請注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購買等信息,謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請點擊一鍵舉報。
    轉(zhuǎn)藏 分享 獻花(0

    0條評論

    發(fā)表

    請遵守用戶 評論公約

    類似文章 更多