【原】python如何只使用requests模塊批量獲取自己的公眾號文章名稱/鏈接等信息？

Python集中營 2023-03-15 發(fā)布于甘肅

展開全文

不知不覺，從開始寫公眾號文章到現在已有兩年有余。有過寥寥無幾閱讀量的挫敗，也有突然漲粉上百人的興奮。

今天正好得空，就想將自己這兩年多以來發(fā)出的公眾號文章都整理一番，可是一個一個手動的去復制鏈接/標題以及時間信息實在太過麻煩。而且手動還可能出現錯亂的情況。

做為一個軟件開發(fā)兼python愛好者，手動那是不可能手動的。

既然，我們只是想獲取自己公眾號的文章信息，那為什么不使用網絡請求request和cookie認證的方式呢？

于是，我實現了整個過程，經過測試可以成功的獲取到的我自己的公眾號文章的信息，簡直就是最簡單的爬蟲嘛！

接下來，我們直接進入正題吧，首先將需要的python模塊全部導入進來吧，若是沒有安裝requests模塊，使用pip的方式安裝一下就OK啦。

pip install reauests

# It imports the time module.
import time

# It imports the random module.
import random

# It imports the requests module.
import requests

from loguru import logger

由于公眾號接口返回的時間參數都是時間戳，因此我們先開發(fā)一個時間戳轉換函數transDateTime()。

def transDateTime(time_stamp=None):
    """
    This function takes a time stamp and returns a string of the date and time in the format of "YYYY-MM-DD HH:MM:SS"

    :param time_stamp: The time stamp you want to convert. If you don't provide one, the current time will be used
    """
    if time_stamp is None:
        logger.error('時間戳不能為空！')
        return
    else:
        time_arr = time.localtime(time_stamp)
        date_time = time.strftime("%Y-%m-%d %H:%M:%S", time_arr)
        return date_time

在開始獲取公眾號文章的信息之前，我們需要獲取到已登錄的cookie信息/token值/以及分享文章時的fakeId，這些信息都可以在自己的文章鏈接和F12瀏覽器中可以獲取到，這里不再贅述。

若是不知道如何獲取的話可以留言或是公眾號內發(fā)送消息，我看到后都會給解答。

下面我們創(chuàng)建一個函數getAllActicleList()，用來不斷的發(fā)送request請求獲取文章的返回信息。

def getAllActicleList(fake_id=None, token=None, total_num=None, cookie=None):
    """
    This function takes a time stamp and returns a string of the date and time in the format of "YYYY-MM-DD HH:MM:SS"

    :param time_stamp: The time stamp you want to convert. If you don't provide one, the current time will be used
    :return: A string of the date and time in the format of "YYYY-MM-DD HH:MM:%S"
    """
    if fake_id is None or token is None or total_num is None or cookie is None:
        logger.error('網絡請求參數皆不能為空！')
    else:
        index_url = "https://mp.weixin.qq.com/cgi-bin/appmsg"
        headers = {
            "Cookie": cookie,
            "User-Agent": ""
        }
        data = {
            "token": token,
            "lang": "zh_CN",
            "f": "json",
            "ajax": "1",
            "action": "list_ex",
            "begin": "0",
            "count": "5",
            "query": "",
            "fakeid": fake_id,
            "type": "9",
        }
        list_all = []
        for n_ in range(total_num):
            data["begin"] = n_ * 5
            time.sleep(random.randint(5, 12))
            content_json = requests.get(index_url, headers=headers, params=data).json()
            print(content_json)
            for item in content_json["app_msg_list"]:
                items = [item["title"], item["link"], item["cover"], transDateTime(item["create_time"]), item["digest"],
                         item["item_show_type"], transDateTime(item["update_time"]), ''.join(fake_id)]
                print(items)
                list_all.append(items)
            logger.info('第{}頁文章信息提取完成！'.format(n_))
        logger.info('所有文章信息全部獲取完成！')


if __name__ == '__main__':
    cookie = "pgv_pvid=9057447067; fqm_pvqid=395435a5-f777-422b-98c9-95b04ec25b5d; ua_id=39FvIEnILeBLbxayAAAAAIgUbbtk3PCL_aL8F7GbDoQ=; wxuin=69813312865817; mm_lang=zh_CN; eas_sid=V1k6f699v8L1c9m1d7f6E5b2G5; RK=c4t0n6rBHH; ptcz=af6350b43aedaca6adb9ae25c82919191828e2f1b4b6cb847cf3a8bebcea7937; _clck=3078953842|1|f7b|0; tvfe_boss_uuid=7e9e190bc18dc488; ts_uid=4218420912; __root_domain_v=.weixin.qq.com; _qddaz=QD.504676987481682; uin=o1342929047; rewardsn=; wxtokenkey=777; wwapp.vid=; wwapp.cst=; wwapp.deviceid=; uuid=b64c2b3c199de94d92136498626d8d8c; rand_info=CAESIJMyV7yjUv/6bgxaYCj27D464fzyX830vFsS8oO7gG99; slave_bizuin=3078953842; data_bizuin=3276163951; bizuin=3078953842; data_ticket=gVPTYyNSO7hxyJjbog60srqEdp0Fi/S/J4fGZjalT8vMHTnQfOgdyUtyvM/QNQvu; slave_sid=Y3I1dGJuWXFoeHY2ZW01ZlpPZV9FTXIyMkhUb3l3QkNXTVJOa05qdGhwUWZIdGdodFJUS0lKOXNpYXBqNVNmSmpWV2VNYkpLR1hlYW01NUFBSDd2bWNtQVlvS0JkbGRPTEVGWUh1RXJ6dmt5am5acG42NnBaM2JoQXA4aWxlWmFmYXVqaGxNcTczdzA1YnNq; slave_user=gh_aab1550ed027; xid=85c2199d1dbf2bf27c2f2fc8f9d173f9"
    getAllActicleList(fake_id='MzA3ODk1Mzg0Mg==', token='1459807521', total_num=100, cookie=cookie)