日韩黑丝制服一区视频播放|日韩欧美人妻丝袜视频在线观看|九九影院一级蜜桃|亚洲中文在线导航|青草草视频在线观看|婷婷五月色伊人网站|日本一区二区在线|国产AV一二三四区毛片|正在播放久草视频|亚洲色图精品一区

分享

Python編程開發(fā)爬蟲抓取www.tmd86.com所有妹子圖片

 昵稱65365553 2019-07-17

懂點編程的館友都知道Python完善的網(wǎng)絡接口非常適合開發(fā)爬蟲和AI編程。

今天分享自動爬取妹子圖片的代碼,不到100行真的超級簡單、快捷。

代碼開始:

import requests

from lxml import etree

import os

def a ():

    url = 'http://www./xinggan/'

    response = requests.get(url)

    # with open('.txt' , 'wb' ) as f :

    #     f.write(response.content)

    html_ele = etree.HTML(response.text)

    # li_ele_list = html_ele.xpath('//ul[@id="pins"]/li/a/@href')

    # print(li_ele_list)

    max_list = html_ele.xpath('//nav[@class="navigation pagination"]/div/a/text()')[3]

    # print(max_list)

    for i in range(1,int(max_list)+1):

        z_url = 'http://www./xinggan/list_{}.html/'.format(i)

        # print(z_url)

        response = requests.get(z_url)

        html_ele = etree.HTML(response.text)/

        li_ele_list = html_ele.xpath('//ul[@id="pins"]/li')

        for href_ele in li_ele_list:

            href_url = href_ele.xpath('./a/@href')[0]

            print(href_url)

            name = href_ele.xpath('./span/a/text()')[0]

            print(name)

            b(href_url, name)

        # break

def b(href_url,name):

    if not os.path.exists('/'+name):

        os.makedirs('/'+name)

    headers = {

    'Referer': str(href_url),

    'Upgrade-Insecure-Requests': '1',

    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36',

    }

    # print(headers)

    response = requests.get(href_url,headers=headers)

    html_ele = etree.HTML(response.text)

    # print(html_ele)

    xq_max_list = html_ele.xpath('//div[@class="pagenavi"]/a')[-2]

    # print(xq_max_list)

    max_list = xq_max_list.xpath('./span/text()')[0]

    # print(max_list)

    for i in range(1,int(max_list)):

        xq_url = str(href_url)+'/'+str(i)

        print(xq_url)

        response = requests.get(xq_url,headers = headers)

        html_ele = etree.HTML(response.text)

        src_page = html_ele.xpath('//div[@class="main-image"]/p/a/img/@src')

        src_page = src_page[0]

        print(src_page)

        tname = src_page.split('/')[-1]

        print(tname)

        response = requests.get(src_page, headers=headers)

        with open( '/'+name+'/'+tname,'wb' ) as f:

            f.write(response.content)

if __name__ == '__main__':

    a()


代碼結束,效率很高 so easy

    本站是提供個人知識管理的網(wǎng)絡存儲空間,所有內容均由用戶發(fā)布,不代表本站觀點。請注意甄別內容中的聯(lián)系方式、誘導購買等信息,謹防詐騙。如發(fā)現(xiàn)有害或侵權內容,請點擊一鍵舉報。
    轉藏 分享 獻花(0

    0條評論

    發(fā)表

    請遵守用戶 評論公約

    類似文章 更多