Spiderbuf
爬虫练习
Python习题
技术文章
在线工具
捐赠
S04 - 分页参数分析及翻页爬取
发布日期:
1718093716
阅读数:725
coding=utf-8 import requests from lxml import etree import re base_url = ‘https://spiderbuf.cn/web-scraping-practice/web-pagination-scraper?pageno=%d’ myheaders = { ‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91...
S03 - lxml库进阶语法及解析练习
发布日期:
1718093665
阅读数:711
coding=utf-8 import requests from lxml import etree url = ‘https://spiderbuf.cn/web-scraping-practice/lxml-xpath-advanced’ myheaders = {‘User-Agent’:‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36&rsqu...
S02 - http请求分析及头构造使用
发布日期:
1718093606
阅读数:753
coding=utf-8 import requests from lxml import etree url = ‘https://spiderbuf.cn/web-scraping-practice/scraper-http-header’ myheaders = {‘User-Agent’:‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36&rsqu...
S01 - requests库及lxml库入门
发布日期:
1718093235
阅读数:1226
coding=utf-8 import requests from lxml import etree url = ‘https://spiderbuf.cn/web-scraping-practice/requests-lxml-for-scraping-beginner’ html = requests.get(url).text f = open(‘01.html’, ‘w’, encoding=‘utf-8’) f.write(html) f.close() root = etree.HTM...
Python调用Selenium爬取网页
发布日期:
1718092674
阅读数:1038
# coding=utf-8 from selenium import webdriver if __name__ == '__main__': url = 'http://www.example.com' client = webdriver.Chrome() client.get(url) html = client.page_source print(html) client.quit()...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21