Spiderbuf
爬虫练习
Python习题
技术文章
在线工具
捐赠
N02 - 使用Base64编码的图片爬取与解码还原
发布日期:
1718095161
阅读数:580
coding=utf-8 import requests from lxml import etree import base64 url = ‘https://spiderbuf.cn/web-scraping-practice/scraping-images-base64’ myheaders = { ‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164...
H01 - CSS样式偏移混淆文本内容的解析与爬取
发布日期:
1718095102
阅读数:585
coding=utf-8 import requests from lxml import etree url = ‘https://spiderbuf.cn/web-scraping-practice/scraping-css-confuse-offset’ myheaders = {‘User-Agent’:‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537...
N01 - User-Agent与Referer校验反爬
发布日期:
1718095062
阅读数:644
coding=utf-8 import requests from lxml import etree url = ‘https://spiderbuf.cn/web-scraping-practice/user-agent-referrer’ myheaders = {‘User-Agent’:‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36&rsqu...
E03 - 无序号翻页
发布日期:
1718095025
阅读数:567
coding=utf-8 import requests from lxml import etree import re base_url = ‘https://spiderbuf.cn/web-scraping-practice/scraping-random-pagination’ https://spiderbuf.cn/e03/5f685274073b myheaders = { ‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537....
E02 - 带验证码的登录爬取
发布日期:
1718094981
阅读数:657
coding=utf-8 import requests from lxml import etree url = ‘https://spiderbuf.cn/web-scraping-practice/web-scraping-with-captcha/list’ 注意:要把Cookie改成自己的 myheaders = {‘User-Agent’:‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21