20个最受欢迎的python 爬虫库- Python学习导航

Python学习导航

K.I.S.S---Keep IT Simple,Stupid! 人生苦短，我用Python

20个最受欢迎的python 爬虫库

　

分类：问答标签： 2023年3月16日

Library Name	Introduction	Official Website URL
BeautifulSoup	A Python library that allows you to parse and navigate HTML and XML documents and extract data from them.	https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Scrapy	A Python web scraping framework that is used for scraping large-scale data from websites. Provides features like automatic request throttling and built-in support for handling cookies and sessions.	https://scrapy.org/
Requests	A Python library that is used for making HTTP requests. Often used in web scraping to send requests to web servers and fetch data.	https://requests.readthedocs.io/en/master/
Selenium	A popular web testing framework that allows you to automate browser interactions and scrape data from websites that use JavaScript.	https://www.selenium.dev/
PyQuery	A Python library that provides a jQuery-like syntax for parsing HTML and XML documents. Allows you to navigate and search parsed documents and extract data from them.	https://pythonhosted.org/pyquery/
Lxml	A Python library that is used for processing XML and HTML documents. Provides a fast and efficient way to parse and manipulate XML and HTML files.	https://lxml.de/
Feedparser	A Python library that is used for parsing RSS and Atom feeds. Allows you to extract data from these types of feeds and process them.	https://pythonhosted.org/feedparser/
MechanicalSoup	A Python library that provides a simple way to automate browser interactions and scrape data from websites. Allows you to fill out and submit forms, follow links, and interact with JavaScript.	https://mechanicalsoup.readthedocs.io/en/stable/
Requests-HTML	A Python library that is used for parsing HTML documents. Provides a number of useful methods for navigating and searching parsed documents.	https://html.python-requests.org/
Scrapy-Redis	A Python library that provides support for Redis in Scrapy. Allows you to store and retrieve scraped data from a Redis database.	https://github.com/rmax/scrapy-redis
Scrapy-Splash	A Python library that provides support for rendering JavaScript in Scrapy. Allows you to scrape data from websites that use JavaScript.	https://github.com/scrapy-plugins/scrapy-splash
Pyppeteer	A Python library that provides a high-level API for controlling headless Chrome or Chromium. Allows you to scrape data from websites that use JavaScript.	https://miyakogi.github.io/pyppeteer/
Grab	A Python library that is used for web scraping. Provides features like automatic request retries and built-in support for handling cookies and sessions.	https://docs.grablib.org/en/latest/
Robobrowser	A Python library that provides a simple way to automate browser interactions and scrape data from websites. Allows you to fill out and submit forms, follow links, and interact with JavaScript.	https://robobrowser.readthedocs.io/en/latest/
Pandas	A Python library that is widely used for data analysis. Can also be used for web scraping to process and analyze scraped data.	https://pandas.pydata.org/
Html5lib	A Python library that is used for parsing HTML documents. Provides a good balance between speed and compliance with HTML standards.	https://html5lib.readthedocs.io/en/latest/
Peewee	A Python library that is used for interacting with SQL databases. Can be used in web scraping to store and retrieve

注:当前文章会不定期进行更新。如果您对本文有更好的建议，有新资料推荐，可以点击：欢迎分享优秀网站。

这个位置将来会放广告

我想等网站访问量多了，在这个位置放个广告。网站纯公益，但是用爱发电服务器也要钱啊－－－－－－－－－－狂奔的小蜗牛