20个最受欢迎的python 爬虫库

分类: 问答 标签: 2023年3月16日

Library Name Introduction Official Website URL
BeautifulSoup A Python library that allows you to parse and navigate HTML and XML documents and extract data from them. https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Scrapy A Python web scraping framework that is used for scraping large-scale data from websites. Provides features like automatic request throttling and built-in support for handling cookies and sessions. https://scrapy.org/
Requests A Python library that is used for making HTTP requests. Often used in web scraping to send requests to web servers and fetch data. https://requests.readthedocs.io/en/master/
Selenium A popular web testing framework that allows you to automate browser interactions and scrape data from websites that use JavaScript. https://www.selenium.dev/
PyQuery A Python library that provides a jQuery-like syntax for parsing HTML and XML documents. Allows you to navigate and search parsed documents and extract data from them. https://pythonhosted.org/pyquery/
Lxml A Python library that is used for processing XML and HTML documents. Provides a fast and efficient way to parse and manipulate XML and HTML files. https://lxml.de/
Feedparser A Python library that is used for parsing RSS and Atom feeds. Allows you to extract data from these types of feeds and process them. https://pythonhosted.org/feedparser/
MechanicalSoup A Python library that provides a simple way to automate browser interactions and scrape data from websites. Allows you to fill out and submit forms, follow links, and interact with JavaScript. https://mechanicalsoup.readthedocs.io/en/stable/
Requests-HTML A Python library that is used for parsing HTML documents. Provides a number of useful methods for navigating and searching parsed documents. https://html.python-requests.org/
Scrapy-Redis A Python library that provides support for Redis in Scrapy. Allows you to store and retrieve scraped data from a Redis database. https://github.com/rmax/scrapy-redis
Scrapy-Splash A Python library that provides support for rendering JavaScript in Scrapy. Allows you to scrape data from websites that use JavaScript. https://github.com/scrapy-plugins/scrapy-splash
Pyppeteer A Python library that provides a high-level API for controlling headless Chrome or Chromium. Allows you to scrape data from websites that use JavaScript. https://miyakogi.github.io/pyppeteer/
Grab A Python library that is used for web scraping. Provides features like automatic request retries and built-in support for handling cookies and sessions. https://docs.grablib.org/en/latest/
Robobrowser A Python library that provides a simple way to automate browser interactions and scrape data from websites. Allows you to fill out and submit forms, follow links, and interact with JavaScript. https://robobrowser.readthedocs.io/en/latest/
Pandas A Python library that is widely used for data analysis. Can also be used for web scraping to process and analyze scraped data. https://pandas.pydata.org/
Html5lib A Python library that is used for parsing HTML documents. Provides a good balance between speed and compliance with HTML standards. https://html5lib.readthedocs.io/en/latest/
Peewee A Python library that is used for interacting with SQL databases. Can be used in web scraping to store and retrieve

