Library Name | Introduction | Official Website URL |
---|---|---|
BeautifulSoup | A Python library that allows you to parse and navigate HTML and XML documents and extract data from them. | https://www.crummy.com/software/BeautifulSoup/bs4/doc/ |
Scrapy | A Python web scraping framework that is used for scraping large-scale data from websites. Provides features like automatic request throttling and built-in support for handling cookies and sessions. | https://scrapy.org/ |
Requests | A Python library that is used for making HTTP requests. Often used in web scraping to send requests to web servers and fetch data. | https://requests.readthedocs.io/en/master/ |
Selenium | A popular web testing framework that allows you to automate browser interactions and scrape data from websites that use JavaScript. | https://www.selenium.dev/ |
PyQuery | A Python library that provides a jQuery-like syntax for parsing HTML and XML documents. Allows you to navigate and search parsed documents and extract data from them. | https://pythonhosted.org/pyquery/ |
Lxml | A Python library that is used for processing XML and HTML documents. Provides a fast and efficient way to parse and manipulate XML and HTML files. | https://lxml.de/ |
Feedparser | A Python library that is used for parsing RSS and Atom feeds. Allows you to extract data from these types of feeds and process them. | https://pythonhosted.org/feedparser/ |
MechanicalSoup | A Python library that provides a simple way to automate browser interactions and scrape data from websites. Allows you to fill out and submit forms, follow links, and interact with JavaScript. | https://mechanicalsoup.readthedocs.io/en/stable/ |
Requests-HTML | A Python library that is used for parsing HTML documents. Provides a number of useful methods for navigating and searching parsed documents. | https://html.python-requests.org/ |
Scrapy-Redis | A Python library that provides support for Redis in Scrapy. Allows you to store and retrieve scraped data from a Redis database. | https://github.com/rmax/scrapy-redis |
Scrapy-Splash | A Python library that provides support for rendering JavaScript in Scrapy. Allows you to scrape data from websites that use JavaScript. | https://github.com/scrapy-plugins/scrapy-splash |
Pyppeteer | A Python library that provides a high-level API for controlling headless Chrome or Chromium. Allows you to scrape data from websites that use JavaScript. | https://miyakogi.github.io/pyppeteer/ |
Grab | A Python library that is used for web scraping. Provides features like automatic request retries and built-in support for handling cookies and sessions. | https://docs.grablib.org/en/latest/ |
Robobrowser | A Python library that provides a simple way to automate browser interactions and scrape data from websites. Allows you to fill out and submit forms, follow links, and interact with JavaScript. | https://robobrowser.readthedocs.io/en/latest/ |
Pandas | A Python library that is widely used for data analysis. Can also be used for web scraping to process and analyze scraped data. | https://pandas.pydata.org/ |
Html5lib | A Python library that is used for parsing HTML documents. Provides a good balance between speed and compliance with HTML standards. | https://html5lib.readthedocs.io/en/latest/ |
Peewee | A Python library that is used for interacting with SQL databases. Can be used in web scraping to store and retrieve |
我想等网站访问量多了,在这个位置放个广告。网站纯公益,但是用爱发电服务器也要钱啊 ----------狂奔的小蜗牛