This example demonstrates how to use HttpCrawler to crawl a list of URLs from an external file, load each URL using a plain HTTP request, and save HTML. This example demonstrates how to use PuppeteerCrawler to □️ HTTP crawler This Dataset example uses the exportToValue function to export the entire default dataset to a single CSV file into a key-value store named "my-data". Puppeteer-extra and playwright-extra are community-built □️ Export entire dataset to one file □️ Using puppeteer-extra and playwright-extra This CheerioCrawler example uses the globs property in the enqueueLinks() method to only add links to the RequestQueue queue if they match the specified pattern. This example downloads and crawls the URLs from a sitemap, by using the downloadListOfUrls utility method provided by the module. This example uses the got-scraping npm package □️ Crawl a sitemap When crawling a website, you may encounter different types of links present that you may want to crawl. This example crawls the specified list of URLs. This example uses the enqueueLinks() method to add new links to the RequestQueue □️ Crawl multiple URLs This example demonstrates how to use CheerioCrawler to crawl a list of URLs from an external file, load each URL using a plain HTTP request, parse the HTML using the Cheerio library and extract some data from it: the page title and all h1 tags. You probably don't need to go this deep though, and it would be better to start with one of the full-featured crawlers □️ Cheerio crawler This is the most bare-bones example of using Crawlee, which demonstrates some of its building blocks such as the BasicCrawler. If the dataset doesn't exist, it will be created. This example saves data to the default dataset. This example accepts and logs user input: □️ Add data to dataset Version: 3.3 Examples □️ Accept user input
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |