Best Scraping Tools in 2018 - updated

9 minutes read

Here at Kurzor we have completed numerous projects using various web scraping techniques. We can implement custom solution. A more flexible but less powerful approach is to use a service which does not require you to be the IT guru. So if you need to make a script to grab all products from an e-shop, articles from a blog or collect some images, it is easy to try one of the following tools. 2018 updated.

Web scraping article image

This is an updated version of article from 2017. If you have experience with any other scraping tools which you think should be on the list, do let us know.

Data scraping is a computer technique to extract data from human-readable output coming from another program. Extracting data from websites is called web scraping. Sometimes it is referred to as web harvesting or web data extraction.

Here at Kurzor we have completed numerous projects using various web scraping techniques. We tried some DOM parsing approach using Selenium driver. This approach needs a programmer to define a whole sequence of steps and actions to extract data from a web page. It takes expert skills in a programming language, HTML, DOM structure and various selector types (xPath, CSS, jQuery).

A more flexible but less powerful approach is to use a service which does not require you to be the IT guru. So if you need to make a script to grab all products from an e-shop, articles from a blog or collect some images, it is easy to try one of the following tools.

Apify

Apify is a web scraping and automation platform that extracts structured data from pages and turns any website into an API.

Apify doesn’t have a user interface where you select the data you want to extract by clicking with your mouse. Instead, you tell your crawler what to extract using JavaScript, so it’s perfect for scraping websites that don’t have a regular structure.

Pricing:

  • Developer: Free, 5k monthly pages, 1 parallel request (maximum number of web pages that can be requested at a time by all your crawlers) and 7 days of data retention.

  • Freelancer: $49 per month, 50k monthly pages, 5 parallel requests and 14 days of data retention.

  • Startup: $149 per month, 200k monthly pages, 10 parallel requests and 21 days of data retention.

  • Business: $499 per month, 1M monthly pages, 30 parallel requests and 30 days of data retention.

  • EnterpriseContact Apify for details about price, unlimited monthly pages, unlimited parallel requests and unlimited data retention.

Pros:

  • Lots of documentation and tips.

  • Can scrape pages with irregular structure.

  • Works on dynamic websites.

  • Supports any website.

  • JQuery integration.

  • You can schedule scripts.

  • Library of free to use existing crawlers.

Cons:

  • You need programming skills to use the free version.

  • Limited data retention period.

Webscraper.io

Web Scraper is a company specializing in data extraction from web pages. It offers 2 great options for our users: free Google Chrome Web Scraper Extension, and cloud-based Web Scraper.

Web Scraper Extension (Free!)

Using the extension, you can create a plan (sitemap) of how a web site should be traversed and what should be extracted. Using these sitemaps, the Web Scraper will navigate the site accordingly and extract all data. Scraped data later can be exported as CSV.

Cloud Web Scraper

Cloud Web Scraper offers top quality results driven at the level you require. This option allows you to extract large amounts of data, run multiple scrapings at once, and even run them on a set schedule!

Webscraper screen nahled
Image: Webscraper.io Chrome extension in action

Pricing: Chromium extension is free of charge. Prices for cloud service start at $50 for 100,000 pages and goes up to 2,000,000 pages credit worth $250.

Pros:

  • You develop scripts quickly for pages with a regular structure.
  • You can play the script and see the behavior directly in Chrome browser.
  • You can define most elements on a page by just clicking on them.
  • There will be an API soon to call your webhooks upon scraping job finish.
  • You don't need knowledge about programming to prepare the scripts.
  • The data never expire.

Cons:

  • Sometimes script working in the Chrome extension produces a different output when running in a cloud service.
  • Some advanced selectors need to be defined by user as xPath or jQuery selector.
  • You can't scrape pages that require a login.

Import.io

It is a web-based platform to extract data from websites without writing any code.

Users enter a URL and the app extract the data that it thinks you need. If the data obtained is not what you needed, you have an interface to click and select the specific data you want to extract. The data collected by users are stored on Import.io's cloud servers and can be downloaded as CSV, Excel, Google Sheets, JSON os accessed via API.

Amazon tv pricing
Image: Import.io interface

Pricing: 

  • Essential: $299 for 5k queries (expires after one month).

  • Professional: $1,999 fro 100k queries (expires after one year).

  • Enterprise: $4,999 for 500k queries (expires after one year).

Pros:

  • No coding.

  • Automatic data & image extraction.

  • Get data from behind logins.

  • Public APIs

  • Desktop application works on Windows, Mac and Linux.

Cons:

  • A bit too expensive for what it offers.

  • The queries have a limited time, expire after “X” days.

  • Doesn't work on dynamic pages.

ParseHub

Parsehub is a web scraping software that supports complicated data extraction from sites that use AJAX, JavaScript, redirects and cookies. It is equipped with machine learning technology that can read and analyse documents on the web to deliver relevant data. Parsehub is available as a desktop client for Windows, MacOS and Linux and there is also a web app that you can use within the browser. You can have up to 5 crawl projects with the free plan from ParseHub.

Parsehub screen nahled
Image: ParseHub interface for scraping

Pricing:

  • Everyone: Free, 200 pages, data retention for 14 days.

  • Standard: $149 per month, 10k pages, 20 private projects and data retention for 14 days.

  • Professional: $499 per month, unlimited pages, 120 private projects and data retention for 30 days.

  • Enterprise: contact company to know details about price, unlimited pages, unlimited projects and data retention for 30 days.

Pros:

  • Desktop client for Windows, Mac and Linux.

  • Rest API and web hooks.

  • Get data from behind logins.

  • No coding.

Cons:

  • Low time of data retention.

  • Takes a while to learn how to use properly.

  • XPath selectors.

Agenty

Is a hosted web scraping tool. It offers 3 options to users: a hosted application, desktop application and chrome extension.

Hosted Application

Crawl the web at a large scale using revolutionary pages-based cloud hosted web scraping app to extract data from static and dynamic websites automatically. API ready, no programming required, free plan available!

Desktop Application

Lightening-fast and self-service data extraction software for windows designed to easily extract data from websites using CSS selector or REGEX in few minutes.

Advanced Web Scraper (Chrome extension)

A very simple & advanced data scraping extension by Agenty to extract data from websites using point-and-click CSS Selectors with real-time extracted data preview and a quick data export into JSON/CSV/TSV.

Agenty Screen Nahled
Image: Agenty agents overview

Pricing:

  • Starter: $29 per month or $296 per year, 5k monthly pages, upto 3 scraping agents (a container which holds the configuration such as fields, selectors, URLs etc. of a particular website scraping), 30 days data history, 1 user.

  • Basic: $49 per month or $500 per year, 25k monthly pages, upto 10 scraping agents, 30 days data history, 3 users.

  • Professional: $99 per month or $1,010 per year, 100k monthly pages, upto 25 scraping agents, 60 days data history, 5 users.

  • Enterprise: contact company to get details about price, pages customizable to your needs, unlimited scraping agents, 180 days data history, unlimited users.

Pros:

  • Get data behind logins.

  • Get data from form submission pages.

  • Schedule.

  • Write your own script to modify the scraped data into your choice of format.

Cons:

  • Low time of data retention.

  • Scraping agents too low, you need to pay a professional to have a decent number of selectors to scrape pages.

  • Desktop app only works on Windows.

Octoparse

Octoparse is a cloud-based web crawler that helps you easily extract any web data without coding in real time. Simulates human operation to interact with web pages. You can use the point-&-click UI to easily bulk extract web data from web pages (including those using Ajax, JS, and etc.) and there are various export formats of your choice like CSV, Excel, HTML, TXT, and database (MySQL, SQL Server, and Oracle).

Octoparse’s cloud service (available in paid editions) can extract and store large amounts of data to meet large-scale extraction needs.

Pricing:

  • Basic: Free to use. You can run 10 scripts and extract an unlimited number of web pages.

  • Standard Plan: $75 per month when billed annually or $89 when billed monthly. You can run 100 scripts, 6 cloud servers and extract unlimited web pages.

  • Professional Plan: $158 per month when billed annually or $189 when billed monthly. You can run 200 scripts, 14 cloud servers and extract unlimited web pages.

  • Professional Data Service: starting from $299. Contact the company and they will do the work for you.

Pros:

  • Select the data to be scraped with mouse clicks. No coding.

  • It has API.

  • Works on dynamic pages.

  • Automatically generates Xpath.

Cons:

  • Only works on Windows.

  • Requires some tutorials to learn how to use it.

  • Program hangs on some pages.

Dexi.io

Dexi.io is a web scraping tool for IT professionals. Delivering the most powerful web extraction (web scraping) tool available. With the web data extraction and robotic process automation (RPA) tool, you can extract and transform data from any source.

Dexi screen nahled
Image: Dexi interface in action

Pricing:

  • Free trial.

  • Standard: $119 per month (or $105 per month if you paid annually), you can only run one script at a time.

  • Professional: $399 per month (or $355 per month if you paid annually), you can run three scripts at a time.

  • Corporate: $699 per month (or $625 per month if you paid annually), you can run six scripts at a time

Pros:

  • Easy to use GUI.

  • Easy to fix broken robots.

  • Run executions on schedules.

  • PIPES 'Master robot’ feature where 1 robot can controls multiple in overall task.

  • Supports any website.

  • Integration with Amazon S3, Box, DropBox, Google Drive and Web Hooks.

Cons:

  • Competitive pricing, but costs goes up for capacity in terms of running lots of robots simultaneously.

Conclusion

Each of the services offers a slightly different approach and pricing. Some of them will suit your project more, some less. The main goal is to select the best scraping service for your project. Definitely, 2017 can be seen as the year where data extraction from web pages is gaining its place in Kurzor’s company portfolio.

Disclaimer: Prices in the article are from October 2017. We are not paid or otherwise advantaged by any of the services mentioned for promoting them.

Send us
a message

Contact form is disabled because Kurzor no longer operates.

Thanks for understanding!