How to get started

Requirements

Scrape all solution does not require technical or programming skills, but before starting you must meet three requirements:

  • Create a scrapeall account (demo or paid)
  • Use a Chrome browser
  • Install our Chrome browser extension

Full tutorial available on YouTube!

Install Google Chrome browser extension

Open your Google Chrome browser and install ScrapeAll extension.
After the installation, open browser extension popup menu and login using your ScrapeAll account credentials

 

After chrome browser extension install and authentication using ScrapeAll credentials, a small button will appear on every website while navigating on the web.
To create a new scraping project for a target website, click the “SCRAPE” left floating button, fill the form fields and click on “Create Project” button.

 

Create a new scraping project while navigating

Create Data Source

Datasources are actually pages that can contain a list of target pages links (where useful data lives). There are 3 types of datasources:

  • Auto Discovery Datasource – Usefull to navigate using all links found on a page and discover data automatically when no structured list data available, for example to recover related article directly from an article page.
  • Listing Datasource – Can be used to scrape a list of target pages such as a store category or a blog category paginated listing. The advantage of this datasource is automatic pagination exploring. 
  • Single Page Datasource – This type of datasource can be used to recover data from a single page when no navigation is required, for example a currency web service that update it’s content every 6 hours.

Scraping for every datasource can be automated and executed at a custom time intervals.

Now, we will create a listing datasource in order to collect each new product from multiple pages:

There are a few things to note here:

– after a datasource is created, it is impossible to change it’s type without deletion
– if you have multiple types of data, each data type must have it’s own data source
– scraped data results are owned by datasource and not by project

 

Target Links and Pagination mapping 

To map products from a page, click ‘+ add mapping‘ button under TARGET LINKS section. After that, hover overt the products title and you must see found element highlighted. When the correct desired element is identified, click it and the mapping will be added to the list (you can have multiple mappings).
Pagination need to be mapped also if you have multiple page of products that needs to be scraped. To map pagination element, the same procedure as above will be used.

A correct mapping will look like this:

Click on next and follow the next section instructions.

Data model is the last required step of mapping. It will help you to identify what text sections will be recovered on each scraping execution.
Each mapped page text position is transposed as a field and will be available after the execution.

To start data model mapping click on “Next” button and follow the instructions:

 

Create data model

Then map important page content:

Click on “Create data model” button and inspect mapped data.

Now we can configure scraping automation by following the next step instructions.

Go to SCRAPING CONSOLE and login with the credentials from the first steps.
 Click on the Projects left menu button and all created projects will appear. For this tutorial, we have “Scrapa” project created in the steps before.

Configure and start scraping automation

Click on “Details” button to inspect the datasource and scraping profile of this project:


As we can see, the datasource execution profile was not configured. 
An execution profile allows us to configure an automated scraping process where we can specify how many page will be scraped, how many pagination are parsed, the number of credits available for this execution profile and the scheduling options (can be configured to run every n minute, hour or day).

The execution profile configuration look like this:

** Each type of datasource (listing/auto discovery/single page) has a different set of options available in execution profile configuration.

After datasource creation and configuration we can start our automated scraping process.

In the following screens presented below, for any available datasource you can lunch scraping using ‘Start’ button and then view scraped data using ‘Data’ button:

Scrape and export data

Now the data can be exported as JSON or CSV.