How to Extract Product Data from Alibaba.com Using Open Source Framework Scrapy

Introduction

When you collect product data from huge e-commerce websites like Alibaba you get a great opportunity to do comprehensive competitive research, market analysis, and price comparison. It is one of the leading e-commerce portals with an enormous product catalog. However, extracting the required Alibaba data is a real challenge if you are not familiar with web scraping. But if you know the stuff and have some coding skills go through this article to find out how to extract Alibaba products’ data through Scrapy – one of the most widely used open-source frameworks for web scraping.

Screenshot from Alibaba by DataOx

3 Reasons to Scrape Alibaba.com

Data extracted from e-commerce websites is a potential help to businesses that are in e-commerce and not only. Keep reading to learn three main reasons why you need to scrape data from Alibaba.

Scraping Alibaba.com by DataOx

Cataloging and listing

For any e-commerce business listing and cataloging competitors’ products are the most important thing. Without an up-to-date and comprehensive product list, it is impossible to compete in the e-commerce market. So, using Alibaba extractor, you can easily get Alibaba info and build your own product list based on your target audience’s demands and preferences or even create a new category of products.

Analyzing data

To do complete market research companies strive to get insights from the buyers’ feedbacks like ratings and reviews. This user-generated content will give you a clear sign of a particular product or brand. This kind of data might be used to improving your current products or offer a new one as well as build a positive brand reputation.

Comparing prices

Today Alibaba is well known for its affordable prices, that’s why it is crucial to extract its prices for further price comparison and optimization. Almost all e-commerce users are tracking product prices, and Alibaba may be the most popular source to track in the first turn. So, if you want to know prices in the market to optimize your price strategy, start with Alibaba scraping!

How to Create an Alibaba Crawler

Written in Python, Scrapy is one of the most efficient free frameworks for web scraping that enables the users to extract, manage, and store information in a structured data format. It is perfectly adapted for web crawlers extracting details from various pages. Let’s move forward to learn how to scrape data from the leading marketplace.

Getting started

To create an Alibaba crawler you need to have Python 3 and PIP. Follow the links to download them:

To install the necessary packages, the following command is used:

DataOx Building Alibaba crawler with Python

Creating Alibaba Scrapy project

The next step is to create a Scrapy project for Alibaba with the scrapy_alibaba folder name containing all necessary files. The command is the following:

DataOx Building Alibaba crawler with Scrapy 1

Creating the crawler

There is a built-in command in Scrapy called genspider that is responsible for generating the primary crawling template.

DataOx Building Alibaba crawler with Scrapy 2

To generate our crawler that will create spiders/scrapy_alibaba.py file it should be:

DataOx Building Alibaba crawler with Scrapy 3

The complete code should look like:

DataOx Building Alibaba crawler with Scrapy 4

Extracting Product Data from Alibaba

In this example, we’re going to extract the following fields for the earphones: https://www.alibaba.com/trade/search?fsb=y&IndexArea=product_en&CatId=&SearchText=earphones&viewtype=G

  • Name of the product
  • Price
  • Image
  • Link to the product
  • Minimum number of orders
  • Name of the seller
  • The response rate of the seller
  • Number of years as a seller on Alibaba

To extract the required data from Alibaba we’re going through the following 3 steps:

  1. Create a Selectorlib library
  2. Create a keyword file
  3. Export data in the required format

Creating a Selectorlib pattern for Alibaba

Selectorlib is a Chrome extension enabling users to point out the required data and create CSS Selectors or XPaths to extract that data. To learn more about Selectorlib go to the following link https://selectorlib.com/getting-started.html

Below you may find how we point out the fields in the code for the required data we need to extract from Alibaba using Selectorlib

DataOx Extracting data from Alibaba using Selectorlib

When you marked all the required data, click on the Export button to download the YAML file and save it as search_results.yml in the folder named /resources

Reading keywords

Now we’re going to set up the Alibaba crawler to read specific keywords from a certain file placed in the folder /resources. Let’s create there a CSV file named keywords.csv and use Python’s CSV module to read our keywords file.

Building Alibaba crawler with Scrapy by DataOx 1 Building Alibaba crawler with Scrapy by DataOx 2

Exporting data into CSV or JSON

With Scrapy you can have in-built JSON and CSV formats. To save the extracted data in the desired format just use the appropriate command line

Building Alibaba crawler with Scrapy by DataOx 3

The output will be saved in the same folder as the script. Here is an example of extracted data in CSV

Alibaba data in CSV format by DataOx

The full code for our Alibaba crawler is given below: (code source: https://github.com/scrapehero/alibaba-scraper/blob/master/scrapy_alibaba/spiders/alibaba_crawler.py)

Alibaba crawler code with Scrapy by DataOx

Final Thoughts

To sum up, we can state that creating the Alibaba crawler is not an easy task. So, if you make up your mind to outsource Alibaba product data extraction to a dedicated web scraping service, a provider like DataOx will free you of the complications in web crawling.

Schedule a free consultation with our expert to reveal the whole list of our web scraping services and learn how DataOx can help you to scrape Alibaba data on a large scale.

Popular posts
The-legality-of-web-scraping-DataOx's-article

A Comprehensive Overview of Web Scraping Legality: Frequent Issues, Major Laws, Notable Cases

Basics of web scraping DataOx's article

Web Scraping Basics, Challenges & Technologies for Startups and Entrepreneurs

DataOx

Quick Overview of the Best Data Scraping Tools in 2020—a Devil’s Dozen Everyone Should Know

B2B Lead Generation

B2B Lead Generation: Most Effective Strategies That Work

Octoparse Review

Our site uses cookies and other technologies to tailor your experience and understand how you and other visitors use our site. Visit our Cookie Policy and our Privacy Policy for more information on our datd collection practices. By clicking Accept, you agree to our use of cookies for the purposes listed in our Cookie Policy.