OpenRefine: Data Analysis Without Mess

Introduction

Messy data is useless data – this is the first thing that comes into your mind when you encounter chaotic, sloppy, and irregular content. Extra spaces, duplicate records, misspellings, or unstructured format need to be cleaned up before starting a proper data analysis. You’ll agree that being a data analyst does not imply cleaning “dirty” material, though this process takes the major part of analytic time. That’s why it’s critical to prepare it first. In this article, we’ll learn about a powerful tool called OpenRefine and how it can help us in data cleaning.

About OpenRefine

OpenRefine software, a former Google Refine, is a robust desktop application that will help to fix inconsistencies in your extracted content and transform it into a clean, near-perfect source for further analysis.

OpenRefine has the following features:

  • Standardizing dataset
  • Reorganizing data columns
  • Faceting and Clustering
  • Operations Tracking
  • Data Exporting

Today there are a lot of OpenRefine alternatives that can be used for content transformation, but as a rule, they are pretty expensive. On the other hand, OpenRefine is a free and open-source tool that can help you clean up your content and not only. OpenRefine also enables you to:

  • Have a complete overview of the material
  • Work with massive datasets up to 100,000 rows
  • Eliminate inconsistencies
  • Split data into parts
  • Accomplish your content using other sources

While using OpenRefine, keep in mind the following key points:

  • You do not need an internet connection though it runs on your default web browser
  • Any of the content you enter in OpenRefine will not be sent to a remote server
  • Your original content is not modified
  • All your actions can be undone
  • All your actions are automatically tracked
  • Your project is automatically saved

But that’s not all, there are extra services that you can use only after downloading OpenRefine extensions. OpenRefine has lots of tutorials and a large community ready to help you with any matter, from the simplest to the toughest one.

How to use OpenRefine

Installing OpenRefine

To download OpenRefine on your computer, make the following steps:

  1. Go to http://openrefine.org/download.html to download and install it.
  2. Run it by clicking on openrefine.exe
  3. Wait until it is opened with Firefox or Chrome (make sure that one of these browsers is set as a default one)
  4. If OpenRefine is not automatically opened, point your web browser at http://127.0.0.1:3333/ or http://localhost:3333/ and run it again.

Creating a project

Let’s start with creating a project with OpenRefine.

  1. Prepare a .csv file with content to be cleaned
  2. Run OpenRefine
  3. Select the Create Project tab
  4. Click on the Choose Files button and select the file.
  5. Click on Next to begin the uploading.
  6. When the file is uploaded, name the project and click on Create Project.
OpenRefine starting screen by DataOx

A screen with your data log will appear where you can see your content to be cleaned.

Data log on OpenRefine by DataOx
  1. Change the Settings available on the bottom of the screen according to your needs.
  2. Click on Create Project to finish importing.

Parsing data

While proceeding with further parsing, you can consider the following options by clicking the arrow on the required column:

  • Select Facet to arrange data in a column for sorting, inspecting, and editing.
  • Select Facet ->Custom Text Facet to split column data without creating new columns.
  • Select Edit cells -> Common transforms -> Trim leading and trailing white space to remove white spaces.
  • Select Edit cells -> Common transforms -> To Titlecase to change the case in names.
  • Select Edit cells -> Cluster and edit… to define values that might be alterations of the same thing.

Exporting data

After finishing with parsing, you need to export the cleaned dataset:

  1. Click on the Export button.
  2. Select the required format.

Conclusion

Whether you’re dealing with empty, unstructured, or incorrect content, you need extra tools to clean up and enhance data for accurate analysis. At DataOX we’ll help you get rid of “junk” data and create a culture of making business development decisions based only on high-quality insights. Just schedule a free consultation with our expert to learn more about our services and technologies.

Popular posts
The-legality-of-web-scraping-DataOx's-article

A Comprehensive Overview of Web Scraping Legality: Frequent Issues, Major Laws, Notable Cases

Basics of web scraping DataOx's article

Web Scraping Basics, Challenges & Technologies for Startups and Entrepreneurs

DataOx

Quick Overview of the Best Data Scraping Tools in 2020—a Devil’s Dozen Everyone Should Know

Octoparse Review

B2B Lead Generation

B2B Lead Generation: Most Effective Strategies That Work

Our site uses cookies and other technologies to tailor your experience and understand how you and other visitors use our site. Visit our Cookie Policy and our Privacy Policy for more information on our datd collection practices. By clicking Accept, you agree to our use of cookies for the purposes listed in our Cookie Policy.