What Are Data Extraction, Data Parsing, and Data Mining?
Data extraction is a process of retrieving or copying information from one source (a database, web page, or other document). It can be done by using online documents as well as when digitizing paper documents.
For instance, say you have an electricity bill mailed to you from an electric company. If you want to operate it for some reason via your computer, you must scan it and digitize it into a useful format—PDF or .doc. In this example, scanning is data extraction.
If you want to work with information from any website, you must copy this information into a useful format—Excel, Word, or JSON. In that case, the process of copying is a data extraction.
Data parsing is the process of analyzing extracted data for many purposes. The goals could be: structuring unstructured text from a picture or PDF to operate in Word or Excel or analyzing text to understand the meaning (data mining).
Data mining is one kind of data parsing. The goal of data mining is to analyze text and pictures and extract meaning and hidden meanings. It is an area of machine learning called deep learning.