Data Analysis’ Methods

3 minute read

As promised we will talk about each of the data analysis’ methods presented in the last post. I hope that this way you can notice its importance to the today’s world.

Data Mining

alt Source: ECMapping

Data mining is a method of data analysis for discovering patterns in large data sets using the methods of statistics, artificial intelligence, machine learning and databases. The goal is to transform raw data into understandable business information. These might include identifying groups of data records (also known as cluster analysis), or identifying anomolies and dependencies between data groups.

Applications of data mining:

  • Anomaly detection can process huge amounts of data (“big data”) and automatically identify outlier cases, possibly for exclusion from decision making or detection of fraud (e.g. bank fraud).
  • Learning customer purchase habits. Machine learning techniques can be used to model customer purchase habits and determine frequently bought items.
  • Clustering can identify previously unknown groups within the data.
  • Classification is used to automatically classify data entries into pre-specified bins. A common example is classifying email messages as “spam” or “not-spam” and having the system learn from the user.

Text Analytics

alt Source: Datanami

Text analytics is the process of deriving useful information from text. It is accomplished by processing unstructured textual information, extract meaningful numerical indices from the information and make the information available to statistical and machine learning algorithms for further processing.

Text mining process includes one or more of the following steps:

  1. Collecting information from various sources including web, file system, database, etc.
  2. Linguistic analysis including natural language processing.
  3. Pattern recognition (e.g. recognizing phone numbers, email addresses, etc.)
  4. Extracting summary information from the text, such as relative frequencies of the words, determining similarities between documents, etc.

Examples of text analytics applications:

  • Analyzing open-ended survey responses. These surveys are of an exploratory nature and include open-ended questions related to the topic in question. The respondents can then express their views without being constrained to a particular response format.
  • Analysis of emails, documents, etc to filter out “junk”. This also includes automatic classification of messages into pre-defined bins for routing to different departments.
  • Investigate competitors by crawling their websites. This could be used to derive information about competitors’ activities.
  • Security applications which can process log files for intrusion detection.

Business Intelligence

alt Source: NMind

Business intelligence transforms data into actionable intelligence for business purposes and may be used in an organization’s strategic and tactical business decision making. It offers a way for people to examine trends from collected data and derive insights from it.

Some examples of business intelligence in used today:

  • An organization’s operating decisions such as product placement and pricing.
  • Identifying new markets, assessing the demand and suitability of products for different market segments.
  • Budgeting and rolling forecasts.
  • Using visual tools such as heat maps, pivot tables and geographical mapping.

Data Visualization

alt Source: BoostLabs

Data visualization refers very simply to the visual representation of data. In the context of data analysis, it means using the tools of statistics, probability, pivot tables and other artifacts to present data visually. It makes complex data more understandable and usable.

Increasing amounts of data are being generated by a number of sensors in the environment (referred to as “Internet of Things” or “IOT”). This data (referred to as “big data”) presents challenges in understanding which can be eased by using the tools of Data visualization.

Data visualization is used in the following applications.

  • Extracting summary data from the raw data of IOT.
  • Using a bar chart to represent sales performance over several quarters.
  • A histogram shows distribution of a variable such as income by dividing the range into bins.

The visualization of Google datasets is a great example of how big data can visually guide decision-making.

Data Analysis in Review

Data analysis is used to evaluate data with statistical tools to discover useful information. A variety of methods are used for this purpose, including data mining, text analytics, business intelligence, combining data sets, and data visualization.

The Power Query tool in Microsoft Excel is especially helpful for data analysis. If you want to familiarize yourself with it, read the MakeUseOf guide to create your first Microsoft Power Query script.

On the next posts I’ll try to introduce some practical data analysis. Until the next post.