Data mining has been in the news a lot lately, usually in connection with something bad. This is causing people to wonder what is data mining?

Data Mining Definition

Data mining is the process of finding anomalies, patterns, and correlations within large amounts of data to predict outcomes. This type of information can help businesses increase revenue, cut costs, improve customers’ happiness, and more.

How data mining is created is from three scientific disciplines. There’s statistics, the study of numbers and data. Then there is artificial intelligence, which is human-like intelligence shown by software or machines. Finally, there’s machine learning, which are algorithms that learn from data to make predictions. 

The info it finds can be your age, where you live, your gender, what you like to read or watch, what you like to buy, what websites you’ve been to, and other personal details.

This is why Facebook has been in trouble recently. Consultants working on Donald Trump’s campaign exploited the personal data of millions of Facebook users.

Data Mining Techniques

There are a few basic techniques that are used with collecting data. Depending on the company, there may be a different technique involved. After all, not every company’s goal is the same.

Association

This is probably the better known and most straightforward technique. You make a connection between two or more items and look for patterns.

An example would be knowing that a customer always buys milk with their cookies. This would suggest that they will want cookies the next time they buy milk.

Classification

You use this technique to build up a type of customer, item, or object by using various descriptions to create a specific class.

For example, gathering info for a group of females in between their 20s and 30s that like to buy a lot of clothes.

Clustering

By looking at one or more classes, you can group together pieces of data to form a structure opinion. It’s used to identify a cluster of similar results.

It’s to see where similarities and ranges intersect.

A recent study of four-digit PIN numbers found clusters between the digits in the ranges of one through twelve and one through thirty-one for the first and second pairs. This allowed the people doing the study to see how the numbers can relate to dates like birthdays and anniversaries.

Prediction

This is a wide topic focusing on predicting the failure of machinery or their components, identifying fraud, and the prediction of company profits. This can be used with other data mining techniques.

Prediction is a mix of analyzing trends, classification, patterning matching and relation. By using these tools, you can make a prediction about an event.

This is how credit card companies can look for fraud. If you have an established purchasing history and then all of the sudden you are spending a lot of money in an area you never did before, the credit card company knows to call you. Then depending on whether you made the purchase or not, they can cancel your card.

Sequential Patterns

This technique is usually used with long term data. It’s a method for identifying trends. Online shops use this method sometimes to suggest other items to purchase. If they notice that you’ve been buying a specific brand for a while, they will use that to try and make more money by suggesting other pieces from that brand.

Decisions Trees

This is related closely to classification and prediction. Decision trees can be used as a part of a selection criteria or to support the use and selection of specific data within the overall structure.

You start with a question and then follow all the paths that the answers create. This goes on and on until a prediction can be made.

Who is Using Data Mining?

Retailers, banks, manufacturers, telecommunication providers, insurers, and more are using data mining for their own individual gain.

Telecommunication and retailers are using it to see what customers have bought before so they can make custom ads, targeted campaigns, and predict future behavior of their customers.

Insurances use it to offer products more effectively and offer better competitive prices.

Manufacturers use it as an early detector of problems. They can predict wear on the stuff they make and guess when it will need to be fixed.

Banks use it to get a better view of market risks, detect fraud faster, and get the best return on their investments.

While It’s Complicated to Understand

Data mining can seem overwhelming to the general public, that’s because it is. But, the techniques and how they gain our data is something that we easily understand. They get it from our social media, reward cards, our browsing history.

Everywhere we go we leave a digital footprint.