Which term refers to the set of activities used to find new, hidden, or unexpected patterns in data?

  1. SAS Insights
  2. Analytics Insights

What it is and why it matters

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.

Evolution of machine learning

Because of new computing technologies, machine learning today is not like machine learning of the past. It was born from pattern recognition and the theory that computers can learn without being programmed to perform specific tasks; researchers interested in artificial intelligence wanted to see if computers could learn from data. The iterative aspect of machine learning is important because as models are exposed to new data, they are able to independently adapt. They learn from previous computations to produce reliable, repeatable decisions and results. It’s a science that’s not new – but one that has gained fresh momentum.

While many machine learning algorithms have been around for a long time, the ability to automatically apply complex mathematical calculations to big data – over and over, faster and faster – is a recent development. Here are a few widely publicized examples of machine learning applications you may be familiar with:

  • The heavily hyped, self-driving Google car? The essence of machine learning.
  • Online recommendation offers such as those from Amazon and Netflix? Machine learning applications for everyday life.
  • Knowing what customers are saying about you on Twitter? Machine learning combined with linguistic rule creation.
  • Fraud detection? One of the more obvious, important uses in our world today.

Machine Learning and Artificial Intelligence

While artificial intelligence (AI) is the broad science of mimicking human abilities, machine learning is a specific subset of AI that trains a machine how to learn. Watch this video to better understand the relationship between AI and machine learning. You'll see how these two technologies work, with useful examples and a few funny asides.

Read more about this topic

  • Viking transforms its analytics strategy using SAS® Viya® on AzureViking is going all-in on cloud-based analytics to stay competitive and meet customer needs. The retailer's digital transformation are designed to optimize processes and boost customer loyalty and revenue across channels.
  • Public health infrastructure desperately needs modernizationPublic health agencies must flex to longitudinal health crises and acute emergencies – from natural disasters like hurricanes to events like a pandemic. To be prepared, public health infrastructure must be modernized to support connectivity, real-time data exchanges, analytics and visualization.
  • SAS CIO: Why leaders must cultivate curiosity in 2021With the change we’re all facing this year, CIOs should be counting on curiosity to play a crucial role in how we’re going to meet the challenges that lie ahead. From the moment COVID-19 hit, our IT organization has relied on curiosity – that strong desire to explore, learn, know - to fuel the urgent changes required. And it’s curiosity that will enable us to meet the needs of the future of work post-pandemic.
  • Five ways your organization can enhance resilience for years to comeInnovation, agility and customer-centricity frequently top the list of companies’ strategic objectives, and now the most urgent priority is resilience. Given this new urgency, it’s worth taking a close look at the underpinnings of resilience and how they could be applied in any industry. This article explores how analytics can help boost resilience and includes key elements to keep your organization resilient.

Two of the most widely adopted machine learning methods are supervised learning and unsupervised learning – but there are also other methods of machine learning. Here's an overview of the most popular types.

Supervised learning algorithms are trained using labeled examples, such as an input where the desired output is known. For example, a piece of equipment could have data points labeled either “F” (failed) or “R” (runs). The learning algorithm receives a set of inputs along with the corresponding correct outputs, and the algorithm learns by comparing its actual output with correct outputs to find errors. It then modifies the model accordingly. Through methods like classification, regression, prediction and gradient boosting, supervised learning uses patterns to predict the values of the label on additional unlabeled data. Supervised learning is commonly used in applications where historical data predicts likely future events. For example, it can anticipate when credit card transactions are likely to be fraudulent or which insurance customer is likely to file a claim.

Unsupervised learning is used against data that has no historical labels. The system is not told the "right answer." The algorithm must figure out what is being shown. The goal is to explore the data and find some structure within. Unsupervised learning works well on transactional data. For example, it can identify segments of customers with similar attributes who can then be treated similarly in marketing campaigns. Or it can find the main attributes that separate customer segments from each other. Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering and singular value decomposition. These algorithms are also used to segment text topics, recommend items and identify data outliers.

Semisupervised learning is used for the same applications as supervised learning. But it uses both labeled and unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data (because unlabeled data is less expensive and takes less effort to acquire). This type of learning can be used with methods such as classification, regression and prediction. Semisupervised learning is useful when the cost associated with labeling is too high to allow for a fully labeled training process. Early examples of this include identifying a person's face on a web cam.

Reinforcement learning is often used for robotics, gaming and navigation. With reinforcement learning, the algorithm discovers through trial and error which actions yield the greatest rewards. This type of learning has three primary components: the agent (the learner or decision maker), the environment (everything the agent interacts with) and actions (what the agent can do). The objective is for the agent to choose actions that maximize the expected reward over a given amount of time. The agent will reach the goal much faster by following a good policy. So the goal in reinforcement learning is to learn the best policy.

Humans can typically create one or two good models a week; machine learning can create thousands of models a week.

Thomas H. Davenport, Analytics thought leader
excerpt from The Wall Street Journal

What are data mining patterns?

Pattern mining concentrates on identifying rules that describe specific patterns within the data. Market-basket analysis, which identifies items that typically occur together in purchase transactions, was one of the first applications of data mining.

What is data mining in research?

Data mining is the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis. Data mining techniques and tools enable enterprises to predict future trends and make more-informed business decisions.

What is data mining marketing?

Marketing. Data mining is used to explore increasingly large databases and to improve market segmentation. By analysing the relationships between parameters such as customer age, gender, tastes, etc., it is possible to guess their behaviour in order to direct personalised loyalty campaigns.

What is data mining and data warehousing?

Data warehousing is a method of organizing and compiling data into one database, whereas data mining deals with fetching important data from databases. Data mining attempts to depict meaningful patterns through a dependency on the data that is compiled in the data warehouse.