Machine Learning

Digital illustration of a robot in front of an image simulating a brain surrounded by data and other illustrations representing machine learning

Machine Learning (ML) is a subset of Artificial Intelligence (AI) that enables systems to learn and improve from experience without being explicating programs. Its algorithms are usually categorized as supervised or unsupervised. Supervised machine learning makes predictions or classifications based on known examples, while unsupervised relies solely on raw data.

Why is Machine Learning important?

Machine learning can uncover complex and hidden patterns in data, allowing it to identify insights that traditional analytics may miss. It excels at predictive modeling, enabling the forecasting of future outcomes based on historical data. Additionally, it is well-suited for tasks like natural language processing, enabling the understanding and generation of human language, which is beyond the scope of traditional analytics.

Examples of Machine Learning

Below are some common uses:

  • Personal assistants like Amazon Alexa and Apple Siri use ML to understand spoken instructions, apply historical learning, and perform actions.
  • Fraud detection uses machine learning to detect potentially fraudulent transactions.
  • Natural Language Processing (NLP) uses it to translate speech to text.
  • Social media uses include following feeds about a subject and inferring the sentiment of the dialogs.
  • Platforms like LinkedIn use it to recommend authors of posts a user might be interested in or potential groups to join.
  • ML can monitor network traffic behavior to detect and intercept potential network intrusions.
  • Shopping sites use machine learning for recommendations based on past purchases and browsing history.
  • In healthcare, providers can gain insights from test results that point to potential issues and use machine learning to develop recommended treatments.
  • Editors can get image recommendations based on the content of their articles.

Machine Learning projects

There are multiple steps involved in a MLproject, including the following:

  • The core ingredients of a machine learning model are data selection and collection. The more data points a model assesses, the more accurate predictions will be. Traditional data analytics tends to require more data preparation. In contrast, machine learning models rely on large volumes of less refined raw data to search for insights and improve predictions.
  • Data preparation benefits datasets using machine learning models. Practical preparation includes filtering out irrelevant content and outlying values and filling gaps.
  • The model selection step involves choosing the best algorithm for training the model.
  • Model training applies the selected algorithms to data sets using an iterative approach to tune prediction accuracy.
  • The model evaluation step tests output predictions against validation datasets or values to better understand the model’s accuracy.
  • The parameter tuning step adjusts the model to improve its efficacy.
  • The output from the project is a set of predictions.

Machine Learning tools provides ML libraries for audio and image processing. Algorithms supplied include numerical linear algebra, numerical optimization, statistics, artificial neural networks, and signal processing.

Amazon SageMaster

Designed for AWS users to design and train ML models. Includes tools for ML operations with a choice of tools to use in ML workflows.

Apache Spark MLlib

Apache Spark MLlib is an open-source distributed framework for machine learning. The Spark core is developed at the top. MLlib includes algorithms for regression, clustering, filters, and decision trees.

Apache Manhout

Apache Manhout helps data scientists by providing algorithms for pre-processors, regression, clustering, recommenders, and distributed linear algebra. JAVA libraries are included for common math operations.

Azure Machine Learning Studio

Azure Machine Learning is Microsoft’s attempt to compete with Google AutoML. It includes a graphical UI to connect data with ML modules.


Caffe (Convolutional Architecture for Fast Feature Embedding) is a tool that supports deep learning applications, which includes a C++ and Python API. A BSD license covers Caffe.

Google Cloud AutoML

Cloud AutoML platform provides pre-trained models to help users create services for text and speech recognition.

IBM Watson

IBM provides a web interface to Watson, which is especially strong in natural language processing.

Jupyter Notebook

The Jupyter Notebook is very popular with data engineers supporting Julia, Python, and R.


Keras is used for creating deep models and distributing training of deep learning models.

Open NN

Open NN implements neural networks focusing on deep learning and predictive analysis.


Qwak is a set of tools for ML model development with strengths in versioning and production testing.


Scikit-Learn is a toolset for predictive data analysis and model selection. The library of tools is available with a BSD software license.

Rapid Miner

Rapid Miner is focused on data sciences with a suite of data mining, deployment, and model operations capabilities.


TensorFlow is a free, open-source framework using ML and neural network models. TensorFlow is used for natural language processing and Image processing. A JavaScript and Python library can execute code on both CPUs and GPUs.

The Actian Data Platform

The Actian Data Platform is a highly scalable data analytics platform with a rich feature set for ingesting, organizing, analyzing, and publishing data. Machine learning engineers and data scientists can easily automate data pipelines, connecting to data sources using predefined connectors and transforming data for their machine learning models.