Skip to main content

What is Dimension in Machine Learning(ML)?

 

This is the complete guide to understanding dimensions in machine learning.

(New to ML? )

 Dimension in Machine Learning

The number of input variables (or feature columns) in the given dataset is termed as dimensions in machine learning.

Example: Salary of employees based on designation and year of experience.

 

Emp_numDesignationYears_of_experienceSalary in 1000$
51Software Engineer215
108Software Developer545
67Software Tester428
89Data Analyst550


Here, there are 3 feature or input variables. And hence dimension is 3 in this case. In the above example Enum, designation, and year_of exp are the feature columns and Salary in 1000$ is the label or output column.

 

What happens if you have high dimensions in the given dataset?

This would be the same usual problem for the machine learning model as well to identify the patterns or relationships.

Example: Salary of employees based on designation

 

Emp_numDesignationEmp_AgeGenderYears_of_experienceSalary in 1000$
51Software Engineer25Male215
108Software Developer28Female545
67Software Tester27Male428
89Data Analyst27Male550



























Here in the above example, the model needs to predict the salary of an employee based on designation and year of experience.

Here gender and employee age do not give any relationship or any value to the model. It would be a waste of resources and time to train the model by including non-decision-making feature columns.

Normally, we humans fail as the multiple features increase. The machine takes a lot of time to train the model and it’s not time and cost-efficient. Having higher dimensions results in an over fitted model.

 

How does having lower dimensions help machine learning models?

  • Less the number of feature columns less the computation is for the model. This helps the model (machine learning algorithm) to train faster.
  • And lower feature size means less storage space is required.
  • This benefits the ML model to find hidden and complex relationships among the data points.
  • The main advantage of reducing the feature columns is removing redundant features and noise.

 

Should you always do dimensional reduction?

The answer would be a definite yes. Irrespective of the number of feature columns in the given dataset. It’s a better and ideal approach to always follow for any datasets.

We always try to reduce the number of feature columns by removing or combining the multiple features columns.

There is no hard and fast rule like you need to include all the feature columns as is for the dataset you get. When the feature columns are not contributing to any decision-making, there is no point in including those columns.

 

At what step of the machine learning model development do we check for dimensions?

This is usually carried out in the data pre-processing stage. We look for feature reduction before giving the dataset to the machine learning models to learn. We never send a raw dataset for the model as we may not get desired outputs.

It’s similar to humans, as we were not put to the school directly to 1st standard or 2nd standard without doing our Pre-KGs(LKG, UKG). There might be exceptional cases, but let me put this across in a better way to make you understand. Only after the completion of schooling do we go to college.What does reducing dimension mean in machine learning?

In simple terms, it is reducing the higher dimensions to lower dimensions. There are many machine learning algorithms to do this. One such famous algorithm used is PCA( Principal Component Analysis).


I hope you got a lot from this guide on dimensions in machine learning. Be sure to refer to dimensionality reduction.

Comments

Popular posts from this blog

Machine Learning Introduction

  Do you want to know what Machine Learning is all about in the AI field? Then let’s get started with the  basic introduction in understanding ML models and datasets. What is Machine Learning (ML)? In normal terms for us humans learning means acquiring knowledge through studies, experience, or a lesson. Here it is a machine that is going to learn by itself without any human interference. Machine Learning is part of AI ( Artificial Intelligence). So let’s see the actual definition of machine learning, the study of computer algorithms that can improve automatically through experience and use of data. In Machine learning, the given datasets are divided into two halves. One is for training and another is for testing. Datasets Division in Machine Learning: Train dataset Test dataset The training dataset is always taken for building the ML models. The training dataset is also known as sample data. The accuracy of the model results is predicte

Types of Datasets in Machine Learning

  Do you want to build a machine learning model? Don’t know what Dataset is? Confused about which type of dataset to be used while building the model? (New to ML? Read our Machine Learning Introduction . ) Then let’s get started with the quick guide to machine learning datasets. What is a Machine Learning(ML) Dataset? Dataset as the name says its a set of data. Dataset is a  collection of data that is treated as a single unit for doing analytics and predictions . The dataset used in Machine learning problems can be a population or sample dataset. Most of the time the dataset used in machine learning is a sample dataset. Based on the patterns identified from this dataset the model makes predictions. Once the model is trained it is tested for accuracy and we look for the model working with the test dataset. Example: Let us consider the test scores dataset of a student. Subject Marked obtained      Performance Level English                85