This is the complete guide to understanding dimensions in machine learning.
(New to ML? )
Dimension in Machine Learning
The number of input variables (or feature columns) in the given dataset is termed as dimensions in machine learning.
Example: Salary of employees based on designation and year of experience.
Emp_num | Designation | Years_of_experience | Salary in 1000$ |
51 | Software Engineer | 2 | 15 |
108 | Software Developer | 5 | 45 |
67 | Software Tester | 4 | 28 |
89 | Data Analyst | 5 | 50 |
Here, there are 3 feature or input variables. And hence dimension is 3 in this case. In the above example Enum, designation, and year_of exp are the feature columns and Salary in 1000$ is the label or output column.
What happens if you have high dimensions in the given dataset?
This would be the same usual problem for the machine learning model as well to identify the patterns or relationships.
Example: Salary of employees based on designation
| |||||||||||||||||||||||||||||||||||
Here in the above example, the model needs to predict the salary of an employee based on designation and year of experience.
Here gender and employee age do not give any relationship or any value to the model. It would be a waste of resources and time to train the model by including non-decision-making feature columns.
Normally, we humans fail as the multiple features increase. The machine takes a lot of time to train the model and it’s not time and cost-efficient. Having higher dimensions results in an over fitted model.
How does having lower dimensions help machine learning models?
- Less the number of feature columns less the computation is for the model. This helps the model (machine learning algorithm) to train faster.
- And lower feature size means less storage space is required.
- This benefits the ML model to find hidden and complex relationships among the data points.
- The main advantage of reducing the feature columns is removing redundant features and noise.
Should you always do dimensional reduction?
The answer would be a definite yes. Irrespective of the number of feature columns in the given dataset. It’s a better and ideal approach to always follow for any datasets.
We always try to reduce the number of feature columns by removing or combining the multiple features columns.
There is no hard and fast rule like you need to include all the feature columns as is for the dataset you get. When the feature columns are not contributing to any decision-making, there is no point in including those columns.
At what step of the machine learning model development do we check for dimensions?
This is usually carried out in the data pre-processing stage. We look for feature reduction before giving the dataset to the machine learning models to learn. We never send a raw dataset for the model as we may not get desired outputs.
It’s similar to humans, as we were not put to the school directly to 1st standard or 2nd standard without doing our Pre-KGs(LKG, UKG). There might be exceptional cases, but let me put this across in a better way to make you understand. Only after the completion of schooling do we go to college.What does reducing dimension mean in machine learning?
In simple terms, it is reducing the higher dimensions to lower dimensions. There are many machine learning algorithms to do this. One such famous algorithm used is PCA( Principal Component Analysis).
I hope you got a lot from this guide on dimensions in machine learning. Be sure to refer to dimensionality reduction.
Comments
Post a Comment