Feature Engineering for Machine Learning

Authors
  • Amit Shekhar
    Name
    Amit Shekhar
    Published on
Feature Engineering for Machine Learning

I am Amit Shekhar, I have taught and mentored many developers, and their efforts landed them high-paying tech jobs, helped many tech companies in solving their unique problems, and created many open-source libraries being used by top companies. I am passionate about sharing knowledge through open-source, blogs, and videos.

Join my program and get high paying tech job: amitshekhar.me

Before we start, I would like to mention that, I have released a video playlist to help you crack the Android Interview: Check out Android Interview Questions and Answers.

In this blog, we will learn about the Feature Engineering for Machine Learning.

Feature Engineering is an art

Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. If feature engineering is done correctly, it increases the predictive power of machine learning algorithms by creating features from raw data that help facilitate the machine learning process. Feature Engineering is an art.

Steps which are involved while solving any problem in machine learning are as follows:

  • Gathering data.
  • Cleaning data.
  • Feature engineering.
  • Defining model.
  • Training, testing model and predicting the output.

Feature engineering is the most important art in machine learning which creates the huge difference between a good model and a bad model. Let's see what feature engineering covers.

Suppose, we are given a data "flight date time vs status". Then, given the date-time data, we have to predict the status of the flight.

Date_Time_CombinedStatus
02018-02-14 20:40Delayed
12018-02-15 10:30On Time
22018-02-14 07:40On Time
32018-02-15 18:10Delayed
42018-02-14 10:20On Time

As the status of the flight depends on the hour of the day, not on the date-time. We will create the new feature Hour_Of_Day. Using the Hour_Of_Day feature, the machine will learn better as this feature is directly related to the status of the flight.

Hour_Of_DayStatus
020Delayed
110On Time
27On Time
318Delayed
410On Time

Here, creating the new feature Hour_Of_Day is the feature engineering.

Let's see another example. Suppose we are given the latitude, longitude and other data with the given label Price_Of_House. We need to predict the price of the house in that area. The latitude and longitude are not of any use if they are alone. So, here we will use the crossed column feature engineering. We will combine the latitude and the longitude to make one feature. Combining into one feature will help the model learn better.

Here, combining two features to create one useful feature is the feature engineering.

Sometimes, we use the bucketized column feature engineering. Suppose we are given a data in which one column is the age and the output is the classification(X, Y, Z). By seeing the data, we realized that the output(X, Y, Z) is dependent on the age-range like 11-20 years age-range output to X, 21-40 years output to Y, 41-70 years output to Z. Here, we will create 3 buckets for the age-range 11-20, 21-40 and 41-70. We will create the new feature which is the bucketized column Age_Range having the numerical values 1, 2 and 3 where 1 is mapped to the bucket 1, 2 is mapped to the bucket 2 and 3 is mapped to the bucket 3.

Here, creating Age_Range bucket is the feature engineering.

Sometimes, removing the unwanted feature is also feature engineering. As the feature which is not related degrade the performance of the model.

Now, the steps to do feature engineering are as follows:

  • Brainstorm features.
  • Create features.
  • Check how the features work with the model.
  • Start again from first until the features work perfectly.

This is what we do in the feature engineering.

Some words on feature engineering by the experts

Feature engineering is another topic which doesn’t seem to merit any review papers or books, or even chapters in books, but it is absolutely vital to ML success. Much of the success of machine learning is actually success in engineering features that a learner can understand.

Actually the success of all Machine Learning algorithms depends on how you present the data.

The algorithms we used are very standard for Kagglers. We spent most of our efforts in feature engineering.

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.

Feature engineering turn your inputs into things the algorithm can understand.

Last but not least, Automated Feature Engineering is the current hot topic. But it requires a lot of resources. Few companies have already started working on it.

That's it for now.

Thanks

Amit Shekhar

You can connect with me on:

Read all of my high-quality blogs here.