Dummy Variables & One Hot Encoding


Code in tutorial: https://github.com/codebasics/py/tree/master/ML/5_one_hot_encoding
Exercise csv file: https://github.com/codebasics/py/blob/master/ML/5_one_hot_encoding/Exercise/carprices.csv
Machine learning models work very well for dataset having only numbers. But how do we handle text information in dataset? Simple approach is to use interger or label encoding but when categorical variables are nominal, using simple label encoding can be problematic. One hot encoding is the technique that can help in this situation. In this tutorial, we will use pandas get_dummies method to create dummy variables that allows us to perform one hot encoding on given dataset. Alternatively we can use sklearn.preprocessing OneHotEncoder as well to create dummy variables.