Passenger Satisfaction
Passenger Satisfaction is my first full-stack data science project where I built a web application over the machine learning model for this problem. I learned how to deploy a machine learning model into production in this project.
Problem:
The objective or goal of this project is to guide an airlines company to determine the important factors that influences the customer or passenger satisfaction. I built a binary classifier model to determine whether a customer is satisfied or not. In this project, the CRISP-DM methodology has been implemented to derive an appropriate solution for a business problem. It is carried out in six phases - Business understanding, Data understanding, Data preparation, Data Modelling, Evaluation and Deployment.
Data:
The Passenger Satisfaction is distributed across 2 categories with over 0.13 million records of passenger's information. Each record in the dataset contains 24 attributes like age, gender, type of travel etc.
Highlights about the data:
- In this data set, the ‘Arrival Delay in Minutes’ column has 310 missing values in it. These missing values are imputed with the mean values of the non-missing values of the same column.
- 99.2% of 'Loyal' customers who go for 'Personal' travel type are satisfied. (Customer type, Travel type are attributes and Satisfied is the class variable.)
Models used:
- Naive Bayes
- Logistic Regression
- Random Forest
- XGBoost
- Ensemble Vote Classifier
Conclusions:
- Although Random Forest Classifier took lesser time, XGBoost model gave the best performance of 96.04% accuracy.
- Feature Engineering was one of the most important steps while solving this problem, on performing which the accuracy increased from 94% to 96%.
Technologies Used: Machine Learning, Python
Framework: Django
Frontend: Html, CSS, Bootstrap and JavaScript
Detailed explanation of Machine Learning aspects of this project: Medium blog