Passenger Satisfaction

Passenger Satisfaction is my first full-stack data science project where I built a web application over the machine learning model for this problem. I learned how to deploy a machine learning model into production in this project.

Problem:

The objective or goal of this project is to guide an airlines company to determine the important factors that influences the customer or passenger satisfaction. I built a binary classifier model to determine whether a customer is satisfied or not. In this project, the CRISP-DM methodology has been implemented to derive an appropriate solution for a business problem. It is carried out in six phases - Business understanding, Data understanding, Data preparation, Data Modelling, Evaluation and Deployment.

Data:

The Passenger Satisfaction is distributed across 2 categories with over 0.13 million records of passenger's information. Each record in the dataset contains 24 attributes like age, gender, type of travel etc.

Highlights about the data:
  • In this data set, the ‘Arrival Delay in Minutes’ column has 310 missing values in it. These missing values are imputed with the mean values of the non-missing values of the same column.
  • 99.2% of 'Loyal' customers who go for 'Personal' travel type are satisfied. (Customer type, Travel type are attributes and Satisfied is the class variable.)
Models used:
  • Naive Bayes
  • Logistic Regression
  • Random Forest
  • XGBoost
  • Ensemble Vote Classifier
Conclusions:
  • Although Random Forest Classifier took lesser time, XGBoost model gave the best performance of 96.04% accuracy.
  • Feature Engineering was one of the most important steps while solving this problem, on performing which the accuracy increased from 94% to 96%.

Technologies Used: Machine Learning, Python

Framework: Django

Frontend: Html, CSS, Bootstrap and JavaScript

Detailed explanation of Machine Learning aspects of this project: Medium blog