The internship started on May 13th and will end on July 12th. As of now, I have spent close to 7 weeks at Ford. On the first day, we were assigned our laptops and projects and introduced to our Supervisors and other colleagues of the same team.
In order for you to understand my project's objective, here are some background details: Ford runs a shuttle service for corporate employees in many cities of India which is called Office Ride. Employees book rides through an App called Ford Office Ride App and they need to give reviews for each ride in the App. As of now, Ford uses a simple Rule-Based algorithm to perform Aspect Based Sentiment Analysis on these reviews on a monthly basis and the management uses the reports from this analysis to improve the service. The problems with the rule-based algorithm are the following:
1. It is very incorrect since no one can write all the possible rules (in if-else loops format) to perform accurate aspect/sentiment analysis on real-life text data.
2. It is inefficient, inflexible and difficult to update if required
So the objective of my project was as follows:
"Replace the existing rule-based algorithm for Aspect Based Sentiment Analysis by a Machine Learning algorithm"
The dataset available with me contained close to 16000 reviews from the month of March (2019) and was manually annotated for sentiments and aspects. There were 10 aspects, each related to a different dimension of the Office Ride Service and three sentiments (Positive, Negative and Neutral)
Part 3: Implementation
Since I was new to NLP, I spent 2 days researching and learning the basics of it and came up with the flow that I would go with to complete the project. I divided the project into 5 parts:
1. Use traditional models (Logistic, XGBoost, RandomForest, MLP Classifier and SVC) with a simple vectorizer for converting words to numbers and sentences to arrays (CountVectorizer) for both the Aspect Analysis and Sentiment Analysis.
2. Use traditional models with a little more sophisticated vectorizer called Tf-Idf Vectorizer for both the Aspect Analysis and Sentiment Analysis.
3. Use traditional models but along with Transfer Learning through Google's pre-trained Word2Vec model for vectorization.
4. Use Deep Learning models along with Transfer Learning through Google's pre-trained Word2Vec model for vectorization.
5. Use Deep Learning models along with Universal Sentence Encoder for vectorization of sentences.
As of 28th June, I have completed the first 4 parts and the best models that I have got for Sentiment Analysis (which I can not disclose), gave me an F1 score of 98.5% on 10% of data excluded initially for testing and the best model for Aspect Analysis gave me an F1 score of 91% similarily. (Since there are 10 aspects, it is difficult for models to learn the differences with the limited data available). The fifth part is in progress as of now.
Part 4: What will I be presenting to management?
The management expects me to give them the best models for Aspect and Sentiment Analysis which they will then deploy by replacing the existing rule-based system. They also expect me to generate visualizations to show the distribution of Aspects and Sentiments and critical areas where the Office Ride should be improved. I am using the famous seaborn library of Python to generate these visualizations.
Part 5: Additional Learnings
Since the models were pretty sophisticated and the dataset was fairly large, it took a lot of time for codes to get executed along with hyperparameter tuning. This gave me ample time to learn many other things related to NLP and Machine Learning in general. I learned the basics of Computer Vision, Seq2Seq models, Attention models, Time Series Forecasting, etc.
Part 5: Ford GDIA as a company
Given the short time and size of the project, I had to learn many things quickly and execute them even quicker. All of this was possible only because of the culture at Ford GDIA. The "Smart, Nice, Curious" value system of Ford GDIA is reflected in each and every aspect of the organization and each and every employee. There is complete flexibility in terms of work hours, complete freedom and appropriate environment to seek help and clear issues. The efficacy and strength of peer-to-peer learning at Ford GDIA is amazing.
One and only one concern in terms of Ford GDIA as a full-time career option for MBA is the strong programming skills required to succeed and progress here. But the impression that Management study and learnings become waste when you work in Data Science domain is completely unjustified. The intricacies of operations, campaigns, programs over which one applies Data Science has to be learned and MBA gives you that.
For me, the last 7 weeks have been the most productive, enriching, value adding, and at the same time challenging part of my life.
Comments