Customer Satisfaction Review¶
Brands face multiple challenges in keeping modern consumers engaged. It is therefore key for brands to listen to what their customers think, feel and say about their purchasing experience with them. To do so, brands first need to collect feedback from customers. The latter is one of the key pillars of digital/social commerce. Sentiment analysis unlocks a new layer to the performance insights of a brand. It allows them to better understand emotions and measure customer satisfaction. By leveraging actionable ways to transform customer emotion into brand action, marketers will be able to deliver a better customer experience.
The solution consists of a data pipeline that uses a combination of descriptives, statistics, and machine learning. Analysts can input their own data and surface the outputs in a dashboard in order to gauge the previous and future success of customer engagement strategies. Data Scientists should use this sample project as an initial building block to develop advanced analytics/support decision making.
To leverage this solution, you must meet the following requirements:
Have access to a DSS 10.0+ instance
Dataiku’s Text Preparation Plugin
A Python 3.6 code environment named
solution_customer-reviewswith the following required packages:
matplotlib plotly==5.6.0 nbformat>=4.2.0 dash==2.3.1 dash-bootstrap-components==1.0.3 sentence-transformers==2.2.0 ipywidgets==7.6.5 tensorflow==2.6.2 tensorflow-hub==0.12.0 tensorflow-text==2.6.0 dash-daq==0.5.0 transformers==4.17.0
When creating a new code environment, please be sure to use the name
solution_customer-reviews or remapping will be required.
Once your instance has been prepared, you can install this solution in one of two ways:
On your Dataiku instance click + New Project > Industry solutions > Retail > Customer Satisfaction Review.
Download the .zip project file and upload it directly to your Dataiku instance as a new project.
The Dataiku flow was initially built using publicly available data. When adapting the project to your own data and needs, having an input dataset of reviews is mandatory to run the project and each row of the dataset should be comprised of:
Review [Text]: Contains the product review.
Product Id [String]: Identifier for the product.
Product Category [String]: Category of the product (higher level than Product Id).
Date [Date]: Date of the customer review.
Rating [Integer]: Rate of the review (no restriction on the range)
Country [String]: Customer’s country.
You can follow along with the sample project in the Dataiku gallery.
The project has the following high-level steps:
Import reviews and clean the text data
Relate reviews to countries and further prepare the text
Visualize product ratings with pre-built dashboards and metrics
Drill down into specific reviews per product
Search for reviews with a pre-built semantic search webapp
This project is meant to be used as a template to guide the development of your own analysis in DSS. The results of the model should not be used as actionable insights and the data provided with the project may not be representative of actual data in a real-life project.
In addition to reading this document, it is recommended to read the wiki of the project before beginning in order to get a deeper technical understanding of how this solution was created, the different types of data enrichment available, longer explanations of solution-specific vocabulary, and suggested future direction for the solution.
Import and Prepare Reviews for Analysis¶
Once we have the correct input data (See Data Requirements) uploaded to our project, the first step in the Import & cleaning of data flow zone applies language detection to identify the language used by the customer when writing their product review. In this flow, we’ve told the recipe to use all 114 available languages but if you know which languages are present, you can decrease the scope to speed up processing and improve the accuracy of results. With the correct language identified, we are then able to apply the spell check recipe of the Text Preparation Plugin to identify and fix misspelled words.
Now that we have higher confidence in the readability of our Product Reviews, we can relate our product reviews to other features that will be relevant to our desired analysis in the Geo Engineering & Lemmatization flow zone. Using the Country column of our reviews dataset with the OpenStreetMap API, we are able to extract the related geopoints for each country. In addition to geoengineering our dataset, we apply a few more transformations to the Product Reviews’ text to remove punctuation, converting words to their lemma form, and convert everything to lower case characters. This will make it much easier for us the investigate the text.
Top Choice Products: Visualizing our Product Ratings¶
Our cleaned reviews are passed to 3 different flow zones for further analysis and visualization. The first we will explore is the Dashboard flow zone. Using a window recipe we sort the product reviews by their product category so we can get an overall view of our product categories and compare products against the other products that belong to the same category. The two datasets in this flow zone are used to generate all of the charts in the first two tabs of the Reviews Overview dashboard.
The first tab, Reviews Analysis contains 3 charts showing the evolution of a product’s rating compared to the full category, the evolution of ratings for one or more products, and a map showing the average of reviews by country. This final map can reveal interesting trends about how products are performing in different countries and the impact of culture on reviews. In this tab, there is also a dashboard filter that will impact the data used to generate all 3 charts.
The second tab, Data Overview contains a metric computing the total number of reviews, the distribution of reviews by Product Category, and the distribution of reviews by Country.
Product Review Deep-Dive¶
In addition to the Dashboard charts, this solution allows us to deep dive into the full corpus of our Product Reviews using an interactive Webapp. There are two features available in the webapp: Keyword Analysis and Semantic Search. Each feature has its own Flow Zone that transforms the data to be able to support the analysis type.
We’ll begin first with the Keyword Analysis part of the Webapp. Within the Keyword Analysis - Webapp flow zone, we take our reviews and extract the ngrams (2 consecutive words in a review) to be used as filters of all our reviews. With the ngrams extracted we can analyze which ngrams are frequently used in all of our product reviews to get a high-level analysis of the keywords most associated with each product or product category. These keywords are explorable in the Webapp and by selecting a keyword, all associated reviews will be displayed.
Switching to the Semantic Search - Webapp flow zone, we take our reviews dataset and apply a Google Tensorflow encoder model to encode 16 languages into vectors so that the model can compute similarities between words. By doing so, we can sum the semantic sense of each word. By embedding meaning into each word of the reviews, we can then use the Semantic Search portion of the Webapp to input any word and retrieve reviews that have been found to contain related words based on semantic similarity.
Please bear in mind that this is a template solution that leverages a pre-trained model. Other models should be used for better results.
Reproducing these Processes With Minimal Effort For Your Own Data¶
The intent of this project is to enable marketing and customer success teams to understand how Dataiku DSS can be used to get a full view of customer sentiment towards their products. By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single organization, smarter and more holistic strategies can be designed in order to optimize customer retention, improve products, make smarter inventory decisions, and adapt marketing strategies.
We’ve provided several suggestions on how to use transaction data to analyze your customer reviews but ultimately the “best” approach will depend on your specific needs and your data. If you’re interested in adopting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.