Solution | Customer Satisfaction Reviews#


Business Case#

Brands face multiple challenges in keeping modern consumers engaged. It is therefore key for brands to listen to what their customers think, feel and say about their purchasing experience with them. To do so, brands first need to collect feedback from customers. The latter is one of the key pillars of digital/social commerce. Sentiment analysis unlocks a new layer to the performance insights of a brand. It allows them to better understand emotions and measure customer satisfaction. By leveraging actionable ways to transform customer emotion into brand action, marketers will be able to deliver a better customer experience.

The solution consists of a data pipeline that uses a combination of descriptive statistics and machine learning. Analysts can input their own data and surface the outputs in a dashboard in order to gauge the previous and future success of customer engagement strategies. Data Scientists should use this sample project as an initial building block to develop advanced analytics/support decision making.


The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.

This solution is not available on Dataiku Cloud. Although you may try to import the zip file found in the self-managed instructions onto a Cloud instance, Dataiku offers no support in this case.

After meeting the technical requirements below, self-managed users can install the Solution in one of two ways:

  1. On your Dataiku instance connected to the internet, click + New Project > Dataiku Solutions > Search for Customer Satisfaction Reviews.

  2. Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.

Technical Requirements#

To leverage this solution, you must meet the following requirements:

  • Have access to a Dataiku 10.0+ instance.

  • Dataiku’s Text Preparation Plugin.

  • A Python 3.6 code environment named solution_customer-reviews with the following required packages:


Data Requirements#

The Dataiku Flow was initially built using publicly available data. When adapting the project to your own data and needs, having an input dataset of reviews is mandatory to run the project and each row of the dataset should be comprised of:






Contains the product review.

Product ID


Corresponds to the identifier for the product.

Product Category


Corresponds to the category of the product (higher level than Product ID).



Corresponds to the date of the customer review.



Indicates the rate of the review (no restriction on the range).



Indicates the customer’s country.

Workflow Overview#

You can follow along with the sample project in the Dataiku gallery.

Dataiku screenshot of the final project Flow showing all Flow zones.

The project has the following high-level steps:

  1. Import reviews and clean the text data.

  2. Relate reviews to countries and further prepare the text.

  3. Visualize product ratings with pre-built dashboards and metrics.

  4. Drill down into specific reviews per product.

  5. Search for reviews with a pre-built semantic search webapp.


This project is meant to be used as a template to guide the development of your own analysis in Dataiku. The results of the model should not be used as actionable insights and the data provided with the project may not be representative of actual data in a real-life project.



In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Import and Prepare Reviews for Analysis#

Once we have the correct input data (See Data Requirements) uploaded to our project, the first step in the Import & cleaning of data Flow zone applies language detection to identify the language used by the customer when writing their product review. In this Flow, we’ve told the recipe to use all 114 available languages but if you know which languages are present, you can decrease the scope to speed up processing and improve the accuracy of results. With the correct language identified, we are then able to apply the spell check recipe of the Text Preparation Plugin to identify and fix misspelled words.

Dataiku screenshot of the recipe used to clean the reviews text.

Now that we have higher confidence in the readability of our Product Reviews, we can relate our product reviews to other features that will be relevant to our desired analysis in the Geo Engineering & Lemmatization Flow zone. Using the Country column of our reviews dataset with the OpenStreetMap API, we are able to extract the related geopoints for each country. In addition to geoengineering our dataset, we apply a few more transformations to the Product Reviews’ text to remove punctuation, converting words to their lemma form, and convert everything to lower case characters. This will make it much easier for us the investigate the text.

Top Choice Products: Visualizing our Product Ratings#

Our cleaned reviews are passed to 3 different Flow zones for further analysis and visualization. The first we will explore is the Dashboard Flow zone. Using a window recipe we sort the product reviews by their product category so we can get an overall view of our product categories and compare products against the other products that belong to the same category. The two datasets in this Flow zone are used to generate all of the charts in the first two tabs of the Reviews Overview dashboard.

The first tab, Reviews Analysis contains 3 charts showing the evolution of a product’s rating compared to the full category, the evolution of ratings for one or more products, and a map showing the average of reviews by country. This final map can reveal interesting trends about how products are performing in different countries and the impact of culture on reviews. In this tab, there is also a dashboard filter that will impact the data used to generate all 3 charts.

Dataiku screenshot of the dashboard tab containing an overall analysis of product ratings

The second tab, Data Overview contains a metric computing the total number of reviews, the distribution of reviews by Product Category, and the distribution of reviews by Country.

Dataiku screenshot of the dashboard tab showing the high-level overview of our product reviews

Product Review Deep-Dive#

In addition to the Dashboard charts, this solution allows us to deep dive into the full corpus of our Product Reviews using an interactive Webapp. There are two features available in the webapp: Keyword Analysis and Semantic Search. Each feature has its own Flow zone that transforms the data to be able to support the analysis type.

We’ll begin first with the Keyword Analysis part of the Webapp. Within the Keyword Analysis - Webapp Flow zone, we take our reviews and extract the ngrams (2 consecutive words in a review) to be used as filters of all our reviews. With the ngrams extracted we can analyze which ngrams are frequently used in all of our product reviews to get a high-level analysis of the keywords most associated with each product or product category. These keywords are explorable in the Webapp and by selecting a keyword, all associated reviews will be displayed.

Dataiku screenshot of the webapp offering keyword analysis and semantic search.

Switching to the Semantic Search - Webapp Flow zone, we take our reviews dataset and apply a Google Tensorflow encoder model to encode 16 languages into vectors so that the model can compute similarities between words. By doing so, we can sum the semantic sense of each word. By embedding meaning into each word of the reviews, we can then use the Semantic Search portion of the Webapp to input any word and retrieve reviews that have been found to contain related words based on semantic similarity.


Please bear in mind that this is a template solution that leverages a pre-trained model. Other models should be used for better results.

Reproducing these Processes With Minimal Effort For Your Own Data#

The intent of this project is to enable marketing and customer success teams to understand how Dataiku can be used to get a full view of customer sentiment towards their products. By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single organization, smarter and more holistic strategies can be designed in order to optimize customer retention, improve products, make smarter inventory decisions, and adapt marketing strategies.

We’ve provided several suggestions on how to use transaction data to analyze your customer reviews but ultimately the “best” approach will depend on your specific needs and your data. If you’re interested in adopting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.