Solution | Customer Satisfaction Reviews#
Overview#
Business Case#
Brands face multiple challenges in keeping modern consumers engaged. It is key for brands to listen to what their customers think, feel, and say about their purchasing experience with them. To do so, brands first need to collect feedback from customers. The latter is one of the key pillars of digital/social commerce.
Sentiment analysis unlocks a new layer to the performance insights of a brand. It allows them to understand emotions better and measure customer satisfaction. By leveraging actionable ways to transform customer emotion into brand action, marketers can deliver a better customer experience.
The solution consists of a data pipeline that uses a combination of descriptive statistics and machine learning. Analysts can input their own data and surface the outputs in a dashboard to gauge customer engagement strategies’ previous and future success. Data scientists should use this sample project as an initial building block to develop advanced analytics and support decision-making.
Installation#
The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.
Dataiku Cloud users should follow the instructions for installing solutions on cloud.
The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.
Once the Solution has been added to your space, move ahead to Data Requirements.
After meeting the technical requirements below, self-managed users can install the Solution with the following instructions:
From the Design homepage of a Dataiku instance connected to the internet, click + Dataiku Solutions.
Search for and select Customer Satisfaction Reviews.
Click Install, changing the project folder into which the solution will be installed if needed.
From the Design homepage of a Dataiku instance connected to the internet, click + New Project.
Select Dataiku Solutions.
Search for and select Customer Satisfaction Reviews.
Note
Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.
Technical Requirements#
To leverage this solution, you must meet the following requirements:
Have access to a Dataiku 12.6+ instance.
Dataiku’s Text Visualization Plugin.
A Python 3.9 code environment named
solution-customer_satisfaction_reviews
with the following required packages:
MarkupSafe<2.1.0
Jinja2>=2.11,<2.12
cloudpickle==3.0.0
flask>=1.0,<1.1
itsdangerous<2.1.0
lightgbm>=3.2,<3.3
scikit-learn>=1.0,<1.1
scikit-optimize>=0.7,<0.10
scipy==1.13.0
statsmodels==0.12.2
xgboost==0.82
gluonts>=0.8.1,<=0.10.4
pmdarima>=1.2.1,<1.8.5
mxnet==1.8.0.post0
prophet==1.1.1
holidays>=0.14.2,<0.25
transformers==4.39.3
sentencepiece==0.2.0
torch==2.2.2
Werkzeug==2.1.2
The code environment also requires an initiation script. Users should put the following script in the tab Resources.
## Base imports
from dataiku.code_env_resources import clear_all_env_vars
from dataiku.code_env_resources import set_env_path
from dataiku.code_env_resources import set_env_var
# Clears all environment variables defined by previously run script
clear_all_env_vars()
## Hugging Face
# Set HuggingFace cache directory
set_env_path("HF_HOME", "huggingface")
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("lxyuan/distilbert-base-multilingual-cased-sentiments-student")
model = AutoModelForSequenceClassification.from_pretrained("lxyuan/distilbert-base-multilingual-cased-sentiments-student")
LLM Mesh connection: This solution uses Dataiku’s LLM Mesh to interact with local or remote models. One connection needs to be configured to run the project.
Data Requirements#
The Dataiku Flow was initially built using publicly available data. When adapting the project to your data and needs, having an input dataset of reviews is mandatory to run the project. Each row of the dataset should be comprised of the following:
Column |
Type |
Description |
---|---|---|
product_id |
[String] |
Unique identifier for a product |
review_text |
[Text] |
Body of the review written by the customer |
product_category |
[String] |
Category of the reviewed product |
review_date |
[Date] |
Date of the customer review |
review_score_provided |
[Integer] |
Score of the product provided by the user on a scale of 5 |
customer_id |
[String] |
Unique identifier of the customer |
customer_country |
[String] |
Country of residence of the customer |
customer_latitude |
[Double] |
Latitude of a geopoint centered on the customer’s country |
customer_longitude |
[Double] |
Longitude of a geopoint centered on the customer’s country |
Workflow Overview#
You can follow along with the sample project in the Dataiku gallery.
The project has the following high-level steps:
Evaluate the review text’s overall sentiment with an NLP model.
Sample review and identify relevant topics using an LLM.
Extract the rating regarding each topic on every review.
Aggregate results for visualization.
Note
This project is meant to be used as a template to guide the development of your analysis in Dataiku. The model’s results should not be used as actionable insights, and the data provided by the project may not represent actual data in a real-life project.
Walkthrough#
Note
In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.
Sentiment Analysis#
Once the correct input data (see Data Requirements) is uploaded to our project, the reviews are processed in the Sentiment Analysis Flow zone. Using an NLP model, we evaluate the rating of a review based on its text. This can be compared to the provided rating and be used to avoid irregularities between text and rating (i.e. a negative text with a positive rating).
Topic Modeling with an LLM#
To extract topics from the data, we select a sample of the reviews (rebalancing by product) to feed them into the LLM recipe. We can then review the topics in the editable dataset to validate them.
Review analysis with an LLM#
Leveraging the topics, we can ask the LLM to provide a rating for each review. This gives a multimodal perspective on customer satisfaction and helps identify specific strengths or pain points regarding products.
When imported to new instances, LLM Mesh recipes default to “No Connection”. However, the prompt, inputs, and examples configuration are still stored and will appear pre-filled for review and further editing when an LLM connection is selected.
Processing results for Dashboards#
Reviews with the added information from the LLMs are processed in those Flow zones to be displayed in the dashboards.
Both Flow zones use Python recipes to pivot and aggregate the datasets to identify the best-performing products and keywords for each topic.
Dashboard#
We can follow the overall rating from the sentiment analysis and the sentiments associated with the identified topics. We can monitor how satisfaction evolves through time with the Sentiment Trend and look for an increase in dissatisfaction.
These dashboards also allow a deeper exploration of the topics. We can see how often topics are mentioned in reviews and the associated sentiment with Topic & Occurrences.
We can explore which products are best/worst rated across the different issues and identify the common keywords associated with the positive and negative sentiment in reviews. This will help highlight clear issues with specific products. With this information, you can identify which product to focus on and how to improve customer satisfaction.
Working with LLMs#
Working with LLMs is an opportunity but requires specific attention. Please be mindful about cost, privacy, and regulatory concerns. A small sample of data should also be used for testing to avoid unpredicted behavior and limit the cost of iteration. Prompts might need to be modified regarding your data or the model used. Lastly, a human-in-the-loop process is recommended before taking any actions based on results that rely on LLMs (directly or indirectly).
Reproducing these Processes With Minimal Effort For Your Data#
This project intends to enable marketing and customer success teams to understand how Dataiku can be used to fully view customer sentiment toward their products. By creating a singular solution that can benefit and influence the decisions of various teams in a single organization, smarter and more holistic strategies can be designed to optimize customer retention, improve products, make smarter inventory decisions, and adapt marketing strategies.
We’ve provided several suggestions on how to use transaction data to analyze your customer reviews, but ultimately, the “best” approach will depend on your specific needs and data. If you want to adopt this project to your organization’s specific goals and needs, roll-out and customization services can be offered on demand.