Solution | Customer Satisfaction Reviews#

Overview#

Business Case#

Brands face multiple challenges in keeping modern consumers engaged. It is key for brands to listen to what their customers think, feel, and say about their purchasing experience with them. To do so, brands first need to collect feedback from customers. The latter is one of the key pillars of digital/social commerce.

Sentiment analysis unlocks a new layer to the performance insights of a brand. It allows them to understand emotions better and measure customer satisfaction. By leveraging actionable ways to transform customer emotion into brand action, marketers can deliver a better customer experience.

The solution consists of a data pipeline that uses a combination of descriptive statistics and machine learning. Analysts can input their own data and surface the outputs in a dashboard to gauge customer engagement strategies’ previous and future success. Data scientists should use this sample project as an initial building block to develop advanced analytics and support decision-making.

Installation#

The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.

Dataiku Cloud users should follow the instructions for installing solutions on cloud.

  1. The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.

  2. Once the Solution has been added to your space, move ahead to Data Requirements.

Technical Requirements#

To leverage this solution, you must meet the following requirements:

  • Have access to a Dataiku 12.6+ instance.

  • Dataiku’s Text Visualization Plugin.

  • A Python 3.9 code environment named solution-customer_satisfaction_reviews with the following required packages:

MarkupSafe<2.1.0
Jinja2>=2.11,<2.12
cloudpickle==3.0.0
flask>=1.0,<1.1
itsdangerous<2.1.0
lightgbm>=3.2,<3.3
scikit-learn>=1.0,<1.1
scikit-optimize>=0.7,<0.10
scipy==1.13.0
statsmodels==0.12.2
xgboost==0.82
gluonts>=0.8.1,<=0.10.4
pmdarima>=1.2.1,<1.8.5
mxnet==1.8.0.post0
prophet==1.1.1
holidays>=0.14.2,<0.25
transformers==4.39.3
sentencepiece==0.2.0
torch==2.2.2
Werkzeug==2.1.2
  • The code environment also requires an initiation script. Users should put the following script in the tab Resources.

## Base imports
from dataiku.code_env_resources import clear_all_env_vars
from dataiku.code_env_resources import set_env_path
from dataiku.code_env_resources import set_env_var

# Clears all environment variables defined by previously run script
clear_all_env_vars()

## Hugging Face
# Set HuggingFace cache directory
set_env_path("HF_HOME", "huggingface")


# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("lxyuan/distilbert-base-multilingual-cased-sentiments-student")
model = AutoModelForSequenceClassification.from_pretrained("lxyuan/distilbert-base-multilingual-cased-sentiments-student")
  • LLM Mesh connection: This solution uses Dataiku’s LLM Mesh to interact with local or remote models. One connection needs to be configured to run the project.

Data Requirements#

The Dataiku Flow was initially built using publicly available data. When adapting the project to your data and needs, having an input dataset of reviews is mandatory to run the project. Each row of the dataset should be comprised of the following:

Column

Type

Description

product_id

[String]

Unique identifier for a product

review_text

[Text]

Body of the review written by the customer

product_category

[String]

Category of the reviewed product

review_date

[Date]

Date of the customer review

review_score_provided

[Integer]

Score of the product provided by the user on a scale of 5

customer_id

[String]

Unique identifier of the customer

customer_country

[String]

Country of residence of the customer

customer_latitude

[Double]

Latitude of a geopoint centered on the customer’s country

customer_longitude

[Double]

Longitude of a geopoint centered on the customer’s country

Workflow Overview#

You can follow along with the sample project in the Dataiku gallery.

Dataiku screenshot of the final project Flow showing all Flow zones.

The project has the following high-level steps:

  1. Evaluate the review text’s overall sentiment with an NLP model.

  2. Sample review and identify relevant topics using an LLM.

  3. Extract the rating regarding each topic on every review.

  4. Aggregate results for visualization.

Note

This project is meant to be used as a template to guide the development of your analysis in Dataiku. The model’s results should not be used as actionable insights, and the data provided by the project may not represent actual data in a real-life project.

Walkthrough#

Note

In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Sentiment Analysis#

Once the correct input data (see Data Requirements) is uploaded to our project, the reviews are processed in the Sentiment Analysis Flow zone. Using an NLP model, we evaluate the rating of a review based on its text. This can be compared to the provided rating and be used to avoid irregularities between text and rating (i.e. a negative text with a positive rating).

flow zone where the sentiment analysis model is used.

Topic Modeling with an LLM#

To extract topics from the data, we select a sample of the reviews (rebalancing by product) to feed them into the LLM recipe. We can then review the topics in the editable dataset to validate them.

flow zone where we identify the topics with an LLM.

Review analysis with an LLM#

Leveraging the topics, we can ask the LLM to provide a rating for each review. This gives a multimodal perspective on customer satisfaction and helps identify specific strengths or pain points regarding products.

flow zone where reviews are evaluated by the LLM.

When imported to new instances, LLM Mesh recipes default to “No Connection”. However, the prompt, inputs, and examples configuration are still stored and will appear pre-filled for review and further editing when an LLM connection is selected.

Processing results for Dashboards#

Reviews with the added information from the LLMs are processed in those Flow zones to be displayed in the dashboards.

main flow zone to prepare the visualization datasets.

Both Flow zones use Python recipes to pivot and aggregate the datasets to identify the best-performing products and keywords for each topic.

flow zone for visual results with a keyword perspective.

Dashboard#

We can follow the overall rating from the sentiment analysis and the sentiments associated with the identified topics. We can monitor how satisfaction evolves through time with the Sentiment Trend and look for an increase in dissatisfaction.

Aggregated sentiment from the customers.

These dashboards also allow a deeper exploration of the topics. We can see how often topics are mentioned in reviews and the associated sentiment with Topic & Occurrences.

Overall satisfaction regarding topics.

We can explore which products are best/worst rated across the different issues and identify the common keywords associated with the positive and negative sentiment in reviews. This will help highlight clear issues with specific products. With this information, you can identify which product to focus on and how to improve customer satisfaction.

Table of products and keywords rated by topics

Working with LLMs#

Working with LLMs is an opportunity but requires specific attention. Please be mindful about cost, privacy, and regulatory concerns. A small sample of data should also be used for testing to avoid unpredicted behavior and limit the cost of iteration. Prompts might need to be modified regarding your data or the model used. Lastly, a human-in-the-loop process is recommended before taking any actions based on results that rely on LLMs (directly or indirectly).

Reproducing these Processes With Minimal Effort For Your Data#

This project intends to enable marketing and customer success teams to understand how Dataiku can be used to fully view customer sentiment toward their products. By creating a singular solution that can benefit and influence the decisions of various teams in a single organization, smarter and more holistic strategies can be designed to optimize customer retention, improve products, make smarter inventory decisions, and adapt marketing strategies.

We’ve provided several suggestions on how to use transaction data to analyze your customer reviews, but ultimately, the “best” approach will depend on your specific needs and data. If you want to adopt this project to your organization’s specific goals and needs, roll-out and customization services can be offered on demand.