Solution | Product Recommendation#
Overview#
Business case#
Companies that successfully implement personalization drive 40% more revenue than the average company. And this should come as no surprise: indeed, 71% of consumers expect companies to deliver personalization and are more likely to shop with brands that they recognize, that understand them, and provide relevant offers and recommendations that grow lasting relationships.
Recommending the right product to customers is now a must-do to secure market share and build loyalty. This can notably be done by implementing a recommendation engine based on a collaborative filtering approach which aims at answering a simple question: what items will appeal to customers who share similar preferences?
By answering this important question, brands can in turn recommend products that haven’t yet been purchased by a customer. The resulting outcome: product discovery, increased customer engagement, and improved revenue.
With this Solution, companies open an opportunity to optimize their customer engagement activities, starting with online experiences:
A website landing page specifically tailored to logged-in users.
A digital app connecting customers to personalized offers.
Promotional emails personalized based on purchase history.
And much more!
Installation#
From the Design homepage of a Dataiku instance connected to the internet, click + Dataiku Solutions.
Search for and select Product Recommendation.
If needed, change the folder into which the Solution will be installed, and click Install.
Follow the modal to either install the technical prerequisites below or request an admin to do it for you.
From the Design homepage of a Dataiku instance connected to the internet, click + New Project.
Select Dataiku Solutions.
Search for and select Product Recommendation.
Follow the modal to either install the technical prerequisites below or request an admin to do it for you.
Note
Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.
Technical requirements#
To leverage this Solution, you must meet the following requirements:
Have access to a Dataiku 13+* instance.
A Python 3.8 code environment named
solution_product-recommendations
with the following required packages:
Flask==2.0.2
scikit-learn>=1.0,<1.1
Pillow==8.4.0
scikit-image==0.17.2
opencv-python-headless==4.5.5.64
imageio==2.15.0
Werkzeug==2.3.7
The Solution comes with empty filesystem-managed datasets. Still, it needs SQL databases to use the Recommendation System Plugin, as its recipes depend on an SQL engine to run memory-intensive processes.
This Solution is compatible with the following connections:
Snowflake
Google Cloud Platform: BigQuery + GCS (Both are required if you want to leverage BigQuery)
PostgreSQL
Microsoft SQL Server and Azure Synapse are compatible with the plugin but require roll-out services for this Solution
Data requirements#
The Solution comes without any data, but the following data model should be respected when connecting your data.
Data |
Description |
---|---|
iteractions_history |
Mandatory dataset containing records of all the historical interactions between users and items with the following columns:
|
user_metadata |
Optional dataset containing information about our customers with the following columns:
|
item_metadata |
Optional dataset providing information about our products with the following columns:
|
item_pictures |
Optional managed folder containing all our product pictures. All images must be .jpg format |
Workflow overview#

The project has the following high-level steps:
Connect your data as input and select your analysis parameters via the Dataiku application.
Join and prepare our input data.
Apply collaborative filtering on established customers to create negative samples for downstream modeling.
Train a recommendation model with visual ML.
Calculate affinity scores and apply collaborative filtering on growth customers.
Predict product recommendations for growth customers.
Interactively explore the outputs of our analytics Flow and automate reports.
Walkthrough#
Note
In addition to reading this document, it’s recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.
Plug and play with your data and parameter choices#
You will need to create a new Product Recommendation Dataiku application instance. You can do this by selecting the Dataiku application from your instance home and clicking Create App Instance.
Once the new instance has been created, you can walk through the application steps to add your data and select the analysis parameters to run. Users of any skill level can input their desired parameters for analysis and view the results directly in a user-friendly webapp. You could also instantiate multiple Product Recommendation projects to compare your approaches.

The first step of the Dataiku application, Connection settings, is to specify the connection where our data is available to be ingested into the Product Recommendation solution. It’s recommended to use the Dataiku App to reconfigure the Flow to your connection and data since it provides a more straightforward and seamless experience than manually updating the Flow to your desired connection.
In the following two sections (Data specifications and Columns identification) of the Dataiku App, we specify the datasets from our connection that we want to use and which optional datasets we want to include before going on to identify the main columns used by each included dataset. After inputting these details and reconfiguring the Flow, this allows the solution to automatically map our data to the data schema of the solution as simply as possible.
Prepare our data and identify customer types#
Moving along to the Recommendation framing sections of the Dataiku App, we can set the main parameters associated with the data batch used to create our Recommendation System. Here we can:
Define the data batch parameters, including the reference data and the window of historical data to use for feature engineering and machine learning training.
Set the minimum number of interactions a user must have to be included in the recommendation System.
Define our items selection strategy.
Precise the max number of recommendations to propose for each user.

The values that we input in this section and our selections within the Data specifications and Columns identification are applied to our data via the data prep Flow zones. The Flow zone, ages_clustering, is optionally run only if our data contains a user’s age column for age clustering. The resulting prepared data has been filtered to our specifications, cleaned and split so that some data is set aside for computing affinity scores, and the rest is ready for training a machine learning model based on the computed affinity scores.
Leveraging item metadata in product recommendations#
In the Recommendation feature engineering section of the Dataiku App, we can configure how to use the ingested item metadata for feature engineering. Three options are provided to us:
Item characteristics feature engineering will apply collaborative filtering on the user-item-characteristics interactions recorded in our specified batch.
If chosen, optional Flow zones to apply characteristics of collaborative filtering will be run.
If the above is selected, then we can also select the specific columns to use for computing collaborative filtering features.
Model features enrichment will join item characteristics to all other features so that the recommendation model can take them into account.
Item characteristics feature engineering and model features enrichment will use both methods listed above.
Note
Collaborative filtering is an approach to feature engineering that uses information about the number of interactions and user ratings for products to generate affinity scores between users and products. These scores are based on correlations between pairs of items and pairs of users. Using matrix factorization, these affinity scores for a given user/item pair are calculated from the pairwise correlations.

Once we’ve selected how we want to use the item metadata, we can also choose to retrain the ages_clustering_model during the Flow build.
Defining and training our recommendation model#
The final section of the Dataiku App where we need to input parameters is the Product recommendations modeling section. Here we can configure the machine learning model parameters used for training. To begin, we’ll be asked to choose an optimization strategy between the following ones.
Optimization strategy |
Description |
---|---|
Optimize model ‘Precision’ |
Means that you will give preference to having a precise model at the risk of missing positive predictions. |
Optimize model ‘Recall’ |
Means that you will give preference to having an opportunistic model over a precise model. |
Optimize model ‘F1 Score’ |
Is a trade-off between being precise and opportunistic. |
Lastly, we can create model evaluation visualizations (Subpopulation Analysis and PDP) for the dashboard to better understand the model we create. Doing so can prove valuable in model transparency and understanding but will significantly increase computation time.
Once we’re satisfied with our chosen parameters, we can press the Run button and wait for the full Flow to build. When complete, we can directly go to the pre-built dashboard by clicking the provided link in the Dataiku App.
Recommending products to grow customer engagement#
The Product Recommendation solution comes with a prebuilt dashboard containing the following tabs.
Tab |
Description |
---|---|
Recommendations Explorer |
Provides a prebuilt webapp to audit and validates the behavior of your Product Recommendations model in an interactive and visually rich way. It will allow us to select users from our operating system and see their historical purchases side-by-side with their recommended products. This can be viewed in a tabular format, or if the item_images managed folder contains images, in a picture grid way. ![]() |
Recommendation model evaluation |
Provides metrics by which we can evaluate our model. If we choose to evaluate our model via Subpopulation Analysis, the resulting graph will be available in this tab. |
Recommendation model interpretation |
Provides a feature importance graph and, optionally, a partial dependence plot so that we can understand our most important variables and assess the relationship between input features and model predictions. |
User Analysis and Item Analysis |
Provide several charts to explain our users behavior/how items are solicited so that we can iteratively tune the selected parameters of the Dataiku application to create a more accurate Product Recommendation model. |
Lastly, we can get a global overview of the volume of users and items involved in our batch, among all collaborative key milestones, via Sankey Charts provided in the Recommendation pipelines tab.
A short note on automation#
It’s possible to automate the Flow of this Solution based on new data, a specific time, etc. via the Project Setup. You can tune all trigger parameters in the Scenarios menu of the project.
Additionally, you can create reporters to send messages to Teams, Slack, email, etc. to keep your full organization informed. You can also run these scenarios ad-hoc as needed. You can find full details on the scenarios and project automation in the wiki.
Reproducing these processes with minimal effort for your data#
The intent of this project is to enable customer management and marketing teams to understand how they can use Dataiku to generate personalized product recommendations for customers.
By creating a singular Solution that can benefit and influence the decisions of a variety of teams in a single organization, you can design smarter and more holistic strategies to engage with customers, design marketing strategies, and serve as the basis for better product pricing.
This documentation has reviewed provided several suggestions on how to derive value from this Solution. Ultimately however, the “best” approach will depend on your specific needs and data. If you’re interested in adapting this project to the specific goals and needs of your organization, Dataiku offers roll-out and customization services on demand.