Solution | Product Recommendation#


Business Case#

Companies that successfully implement personalization drive 40% more revenue than the average company. And this should come as no surprise: indeed, 71% of consumers expect companies to deliver personalization and are more likely to shop with brands that they recognize, that understand them, and provide relevant offers and recommendations that grow lasting relationships.

Recommending the right product to customers is now a must-do to secure market share and build loyalty. This can notably be done by implementing a recommendation engine based on a collaborative filtering approach which aims at answering a simple question: what items will appeal to customers who share similar preferences?

By answering this important question, brands can in turn recommend products that have not yet been purchased by a customer. The resulting outcome: product discovery, increased customer engagement, and improved revenue.

With this solution, companies open an opportunity to optimize their customer engagement activities, starting with online experiences: offer a website landing page specifically tailored to logged-in users; a digital app connecting customers to personalized offers; promotional emails personalized based on purchase history; and much more…


The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.

Dataiku Cloud users should follow the instructions for installing solutions on cloud.

  1. The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.

  2. Once the Solution has been added to your space, move ahead to Data Requirements.

After meeting the technical requirements below, self-managed users can install the Solution in one of two ways:

  1. On your Dataiku instance connected to the internet, click + New Project > Dataiku Solutions > Search for Product Recommendation.

  2. Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.

Additional note for 12.1+ users

If using a Dataiku 12.1+ instance, and you are missing the technical requirements for this Solution, the popup below will appear to allow admin users to easily install the requirements, or for non-admin users to request installation of code environments and/or plugins on their instance for the Solution.

Admins can processes these requests in the admin request center, after which non-admin users can re-trigger the successful installation of the Solution.

Screenshot of the Request Install of Requirements menu available for Solutions.

Technical Requirements#

To leverage this solution, you must meet the following requirements:


The solution is delivered with empty filesystem-managed datasets. Still, it needs SQL databases to use the Recommendation System Plugin, as its recipes depend on a SQL engine to run memory-intensive processes.

This solution is compatible with the following connections:

  • Snowflake

  • Google Cloud Platform: BigQuery + GCS (Both are required if you want to leverage BigQuery)

  • PostgreSQL

  • Microsoft SQL Server and Azure Synapse are compatible with the plugin but require roll-out services for this solution

Data Requirements#

The solution comes without any data, but the following data model should be respected when connecting your data.




Mandatory dataset containing records of all the historical interactions between users and items with the following columns:

  • User ID (mandatory)

  • Item ID (mandatory)

  • Date (mandatory)

  • Revenue Identifier (optional)


Optional dataset containing information about our customers with the following columns:

  • Customer ID (mandatory)

  • Customer Age (optional)

  • Additional columns describing our customers (optional)


Optional dataset providing information about our products with the following columns:

  • Item ID (mandatory)

  • Categorial Product Description Columns (optional)

  • Datatype Production Description Columns (optional)

  • Name of the .jpg picture file of the Production (optional)


Optional managed folder containing all our product pictures. All images must be .jpg format

Workflow Overview#

Dataiku screenshot of the final project Flow showing all Flow zones.

The project has the following high-level steps:

  1. Connect your data as input and select your analysis parameters via the Dataiku Application.

  2. Join and prepare our input data.

  3. Apply collaborative filtering on established customers to create negative samples for downstream modeling.

  4. Train a recommendation model with VisualML.

  5. Calculate affinity scores and apply collaborative filtering on growth customers.

  6. Predict product recommendations for growth customers.

  7. Interactively explore the outputs of our analytics Flow and automate reports.



In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Plug and Play with Your Data and Parameter Choices#

You will need to create a new Product Recommendation Dataiku Application instance. This can be done by selecting the Dataiku Application from your instance home and clicking Create App Instance.

Once the new instance has been created, you can walk through the application steps to add your data and select the analysis parameters to run. Users of any skill level can input their desired parameters for analysis and view the results directly in a user-friendly webapp. You could also instantiate multiple Product Recommendation projects to compare your approaches.

Dataiku screenshot of part of the Dataiku Application for Product Recommendation

The first step of the Dataiku Application, Connection settings, is to specify the connection where our data is available to be ingested into the Product Recommendation solution. It is recommended to use the Dataiku App to reconfigure the Flow to your connection and data since it provides a more straightforward and seamless experience than manually updating the Flow to your desired connection.

In the following two sections (Data specifications and Columns identification) of the Dataiku App, we specify the datasets from our connection that we want to use and which optional datasets we want to include before going on to identify the main columns used by each included dataset. After inputting these details and reconfiguring the Flow, this allows the solution to automatically map our data to the data schema of the solution as simply as possible.

Prepare our Data and Identify Customer Types#

Moving along to the Recommendation framing sections of the Dataiku App, we can set the main parameters associated with the data batch used to create our Recommendation System. Here we can:

  • Define the data batch parameters, including the reference data and the window of historical data to use for feature engineering and machine learning training.

  • Set the minimum number of interactions a user must have to be included in the recommendation System.

  • Define our items selection strategy.

  • Precise the max number of recommendations to propose for each user.

Dataiku screenshot of the Recommendation framing section of the Product Recommendation App

The values that we input in this section and our selections within the Data specifications and Columns identification are applied to our data via the data prep Flow zones. The Flow zone, ages_clustering, is optionally run only if our data contains a user’s age column for age clustering. The resulting prepared data has been filtered to our specifications, cleaned and split so that some data is set aside for computing affinity scores, and the rest is ready for training a machine learning model based on the computed affinity scores.

Leveraging Item Metadata in Product Recommendations#

In the Recommendation feature engineering section of the Dataiku App, we can configure how to use the ingested item metadata for feature engineering. Three options are provided to us:

  • Item characteristics feature engineering will apply collaborative filtering on the user-item-characteristics interactions recorded in our specified batch.

    • If chosen, optional Flow zones to apply characteristics of collaborative filtering will be run.

    • If the above is selected, then we can also select the specific columns to use for computing collaborative filtering features.

  • Model features enrichment will join item characteristics to all other features so that the recommendation model can take them into account.

  • Item characteristics feature engineering and model features enrichment will use both methods listed above.


Collaborative filtering is an approach to feature engineering that uses information about the number of interactions and user ratings for products to generate affinity scores between users and products. These scores are based on correlations between pairs of items and pairs of users. Using matrix factorization, these affinity scores for a given user/item pair are calculated from the pairwise correlations.

Dataiku screenshot the feature engineering parameters for Product Recommendation

Once we’ve selected how we want to use the item metadata, we can also choose to retrain the ages_clustering_model during the Flow build.

Defining and Training our Recommendation Model#

The final section of the Dataiku App where we need to input parameters is the Product recommendations modeling section. Here we can configure the machine learning model parameters used for training. To begin, we’ll be asked to choose an optimization strategy between the following ones.

Optimization strategy


Optimize model ‘Precision’

Means that you will give preference to having a precise model at the risk of missing positive predictions.

Optimize model ‘Recall’

Means that you will give preference to having an opportunistic model over a precise model.

Optimize model ‘F1 Score’

Is a trade-off between being precise and opportunistic.

Lastly, we can create model evaluation visualizations (Subpopulation Analysis and PDP) for the dashboard in order to better understand the model we create. Doing so can prove valuable in model transparency and understanding but will significantly increase computation time.

Once we’re satisfied with our chosen parameters, we can press the Run button and wait for the full Flow to build. When complete, we can directly go to the pre-built dashboard by clicking the provided link in the Dataiku App.

Recommending Products to Grow Customer Engagement#

The Product Recommendation solution comes with a prebuilt dashboard containing the following tabs.



Recommendations Explorer

Provides a prebuilt webapp to audit and validates the behavior of your Product Recommendations model in an interactive and visually rich way. It will allow us to select users from our operating system and see their historical purchases side-by-side with their recommended products. This can be viewed in a tabular format, or if the item_images managed folder contains images, in a picture grid way.

Dataiku screenshot showing the interactive webapp for forecast exploration.

Recommendation model evaluation

Provides metrics by which we can evaluate our model. If we choose to evaluate our model via Subpopulation Analysis, the resulting graph will be available in this tab.

Recommendation model interpretation

Provides a feature importance graph and, optionally, a partial dependence plot so that we can understand our most important variables and assess the relationship between input features and model predictions.

User Analysis and Item Analysis

Provide several charts to explain our users behavior/how items are solicited so that we can iteratively tune the selected parameters of the Dataiku Application to create a more accurate Product Recommendation model.

Lastly, we can get a global overview of the volume of users and items involved in our batch, among all collaborative key milestones, via Sankey Charts provided in the Recommendation pipelines tab.

Dataiku screenshot showing Sankey charts that are available in the dashboard for understanding the full recommendation pipeline.

A Short Note on Automation#

It is possible to automate the Flow of this solution to be triggered based on new data, a specific time, etc, via the Dataiku Application. All of these trigger parameters can be tuned in the Scenarios menu of the project. Additionally, reporters can be created to send messages to Teams, Slack, email, etc., to keep our full organization informed. These scenarios can also be run ad-hoc as needed. Full detail on the scenarios and project automation can be found in the wiki.

Reproducing these Processes With Minimal Effort For Your Data#

The intent of this project is to enable customer management and marketing teams to understand how Dataiku can be used to generate personalized product recommendations for customers. By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single organization, smarter and more holistic strategies can be designed to engage with customers, design marketing strategies, and serve as the basis for better product pricing.

We’ve provided several suggestions on how to use transaction data to recommend products but ultimately, the “best” approach will depend on your specific needs and your data. If you’re interested in adopting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.