Reuse Functions from Code Libraries¶
As data scientists and coders, you will be familiar with the notion of code libraries for storing code that can be reused in different parts of your project. This feature is available in Dataiku DSS in the form of Project libraries.
This section will explore how coders can create and import existing code libraries for reuse in code-based objects in Dataiku DSS.
Create a Project Library¶
As we saw in the last section, the Python recipe contains a create_target
function that computes a target column by comparing the revenue values to a cut-off value.
Let’s create a similar function in a project library so that the function is available to be reused in code-based objects in Dataiku DSS.
Note
A Project Library is the place to store code that you plan to reuse in code-based objects (e.g., code recipes and notebooks) in your project. You can define objects, functions, etc., in a project library.
Project libraries should be used for code that is project-specific. However, libraries also leverage shared GitHub repositories, allowing you to retrieve your classes and functions.
You can import libraries from other Dataiku DSS projects to use in your project. See the product documentation to learn about reusing Python code and reusing R code.
To access the project library,
Go to the “code” icon in the top navigation bar and click Libraries from the dropdown menu.
In the project library,
Click the dropdown arrow next to the “python” folder to see an existing Python module
myfunctions.py
containing a functionbin_values
.
You can create additional Python or R modules in the library. For example, if you’d like to add another Python module:
Click the +Add button and select Create file.
Provide a file name that ends in the
.py
extension and click Create.Right-click the new file and select Move.
Select the folder location for the file and click Move.
Type your code into the editor window and click Save All.
Note
For code that has been developed outside of Dataiku DSS and is available in a Git repository, see the Cloning a Library from a Remote Git Repository article to learn how to import into a Dataiku Project library.
Use the Module From the Project Library¶
Now we’ll go back to the existing Python recipe, where we’ll use the bin_values
function from the myfunctions_py
module.
Click the Flow icon to return to the Flow.
Double click the Python recipe to open it.
Click Edit in Notebook and make the following modifications:
Delete the cell where the
create_target
function is defined.Uncomment the line
from myfunctions import bin_values
to import the module and function from your project library.In the next cell, apply the
bin_values
function to the revenue column.
Click Save Back to Recipe.
Run the recipe.
After the job completes, you can open the customers_revenue dataset to see that the high_value column contains the values that were previously there.
Return to the Flow.