Skip to content

Cleaning

As explained earlier, this operation is really a complex and tedious step in any DSML workflow but also essential to elaborate the perfect dataset that will answer your business use case. Hence, papAI propose some cleaning operations to simplify this process while keeping it efficient. This operation allows you to clean and normalise a dataset in a visual and interactive way with a series of transformation steps defined by the user.

To access this feature, you select the dataset from your project's flow and on the left sidebar, select the Cleaning operation and will appear the Cleaning module interface.

cleaning left sidebar

Cleaning operation on the left sidebar

cleaning pop-up

Cleaning operation pop-up

The cleaning interface is composed of two panels :

  • The left panel regroups a catalog of different transformation steps applied to the input dataset that you can add as many as you want and remove them at anytime.

right panel

Left panel with table preview

  • The right of the screen displays a preview of a sample from the input dataset after applying the series of steps. Any action on the list of transformation steps is directly visible.

left panel

Right panel with List of cleaning operations

Info

Each added steps to the list should be reflected on the table preview and on the parameter proposed for next steps.

Here's the list of possible transformation steps :

categories of operations

List of available cleaning operations

These steps includes Formula edition, column renaming or dropping, filling or dropping null values, filter rows and the list goes on...

When selecting a step and rearranging the order of the settings according to your likings, the preview will automatically update to show you the changes applied on the sample and notified when ready through the green mark on the top right of the table preview.

Tip

If you want to disable the preview of one of your cleaning steps, you can do it through the icon on the top right of the desired step by simply clicking it and the icon will change to and the preview will automatically be updated.

preview toggle icon

Preview toggle button disable for a specific step

When the preview satisfies you, you just click on the Create recipe and run it now button and a green gear icon linked to the source and output datasets is displayed on your project's flow.

submit button

Cleaning operation submitting

cleaning step on flow

Cleaning operation appearance on the flow

Tip

If you want to update any settings or changes for any of these different operations, you can just double click on the green gear icon in the flow related to the operation and when the changes are done, you click on the Update and Run green button to save and apply these changes on the output dataset, creating a new one with the new changes.

cleaning update

Cleaning operation update button