As explained earlier, this operation is really a complex and tedious step in any DSML workflow but also essential to elaborate the perfect dataset that will answer your business use case. Hence, papAI propose some cleaning operations to simplify this process while keeping it efficient. This operation allows you to clean and normalise a dataset in a visual and interactive way with a series of transformation steps defined by the user.
To access this feature, you select the dataset from your project's flow and and on the left sidebar, select the Cleaning operation and will appear the Cleaning module interface.
The cleaning interface is composed of two panels :
- The right panel regroups a catalog of different transformation steps applied to the input dataset that you can add as many as you want and remove them at anytime.
- The left panel of the screen displays a preview of a sample from the input dataset after applying the series of steps. Any action on the list of transformation steps is directly visible.
Each added steps to the list should be reflected on the table preview and on the parameter proposed for next steps.
Here's the list of possible transformation steps :
These steps includes Formula edition, column renaming or dropping, filling or dropping null values, filter rows and the list goes on...
When selecting a step and rearranging the order of the settings according to your likings, the preview will automatically update to show you the changes applied on the sample and notified when ready through the green mark on the top right of the table preview.
If you want to disable the preview of one of your cleaning steps, you can do it through the icon on the top right of the desired step by simply clicking it and the icon will change to and the preview will automatically be updated.
When the preview satisfies you, you just click on the Save and Run button and a green gear icon linked to the source and output datasets is displayed on your project's flow.
If you want to update any settings or changes for any of these different operations, you can just double click on the green gear icon in the flow related to the operation and when the changes are done, you click on the Update and Run green button to save and apply these changes on the output dataset, creating a new one with the new changes.