Skip to content

Time-Series Cleaning

Cleaning raw data is really important especially when having different data types in the same table and handling it through simple operations or specific operations such as for Arrays or Maps type all included in the Cleaning module.

However, in a case of having datetime type columns, these cleaning steps can be restrictive to obtain the right dataset ready-to-use for Time-Series Forecasting. Hence, we offer on papAI the possibility of applying specific signal processing operations on datatime columns to optimize your Time-Series dataset and ensure having a well trained model.

cleaning step

Time-Series Cleaning in the flow

ts cleaning steps

Time-Series cleaning interface

These signal processing operation including resampling, smoothing, windowing, generating lag and the list goes on. To have a clear idea of all the available operations, here's a list of these operations :

ts cleaning list

List of available Time-Series cleaning operations

Similar to the cleaning steps, on the left panel, you apply the settings on each step and the preview on the right panel is automatically updated and when you are satisfied by the displayed result, you simply click on the Submit and Run button and a red Time-Series Cleaning gear icon with the newly created dataset will appear on your project's flow

submit operation

Time-Series cleaning operation submitting button

Cleaning gear icon
Time-Series Cleaning operation gear and output dataset on the flow

Info

In addition to the preview table, a graph of raw data throughout a set of datetime to give you a glimpse of the time-series that will be used for the model training.

ts preview

Time-Series cleaning graph preview

Note

When you enter to the AutoML module, you can encounter the Times-Series sanity test to check the frequency of the selected datatime column. If the frequency is not regular then a resampling of the time-series will be needed, hence landing on the same Time-Series Cleaning interface to apply your resampling and get a more adequate dataset for your Time-Series Forecasting case.