Cleaning raw data is really important especially when having different data types in the same table and handling it through simple operations or specific operations such as for Arrays or Maps type all included in the Cleaning module.
However, in a case of having datetime type columns, these cleaning steps can be restrictive to obtain the right dataset ready-to-use for Time-Series Forecasting. Hence, we offer on papAI the possibility of applying specific signal processing operations on datatime columns to optimize your Time-Series dataset and ensure having a well trained model.
These signal processing operation including resampling, smoothing, windowing, generating lag and the list goes on. To have a clear idea of all the available operations, here's a list of these operations :
Similar to the cleaning steps, on the left panel, you apply the settings on each step and the preview on the right panel is automatically updated and when you are satisfied by the displayed result, you simply click on the Submit and Run button and a red Time-Series Cleaning gear icon with the newly created dataset will appear on your project's flow
In addition to the preview table, a graph of raw data throughout a set of datetime to give you a glimpse of the time-series that will be used for the model training.
When you enter to the AutoML module, you can encounter the Times-Series sanity test to check the frequency of the selected datatime column. If the frequency is not regular then a resampling of the time-series will be needed, hence landing on the same Time-Series Cleaning interface to apply your resampling and get a more adequate dataset for your Time-Series Forecasting case.