With the papAI tool, users have access to a wide range of options for data preprocessing and analysis. These options allow for common operations to be performed quickly and efficiently.
However, for users who require a more tailored solution, papAI offers a powerful tool called Extensions.
This tool enables users to create custom tools from scratch that can be applied to any dataset or bucket of their choosing. With Extensions, users can tweak the different parameters necessary to run their custom tool correctly, ensuring that it meets their specific needs. Whether exporting data into a single file or multiple ones, Extensions provides a highly customizable solution for data analysis, preprocessing or data viz...
By exploring the full capabilities of this tool, users can unlock new targeted insights and gain a deeper understanding of their data.
Access to the extension list¶
If you're looking to expand your data analysis capabilities, papAI's Extensions feature is an excellent place to start. To access the list of existing extensions, simply click on the extensions tab in the left-hand side menu.
Here, you'll find a vast catalog of pre-existing extensions that are ready to use on any dataset within your projects. Extensions can be identified by their name, author and tags.
If you have granted access, you can also modify the content of the extension to your likings and you simply update it when satisfied.
Create your own extension¶
papAI offers a vast selection of extensions to choose from, but sometimes users require a more personalized solution. PapAI allows users to create their own extensions with custom python code, dedicated virtual environnement, and form builder to tailor the extension to their specific needs.
To access the creation interface, users can simply click on the New Extension button located on the top right corner of the screen.
When you create a new extension, you have access to three tabs:
- The source code tab provides a file explorer on the left side, where you can create new files by clicking on a folder, then on the
Create python filebutton.
- The form builder where you can customize inputs, outputs, and parameter fields. Note that you must have at least one input and one output (use the
Text inputelement for both of them). You can drag-and-drop components from the list to add parameters. They will be shown when you click on your extension in your flow.
- The meta data tab where you can edit the name of your extension, the tags associated with it, its description and its icon.
For the python recipe, there is multiple instructions to import/export correctly either data or parameter needed for the code to run as you like.
The python recipe include also the basic and useful libraries commonly used such as Pandas or NumPy but you can also use a custom Python virtual environnement using a specific list of libraries that you need for coding.
Finally, users can click on the Create button to save their unique new extension.
Developing the extension¶
You have access to a few variables that contains your inputs / outputs and parameters:
parquet_inputs: list[dict]contains the list of all your datasets defined as inputs in the form builder.
parquet_outputs: list[dict]contains the list of all your datasets defined as outputs in the form builder.
bucket_inputs: list[dict]contains the list of all your buckets defined as inputs in the form builder.
bucket_outputs: list[dict]contains the list of all your buckets defined as outputs in your form builder.
For instance, you can get the name of the ith input parquet in your form builder with
The platform provides a few function to read and write files to bucket (see more on buckets), and a few variables that contains the parameter values and input datasets.
def read_from_bucket_to_file(bucket_name: str, object_name: str) -> str: """ write the content of a bucket inside a file. Parameters ---------- bucket_name : str name of the bucket containing the data you want to get. object_name : str filepath to the object you want to read. Returns ------- filepath : str return the filepath to the file extracted from the bucket. """
def write_file_in_bucket(bucket_name: str, object_name: str, *, data: io.BytesIO = None, file_path: str | bytes | os.PathLike = None) -> None: """ write the content of a bucket inside a file. Parameters ---------- bucket_name : str name of the bucket containing the data you want to get. object_name : str filepath to the object you want to read. data : io.BytesIO, optional data to write to the file in the bucket. Do not use with `file_path`. file_path : str | bytes | os.PathLike, optional the path to your file in the virtual environment. Do not use with `data`. """
def import_dataset(dataset_name: str) -> pandas.DataFrame: """ query a dataset from your project with its name Parameters ---------- dataset_name : str name of the dataset you want to get. Returns ------- dataset : pandas.DataFrame return the filepath to the file extracted from the bucket. """
def export_dataset(dataset: pandas.DataFrame, dataset_name: str) -> None: """ query a dataset from your project with its name Parameters ---------- dataset : pandas.DataFrame dataset to output dataset_name : str name of the dataset to write to. You can get the name from parquet_outputs[i]["step_name"] (replace i with the index of your dataset in the form builder output section). """
Apply an extension to a dataset/bucket¶
To apply the Extensions operation, simply select a dataset or bucket from your project and navigate to the left sidebar. From there, select the Extensions operation and choose the extension that you want to use. A new interface will appear, prompting you to input the necessary parameters and specify the dataset (or bucket) input and output.
Once all the required fields have been filled, simply click the Submit and Run button to initiate the recipe. The extension will then be applied to your data, and the output will be displayed on your project's flow.
You can still look into the extension recipe and configuration if you need to apply any modification by clicking the extension icon, next to the Submit button and you will land on the same page of extension creation. When the modification is done, you submit it by clicking the Update button.
Here is a demo of the Extension module on papAI