Skip to content

Extensions

With the papAI tool, users have access to a wide range of options for data preprocessing and analysis. These options allow for common operations to be performed quickly and efficiently.

However, for users who require a more tailored solution, papAI offers a powerful tool called Extensions.
This tool enables users to create custom tools from scratch that can be applied to any dataset or bucket of their choosing. With Extensions, users can tweak the different parameters necessary to run their custom tool correctly, ensuring that it meets their specific needs. Whether exporting data into a single file or multiple ones, Extensions provides a highly customizable solution for data analysis, preprocessing or data viz...
By exploring the full capabilities of this tool, users can unlock new targeted insights and gain a deeper understanding of their data.

Access to the extension list

If you're looking to expand your data analysis capabilities, papAI's Extensions feature is an excellent place to start. To access the list of existing extensions, simply click on the extensions tab in the left-hand side menu.

extension

Extension tab

Here, you'll find a vast catalog of pre-existing extensions that are ready to use on any dataset within your projects. Extensions can be identified by their name, author and tags.

Info

If you have granted access, you can also modify the content of the extension to your likings and you simply update it when satisfied.

Create your own extension

papAI offers a vast selection of extensions to choose from, but sometimes users require a more personalized solution. PapAI allows users to create their own extensions with custom python code, dedicated virtual environnement, and form builder to tailor the extension to their specific needs.

To access the creation interface, users can simply click on the New Extension button located on the top right corner of the screen.

extension

New extension interface with Python recipe editor

When you create a new extension, you have access to three tabs:

extension

the three tabs and the action buttons when a folder is selected

  • The source code tab provides a file explorer on the left side, where you can create new files by clicking on a folder, then on the Create python file button.
  • The form builder where you can customize inputs, outputs, and parameter fields. Note that you must have at least one input and one output (use the Text input element for both of them). You can drag-and-drop components from the list to add parameters. They will be shown when you click on your extension in your flow.
  • The meta data tab where you can edit the name of your extension, the tags associated with it, its description and its icon.

extension

component drag-and-drop

Tip

For the python recipe, there is multiple instructions to import/export correctly either data or parameter needed for the code to run as you like.

Info

The python recipe include also the basic and useful libraries commonly used such as Pandas or NumPy but you can also use a custom Python virtual environnement using a specific list of libraries that you need for coding.

Finally, users can click on the Create button to save their unique new extension.

Developing the extension

You have access to a few variables that contains your inputs / outputs and parameters:

  • parquet_inputs: list[dict] contains the list of all your datasets defined as inputs in the form builder.
  • parquet_outputs: list[dict] contains the list of all your datasets defined as outputs in the form builder.
  • bucket_inputs: list[dict] contains the list of all your buckets defined as inputs in the form builder.
  • bucket_outputs: list[dict] contains the list of all your buckets defined as outputs in your form builder.

For instance, you can get the name of the ith input parquet in your form builder with parquet_inputs[i]["step_name"].

The platform provides a few function to read and write files to bucket (see more on buckets), and a few variables that contains the parameter values and input datasets.

def read_from_bucket_to_file(bucket_name: str, object_name: str) -> str:
    """
    write the content of a bucket inside a file.

    Parameters
    ----------
    bucket_name : str
        name of the bucket containing the data you want to get.
    object_name : str
        filepath to the object you want to read.

    Returns
    -------
    filepath : str
        return the filepath to the file extracted from the bucket.
    """
def write_file_in_bucket(bucket_name: str, object_name: str, *, data: io.BytesIO = None, file_path: str | bytes | os.PathLike = None) -> None:
    """
    write the content of a bucket inside a file.

    Parameters
    ----------
    bucket_name : str
        name of the bucket containing the data you want to get.
    object_name : str
        filepath to the object you want to read.
    data : io.BytesIO, optional
        data to write to the file in the bucket. Do not use with `file_path`.
    file_path : str | bytes | os.PathLike, optional
        the path to your file in the virtual environment. Do not use with `data`.
    """
def import_dataset(dataset_name: str) -> pandas.DataFrame:
    """
    query a dataset from your project with its name

    Parameters
    ----------
    dataset_name : str
        name of the dataset you want to get.

    Returns
    -------
    dataset : pandas.DataFrame
        return the filepath to the file extracted from the bucket.
    """
def export_dataset(dataset: pandas.DataFrame, dataset_name: str) -> None:
    """
    query a dataset from your project with its name

    Parameters
    ----------
    dataset : pandas.DataFrame
        dataset to output
    dataset_name : str
        name of the dataset to write to. You can get the name from
        parquet_outputs[i]["step_name"] (replace i with the index of your
        dataset in the form builder output section).
    """

Apply an extension to a dataset/bucket

To apply the Extensions operation, simply select a dataset or bucket from your project and navigate to the left sidebar. From there, select the Extensions operation and choose the extension that you want to use. A new interface will appear, prompting you to input the necessary parameters and specify the dataset (or bucket) input and output.

extension

Extension icon on papAI project

Once all the required fields have been filled, simply click the Submit and Run button to initiate the recipe. The extension will then be applied to your data, and the output will be displayed on your project's flow.

extension

Example of required fields to fill and run the an extension

Info

You can still look into the extension recipe and configuration if you need to apply any modification by clicking the extension icon, next to the Submit button and you will land on the same page of extension creation. When the modification is done, you submit it by clicking the Update button.

extension

Extension source code icon

Here is a demo of the Extension module on papAI