Skip to content

pandas

Ibis's pandas backend is available in core Ibis.

ibis.memtable Support memtable

The pandas backend supports memtables by natively executing queries against the underlying storage (e.g., pyarrow Tables or pandas DataFrames).

Install

Install ibis and dependencies for the pandas backend:

pip install 'ibis-framework'
conda install -c conda-forge ibis-framework
mamba install -c conda-forge ibis-framework

Connect

API

Create a client by passing in a dictionary of paths to ibis.pandas.connect.

See ibis.backends.pandas.Backend.do_connect for connection parameter information.

ibis.pandas.connect is a thin wrapper around ibis.backends.pandas.Backend.do_connect.

Connection Parameters

do_connect(dictionary=None)

Construct a client from a dictionary of pandas DataFrames.

Parameters:

Name Type Description Default
dictionary MutableMapping[str, pd.DataFrame] | None

An optional mapping of string table names to pandas DataFrames.

None

Examples:

>>> import ibis
>>> ibis.pandas.connect({"t": pd.DataFrame({"a": [1, 2, 3]})})
<ibis.backends.pandas.Backend at 0x...>

Backend API

Backend

Bases: BasePandasBackend

Attributes

db_identity: str cached property

Return the identity of the database.

Multiple connections to the same database will return the same value for db_identity.

The default implementation assumes connection parameters uniquely specify the database.

Returns:

Type Description
Hashable

Database identity

tables cached property

An accessor for tables in the database.

Tables may be accessed by name using either index or attribute access:

Examples:

>>> con = ibis.sqlite.connect("example.db")
>>> people = con.tables['people']  # access via index
>>> people = con.tables.people  # access via attribute

Functions

add_operation(operation)

Add a translation function to the backend for a specific operation.

Operations are defined in ibis.expr.operations, and a translation function receives the translator object and an expression as parameters, and returns a value depending on the backend.

connect(*args, **kwargs)

Connect to the database.

Parameters:

Name Type Description Default
*args

Mandatory connection parameters, see the docstring of do_connect for details.

()
**kwargs

Extra connection parameters, see the docstring of do_connect for details.

{}
Notes

This creates a new backend instance with saved args and kwargs, then calls reconnect and finally returns the newly created and connected backend instance.

Returns:

Type Description
BaseBackend

An instance of the backend

create_table(name, obj=None, *, schema=None, database=None, temp=None, overwrite=False)

Create a table.

database(name=None)

Return a Database object for the name database.

Parameters:

Name Type Description Default
name str | None

Name of the database to return the object for.

None

Returns:

Type Description
Database

A database object for the specified database.

from_dataframe(df, name='df', client=None)

Construct an ibis table from a pandas DataFrame.

Parameters:

Name Type Description Default
df pd.DataFrame

A pandas DataFrame

required
name str

The name of the pandas DataFrame

'df'
client BasePandasBackend | None

Client dictionary will be mutated with the name of the DataFrame, if not provided a new client is created

None

Returns:

Type Description
Table

A table expression

read_csv(path, table_name=None, **kwargs)

Register a CSV file as a table in the current backend.

Parameters:

Name Type Description Default
path str | Path

The data source. A string or Path to the CSV file.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None
**kwargs Any

Additional keyword arguments passed to the backend loading function.

{}

Returns:

Type Description
ir.Table

The just-registered table

read_parquet(path, table_name=None, **kwargs)

Register a parquet file as a table in the current backend.

Parameters:

Name Type Description Default
path str | Path

The data source.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None
**kwargs Any

Additional keyword arguments passed to the backend loading function.

{}

Returns:

Type Description
ir.Table

The just-registered table

register_options() classmethod

Register custom backend options.

to_csv(expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a CSV file.

This method is eager and will execute the associated expression immediately.

Parameters:

Name Type Description Default
expr ir.Table

The ibis expression to execute and persist to CSV.

required
path str | Path

The data source. A string or Path to the CSV file.

required
params Mapping[ir.Scalar, Any] | None

Mapping of scalar parameter expressions to value.

None
kwargs Any

Additional keyword arguments passed to pyarrow.csv.CSVWriter

{}
to_delta(expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a Delta Lake table.

This method is eager and will execute the associated expression immediately.

Parameters:

Name Type Description Default
expr ir.Table

The ibis expression to execute and persist to Delta Lake table.

required
path str | Path

The data source. A string or Path to the Delta Lake table.

required
params Mapping[ir.Scalar, Any] | None

Mapping of scalar parameter expressions to value.

None
kwargs Any

Additional keyword arguments passed to deltalake.writer.write_deltalake method

{}
to_parquet(expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a parquet file.

This method is eager and will execute the associated expression immediately.

Parameters:

Name Type Description Default
expr ir.Table

The ibis expression to execute and persist to parquet.

required
path str | Path

The data source. A string or Path to the parquet file.

required
params Mapping[ir.Scalar, Any] | None

Mapping of scalar parameter expressions to value.

None
**kwargs Any

Additional keyword arguments passed to pyarrow.parquet.ParquetWriter

{}
to_torch(expr, *, params=None, limit=None, **kwargs)

Execute an expression and return results as a dictionary of torch tensors.

Parameters:

Name Type Description Default
expr ir.Expr

Ibis expression to execute.

required
params Mapping[ir.Scalar, Any] | None

Parameters to substitute into the expression.

None
limit int | str | None

An integer to effect a specific row limit. A value of None means no limit.

None
kwargs Any

Keyword arguments passed into the backend's to_torch implementation.

{}

Returns:

Type Description
dict[str, torch.Tensor]

A dictionary of torch tensors, keyed by column name.

User Defined functions (UDF)

Ibis supports defining three kinds of user-defined functions for operations on expressions targeting the pandas backend: element-wise, reduction, and analytic.

Elementwise Functions

An element-wise function is a function that takes N rows as input and produces N rows of output. log, exp, and floor are examples of element-wise functions.

Here's how to define an element-wise function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise(input_type=[dt.int64], output_type=dt.double)
def add_one(x):
    return x + 1.0

Reduction Functions

A reduction is a function that takes N rows as input and produces 1 row as output. sum, mean and count are examples of reductions. In the context of a GROUP BY, reductions produce 1 row of output per group.

Here's how to define a reduction function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.reduction(input_type=[dt.double], output_type=dt.double)
def double_mean(series):
    return 2 * series.mean()

Analytic Functions

An analytic function is like an element-wise function in that it takes N rows as input and produces N rows of output. The key difference is that analytic functions can be applied per group using window functions. Z-score is an example of an analytic function.

Here's how to define an analytic function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.analytic(input_type=[dt.double], output_type=dt.double)
def zscore(series):
    return (series - series.mean()) / series.std()

Details of pandas UDFs

  • Element-wise provide support for applying your UDF to any combination of scalar values and columns.
  • Reductions provide support for whole column aggregations, grouped aggregations, and application of your function over a window.
  • Analytic functions work in both grouped and non-grouped settings
  • The objects you receive as input arguments are either pandas.Series or Python/NumPy scalars.

Keyword arguments must be given a default

Any keyword arguments must be given a default value or the function will not work.

A common Python convention is to set the default value to None and handle setting it to something not None in the body of the function.

Using add_one from above as an example, the following call will receive a pandas.Series for the x argument:

import ibis
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})
con = ibis.pandas.connect({'df': df})
t = con.table('df')
expr = add_one(t.a)
expr

And this will receive the int 1:

expr = add_one(1)
expr

Since the pandas backend passes around **kwargs you can accept **kwargs in your function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise([dt.int64], dt.double)
def add_two(x, **kwargs): # do stuff with kwargs
    return x + 2.0

Or you can leave them out as we did in the example above. You can also optionally accept specific keyword arguments.

For example:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise([dt.int64], dt.double)
def add_two_with_none(x, y=None):
    if y is None:
    y = 2.0
    return x + y

Last update: August 1, 2023