pandas ¶

Ibis's pandas backend is available in core Ibis.

`ibis.memtable` Support ¶

The pandas backend supports memtables by natively executing queries against the underlying storage (e.g., pyarrow Tables or pandas DataFrames).

Install¶

Install ibis and dependencies for the pandas backend:

pipcondamamba

pip install 'ibis-framework'

conda install -c conda-forge ibis-framework

mamba install -c conda-forge ibis-framework

Connect¶

API¶

Create a client by passing in a dictionary of paths to ibis.pandas.connect.

See ibis.backends.pandas.Backend.do_connect for connection parameter information.

ibis.pandas.connect is a thin wrapper around ibis.backends.pandas.Backend.do_connect.

Connection Parameters¶

`do_connect(dictionary=None)` ¶

Construct a client from a dictionary of pandas DataFrames.

Parameters:

Name	Type	Description	Default
`dictionary`	`MutableMapping[str, pd.DataFrame] \| None`	An optional mapping of string table names to pandas DataFrames.	`None`

Examples:

>>> import ibis
>>> ibis.pandas.connect({"t": pd.DataFrame({"a": [1, 2, 3]})})
<ibis.backends.pandas.Backend at 0x...>

Backend API¶

`Backend` ¶

Bases: BasePandasBackend

Attributes¶

`db_identity: str` `cached` `property` ¶

Return the identity of the database.

Multiple connections to the same database will return the same value for db_identity.

The default implementation assumes connection parameters uniquely specify the database.

Returns:

Type	Description
`Hashable`	Database identity

`tables` `cached` `property` ¶

An accessor for tables in the database.

Tables may be accessed by name using either index or attribute access:

Examples:

>>> con = ibis.sqlite.connect("example.db")
>>> people = con.tables['people']  # access via index
>>> people = con.tables.people  # access via attribute

Functions¶

`add_operation(operation)` ¶

Add a translation function to the backend for a specific operation.

Operations are defined in ibis.expr.operations, and a translation function receives the translator object and an expression as parameters, and returns a value depending on the backend.

`connect(*args, **kwargs)` ¶

Connect to the database.

Parameters:

Name	Type	Description	Default
`*args`		Mandatory connection parameters, see the docstring of `do_connect` for details.	`()`
`**kwargs`		Extra connection parameters, see the docstring of `do_connect` for details.	`{}`

Notes¶

This creates a new backend instance with saved args and kwargs, then calls reconnect and finally returns the newly created and connected backend instance.

Returns:

Type	Description
`BaseBackend`	An instance of the backend

`create_table(name, obj=None, *, schema=None, database=None, temp=None, overwrite=False)` ¶

Create a table.

`database(name=None)` ¶

Return a Database object for the name database.

Parameters:

Name	Type	Description	Default
`name`	`str \| None`	Name of the database to return the object for.	`None`

Returns:

Type	Description
`Database`	A database object for the specified database.

`from_dataframe(df, name='df', client=None)` ¶

Construct an ibis table from a pandas DataFrame.

Parameters:

Name	Type	Description	Default
`df`	`pd.DataFrame`	A pandas DataFrame	required
`name`	`str`	The name of the pandas DataFrame	`'df'`
`client`	`BasePandasBackend \| None`	Client dictionary will be mutated with the name of the DataFrame, if not provided a new client is created	`None`

Returns:

Type	Description
`Table`	A table expression

`read_csv(path, table_name=None, **kwargs)` ¶

Register a CSV file as a table in the current backend.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	The data source. A string or Path to the CSV file.	required
`table_name`	`str \| None`	An optional name to use for the created table. This defaults to a sequentially generated name.	`None`
`**kwargs`	`Any`	Additional keyword arguments passed to the backend loading function.	`{}`

Returns:

Type	Description
`ir.Table`	The just-registered table

`read_parquet(path, table_name=None, **kwargs)` ¶

Register a parquet file as a table in the current backend.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	The data source.	required
`table_name`	`str \| None`	An optional name to use for the created table. This defaults to a sequentially generated name.	`None`
`**kwargs`	`Any`	Additional keyword arguments passed to the backend loading function.	`{}`

Returns:

Type	Description
`ir.Table`	The just-registered table

`register_options()` `classmethod` ¶

Register custom backend options.

`to_csv(expr, path, *, params=None, **kwargs)` ¶

Write the results of executing the given expression to a CSV file.

This method is eager and will execute the associated expression immediately.

Parameters:

Name	Type	Description	Default
`expr`	`ir.Table`	The ibis expression to execute and persist to CSV.	required
`path`	`str \| Path`	The data source. A string or Path to the CSV file.	required
`params`	`Mapping[ir.Scalar, Any] \| None`	Mapping of scalar parameter expressions to value.	`None`
`kwargs`	`Any`	Additional keyword arguments passed to pyarrow.csv.CSVWriter	`{}`

`to_delta(expr, path, *, params=None, **kwargs)` ¶

Write the results of executing the given expression to a Delta Lake table.

This method is eager and will execute the associated expression immediately.

Parameters:

Name	Type	Description	Default
`expr`	`ir.Table`	The ibis expression to execute and persist to Delta Lake table.	required
`path`	`str \| Path`	The data source. A string or Path to the Delta Lake table.	required
`params`	`Mapping[ir.Scalar, Any] \| None`	Mapping of scalar parameter expressions to value.	`None`
`kwargs`	`Any`	Additional keyword arguments passed to deltalake.writer.write_deltalake method	`{}`

`to_parquet(expr, path, *, params=None, **kwargs)` ¶

Write the results of executing the given expression to a parquet file.

This method is eager and will execute the associated expression immediately.

Parameters:

Name	Type	Description	Default
`expr`	`ir.Table`	The ibis expression to execute and persist to parquet.	required
`path`	`str \| Path`	The data source. A string or Path to the parquet file.	required
`params`	`Mapping[ir.Scalar, Any] \| None`	Mapping of scalar parameter expressions to value.	`None`
`**kwargs`	`Any`	Additional keyword arguments passed to pyarrow.parquet.ParquetWriter	`{}`

`to_torch(expr, *, params=None, limit=None, **kwargs)` ¶

Execute an expression and return results as a dictionary of torch tensors.

Parameters:

Name	Type	Description	Default
`expr`	`ir.Expr`	Ibis expression to execute.	required
`params`	`Mapping[ir.Scalar, Any] \| None`	Parameters to substitute into the expression.	`None`
`limit`	`int \| str \| None`	An integer to effect a specific row limit. A value of `None` means no limit.	`None`
`kwargs`	`Any`	Keyword arguments passed into the backend's `to_torch` implementation.	`{}`

Returns:

Type	Description
`dict[str, torch.Tensor]`	A dictionary of torch tensors, keyed by column name.

User Defined functions (UDF)¶

Ibis supports defining three kinds of user-defined functions for operations on expressions targeting the pandas backend: element-wise, reduction, and analytic.

Elementwise Functions¶

An element-wise function is a function that takes N rows as input and produces N rows of output. log, exp, and floor are examples of element-wise functions.

Here's how to define an element-wise function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise(input_type=[dt.int64], output_type=dt.double)
def add_one(x):
    return x + 1.0

Reduction Functions¶

A reduction is a function that takes N rows as input and produces 1 row as output. sum, mean and count are examples of reductions. In the context of a GROUP BY, reductions produce 1 row of output per group.

Here's how to define a reduction function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.reduction(input_type=[dt.double], output_type=dt.double)
def double_mean(series):
    return 2 * series.mean()

Analytic Functions¶

An analytic function is like an element-wise function in that it takes N rows as input and produces N rows of output. The key difference is that analytic functions can be applied per group using window functions. Z-score is an example of an analytic function.

Here's how to define an analytic function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.analytic(input_type=[dt.double], output_type=dt.double)
def zscore(series):
    return (series - series.mean()) / series.std()

Details of pandas UDFs¶

Element-wise provide support for applying your UDF to any combination of scalar values and columns.
Reductions provide support for whole column aggregations, grouped aggregations, and application of your function over a window.
Analytic functions work in both grouped and non-grouped settings
The objects you receive as input arguments are either pandas.Series or Python/NumPy scalars.

Keyword arguments must be given a default

Any keyword arguments must be given a default value or the function will not work.

A common Python convention is to set the default value to None and handle setting it to something not None in the body of the function.

Using add_one from above as an example, the following call will receive a pandas.Series for the x argument:

import ibis
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})
con = ibis.pandas.connect({'df': df})
t = con.table('df')
expr = add_one(t.a)
expr

And this will receive the int 1:

expr = add_one(1)
expr

Since the pandas backend passes around **kwargs you can accept **kwargs in your function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise([dt.int64], dt.double)
def add_two(x, **kwargs): # do stuff with kwargs
    return x + 2.0

Or you can leave them out as we did in the example above. You can also optionally accept specific keyword arguments.

For example:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise([dt.int64], dt.double)
def add_two_with_none(x, y=None):
    if y is None:
    y = 2.0
    return x + y

Last update: August 1, 2023

pandas¶

ibis.memtable Support ¶

Install¶

Connect¶

API¶

Connection Parameters¶

do_connect(dictionary=None) ¶

Backend API¶

Backend ¶

Attributes¶

db_identity: str cached property ¶

tables cached property ¶

Functions¶

add_operation(operation) ¶

connect(*args, **kwargs) ¶

Notes¶

create_table(name, obj=None, *, schema=None, database=None, temp=None, overwrite=False) ¶

database(name=None) ¶

from_dataframe(df, name='df', client=None) ¶

read_csv(path, table_name=None, **kwargs) ¶

read_parquet(path, table_name=None, **kwargs) ¶

register_options() classmethod ¶

to_csv(expr, path, *, params=None, **kwargs) ¶

to_delta(expr, path, *, params=None, **kwargs) ¶

to_parquet(expr, path, *, params=None, **kwargs) ¶

to_torch(expr, *, params=None, limit=None, **kwargs) ¶

User Defined functions (UDF)¶

Elementwise Functions¶

Reduction Functions¶

Analytic Functions¶

Details of pandas UDFs¶

pandas ¶

`ibis.memtable` Support ¶

`do_connect(dictionary=None)` ¶

`Backend` ¶

`db_identity: str` `cached` `property` ¶

`tables` `cached` `property` ¶

`add_operation(operation)` ¶

`connect(*args, **kwargs)` ¶

`create_table(name, obj=None, *, schema=None, database=None, temp=None, overwrite=False)` ¶

`database(name=None)` ¶

`from_dataframe(df, name='df', client=None)` ¶

`read_csv(path, table_name=None, **kwargs)` ¶

`read_parquet(path, table_name=None, **kwargs)` ¶

`register_options()` `classmethod` ¶

`to_csv(expr, path, *, params=None, **kwargs)` ¶

`to_delta(expr, path, *, params=None, **kwargs)` ¶

`to_parquet(expr, path, *, params=None, **kwargs)` ¶

`to_torch(expr, *, params=None, limit=None, **kwargs)` ¶