pandas

https://pandas.pydata.org/

Install

Install Ibis and dependencies for the pandas backend:

Install with the pandas extra:

pip install 'ibis-framework[pandas]'

And connect:

import ibis

con = ibis.pandas.connect()

1: Adjust connection parameters as needed.

Install for pandas:

conda install -c conda-forge ibis-pandas

And connect:

import ibis

con = ibis.pandas.connect()

1: Adjust connection parameters as needed.

Install for pandas:

mamba install -c conda-forge ibis-pandas

And connect:

import ibis

con = ibis.pandas.connect()

1: Adjust connection parameters as needed.

User Defined functions (UDF)

Ibis supports defining three kinds of user-defined functions for operations on expressions targeting the pandas backend: element-wise, reduction, and analytic.

Elementwise Functions

An element-wise function is a function that takes N rows as input and produces N rows of output. log, exp, and floor are examples of element-wise functions.

Here’s how to define an element-wise function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise(input_type=[dt.int64], output_type=dt.double)
def add_one(x):
    return x + 1.0

Reduction Functions

A reduction is a function that takes N rows as input and produces 1 row as output. sum, mean and count are examples of reductions. In the context of a GROUP BY, reductions produce 1 row of output per group.

Here’s how to define a reduction function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.reduction(input_type=[dt.double], output_type=dt.double)
def double_mean(series):
    return 2 * series.mean()

Analytic Functions

An analytic function is like an element-wise function in that it takes N rows as input and produces N rows of output. The key difference is that analytic functions can be applied per group using window functions. Z-score is an example of an analytic function.

Here’s how to define an analytic function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.analytic(input_type=[dt.double], output_type=dt.double)
def zscore(series):
    return (series - series.mean()) / series.std()

Details of pandas UDFs

Element-wise provide support for applying your UDF to any combination of scalar values and columns.
Reductions provide support for whole column aggregations, grouped aggregations, and application of your function over a window.
Analytic functions work in both grouped and non-grouped settings
The objects you receive as input arguments are either pandas.Series or Python/NumPy scalars.

Keyword arguments must be given a default

Any keyword arguments must be given a default value or the function will not work.

A common Python convention is to set the default value to None and handle setting it to something not None in the body of the function.

Using add_one from above as an example, the following call will receive a pandas.Series for the x argument:

import ibis
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})
con = ibis.pandas.connect({'df': df})
t = con.table('df')
expr = add_one(t.a)
expr

And this will receive the int 1:

expr = add_one(1)
expr

Since the pandas backend passes around **kwargs you can accept **kwargs in your function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise([dt.int64], dt.double)
def add_two(x, **kwargs): # do stuff with kwargs
    return x + 2.0

Or you can leave them out as we did in the example above. You can also optionally accept specific keyword arguments.

For example:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise([dt.int64], dt.double)
def add_two_with_none(x, y=None):
    if y is None:
    y = 2.0
    return x + y

`pandas.Backend`

add_operation

add_operation(self, operation)

Add a translation function to the backend for a specific operation.

Operations are defined in ibis.expr.operations, and a translation function receives the translator object and an expression as parameters, and returns a value depending on the backend.

compile

compile(self, expr, *args, **kwargs)

Compile an expression.

connect

connect(self, *args, **kwargs)

Connect to the database.

Parameters

Name	Type	Description	Default
`*args`		Mandatory connection parameters, see the docstring of `do_connect` for details.	`()`
`**kwargs`		Extra connection parameters, see the docstring of `do_connect` for details.	`{}`

Notes

This creates a new backend instance with saved args and kwargs, then calls reconnect and finally returns the newly created and connected backend instance.

Returns

Type	Description
ibis.backends.base.BaseBackend	An instance of the backend

create_table

create_table(self, name, obj=None, *, schema=None, database=None, temp=None, overwrite=False)

Create a table.

create_view

create_view(self, name, obj, *, database=None, overwrite=False)

Create a new view from an expression.

Parameters

Name	Type	Description	Default
`name`	str	Name of the new view.	required
`obj`	ibis.ibis.Table	An Ibis table expression that will be used to create the view.	required
`database`	str \| None	Name of the database where the view will be created, if not provided the database’s default is used.	`None`
`overwrite`	bool	Whether to clobber an existing view with the same name	`False`

Returns

Type	Description
Table	The view that was created.

database

database(self, name=None)

Return a Database object for the name database.

Parameters

Name	Type	Description	Default
`name`	str \| None	Name of the database to return the object for.	`None`

Returns

Type	Description
ibis.backends.base.Database	A database object for the specified database.

drop_table

drop_table(self, name, *, force=False)

Drop a table.

Parameters

Name	Type	Description	Default
`name`	str	Name of the table to drop.	required
`database`	str \| None	Name of the database where the table exists, if not the default.	`None`
`force`	bool	If `False`, an exception is raised if the table does not exist.	`False`

drop_view

drop_view(self, name, *, force=False)

Drop a view.

Parameters

Name	Type	Description	Default
`name`	str	Name of the view to drop.	required
`database`	str \| None	Name of the database where the view exists, if not the default.	`None`
`force`	bool	If `False`, an exception is raised if the view does not exist.	`False`

execute

execute(self, expr)

Execute an expression.

from_dataframe

from_dataframe(self, df, name='df', client=None)

Construct an ibis table from a pandas DataFrame.

Parameters

Name	Type	Description	Default
`df`	pandas.pandas.DataFrame	A pandas DataFrame	required
`name`	str	The name of the pandas DataFrame	`'df'`
`client`	ibis.backends.pandas.BasePandasBackend \| None	Client dictionary will be mutated with the name of the DataFrame, if not provided a new client is created	`None`

Returns

Type	Description
Table	A table expression

get_schema

get_schema(self, table_name, database=None)

has_operation

has_operation(cls, operation)

Return whether the backend implements support for operation.

Parameters

Name	Type	Description	Default
`operation`	type[ibis.ibis.Value]	A class corresponding to an operation.	required

Returns

Type	Description
bool	Whether the backend implements the operation.

Examples

>>> import ibis
>>> import ibis.expr.operations as ops
>>> ibis.sqlite.has_operation(ops.ArrayIndex)
False
>>> ibis.postgres.has_operation(ops.ArrayIndex)
True

list_tables

list_tables(self, like=None, database=None)

Return the list of table names in the current database.

For some backends, the tables may be files in a directory, or other equivalent entities in a SQL database.

Parameters

Name	Type	Description	Default
`like`	str \| None	A pattern in Python’s regex format.	`None`
`database`	str \| None	The database from which to list tables. If not provided, the current database is used.	`None`

Returns

Type	Description
list[str]	The list of the table names that match the pattern `like`.

read_csv

read_csv(self, source, table_name=None, **kwargs)

Parameters

Name	Type	Description	Default
`source`	str \| pathlib.pathlib.Path	The data source. Can be a local or remote file, pathlike objects also accepted.	required
`table_name`	str \| None	An optional name to use for the created table. This defaults to a generated name.	`None`
`**kwargs`	typing.Any	Additional keyword arguments passed to Pandas loading function. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html for more information.	`{}`

Returns

Type	Description
ibis.ibis.Table	The just-registered table

read_delta

read_delta(self, source, table_name=None, **kwargs)

Parameters

Name	Type	Description	Default
`source`	str \| pathlib.Path	The data source. Must be a directory containing a Delta Lake table.	required
`table_name`	str \| None	An optional name to use for the created table. This defaults to a sequentially generated name.	`None`
`**kwargs`	typing.Any	Additional keyword arguments passed to the underlying backend or library.	`{}`

Returns

Type	Description
ibis.ibis.Table	The just-registered table.

read_json

read_json(self, path, table_name=None, **kwargs)

Parameters

Name	Type	Description	Default
`path`	str \| pathlib.Path	The data source. A string or Path to the JSON file.	required
`table_name`	str \| None	An optional name to use for the created table. This defaults to a sequentially generated name.	`None`
`**kwargs`	typing.Any	Additional keyword arguments passed to the backend loading function.	`{}`

Returns

Type	Description
ibis.ibis.Table	The just-registered table

read_parquet

read_parquet(self, source, table_name=None, **kwargs)

Parameters

Name	Type	Description	Default
`source`	str \| pathlib.pathlib.Path	The data source(s). May be a path to a file, an iterable of files, or directory of parquet files.	required
`table_name`	str \| None	An optional name to use for the created table. This defaults to a generated name.	`None`
`**kwargs`	typing.Any	Additional keyword arguments passed to Pandas loading function. See https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html for more information.	`{}`

Returns

Type	Description
ibis.ibis.Table	The just-registered table

reconnect

reconnect(self)

Reconnect to the database already configured with connect.

register_options

register_options(cls)

rename_table

rename_table(self, old_name, new_name)

Rename an existing table.

Parameters

Name	Type	Description	Default
`old_name`	str	The old name of the table.	required
`new_name`	str	The new name of the table.	required

table

table(self, name, schema=None)

Construct a table expression.

Parameters

Name	Type	Description	Default
`name`	str	Table name	required
`database`	str \| None	Database name	`None`

Returns

Type	Description
Table	Table expression

to_csv

to_csv(self, expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a CSV file.

This method is eager and will execute the associated expression immediately.

Parameters

Name	Type	Description	Default
`expr`	ibis.ibis.Table	The ibis expression to execute and persist to CSV.	required
`path`	str \| pathlib.Path	The data source. A string or Path to the CSV file.	required
`params`	collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] \| None	Mapping of scalar parameter expressions to value.	`None`
`kwargs`	typing.Any	Additional keyword arguments passed to pyarrow.csv.CSVWriter	`{}`
`https`			required

to_delta

to_delta(self, expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a Delta Lake table.

This method is eager and will execute the associated expression immediately.

Parameters

Name	Type	Description	Default
`expr`	ibis.ibis.Table	The ibis expression to execute and persist to Delta Lake table.	required
`path`	str \| pathlib.Path	The data source. A string or Path to the Delta Lake table.	required
`params`	collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] \| None	Mapping of scalar parameter expressions to value.	`None`
`kwargs`	typing.Any	Additional keyword arguments passed to deltalake.writer.write_deltalake method	`{}`

to_pandas

to_pandas(self, expr, *, params=None, limit=None, **kwargs)

Execute an Ibis expression and return a pandas DataFrame, Series, or scalar.

Note

This method is a wrapper around execute.

Parameters

Name	Type	Description	Default
`expr`	ibis.ibis.Expr	Ibis expression to execute.	required
`params`	collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] \| None	Mapping of scalar parameter expressions to value.	`None`
`limit`	int \| str \| None	An integer to effect a specific row limit. A value of `None` means “no limit”. The default is in `ibis/config.py`.	`None`
`kwargs`	typing.Any	Keyword arguments	`{}`

to_pandas_batches

to_pandas_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **kwargs)

Execute an Ibis expression and return an iterator of pandas DataFrames.

Parameters

Name	Type	Description	Default
`expr`	ibis.ibis.Expr	Ibis expression to execute.	required
`params`	collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] \| None	Mapping of scalar parameter expressions to value.	`None`
`limit`	int \| str \| None	An integer to effect a specific row limit. A value of `None` means “no limit”. The default is in `ibis/config.py`.	`None`
`chunk_size`	int	Maximum number of rows in each returned `DataFrame` batch. This may have no effect depending on the backend.	`1000000`
`kwargs`	typing.Any	Keyword arguments	`{}`

Returns

Type	Description
collections.abc.Iterator[pandas.pandas.DataFrame]	An iterator of pandas `DataFrame`s.

to_parquet

to_parquet(self, expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a parquet file.

This method is eager and will execute the associated expression immediately.

Parameters

Name	Type	Description	Default
`expr`	ibis.ibis.Table	The ibis expression to execute and persist to parquet.	required
`path`	str \| pathlib.Path	The data source. A string or Path to the parquet file.	required
`params`	collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] \| None	Mapping of scalar parameter expressions to value.	`None`
`**kwargs`	typing.Any	Additional keyword arguments passed to pyarrow.parquet.ParquetWriter	`{}`
`https`			required

to_pyarrow

to_pyarrow(self, expr, params=None, limit=None, **kwargs)

Execute expression and return results in as a pyarrow table.

This method is eager and will execute the associated expression immediately.

Parameters

Name	Type	Description	Default
`expr`	ibis.ibis.Expr	Ibis expression to export to pyarrow	required
`params`	collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] \| None	Mapping of scalar parameter expressions to value.	`None`
`limit`	int \| str \| None	An integer to effect a specific row limit. A value of `None` means “no limit”. The default is in `ibis/config.py`.	`None`
`kwargs`	typing.Any	Keyword arguments	`{}`

Returns

Type	Description
Table	A pyarrow table holding the results of the executed expression.

to_pyarrow_batches

to_pyarrow_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **kwargs)

Execute expression and return a RecordBatchReader.

This method is eager and will execute the associated expression immediately.

Parameters

Name	Type	Description	Default
`expr`	ibis.ibis.Expr	Ibis expression to export to pyarrow	required
`limit`	int \| str \| None	An integer to effect a specific row limit. A value of `None` means “no limit”. The default is in `ibis/config.py`.	`None`
`params`	collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] \| None	Mapping of scalar parameter expressions to value.	`None`
`chunk_size`	int	Maximum number of rows in each returned record batch.	`1000000`
`kwargs`	typing.Any	Keyword arguments	`{}`

Returns

Type	Description
results	RecordBatchReader

to_torch

to_torch(self, expr, *, params=None, limit=None, **kwargs)

Execute an expression and return results as a dictionary of torch tensors.

Parameters

Name	Type	Description	Default
`expr`	ibis.ibis.Expr	Ibis expression to execute.	required
`params`	collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] \| None	Parameters to substitute into the expression.	`None`
`limit`	int \| str \| None	An integer to effect a specific row limit. A value of `None` means no limit.	`None`
`kwargs`	typing.Any	Keyword arguments passed into the backend’s `to_torch` implementation.	`{}`

Returns

Type	Description
dict[str, torch.torch.Tensor]	A dictionary of torch tensors, keyed by column name.