pandas
Install
Install Ibis and dependencies for the pandas backend:
Install with the pandas
extra:
pip install 'ibis-framework[pandas]'
And connect:
import ibis
= ibis.pandas.connect() con
- 1
- Adjust connection parameters as needed.
Install for pandas:
conda install -c conda-forge ibis-pandas
And connect:
import ibis
= ibis.pandas.connect() con
- 1
- Adjust connection parameters as needed.
Install for pandas:
mamba install -c conda-forge ibis-pandas
And connect:
import ibis
= ibis.pandas.connect() con
- 1
- Adjust connection parameters as needed.
User Defined functions (UDF)
Ibis supports defining three kinds of user-defined functions for operations on expressions targeting the pandas backend: element-wise, reduction, and analytic.
Elementwise Functions
An element-wise function is a function that takes N rows as input and produces N rows of output. log
, exp
, and floor
are examples of element-wise functions.
Here’s how to define an element-wise function:
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf
@udf.elementwise(input_type=[dt.int64], output_type=dt.double)
def add_one(x):
return x + 1.0
Reduction Functions
A reduction is a function that takes N rows as input and produces 1 row as output. sum
, mean
and count
are examples of reductions. In the context of a GROUP BY
, reductions produce 1 row of output per group.
Here’s how to define a reduction function:
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf
@udf.reduction(input_type=[dt.double], output_type=dt.double)
def double_mean(series):
return 2 * series.mean()
Analytic Functions
An analytic function is like an element-wise function in that it takes N rows as input and produces N rows of output. The key difference is that analytic functions can be applied per group using window functions. Z-score is an example of an analytic function.
Here’s how to define an analytic function:
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf
@udf.analytic(input_type=[dt.double], output_type=dt.double)
def zscore(series):
return (series - series.mean()) / series.std()
Details of pandas UDFs
- Element-wise provide support for applying your UDF to any combination of scalar values and columns.
- Reductions provide support for whole column aggregations, grouped aggregations, and application of your function over a window.
- Analytic functions work in both grouped and non-grouped settings
- The objects you receive as input arguments are either
pandas.Series
or Python/NumPy scalars.
Any keyword arguments must be given a default value or the function will not work.
A common Python convention is to set the default value to None
and handle setting it to something not None
in the body of the function.
Using add_one
from above as an example, the following call will receive a pandas.Series
for the x
argument:
import ibis
import pandas as pd
= pd.DataFrame({'a': [1, 2, 3]})
df = ibis.pandas.connect({'df': df})
con = con.table('df')
t = add_one(t.a)
expr expr
And this will receive the int
1:
= add_one(1)
expr expr
Since the pandas backend passes around **kwargs
you can accept **kwargs
in your function:
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf
@udf.elementwise([dt.int64], dt.double)
def add_two(x, **kwargs): # do stuff with kwargs
return x + 2.0
Or you can leave them out as we did in the example above. You can also optionally accept specific keyword arguments.
For example:
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf
@udf.elementwise([dt.int64], dt.double)
def add_two_with_none(x, y=None):
if y is None:
= 2.0
y return x + y
pandas.Backend
add_operation
add_operation(self, operation)
Add a translation function to the backend for a specific operation.
Operations are defined in ibis.expr.operations
, and a translation function receives the translator object and an expression as parameters, and returns a value depending on the backend.
compile
compile(self, expr, *args, **kwargs)
Compile an expression.
connect
connect(self, *args, **kwargs)
Connect to the database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
*args |
Mandatory connection parameters, see the docstring of do_connect for details. |
() |
|
**kwargs |
Extra connection parameters, see the docstring of do_connect for details. |
{} |
Notes
This creates a new backend instance with saved args
and kwargs
, then calls reconnect
and finally returns the newly created and connected backend instance.
Returns
Type | Description |
---|---|
ibis.backends.base.BaseBackend | An instance of the backend |
create_table
create_table(self, name, obj=None, *, schema=None, database=None, temp=None, overwrite=False)
Create a table.
create_view
create_view(self, name, obj, *, database=None, overwrite=False)
Create a new view from an expression.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name |
str | Name of the new view. | required |
obj |
ibis.ibis.Table | An Ibis table expression that will be used to create the view. | required |
database |
str | None | Name of the database where the view will be created, if not provided the database’s default is used. | None |
overwrite |
bool | Whether to clobber an existing view with the same name | False |
Returns
Type | Description |
---|---|
Table | The view that was created. |
database
database(self, name=None)
Return a Database
object for the name
database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name |
str | None | Name of the database to return the object for. | None |
Returns
Type | Description |
---|---|
ibis.backends.base.Database | A database object for the specified database. |
drop_table
drop_table(self, name, *, force=False)
Drop a table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name |
str | Name of the table to drop. | required |
database |
str | None | Name of the database where the table exists, if not the default. | None |
force |
bool | If False , an exception is raised if the table does not exist. |
False |
drop_view
drop_view(self, name, *, force=False)
Drop a view.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name |
str | Name of the view to drop. | required |
database |
str | None | Name of the database where the view exists, if not the default. | None |
force |
bool | If False , an exception is raised if the view does not exist. |
False |
execute
execute(self, expr)
Execute an expression.
from_dataframe
from_dataframe(self, df, name='df', client=None)
Construct an ibis table from a pandas DataFrame.
Parameters
Name | Type | Description | Default |
---|---|---|---|
df |
pandas.pandas.DataFrame | A pandas DataFrame | required |
name |
str | The name of the pandas DataFrame | 'df' |
client |
ibis.backends.pandas.BasePandasBackend | None | Client dictionary will be mutated with the name of the DataFrame, if not provided a new client is created | None |
Returns
Type | Description |
---|---|
Table | A table expression |
get_schema
get_schema(self, table_name, database=None)
has_operation
has_operation(cls, operation)
Return whether the backend implements support for operation
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
operation |
type[ibis.ibis.Value] | A class corresponding to an operation. | required |
Returns
Type | Description |
---|---|
bool | Whether the backend implements the operation. |
Examples
>>> import ibis
>>> import ibis.expr.operations as ops
>>> ibis.sqlite.has_operation(ops.ArrayIndex)
False
>>> ibis.postgres.has_operation(ops.ArrayIndex)
True
list_tables
list_tables(self, like=None, database=None)
Return the list of table names in the current database.
For some backends, the tables may be files in a directory, or other equivalent entities in a SQL database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
like |
str | None | A pattern in Python’s regex format. | None |
database |
str | None | The database from which to list tables. If not provided, the current database is used. | None |
Returns
Type | Description |
---|---|
list[str] | The list of the table names that match the pattern like . |
read_csv
read_csv(self, source, table_name=None, **kwargs)
Register a CSV file as a table in the current session.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source |
str | pathlib.pathlib.Path | The data source. Can be a local or remote file, pathlike objects also accepted. | required |
table_name |
str | None | An optional name to use for the created table. This defaults to a generated name. | None |
**kwargs |
typing.Any | Additional keyword arguments passed to Pandas loading function. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html for more information. | {} |
Returns
Type | Description |
---|---|
ibis.ibis.Table | The just-registered table |
read_delta
read_delta(self, source, table_name=None, **kwargs)
Register a Delta Lake table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source |
str | pathlib.Path | The data source. Must be a directory containing a Delta Lake table. | required |
table_name |
str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs |
typing.Any | Additional keyword arguments passed to the underlying backend or library. | {} |
Returns
Type | Description |
---|---|
ibis.ibis.Table | The just-registered table. |
read_json
read_json(self, path, table_name=None, **kwargs)
Register a JSON file as a table in the current backend.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path |
str | pathlib.Path | The data source. A string or Path to the JSON file. | required |
table_name |
str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs |
typing.Any | Additional keyword arguments passed to the backend loading function. | {} |
Returns
Type | Description |
---|---|
ibis.ibis.Table | The just-registered table |
read_parquet
read_parquet(self, source, table_name=None, **kwargs)
Register a parquet file as a table in the current session.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source |
str | pathlib.pathlib.Path | The data source(s). May be a path to a file, an iterable of files, or directory of parquet files. | required |
table_name |
str | None | An optional name to use for the created table. This defaults to a generated name. | None |
**kwargs |
typing.Any | Additional keyword arguments passed to Pandas loading function. See https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html for more information. | {} |
Returns
Type | Description |
---|---|
ibis.ibis.Table | The just-registered table |
reconnect
reconnect(self)
Reconnect to the database already configured with connect.
register_options
register_options(cls)
Register custom backend options.
rename_table
rename_table(self, old_name, new_name)
Rename an existing table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
old_name |
str | The old name of the table. | required |
new_name |
str | The new name of the table. | required |
table
table(self, name, schema=None)
Construct a table expression.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name |
str | Table name | required |
database |
str | None | Database name | None |
Returns
Type | Description |
---|---|
Table | Table expression |
to_csv
to_csv(self, expr, path, *, params=None, **kwargs)
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.ibis.Table | The ibis expression to execute and persist to CSV. | required |
path |
str | pathlib.Path | The data source. A string or Path to the CSV file. | required |
params |
collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
kwargs |
typing.Any | Additional keyword arguments passed to pyarrow.csv.CSVWriter | {} |
https |
required |
to_delta
to_delta(self, expr, path, *, params=None, **kwargs)
Write the results of executing the given expression to a Delta Lake table.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.ibis.Table | The ibis expression to execute and persist to Delta Lake table. | required |
path |
str | pathlib.Path | The data source. A string or Path to the Delta Lake table. | required |
params |
collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
kwargs |
typing.Any | Additional keyword arguments passed to deltalake.writer.write_deltalake method | {} |
to_pandas
to_pandas(self, expr, *, params=None, limit=None, **kwargs)
Execute an Ibis expression and return a pandas DataFrame
, Series
, or scalar.
This method is a wrapper around execute
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.ibis.Expr | Ibis expression to execute. | required |
params |
collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs |
typing.Any | Keyword arguments | {} |
to_pandas_batches
to_pandas_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **kwargs)
Execute an Ibis expression and return an iterator of pandas DataFrame
s.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.ibis.Expr | Ibis expression to execute. | required |
params |
collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
chunk_size |
int | Maximum number of rows in each returned DataFrame batch. This may have no effect depending on the backend. |
1000000 |
kwargs |
typing.Any | Keyword arguments | {} |
Returns
Type | Description |
---|---|
collections.abc.Iterator[pandas.pandas.DataFrame] | An iterator of pandas DataFrame s. |
to_parquet
to_parquet(self, expr, path, *, params=None, **kwargs)
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.ibis.Table | The ibis expression to execute and persist to parquet. | required |
path |
str | pathlib.Path | The data source. A string or Path to the parquet file. | required |
params |
collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
**kwargs |
typing.Any | Additional keyword arguments passed to pyarrow.parquet.ParquetWriter | {} |
https |
required |
to_pyarrow
to_pyarrow(self, expr, params=None, limit=None, **kwargs)
Execute expression and return results in as a pyarrow table.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.ibis.Expr | Ibis expression to export to pyarrow | required |
params |
collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs |
typing.Any | Keyword arguments | {} |
Returns
Type | Description |
---|---|
Table | A pyarrow table holding the results of the executed expression. |
to_pyarrow_batches
to_pyarrow_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **kwargs)
Execute expression and return a RecordBatchReader.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.ibis.Expr | Ibis expression to export to pyarrow | required |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
params |
collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
chunk_size |
int | Maximum number of rows in each returned record batch. | 1000000 |
kwargs |
typing.Any | Keyword arguments | {} |
Returns
Type | Description |
---|---|
results | RecordBatchReader |
to_torch
to_torch(self, expr, *, params=None, limit=None, **kwargs)
Execute an expression and return results as a dictionary of torch tensors.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.ibis.Expr | Ibis expression to execute. | required |
params |
collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None | Parameters to substitute into the expression. | None |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
kwargs |
typing.Any | Keyword arguments passed into the backend’s to_torch implementation. |
{} |
Returns
Type | Description |
---|---|
dict[str, torch.torch.Tensor] | A dictionary of torch tensors, keyed by column name. |