Skip to content

Dask

ibis.memtable Support memtable

The Dask backend supports memtables by natively executing queries against the underlying storage (e.g., pyarrow Tables or pandas DataFrames).

Install

Install ibis and dependencies for the Dask backend:

pip install 'ibis-framework[dask]'
conda install -c conda-forge ibis-dask
mamba install -c conda-forge ibis-dask

Connect

API

Create a client by passing in a dictionary of paths to ibis.dask.connect.

See ibis.backends.dask.Backend.do_connect for connection parameter information.

ibis.dask.connect is a thin wrapper around ibis.backends.dask.Backend.do_connect.

Connection Parameters

do_connect(dictionary=None)

Construct a Dask backend client from a dictionary of data sources.

Parameters:

Name Type Description Default
dictionary MutableMapping[str, dd.DataFrame] | None

An optional mapping from str table names to Dask DataFrames.

None

Examples:

>>> import ibis
>>> import dask.dataframe as dd
>>> data = {
...     "t": dd.read_parquet("path/to/file.parquet"),
...     "s": dd.read_csv("path/to/file.csv"),
... }
>>> ibis.dask.connect(data)

Backend API

Backend

Bases: BasePandasBackend

Attributes

db_identity: str cached property

Return the identity of the database.

Multiple connections to the same database will return the same value for db_identity.

The default implementation assumes connection parameters uniquely specify the database.

Returns:

Type Description
Hashable

Database identity

tables cached property

An accessor for tables in the database.

Tables may be accessed by name using either index or attribute access:

Examples:

>>> con = ibis.sqlite.connect("example.db")
>>> people = con.tables['people']  # access via index
>>> people = con.tables.people  # access via attribute

Functions

add_operation(operation)

Add a translation function to the backend for a specific operation.

Operations are defined in ibis.expr.operations, and a translation function receives the translator object and an expression as parameters, and returns a value depending on the backend.

compile(query, params=None, **kwargs)

Compile expr.

Returns:

Type Description
dask.dataframe.core.DataFrame | dask.dataframe.core.Series | das.dataframe.core.Scalar

Dask graph.

connect(*args, **kwargs)

Connect to the database.

Parameters:

Name Type Description Default
*args

Mandatory connection parameters, see the docstring of do_connect for details.

()
**kwargs

Extra connection parameters, see the docstring of do_connect for details.

{}
Notes

This creates a new backend instance with saved args and kwargs, then calls reconnect and finally returns the newly created and connected backend instance.

Returns:

Type Description
BaseBackend

An instance of the backend

create_table(name, obj=None, *, schema=None, database=None, temp=None, overwrite=False)

Create a table.

database(name=None)

Return a Database object for the name database.

Parameters:

Name Type Description Default
name str | None

Name of the database to return the object for.

None

Returns:

Type Description
Database

A database object for the specified database.

from_dataframe(df, name='df', client=None)

Construct an ibis table from a pandas DataFrame.

Parameters:

Name Type Description Default
df pd.DataFrame

A pandas DataFrame

required
name str

The name of the pandas DataFrame

'df'
client BasePandasBackend | None

Client dictionary will be mutated with the name of the DataFrame, if not provided a new client is created

None

Returns:

Type Description
Table

A table expression

read_csv(path, table_name=None, **kwargs)

Register a CSV file as a table in the current backend.

Parameters:

Name Type Description Default
path str | Path

The data source. A string or Path to the CSV file.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None
**kwargs Any

Additional keyword arguments passed to the backend loading function.

{}

Returns:

Type Description
ir.Table

The just-registered table

read_parquet(path, table_name=None, **kwargs)

Register a parquet file as a table in the current backend.

Parameters:

Name Type Description Default
path str | Path

The data source.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None
**kwargs Any

Additional keyword arguments passed to the backend loading function.

{}

Returns:

Type Description
ir.Table

The just-registered table

register_options() classmethod

Register custom backend options.

to_csv(expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a CSV file.

This method is eager and will execute the associated expression immediately.

Parameters:

Name Type Description Default
expr ir.Table

The ibis expression to execute and persist to CSV.

required
path str | Path

The data source. A string or Path to the CSV file.

required
params Mapping[ir.Scalar, Any] | None

Mapping of scalar parameter expressions to value.

None
kwargs Any

Additional keyword arguments passed to pyarrow.csv.CSVWriter

{}
to_delta(expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a Delta Lake table.

This method is eager and will execute the associated expression immediately.

Parameters:

Name Type Description Default
expr ir.Table

The ibis expression to execute and persist to Delta Lake table.

required
path str | Path

The data source. A string or Path to the Delta Lake table.

required
params Mapping[ir.Scalar, Any] | None

Mapping of scalar parameter expressions to value.

None
kwargs Any

Additional keyword arguments passed to deltalake.writer.write_deltalake method

{}
to_parquet(expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a parquet file.

This method is eager and will execute the associated expression immediately.

Parameters:

Name Type Description Default
expr ir.Table

The ibis expression to execute and persist to parquet.

required
path str | Path

The data source. A string or Path to the parquet file.

required
params Mapping[ir.Scalar, Any] | None

Mapping of scalar parameter expressions to value.

None
**kwargs Any

Additional keyword arguments passed to pyarrow.parquet.ParquetWriter

{}
to_pyarrow(expr, *, params=None, limit=None, **kwargs)

Execute expression and return results in as a pyarrow table.

This method is eager and will execute the associated expression immediately.

Parameters:

Name Type Description Default
expr ir.Expr

Ibis expression to export to pyarrow

required
params Mapping[ir.Scalar, Any] | None

Mapping of scalar parameter expressions to value.

None
limit int | str | None

An integer to effect a specific row limit. A value of None means "no limit". The default is in ibis/config.py.

None
kwargs Any

Keyword arguments

{}

Returns:

Type Description
Table

A pyarrow table holding the results of the executed expression.

to_pyarrow_batches(expr, *, params=None, limit=None, chunk_size=1000000, **kwargs)

Execute expression and return a RecordBatchReader.

This method is eager and will execute the associated expression immediately.

Parameters:

Name Type Description Default
expr ir.Expr

Ibis expression to export to pyarrow

required
limit int | str | None

An integer to effect a specific row limit. A value of None means "no limit". The default is in ibis/config.py.

None
params Mapping[ir.Scalar, Any] | None

Mapping of scalar parameter expressions to value.

None
chunk_size int

Maximum number of rows in each returned record batch.

1000000
kwargs Any

Keyword arguments

{}

Returns:

Type Description
results

RecordBatchReader

to_torch(expr, *, params=None, limit=None, **kwargs)

Execute an expression and return results as a dictionary of torch tensors.

Parameters:

Name Type Description Default
expr ir.Expr

Ibis expression to execute.

required
params Mapping[ir.Scalar, Any] | None

Parameters to substitute into the expression.

None
limit int | str | None

An integer to effect a specific row limit. A value of None means no limit.

None
kwargs Any

Keyword arguments passed into the backend's to_torch implementation.

{}

Returns:

Type Description
dict[str, torch.Tensor]

A dictionary of torch tensors, keyed by column name.


Last update: August 1, 2023