Dask ¶

`ibis.memtable` Support ¶

The Dask backend supports memtables by natively executing queries against the underlying storage (e.g., pyarrow Tables or pandas DataFrames).

Install¶

Install ibis and dependencies for the Dask backend:

pipcondamamba

pip install 'ibis-framework[dask]'

conda install -c conda-forge ibis-dask

mamba install -c conda-forge ibis-dask

Connect¶

API¶

Create a client by passing in a dictionary of paths to ibis.dask.connect.

See ibis.backends.dask.Backend.do_connect for connection parameter information.

ibis.dask.connect is a thin wrapper around ibis.backends.dask.Backend.do_connect.

Connection Parameters¶

`do_connect(dictionary=None)` ¶

Construct a Dask backend client from a dictionary of data sources.

Parameters:

Name	Type	Description	Default
`dictionary`	`MutableMapping[str, dd.DataFrame] \| None`	An optional mapping from `str` table names to Dask DataFrames.	`None`

Examples:

>>> import ibis
>>> import dask.dataframe as dd
>>> data = {
...     "t": dd.read_parquet("path/to/file.parquet"),
...     "s": dd.read_csv("path/to/file.csv"),
... }
>>> ibis.dask.connect(data)

Backend API¶

`Backend` ¶

Bases: BasePandasBackend

Attributes¶

`db_identity: str` `cached` `property` ¶

Return the identity of the database.

Multiple connections to the same database will return the same value for db_identity.

The default implementation assumes connection parameters uniquely specify the database.

Returns:

Type	Description
`Hashable`	Database identity

`tables` `cached` `property` ¶

An accessor for tables in the database.

Tables may be accessed by name using either index or attribute access:

Examples:

>>> con = ibis.sqlite.connect("example.db")
>>> people = con.tables['people']  # access via index
>>> people = con.tables.people  # access via attribute

Functions¶

`add_operation(operation)` ¶

Add a translation function to the backend for a specific operation.

Operations are defined in ibis.expr.operations, and a translation function receives the translator object and an expression as parameters, and returns a value depending on the backend.

`compile(query, params=None, **kwargs)` ¶

Compile expr.

Returns:

Type	Description
`dask.dataframe.core.DataFrame \| dask.dataframe.core.Series \| das.dataframe.core.Scalar`	Dask graph.

`connect(*args, **kwargs)` ¶

Connect to the database.

Parameters:

Name	Type	Description	Default
`*args`		Mandatory connection parameters, see the docstring of `do_connect` for details.	`()`
`**kwargs`		Extra connection parameters, see the docstring of `do_connect` for details.	`{}`

Notes¶

This creates a new backend instance with saved args and kwargs, then calls reconnect and finally returns the newly created and connected backend instance.

Returns:

Type	Description
`BaseBackend`	An instance of the backend

`create_table(name, obj=None, *, schema=None, database=None, temp=None, overwrite=False)` ¶

Create a table.

`database(name=None)` ¶

Return a Database object for the name database.

Parameters:

Name	Type	Description	Default
`name`	`str \| None`	Name of the database to return the object for.	`None`

Returns:

Type	Description
`Database`	A database object for the specified database.

`from_dataframe(df, name='df', client=None)` ¶

Construct an ibis table from a pandas DataFrame.

Parameters:

Name	Type	Description	Default
`df`	`pd.DataFrame`	A pandas DataFrame	required
`name`	`str`	The name of the pandas DataFrame	`'df'`
`client`	`BasePandasBackend \| None`	Client dictionary will be mutated with the name of the DataFrame, if not provided a new client is created	`None`

Returns:

Type	Description
`Table`	A table expression

`read_csv(path, table_name=None, **kwargs)` ¶

Register a CSV file as a table in the current backend.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	The data source. A string or Path to the CSV file.	required
`table_name`	`str \| None`	An optional name to use for the created table. This defaults to a sequentially generated name.	`None`
`**kwargs`	`Any`	Additional keyword arguments passed to the backend loading function.	`{}`

Returns:

Type	Description
`ir.Table`	The just-registered table

`read_parquet(path, table_name=None, **kwargs)` ¶

Register a parquet file as a table in the current backend.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	The data source.	required
`table_name`	`str \| None`	An optional name to use for the created table. This defaults to a sequentially generated name.	`None`
`**kwargs`	`Any`	Additional keyword arguments passed to the backend loading function.	`{}`

Returns:

Type	Description
`ir.Table`	The just-registered table

`register_options()` `classmethod` ¶

Register custom backend options.

`to_csv(expr, path, *, params=None, **kwargs)` ¶

Write the results of executing the given expression to a CSV file.

This method is eager and will execute the associated expression immediately.

Parameters:

Name	Type	Description	Default
`expr`	`ir.Table`	The ibis expression to execute and persist to CSV.	required
`path`	`str \| Path`	The data source. A string or Path to the CSV file.	required
`params`	`Mapping[ir.Scalar, Any] \| None`	Mapping of scalar parameter expressions to value.	`None`
`kwargs`	`Any`	Additional keyword arguments passed to pyarrow.csv.CSVWriter	`{}`

`to_delta(expr, path, *, params=None, **kwargs)` ¶

Write the results of executing the given expression to a Delta Lake table.

This method is eager and will execute the associated expression immediately.

Parameters:

Name	Type	Description	Default
`expr`	`ir.Table`	The ibis expression to execute and persist to Delta Lake table.	required
`path`	`str \| Path`	The data source. A string or Path to the Delta Lake table.	required
`params`	`Mapping[ir.Scalar, Any] \| None`	Mapping of scalar parameter expressions to value.	`None`
`kwargs`	`Any`	Additional keyword arguments passed to deltalake.writer.write_deltalake method	`{}`

`to_parquet(expr, path, *, params=None, **kwargs)` ¶

Write the results of executing the given expression to a parquet file.

This method is eager and will execute the associated expression immediately.

Parameters:

Name	Type	Description	Default
`expr`	`ir.Table`	The ibis expression to execute and persist to parquet.	required
`path`	`str \| Path`	The data source. A string or Path to the parquet file.	required
`params`	`Mapping[ir.Scalar, Any] \| None`	Mapping of scalar parameter expressions to value.	`None`
`**kwargs`	`Any`	Additional keyword arguments passed to pyarrow.parquet.ParquetWriter	`{}`

`to_pyarrow(expr, *, params=None, limit=None, **kwargs)` ¶

Execute expression and return results in as a pyarrow table.

This method is eager and will execute the associated expression immediately.

Parameters:

Name	Type	Description	Default
`expr`	`ir.Expr`	Ibis expression to export to pyarrow	required
`params`	`Mapping[ir.Scalar, Any] \| None`	Mapping of scalar parameter expressions to value.	`None`
`limit`	`int \| str \| None`	An integer to effect a specific row limit. A value of `None` means "no limit". The default is in `ibis/config.py`.	`None`
`kwargs`	`Any`	Keyword arguments	`{}`

Returns:

Type	Description
`Table`	A pyarrow table holding the results of the executed expression.

`to_pyarrow_batches(expr, *, params=None, limit=None, chunk_size=1000000, **kwargs)` ¶

Execute expression and return a RecordBatchReader.

This method is eager and will execute the associated expression immediately.

Parameters:

Name	Type	Description	Default
`expr`	`ir.Expr`	Ibis expression to export to pyarrow	required
`limit`	`int \| str \| None`	An integer to effect a specific row limit. A value of `None` means "no limit". The default is in `ibis/config.py`.	`None`
`params`	`Mapping[ir.Scalar, Any] \| None`	Mapping of scalar parameter expressions to value.	`None`
`chunk_size`	`int`	Maximum number of rows in each returned record batch.	`1000000`
`kwargs`	`Any`	Keyword arguments	`{}`

Returns:

Type	Description
`results`	RecordBatchReader

`to_torch(expr, *, params=None, limit=None, **kwargs)` ¶

Execute an expression and return results as a dictionary of torch tensors.

Parameters:

Name	Type	Description	Default
`expr`	`ir.Expr`	Ibis expression to execute.	required
`params`	`Mapping[ir.Scalar, Any] \| None`	Parameters to substitute into the expression.	`None`
`limit`	`int \| str \| None`	An integer to effect a specific row limit. A value of `None` means no limit.	`None`
`kwargs`	`Any`	Keyword arguments passed into the backend's `to_torch` implementation.	`{}`

Returns:

Type	Description
`dict[str, torch.Tensor]`	A dictionary of torch tensors, keyed by column name.

Last update: August 1, 2023

Dask¶

ibis.memtable Support ¶

Install¶

Connect¶

API¶

Connection Parameters¶

do_connect(dictionary=None) ¶

Backend API¶

Backend ¶

Attributes¶

db_identity: str cached property ¶

tables cached property ¶

Functions¶

add_operation(operation) ¶

compile(query, params=None, **kwargs) ¶

connect(*args, **kwargs) ¶

Notes¶

create_table(name, obj=None, *, schema=None, database=None, temp=None, overwrite=False) ¶

database(name=None) ¶

from_dataframe(df, name='df', client=None) ¶

read_csv(path, table_name=None, **kwargs) ¶

read_parquet(path, table_name=None, **kwargs) ¶

register_options() classmethod ¶

to_csv(expr, path, *, params=None, **kwargs) ¶

to_delta(expr, path, *, params=None, **kwargs) ¶

to_parquet(expr, path, *, params=None, **kwargs) ¶

to_pyarrow(expr, *, params=None, limit=None, **kwargs) ¶

to_pyarrow_batches(expr, *, params=None, limit=None, chunk_size=1000000, **kwargs) ¶

to_torch(expr, *, params=None, limit=None, **kwargs) ¶

Dask ¶

`ibis.memtable` Support ¶

`do_connect(dictionary=None)` ¶

`Backend` ¶

`db_identity: str` `cached` `property` ¶

`tables` `cached` `property` ¶

`add_operation(operation)` ¶

`compile(query, params=None, **kwargs)` ¶

`connect(*args, **kwargs)` ¶

`create_table(name, obj=None, *, schema=None, database=None, temp=None, overwrite=False)` ¶

`database(name=None)` ¶

`from_dataframe(df, name='df', client=None)` ¶

`read_csv(path, table_name=None, **kwargs)` ¶

`read_parquet(path, table_name=None, **kwargs)` ¶

`register_options()` `classmethod` ¶

`to_csv(expr, path, *, params=None, **kwargs)` ¶

`to_delta(expr, path, *, params=None, **kwargs)` ¶

`to_parquet(expr, path, *, params=None, **kwargs)` ¶

`to_pyarrow(expr, *, params=None, limit=None, **kwargs)` ¶

`to_pyarrow_batches(expr, *, params=None, limit=None, chunk_size=1000000, **kwargs)` ¶

`to_torch(expr, *, params=None, limit=None, **kwargs)` ¶