Dask¶
ibis.memtable
Support ¶
The Dask backend supports memtable
s by natively executing queries against the underlying storage (e.g., pyarrow Tables or pandas DataFrames).
Install¶
Install ibis
and dependencies for the Dask backend:
pip install 'ibis-framework[dask]'
conda install -c conda-forge ibis-dask
mamba install -c conda-forge ibis-dask
Connect¶
API¶
Create a client by passing in a dictionary of paths to ibis.dask.connect
.
See ibis.backends.dask.Backend.do_connect
for connection parameter information.
ibis.dask.connect
is a thin wrapper around ibis.backends.dask.Backend.do_connect
.
Connection Parameters¶
do_connect(dictionary=None)
¶
Construct a Dask backend client from a dictionary of data sources.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dictionary |
MutableMapping[str, dd.DataFrame] | None
|
An optional mapping from |
None
|
Examples:
>>> import ibis
>>> import dask.dataframe as dd
>>> data = {
... "t": dd.read_parquet("path/to/file.parquet"),
... "s": dd.read_csv("path/to/file.csv"),
... }
>>> ibis.dask.connect(data)
Backend API¶
Backend
¶
Bases: BasePandasBackend
Attributes¶
db_identity: str
cached
property
¶
Return the identity of the database.
Multiple connections to the same
database will return the same value for db_identity
.
The default implementation assumes connection parameters uniquely specify the database.
Returns:
Type | Description |
---|---|
Hashable
|
Database identity |
tables
cached
property
¶
An accessor for tables in the database.
Tables may be accessed by name using either index or attribute access:
Examples:
>>> con = ibis.sqlite.connect("example.db")
>>> people = con.tables['people'] # access via index
>>> people = con.tables.people # access via attribute
Functions¶
add_operation(operation)
¶
Add a translation function to the backend for a specific operation.
Operations are defined in ibis.expr.operations
, and a translation
function receives the translator object and an expression as
parameters, and returns a value depending on the backend.
compile(query, params=None, **kwargs)
¶
connect(*args, **kwargs)
¶
Connect to the database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
Mandatory connection parameters, see the docstring of |
()
|
|
**kwargs |
Extra connection parameters, see the docstring of |
{}
|
Notes¶
This creates a new backend instance with saved args
and kwargs
,
then calls reconnect
and finally returns the newly created and
connected backend instance.
Returns:
Type | Description |
---|---|
BaseBackend
|
An instance of the backend |
create_table(name, obj=None, *, schema=None, database=None, temp=None, overwrite=False)
¶
Create a table.
database(name=None)
¶
Return a Database
object for the name
database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str | None
|
Name of the database to return the object for. |
None
|
Returns:
Type | Description |
---|---|
Database
|
A database object for the specified database. |
from_dataframe(df, name='df', client=None)
¶
Construct an ibis table from a pandas DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
pd.DataFrame
|
A pandas DataFrame |
required |
name |
str
|
The name of the pandas DataFrame |
'df'
|
client |
BasePandasBackend | None
|
Client dictionary will be mutated with the name of the DataFrame, if not provided a new client is created |
None
|
Returns:
Type | Description |
---|---|
Table
|
A table expression |
read_csv(path, table_name=None, **kwargs)
¶
Register a CSV file as a table in the current backend.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str | Path
|
The data source. A string or Path to the CSV file. |
required |
table_name |
str | None
|
An optional name to use for the created table. This defaults to a sequentially generated name. |
None
|
**kwargs |
Any
|
Additional keyword arguments passed to the backend loading function. |
{}
|
Returns:
Type | Description |
---|---|
ir.Table
|
The just-registered table |
read_parquet(path, table_name=None, **kwargs)
¶
Register a parquet file as a table in the current backend.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str | Path
|
The data source. |
required |
table_name |
str | None
|
An optional name to use for the created table. This defaults to a sequentially generated name. |
None
|
**kwargs |
Any
|
Additional keyword arguments passed to the backend loading function. |
{}
|
Returns:
Type | Description |
---|---|
ir.Table
|
The just-registered table |
register_options()
classmethod
¶
Register custom backend options.
to_csv(expr, path, *, params=None, **kwargs)
¶
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Table
|
The ibis expression to execute and persist to CSV. |
required |
path |
str | Path
|
The data source. A string or Path to the CSV file. |
required |
params |
Mapping[ir.Scalar, Any] | None
|
Mapping of scalar parameter expressions to value. |
None
|
kwargs |
Any
|
Additional keyword arguments passed to pyarrow.csv.CSVWriter |
{}
|
to_delta(expr, path, *, params=None, **kwargs)
¶
Write the results of executing the given expression to a Delta Lake table.
This method is eager and will execute the associated expression immediately.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Table
|
The ibis expression to execute and persist to Delta Lake table. |
required |
path |
str | Path
|
The data source. A string or Path to the Delta Lake table. |
required |
params |
Mapping[ir.Scalar, Any] | None
|
Mapping of scalar parameter expressions to value. |
None
|
kwargs |
Any
|
Additional keyword arguments passed to deltalake.writer.write_deltalake method |
{}
|
to_parquet(expr, path, *, params=None, **kwargs)
¶
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Table
|
The ibis expression to execute and persist to parquet. |
required |
path |
str | Path
|
The data source. A string or Path to the parquet file. |
required |
params |
Mapping[ir.Scalar, Any] | None
|
Mapping of scalar parameter expressions to value. |
None
|
**kwargs |
Any
|
Additional keyword arguments passed to pyarrow.parquet.ParquetWriter |
{}
|
to_pyarrow(expr, *, params=None, limit=None, **kwargs)
¶
Execute expression and return results in as a pyarrow table.
This method is eager and will execute the associated expression immediately.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Expr
|
Ibis expression to export to pyarrow |
required |
params |
Mapping[ir.Scalar, Any] | None
|
Mapping of scalar parameter expressions to value. |
None
|
limit |
int | str | None
|
An integer to effect a specific row limit. A value of |
None
|
kwargs |
Any
|
Keyword arguments |
{}
|
Returns:
Type | Description |
---|---|
Table
|
A pyarrow table holding the results of the executed expression. |
to_pyarrow_batches(expr, *, params=None, limit=None, chunk_size=1000000, **kwargs)
¶
Execute expression and return a RecordBatchReader.
This method is eager and will execute the associated expression immediately.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Expr
|
Ibis expression to export to pyarrow |
required |
limit |
int | str | None
|
An integer to effect a specific row limit. A value of |
None
|
params |
Mapping[ir.Scalar, Any] | None
|
Mapping of scalar parameter expressions to value. |
None
|
chunk_size |
int
|
Maximum number of rows in each returned record batch. |
1000000
|
kwargs |
Any
|
Keyword arguments |
{}
|
Returns:
Type | Description |
---|---|
results
|
RecordBatchReader |
to_torch(expr, *, params=None, limit=None, **kwargs)
¶
Execute an expression and return results as a dictionary of torch tensors.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Expr
|
Ibis expression to execute. |
required |
params |
Mapping[ir.Scalar, Any] | None
|
Parameters to substitute into the expression. |
None
|
limit |
int | str | None
|
An integer to effect a specific row limit. A value of |
None
|
kwargs |
Any
|
Keyword arguments passed into the backend's |
{}
|
Returns:
Type | Description |
---|---|
dict[str, torch.Tensor]
|
A dictionary of torch tensors, keyed by column name. |