DuckDB¶
duckdb
>= 0.5.0 requires duckdb-engine
>= 0.6.2
If you encounter problems when using duckdb
>= 0.5.0 you may need to
upgrade duckdb-engine
to at least version 0.6.2.
See this issue for more details.
ibis.memtable
Support ¶
The DuckDB backend supports memtable
s by natively executing queries against the underlying storage (e.g., pyarrow Tables or pandas DataFrames).
Install¶
Install ibis
and dependencies for the DuckDB backend:
pip install 'ibis-framework[duckdb]'
conda install -c conda-forge ibis-duckdb
mamba install -c conda-forge ibis-duckdb
Connect¶
ibis.duckdb.connect
¶
con = ibis.duckdb.connect() # (1)
- Use an ephemeral, in-memory database
con = ibis.duckdb.connect("mydb.duckdb") # (1)
- Connect to, or create, a local DuckDB file
ibis.duckdb.connect
is a thin wrapper around ibis.backends.duckdb.Backend.do_connect
.
Connection Parameters¶
do_connect(database=':memory:', read_only=False, temp_directory=None, **config)
¶
Create an Ibis client connected to a DuckDB database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
database |
str | Path
|
Path to a duckdb database. |
':memory:'
|
read_only |
bool
|
Whether the database is read-only. |
False
|
temp_directory |
str | Path | None
|
Directory to use for spilling to disk. Only set by default for in-memory connections. |
None
|
config |
Any
|
DuckDB configuration parameters. See the DuckDB configuration documentation for possible configuration values. |
{}
|
Examples:
>>> import ibis
>>> ibis.duckdb.connect("database.ddb", threads=4, memory_limit="1GB")
<ibis.backends.duckdb.Backend object at ...>
ibis.connect
URL format¶
In addition to ibis.duckdb.connect
, you can also connect to DuckDB by
passing a properly formatted DuckDB connection URL to ibis.connect
con = ibis.connect("duckdb:///path/to/local/file")
con = ibis.connect("duckdb://") # (1)
- ephemeral, in-memory database
MotherDuck¶
The DuckDB backend supports MotherDuck. If you have an
account, you can connect to MotherDuck by passing in the string md:
or
motherduck:
. ibis
will trigger the authentication prompt in-browser.
>>> import ibis
>>> con = ibis.duckdb.connect("md:")
Authentication to MotherDuck will trigger on the first call that requires retrieving information (in this case list_tables
)
>>> con.list_tables()
Attempting to automatically open the SSO authorization page in your default browser.
1. Please open this link to login into your account: https://auth.motherduck.com/activate
2. Enter the following code: ZSRQ-GJQS
Token successfully retrieved ✅
You can store it as an environment variable to avoid having to log in again:
$ export motherduck_token='****************'
['penguins']
File Support¶
read_csv(source_list, table_name=None, **kwargs)
¶
Register a CSV file as a table in the current database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_list |
str | list[str] | tuple[str]
|
The data source(s). May be a path to a file or directory of CSV files, or an iterable of CSV files. |
required |
table_name |
str | None
|
An optional name to use for the created table. This defaults to a sequentially generated name. |
None
|
**kwargs |
Any
|
Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/csv for more information. |
{}
|
read_parquet(source_list, table_name=None, **kwargs)
¶
Register a parquet file as a table in the current database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_list |
str | Iterable[str]
|
The data source(s). May be a path to a file, an iterable of files, or directory of parquet files. |
required |
table_name |
str | None
|
An optional name to use for the created table. This defaults to a sequentially generated name. |
None
|
**kwargs |
Any
|
Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/parquet for more information. |
{}
|
read_delta(source_table, table_name=None, **kwargs)
¶
Register a Delta Lake table as a table in the current database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_table |
str
|
The data source. Must be a directory containing a Delta Lake table. |
required |
table_name |
str | None
|
An optional name to use for the created table. This defaults to a sequentially generated name. |
None
|
**kwargs |
Any
|
Additional keyword arguments passed to deltalake.DeltaTable. |
{}
|
read_json(source_list, table_name=None, **kwargs)
¶
Read newline-delimited JSON into an ibis table.
This feature requires duckdb>=0.7.0
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_list |
str | list[str] | tuple[str]
|
File or list of files |
required |
table_name |
str | None
|
Optional table name |
None
|
**kwargs |
Additional keyword arguments passed to DuckDB's |
{}
|
read_in_memory(source, table_name=None)
¶
Register a Pandas DataFrame or pyarrow object as a table in the current database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
pd.DataFrame | pa.Table | pa.RecordBatchReader
|
The data source. |
required |
table_name |
str | None
|
An optional name to use for the created table. This defaults to a sequentially generated name. |
None
|