Skip to content

DuckDB

filebadge

exportbadge

duckdb >= 0.5.0 requires duckdb-engine >= 0.6.2

If you encounter problems when using duckdb >= 0.5.0 you may need to upgrade duckdb-engine to at least version 0.6.2.

See this issue for more details.

ibis.memtable Support memtable

The DuckDB backend supports memtables by natively executing queries against the underlying storage (e.g., pyarrow Tables or pandas DataFrames).

Install

Install ibis and dependencies for the DuckDB backend:

pip install 'ibis-framework[duckdb]'
conda install -c conda-forge ibis-duckdb
mamba install -c conda-forge ibis-duckdb

Connect

ibis.duckdb.connect

con = ibis.duckdb.connect()  # (1)
  1. Use an ephemeral, in-memory database
con = ibis.duckdb.connect("mydb.duckdb")  # (1)
  1. Connect to, or create, a local DuckDB file

ibis.duckdb.connect is a thin wrapper around ibis.backends.duckdb.Backend.do_connect.

Connection Parameters

do_connect(database=':memory:', read_only=False, temp_directory=None, **config)

Create an Ibis client connected to a DuckDB database.

Parameters:

Name Type Description Default
database str | Path

Path to a duckdb database.

':memory:'
read_only bool

Whether the database is read-only.

False
temp_directory str | Path | None

Directory to use for spilling to disk. Only set by default for in-memory connections.

None
config Any

DuckDB configuration parameters. See the DuckDB configuration documentation for possible configuration values.

{}

Examples:

>>> import ibis
>>> ibis.duckdb.connect("database.ddb", threads=4, memory_limit="1GB")
<ibis.backends.duckdb.Backend object at ...>

ibis.connect URL format

In addition to ibis.duckdb.connect, you can also connect to DuckDB by passing a properly formatted DuckDB connection URL to ibis.connect

con = ibis.connect("duckdb:///path/to/local/file")
con = ibis.connect("duckdb://") # (1)
  1. ephemeral, in-memory database

MotherDuck

The DuckDB backend supports MotherDuck. If you have an account, you can connect to MotherDuck by passing in the string md: or motherduck:. ibis will trigger the authentication prompt in-browser.

>>> import ibis

>>> con = ibis.duckdb.connect("md:")

Authentication to MotherDuck will trigger on the first call that requires retrieving information (in this case list_tables)

>>> con.list_tables()
Attempting to automatically open the SSO authorization page in your default browser.
1. Please open this link to login into your account: https://auth.motherduck.com/activate
2. Enter the following code: ZSRQ-GJQS


Token successfully retrieved 
You can store it as an environment variable to avoid having to log in again:
  $ export motherduck_token='****************'

['penguins']

File Support

read_csv(source_list, table_name=None, **kwargs)

Register a CSV file as a table in the current database.

Parameters:

Name Type Description Default
source_list str | list[str] | tuple[str]

The data source(s). May be a path to a file or directory of CSV files, or an iterable of CSV files.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None
**kwargs Any

Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/csv for more information.

{}

read_parquet(source_list, table_name=None, **kwargs)

Register a parquet file as a table in the current database.

Parameters:

Name Type Description Default
source_list str | Iterable[str]

The data source(s). May be a path to a file, an iterable of files, or directory of parquet files.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None
**kwargs Any

Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/parquet for more information.

{}

read_delta(source_table, table_name=None, **kwargs)

Register a Delta Lake table as a table in the current database.

Parameters:

Name Type Description Default
source_table str

The data source. Must be a directory containing a Delta Lake table.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None
**kwargs Any

Additional keyword arguments passed to deltalake.DeltaTable.

{}

read_json(source_list, table_name=None, **kwargs)

Read newline-delimited JSON into an ibis table.

This feature requires duckdb>=0.7.0

Parameters:

Name Type Description Default
source_list str | list[str] | tuple[str]

File or list of files

required
table_name str | None

Optional table name

None
**kwargs

Additional keyword arguments passed to DuckDB's read_json_auto function

{}

read_in_memory(source, table_name=None)

Register a Pandas DataFrame or pyarrow object as a table in the current database.

Parameters:

Name Type Description Default
source pd.DataFrame | pa.Table | pa.RecordBatchReader

The data source.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None

read_sqlite(path, table_name=None)

Register a table from a SQLite database into a DuckDB table.

Parameters:

Name Type Description Default
path str | Path

The path to the SQLite database

required
table_name str | None

The table to read

None

read_postgres(uri, table_name=None, schema='public')

Register a table from a postgres instance into a DuckDB table.

Parameters:

Name Type Description Default
uri

The postgres URI in form 'postgres://user:password@host:port'

required
table_name str | None

The table to read

None
schema str

PostgreSQL schema where table_name resides

'public'

Last update: August 1, 2023