Skip to content

Table Expressions

Table expressions form the basis for most Ibis expressions.

Table

Bases: Expr, JupyterMixin

Functions

aggregate(metrics=None, by=None, having=None, **kwargs)

Aggregate a table with a given set of reductions grouping by by.

Parameters:

Name Type Description Default
metrics Sequence[ir.Scalar] | None

Aggregate expressions

None
by Sequence[ir.Value] | None

Grouping expressions

None
having Sequence[ir.BooleanValue] | None

Post-aggregation filters

None
kwargs ir.Value

Named aggregate expressions

{}

Returns:

Type Description
Table

An aggregate table expression

alias(alias)

Create a table expression with a specific name alias.

This method is useful for exposing an ibis expression to the underlying backend for use in the Table.sql method.

.alias will create a temporary view

.alias creates a temporary view in the database.

This side effect will be removed in a future version of ibis and is not part of the public API.

Parameters:

Name Type Description Default
alias str

Name of the child expression

required

Returns:

Type Description
Table

An table expression

Examples:

>>> con = ibis.duckdb.connect("ci/ibis-testing-data/ibis_testing.ddb")
>>> t = con.table("functional_alltypes")
>>> expr = t.alias("my_t").sql("SELECT sum(double_col) FROM my_t")
>>> expr
r0 := AlchemyTable: functional_alltypes
  index           int64

  month           int32
r1 := View[r0]: my_t
  schema:
    index           int64

    month           int32
SQLStringView[r1]: _ibis_view_0
  query: 'SELECT sum(double_col) FROM my_t'
  schema:
    sum(double_col) float64

asof_join(left, right, predicates=(), by=(), tolerance=None, *, suffixes=('_x', '_y'))

Perform an "as-of" join between left and right.

Similar to a left join except that the match is done on nearest key rather than equal keys.

Optionally, match keys with by before joining with predicates.

Parameters:

Name Type Description Default
left Table

Table expression

required
right Table

Table expression

required
predicates str | ir.BooleanColumn | Sequence[str | ir.BooleanColumn]

Join expressions

()
by str | ir.Column | Sequence[str | ir.Column]

column to group by before joining

()
tolerance str | ir.IntervalScalar | None

Amount of time to look behind when joining

None
suffixes tuple[str, str]

Left and right suffixes that will be used to rename overlapping columns.

('_x', '_y')

Returns:

Type Description
Table

Table expression

count(where=None)

Compute the number of rows in the table.

Parameters:

Name Type Description Default
where ir.BooleanValue | None

Optional boolean expression to filter rows when counting.

None

Returns:

Type Description
IntegerScalar

Number of rows in the table

Examples:

>>> import ibis
>>> from ibis import _
>>> t = ibis.table(dict(a="int"), name="t")
>>> t.count()
r0 := UnboundTable: t
  a int64
count: CountStar(t)
>>> t.aggregate(n=_.count(_.a > 1), total=_.sum())
r0 := UnboundTable: t
  a int64
Aggregation[r0]
  metrics:
    n:     CountStar(t, where=r0.a > 1)
    total: Sum(r0.a)

cross_join(left, right, *rest, suffixes=('_x', '_y'))

Compute the cross join of a sequence of tables.

Parameters:

Name Type Description Default
left Table

Left table

required
right Table

Right table

required
rest Table

Additional tables to cross join

()
suffixes tuple[str, str]

Left and right suffixes that will be used to rename overlapping columns.

('_x', '_y')

Returns:

Type Description
Table

Cross join of left, right and rest

Examples:

>>> import ibis
>>> schemas = [(name, 'int64') for name in 'abcde']
>>> a, b, c, d, e = [
...     ibis.table([(name, type)], name=name) for name, type in schemas
... ]
>>> joined1 = ibis.cross_join(a, b, c, d, e)
>>> joined1
r0 := UnboundTable[e]
  e int64
r1 := UnboundTable[d]
  d int64
r2 := UnboundTable[c]
  c int64
r3 := UnboundTable[b]
  b int64
r4 := UnboundTable[a]
  a int64
r5 := CrossJoin[r3, r2]
r6 := CrossJoin[r5, r1]
r7 := CrossJoin[r6, r0]
CrossJoin[r4, r7]

difference(*tables, distinct=True)

Compute the set difference of multiple table expressions.

The input tables must have identical schemas.

Parameters:

Name Type Description Default
tables Table

One or more table expressions

()
distinct bool

Only diff distinct rows not occurring in the calling table

True

Returns:

Type Description
Table

The rows present in self that are not present in tables.

distinct()

Compute the set of unique rows in the table.

drop(*fields)

Remove fields from a table.

Parameters:

Name Type Description Default
fields str

Fields to drop

()

Returns:

Type Description
Table

A table with all columns in fields removed.

dropna(subset=None, how='any')

Remove rows with null values from the table.

Parameters:

Name Type Description Default
subset Sequence[str] | None

Columns names to consider when dropping nulls. By default all columns are considered.

None
how Literal['any', 'all']

Determine whether a row is removed if there is at least one null value in the row ('any'), or if all row values are null ('all'). Options are 'any' or 'all'. Default is 'any'.

'any'

Examples:

>>> import ibis
>>> t = ibis.table(dict(a='int64', b='string'), name='t')
>>> t = t.dropna()  # Drop all rows where any values are null
>>> t
r0 := UnboundTable: t
  a int64
  b string
DropNa[r0]
  how: 'any'
>>> t.dropna(how='all')  # Only drop rows where all values are null
r0 := UnboundTable: t
  a int64
  b string
r1 := DropNa[r0]
  how: 'all'
>>> t.dropna(subset=['a'], how='all')  # Only drop rows where all values in column 'a' are null  # noqa: E501
r0 := UnboundTable: t
  a int64
  b string
DropNa[r0]
  how: 'all'
  subset:
    r0.a

Returns:

Type Description
Table

Table expression

fillna(replacements)

Fill null values in a table expression.

Parameters:

Name Type Description Default
replacements ir.Scalar | Mapping[str, ir.Scalar]

Value with which to fill the nulls. If passed as a mapping, the keys are column names that map to their replacement value. If passed as a scalar, all columns are filled with that value.

required
Notes

There is potential lack of type stability with the fillna API. For example, different library versions may impact whether or not a given backend promotes integer replacement values to floats.

Examples:

>>> import ibis
>>> import ibis.expr.datatypes as dt
>>> t = ibis.table([('a', 'int64'), ('b', 'string')])
>>> res = t.fillna(0.0)  # Replace nulls in all columns with 0.0
>>> res = t.fillna({c: 0.0 for c, t in t.schema().items() if t == dt.float64})

Returns:

Type Description
Table

Table expression

filter(predicates)

Select rows from table based on predicates.

Parameters:

Name Type Description Default
predicates ir.BooleanValue | Sequence[ir.BooleanValue]

Boolean value expressions used to select rows in table.

required

Returns:

Type Description
Table

Filtered table expression

get_column(name)

Get a reference to a single column from the table.

Returns:

Type Description
Column

A column named name.

get_columns(iterable)

Get multiple columns from the table.

Examples:

>>> import ibis
>>> table = ibis.table(
...    [
...        ('a', 'int64'),
...        ('b', 'string'),
...        ('c', 'timestamp'),
...        ('d', 'float'),
...    ],
...    name='t'
... )
>>> a, b, c = table.get_columns(['a', 'b', 'c'])

Returns:

Type Description
list[ir.Column]

List of column expressions

group_by(by=None, **additional_grouping_expressions)

Create a grouped table expression.

Parameters:

Name Type Description Default
by

Grouping expressions

None
additional_grouping_expressions Any

Named grouping expressions

{}

Examples:

>>> import ibis
>>> from ibis import _
>>> t = ibis.table(dict(a='int32', b='timestamp', c='double'), name='t')
>>> t.group_by([_.a, _.b]).aggregate(sum_of_c=_.c.sum())
r0 := UnboundTable: t
  a int32
  b timestamp
  c float64
Aggregation[r0]
  metrics:
    sum_of_c: Sum(r0.c)
  by:
    a: r0.a
    b: r0.b

Returns:

Type Description
GroupedTable

A grouped table expression

head(n=5)

Select the first n rows of a table.

The result set is not deterministic without a sort.

Parameters:

Name Type Description Default
n int

Number of rows to include, defaults to 5

5

Returns:

Type Description
Table

table limited to n rows

info(buf=None)

Show summary information about a table.

Currently implemented as showing column names, types and null counts.

Parameters:

Name Type Description Default
buf IO[str] | None

A writable buffer, defaults to stdout

None

intersect(*tables, distinct=True)

Compute the set intersection of multiple table expressions.

The input tables must have identical schemas.

Parameters:

Name Type Description Default
*tables Table

One or more table expressions

()
distinct bool

Only return distinct rows

True

Returns:

Type Description
Table

A new table containing the intersection of all input tables.

join(left, right, predicates=(), how='inner', *, suffixes=('_x', '_y'))

Perform a join between two tables.

Parameters:

Name Type Description Default
left Table

Left table to join

required
right Table

Right table to join

required
predicates str | Sequence[str | tuple[str | ir.Column, str | ir.Column] | ir.BooleanColumn]

Boolean or column names to join on

()
how Literal['inner', 'left', 'outer', 'right', 'semi', 'anti', 'any_inner', 'any_left', 'left_semi']

Join method

'inner'
suffixes tuple[str, str]

Left and right suffixes that will be used to rename overlapping columns.

('_x', '_y')

limit(n, offset=0)

Select the first n rows starting at offset.

Parameters:

Name Type Description Default
n int

Number of rows to include

required
offset int

Number of rows to skip first

0

Returns:

Type Description
Table

The first n rows of table starting at offset

mutate(exprs=None, **mutations)

Add columns to a table expression.

Parameters:

Name Type Description Default
exprs Sequence[ir.Expr] | None

List of named expressions to add as columns

None
mutations ir.Value

Named expressions using keyword arguments

{}

Returns:

Type Description
Table

Table expression with additional columns

Examples:

Using keywords arguments to name the new columns

>>> import ibis
>>> table = ibis.table(
...     [('foo', 'double'), ('bar', 'double')],
...     name='t'
... )
>>> expr = table.mutate(qux=table.foo + table.bar, baz=5)
>>> expr
r0 := UnboundTable[t]
  foo float64
  bar float64
Selection[r0]
  selections:
    r0
    baz: 5
    qux: r0.foo + r0.bar

Use the name method to name the new columns.

>>> new_columns = [ibis.literal(5).name('baz',),
...                (table.foo + table.bar).name('qux')]
>>> expr2 = table.mutate(new_columns)
>>> expr.equals(expr2)
True

order_by(by)

Sort a table by one or more expressions.

Parameters:

Name Type Description Default
by str | ir.Column | tuple[str | ir.Column, bool] | Sequence[tuple[str | ir.Column, bool]]

An expression (or expressions) to sort the table by.

required

Examples:

>>> import ibis
>>> t = ibis.table(dict(a='int64', b='string'))
>>> t.order_by(['a', ibis.desc('b')])
r0 := UnboundTable: unbound_table_0
  a int64
  b string
Selection[r0]
  sort_keys:
     asc|r0.a
    desc|r0.b

Returns:

Type Description
Table

Sorted table

relabel(substitutions)

Rename columns in the table.

Parameters:

Name Type Description Default
substitutions Mapping[str, str] | Callable[[str], str | None]

A mapping or function from old to new column names. If a column isn't in the mapping (or if the callable returns None) it is left with its original name.

required

Returns:

Type Description
Table

A relabeled table expression

rowid()

A unique integer per row, only valid on physical tables.

Any further meaning behind this expression is backend dependent. Generally this corresponds to some index into the database storage (for example, sqlite or duckdb's rowid).

For a monotonically increasing row number, see ibis.row_number.

Returns:

Type Description
IntegerColumn

An integer column

schema()

Get the schema for this table (if one is known).

Returns:

Type Description
Schema

The table's schema.

select(*exprs, **named_exprs)

Compute a new table expression using exprs and named_exprs.

Passing an aggregate function to this method will broadcast the aggregate's value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.

For backwards compatibility the keyword argument exprs is reserved and cannot be used to name an expression. This behavior will be removed in v4.

Parameters:

Name Type Description Default
exprs ir.Value | str | Iterable[ir.Value | str]

Column expression, string, or list of column expressions and strings.

()
named_exprs ir.Value | str

Column expressions

{}

Returns:

Type Description
Table

Table expression

Examples:

Simple projection

>>> import ibis
>>> t = ibis.table(dict(a="int64", b="double"), name='t')
>>> proj = t.select(t.a, b_plus_1=t.b + 1)
>>> proj
r0 := UnboundTable[t]
  a int64
  b float64
Selection[r0]
  selections:
    a:        r0.a
    b_plus_1: r0.b + 1
>>> proj2 = t.select("a", b_plus_1=t.b + 1)
>>> proj.equals(proj2)
True

Aggregate projection

>>> agg_proj = t.select(sum_a=t.a.sum(), mean_b=t.b.mean())
>>> agg_proj
r0 := UnboundTable[t]
  a int64
  b float64
Selection[r0]
  selections:
    sum_a:  Window(Sum(r0.a), window=Window(how='rows'))
    mean_b: Window(Mean(r0.b), window=Window(how='rows'))

Note the Window objects here.

Their existence means that the result of the aggregation will be broadcast across the number of rows in the input column. The purpose of this expression rewrite is to make it easy to write column/scalar-aggregate operations like

>>> t.select(demeaned_a=t.a - t.a.mean())
r0 := UnboundTable[t]
  a int64
  b float64
Selection[r0]
  selections:
    demeaned_a: r0.a - Window(Mean(r0.a), window=Window(how='rows'))

set_column(name, expr)

Replace an existing column with a new expression.

Parameters:

Name Type Description Default
name str

Column name to replace

required
expr ir.Value

New data for column

required

Returns:

Type Description
Table

Table expression with new columns

sql(query)

Run a SQL query against a table expression.

The SQL string is backend specific

query must be valid SQL for the execution backend the expression will run against

See Table.alias for details on using named table expressions in a SQL string.

Parameters:

Name Type Description Default
query str

Query string

required

Returns:

Type Description
Table

An opaque table expression

Examples:

>>> con = ibis.duckdb.connect("ci/ibis-testing-data/ibis_testing.ddb")
>>> t = con.table("functional_alltypes")
>>> expr = t.sql("SELECT sum(double_col) FROM functional_alltypes")
>>> expr
r0 := AlchemyTable: functional_alltypes
  index           int64

  month           int32
SQLStringView[r0]: _ibis_view_1
  query: 'SELECT sum(double_col) FROM functional_alltypes'
  schema:
    sum(double_col) float64

to_array()

View a single column table as an array.

Returns:

Type Description
Value

A single column view of a table

union(*tables, distinct=False)

Compute the set union of multiple table expressions.

The input tables must have identical schemas.

Parameters:

Name Type Description Default
*tables Table

One or more table expressions

()
distinct bool

Only return distinct rows

False

Returns:

Type Description
Table

A new table containing the union of all input tables.

unpack(*columns)

Project the struct fields of each of columns into self.

Existing fields are retained in the projection.

Parameters:

Name Type Description Default
columns str

String column names to project into self.

()

Returns:

Type Description
Table

The child table with struct fields of each of columns projected.

Examples:

>>> schema = dict(a="struct<b: float, c: string>", d="string")
>>> t = ibis.table(schema, name="t")
>>> t
UnboundTable: t
  a struct<b: float64, c: string>
  d string
>>> t.unpack("a")
r0 := UnboundTable: t
  a struct<b: float64, c: string>
  d string
Selection[r0]
  selections:
    b: StructField(r0.a, field='b')
    c: StructField(r0.a, field='c')
    d: r0.d
See Also

StructValue.lift

view()

Create a new table expression distinct from the current one.

Use this API for any self-referencing operations like a self-join.

Returns:

Type Description
Table

Table expression

GroupedTable

An intermediate table expression to hold grouping information.

Functions

count(metric_name='count')

Computing the number of rows per group.

Parameters:

Name Type Description Default
metric_name str

Name to use for the row count metric

'count'

Returns:

Type Description
Table

The aggregated table

having(expr)

Add a post-aggregation result filter expr.

Parameters:

Name Type Description Default
expr ir.BooleanScalar

An expression that filters based on an aggregate value.

required

Returns:

Type Description
GroupedTable

A grouped table expression

mutate(exprs=None, **kwds)

Return a table projection with window functions applied.

Any arguments can be functions.

Parameters:

Name Type Description Default
exprs ir.Value | Sequence[ir.Value] | None

List of expressions

None
kwds ir.Value

Expressions

{}

Examples:

>>> import ibis
>>> t = ibis.table([
...     ('foo', 'string'),
...     ('bar', 'string'),
...     ('baz', 'double'),
... ], name='t')
>>> t
UnboundTable[t]
  foo string
  bar string
  baz float64
>>> expr = (t.group_by('foo')
...          .order_by(ibis.desc('bar'))
...          .mutate(qux=lambda x: x.baz.lag(), qux2=t.baz.lead()))
>>> print(expr)
r0 := UnboundTable[t]
  foo string
  bar string
  baz float64
Selection[r0]
  selections:
    r0
    qux:  Window(Lag(r0.baz), window=Window(group_by=[r0.foo], order_by=[desc|r0.bar], how='rows'))
    qux2: Window(Lead(r0.baz), window=Window(group_by=[r0.foo], order_by=[desc|r0.bar], how='rows'))

Returns:

Type Description
Table

A table expression with window functions applied

order_by(expr)

Sort a grouped table expression by expr.

Notes

This API call is ignored in aggregations.

Parameters:

Name Type Description Default
expr ir.Value | Iterable[ir.Value]

Expressions to order the results by

required

Returns:

Type Description
GroupedTable

A sorted grouped GroupedTable

over(window)

Apply a window over the input expressions.

Parameters:

Name Type Description Default
window _window.Window

Window to add to the input

required

Returns:

Type Description
GroupedTable

A new grouped table expression

projection(exprs)

Project new columns out of the grouped table.

See Also

GroupedTable.mutate


Last update: August 5, 2022