Table Expressions¶

Table expressions form the basis for most Ibis expressions.

`Table` ¶

Bases: Expr, JupyterMixin

Functions¶

`aggregate(metrics=None, by=None, having=None, **kwargs)` ¶

Aggregate a table with a given set of reductions grouping by by.

Parameters:

Name	Type	Description	Default
`metrics`	`Sequence[ir.Scalar] \| None`	Aggregate expressions	`None`
`by`	`Sequence[ir.Value] \| None`	Grouping expressions	`None`
`having`	`Sequence[ir.BooleanValue] \| None`	Post-aggregation filters	`None`
`kwargs`	`ir.Value`	Named aggregate expressions	`{}`

Returns:

Type	Description
`Table`	An aggregate table expression

`alias(alias)` ¶

Create a table expression with a specific name alias.

This method is useful for exposing an ibis expression to the underlying backend for use in the Table.sql method.

.alias will create a temporary view

.alias creates a temporary view in the database.

This side effect will be removed in a future version of ibis and is not part of the public API.

Parameters:

Name	Type	Description	Default
`alias`	`str`	Name of the child expression	required

Returns:

Type	Description
`Table`	An table expression

Examples:

>>> con = ibis.duckdb.connect("ci/ibis-testing-data/ibis_testing.ddb")
>>> t = con.table("functional_alltypes")
>>> expr = t.alias("my_t").sql("SELECT sum(double_col) FROM my_t")
>>> expr
r0 := AlchemyTable: functional_alltypes
  index           int64
    ⋮
  month           int32
r1 := View[r0]: my_t
  schema:
    index           int64
      ⋮
    month           int32
SQLStringView[r1]: _ibis_view_0
  query: 'SELECT sum(double_col) FROM my_t'
  schema:
    sum(double_col) float64

`asof_join(left, right, predicates=(), by=(), tolerance=None, *, suffixes=('_x', '_y'))` ¶

Perform an "as-of" join between left and right.

Similar to a left join except that the match is done on nearest key rather than equal keys.

Optionally, match keys with by before joining with predicates.

Parameters:

Name	Type	Description	Default
`left`	`Table`	Table expression	required
`right`	`Table`	Table expression	required
`predicates`	`str \| ir.BooleanColumn \| Sequence[str \| ir.BooleanColumn]`	Join expressions	`()`
`by`	`str \| ir.Column \| Sequence[str \| ir.Column]`	column to group by before joining	`()`
`tolerance`	`str \| ir.IntervalScalar \| None`	Amount of time to look behind when joining	`None`
`suffixes`	`tuple[str, str]`	Left and right suffixes that will be used to rename overlapping columns.	`('_x', '_y')`

Returns:

Type	Description
`Table`	Table expression

`count(where=None)` ¶

Compute the number of rows in the table.

Parameters:

Name	Type	Description	Default
`where`	`ir.BooleanValue \| None`	Optional boolean expression to filter rows when counting.	`None`

Returns:

Type	Description
`IntegerScalar`	Number of rows in the table

Examples:

>>> import ibis
>>> from ibis import _
>>> t = ibis.table(dict(a="int"), name="t")
>>> t.count()
r0 := UnboundTable: t
  a int64
count: CountStar(t)
>>> t.aggregate(n=_.count(_.a > 1), total=_.sum())
r0 := UnboundTable: t
  a int64
Aggregation[r0]
  metrics:
    n:     CountStar(t, where=r0.a > 1)
    total: Sum(r0.a)

`cross_join(left, right, *rest, suffixes=('_x', '_y'))` ¶

Compute the cross join of a sequence of tables.

Parameters:

Name	Type	Description	Default
`left`	`Table`	Left table	required
`right`	`Table`	Right table	required
`rest`	`Table`	Additional tables to cross join	`()`
`suffixes`	`tuple[str, str]`	Left and right suffixes that will be used to rename overlapping columns.	`('_x', '_y')`

Returns:

Type	Description
`Table`	Cross join of `left`, `right` and `rest`

Examples:

>>> import ibis
>>> schemas = [(name, 'int64') for name in 'abcde']
>>> a, b, c, d, e = [
...     ibis.table([(name, type)], name=name) for name, type in schemas
... ]
>>> joined1 = ibis.cross_join(a, b, c, d, e)
>>> joined1
r0 := UnboundTable[e]
  e int64
r1 := UnboundTable[d]
  d int64
r2 := UnboundTable[c]
  c int64
r3 := UnboundTable[b]
  b int64
r4 := UnboundTable[a]
  a int64
r5 := CrossJoin[r3, r2]
r6 := CrossJoin[r5, r1]
r7 := CrossJoin[r6, r0]
CrossJoin[r4, r7]

`difference(*tables, distinct=True)` ¶

Compute the set difference of multiple table expressions.

The input tables must have identical schemas.

Parameters:

Name	Type	Description	Default
`tables`	`Table`	One or more table expressions	`()`
`distinct`	`bool`	Only diff distinct rows not occurring in the calling table	`True`

Returns:

Type	Description
`Table`	The rows present in `self` that are not present in `tables`.

`distinct()` ¶

Compute the set of unique rows in the table.

`drop(*fields)` ¶

Remove fields from a table.

Parameters:

Name	Type	Description	Default
`fields`	`str`	Fields to drop	`()`

Returns:

Type	Description
`Table`	A table with all columns in `fields` removed.

`dropna(subset=None, how='any')` ¶

Remove rows with null values from the table.

Parameters:

Name	Type	Description	Default
`subset`	`Sequence[str] \| None`	Columns names to consider when dropping nulls. By default all columns are considered.	`None`
`how`	`Literal['any', 'all']`	Determine whether a row is removed if there is at least one null value in the row ('any'), or if all row values are null ('all'). Options are 'any' or 'all'. Default is 'any'.	`'any'`

Examples:

>>> import ibis
>>> t = ibis.table(dict(a='int64', b='string'), name='t')
>>> t = t.dropna()  # Drop all rows where any values are null
>>> t
r0 := UnboundTable: t
  a int64
  b string
DropNa[r0]
  how: 'any'
>>> t.dropna(how='all')  # Only drop rows where all values are null
r0 := UnboundTable: t
  a int64
  b string
r1 := DropNa[r0]
  how: 'all'
>>> t.dropna(subset=['a'], how='all')  # Only drop rows where all values in column 'a' are null  # noqa: E501
r0 := UnboundTable: t
  a int64
  b string
DropNa[r0]
  how: 'all'
  subset:
    r0.a

Returns:

Type	Description
`Table`	Table expression

`fillna(replacements)` ¶

Fill null values in a table expression.

Parameters:

Name	Type	Description	Default
`replacements`	`ir.Scalar \| Mapping[str, ir.Scalar]`	Value with which to fill the nulls. If passed as a mapping, the keys are column names that map to their replacement value. If passed as a scalar, all columns are filled with that value.	required

Notes¶

There is potential lack of type stability with the fillna API. For example, different library versions may impact whether or not a given backend promotes integer replacement values to floats.

Examples:

>>> import ibis
>>> import ibis.expr.datatypes as dt
>>> t = ibis.table([('a', 'int64'), ('b', 'string')])
>>> res = t.fillna(0.0)  # Replace nulls in all columns with 0.0
>>> res = t.fillna({c: 0.0 for c, t in t.schema().items() if t == dt.float64})

Returns:

Type	Description
`Table`	Table expression

`filter(predicates)` ¶

Select rows from table based on predicates.

Parameters:

Name	Type	Description	Default
`predicates`	`ir.BooleanValue \| Sequence[ir.BooleanValue]`	Boolean value expressions used to select rows in `table`.	required

Returns:

Type	Description
`Table`	Filtered table expression

`get_column(name)` ¶

Get a reference to a single column from the table.

Returns:

Type	Description
`Column`	A column named `name`.

`get_columns(iterable)` ¶

Get multiple columns from the table.

Examples:

>>> import ibis
>>> table = ibis.table(
...    [
...        ('a', 'int64'),
...        ('b', 'string'),
...        ('c', 'timestamp'),
...        ('d', 'float'),
...    ],
...    name='t'
... )
>>> a, b, c = table.get_columns(['a', 'b', 'c'])

Returns:

Type	Description
`list[ir.Column]`	List of column expressions

`group_by(by=None, **additional_grouping_expressions)` ¶

Create a grouped table expression.

Parameters:

Name	Type	Description	Default
`by`		Grouping expressions	`None`
`additional_grouping_expressions`	`Any`	Named grouping expressions	`{}`

Examples:

>>> import ibis
>>> from ibis import _
>>> t = ibis.table(dict(a='int32', b='timestamp', c='double'), name='t')
>>> t.group_by([_.a, _.b]).aggregate(sum_of_c=_.c.sum())
r0 := UnboundTable: t
  a int32
  b timestamp
  c float64
Aggregation[r0]
  metrics:
    sum_of_c: Sum(r0.c)
  by:
    a: r0.a
    b: r0.b

Returns:

Type	Description
`GroupedTable`	A grouped table expression

`head(n=5)` ¶

Select the first n rows of a table.

The result set is not deterministic without a sort.

Parameters:

Name	Type	Description	Default
`n`	`int`	Number of rows to include, defaults to 5	`5`

Returns:

Type	Description
`Table`	`table` limited to `n` rows

`info(buf=None)` ¶

Show summary information about a table.

Currently implemented as showing column names, types and null counts.

Parameters:

Name	Type	Description	Default
`buf`	`IO[str] \| None`	A writable buffer, defaults to stdout	`None`

`intersect(*tables, distinct=True)` ¶

Compute the set intersection of multiple table expressions.

The input tables must have identical schemas.

Parameters:

Name	Type	Description	Default
`*tables`	`Table`	One or more table expressions	`()`
`distinct`	`bool`	Only return distinct rows	`True`

Returns:

Type	Description
`Table`	A new table containing the intersection of all input tables.

`join(left, right, predicates=(), how='inner', *, suffixes=('_x', '_y'))` ¶

Perform a join between two tables.

Parameters:

Name	Type	Description	Default
`left`	`Table`	Left table to join	required
`right`	`Table`	Right table to join	required
`predicates`	`str \| Sequence[str \| tuple[str \| ir.Column, str \| ir.Column] \| ir.BooleanColumn]`	Boolean or column names to join on	`()`
`how`	`Literal['inner', 'left', 'outer', 'right', 'semi', 'anti', 'any_inner', 'any_left', 'left_semi']`	Join method	`'inner'`
`suffixes`	`tuple[str, str]`	Left and right suffixes that will be used to rename overlapping columns.	`('_x', '_y')`

`limit(n, offset=0)` ¶

Select the first n rows starting at offset.

Parameters:

Name	Type	Description	Default
`n`	`int`	Number of rows to include	required
`offset`	`int`	Number of rows to skip first	`0`

Returns:

Type	Description
`Table`	The first `n` rows of `table` starting at `offset`

`mutate(exprs=None, **mutations)` ¶

Add columns to a table expression.

Parameters:

Name	Type	Description	Default
`exprs`	`Sequence[ir.Expr] \| None`	List of named expressions to add as columns	`None`
`mutations`	`ir.Value`	Named expressions using keyword arguments	`{}`

Returns:

Type	Description
`Table`	Table expression with additional columns

Examples:

Using keywords arguments to name the new columns

>>> import ibis
>>> table = ibis.table(
...     [('foo', 'double'), ('bar', 'double')],
...     name='t'
... )
>>> expr = table.mutate(qux=table.foo + table.bar, baz=5)
>>> expr
r0 := UnboundTable[t]
  foo float64
  bar float64
Selection[r0]
  selections:
    r0
    baz: 5
    qux: r0.foo + r0.bar

Use the name method to name the new columns.

>>> new_columns = [ibis.literal(5).name('baz',),
...                (table.foo + table.bar).name('qux')]
>>> expr2 = table.mutate(new_columns)
>>> expr.equals(expr2)
True

`order_by(by)` ¶

Sort a table by one or more expressions.

Parameters:

Name	Type	Description	Default
`by`	`str \| ir.Column \| tuple[str \| ir.Column, bool] \| Sequence[tuple[str \| ir.Column, bool]]`	An expression (or expressions) to sort the table by.	required

Examples:

>>> import ibis
>>> t = ibis.table(dict(a='int64', b='string'))
>>> t.order_by(['a', ibis.desc('b')])
r0 := UnboundTable: unbound_table_0
  a int64
  b string
Selection[r0]
  sort_keys:
     asc|r0.a
    desc|r0.b

Returns:

Type	Description
`Table`	Sorted table

`relabel(substitutions)` ¶

Rename columns in the table.

Parameters:

Name	Type	Description	Default
`substitutions`	`Mapping[str, str] \| Callable[[str], str \| None]`	A mapping or function from old to new column names. If a column isn't in the mapping (or if the callable returns None) it is left with its original name.	required

Returns:

Type	Description
`Table`	A relabeled table expression

`rowid()` ¶

A unique integer per row, only valid on physical tables.

Any further meaning behind this expression is backend dependent. Generally this corresponds to some index into the database storage (for example, sqlite or duckdb's rowid).

For a monotonically increasing row number, see ibis.row_number.

Returns:

Type	Description
`IntegerColumn`	An integer column

`schema()` ¶

Get the schema for this table (if one is known).

Returns:

Type	Description
`Schema`	The table's schema.

`select(*exprs, **named_exprs)` ¶

Compute a new table expression using exprs and named_exprs.

Passing an aggregate function to this method will broadcast the aggregate's value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.

For backwards compatibility the keyword argument exprs is reserved and cannot be used to name an expression. This behavior will be removed in v4.

Parameters:

Name	Type	Description	Default
`exprs`	`ir.Value \| str \| Iterable[ir.Value \| str]`	Column expression, string, or list of column expressions and strings.	`()`
`named_exprs`	`ir.Value \| str`	Column expressions	`{}`

Returns:

Type	Description
`Table`	Table expression

Examples:

Simple projection

>>> import ibis
>>> t = ibis.table(dict(a="int64", b="double"), name='t')
>>> proj = t.select(t.a, b_plus_1=t.b + 1)
>>> proj
r0 := UnboundTable[t]
  a int64
  b float64
Selection[r0]
  selections:
    a:        r0.a
    b_plus_1: r0.b + 1
>>> proj2 = t.select("a", b_plus_1=t.b + 1)
>>> proj.equals(proj2)
True

Aggregate projection

>>> agg_proj = t.select(sum_a=t.a.sum(), mean_b=t.b.mean())
>>> agg_proj
r0 := UnboundTable[t]
  a int64
  b float64
Selection[r0]
  selections:
    sum_a:  Window(Sum(r0.a), window=Window(how='rows'))
    mean_b: Window(Mean(r0.b), window=Window(how='rows'))

Note the Window objects here.

Their existence means that the result of the aggregation will be broadcast across the number of rows in the input column. The purpose of this expression rewrite is to make it easy to write column/scalar-aggregate operations like

>>> t.select(demeaned_a=t.a - t.a.mean())
r0 := UnboundTable[t]
  a int64
  b float64
Selection[r0]
  selections:
    demeaned_a: r0.a - Window(Mean(r0.a), window=Window(how='rows'))

`set_column(name, expr)` ¶

Replace an existing column with a new expression.

Parameters:

Name	Type	Description	Default
`name`	`str`	Column name to replace	required
`expr`	`ir.Value`	New data for column	required

Returns:

Type	Description
`Table`	Table expression with new columns

`sql(query)` ¶

Run a SQL query against a table expression.

The SQL string is backend specific

query must be valid SQL for the execution backend the expression will run against

See Table.alias for details on using named table expressions in a SQL string.

Parameters:

Name	Type	Description	Default
`query`	`str`	Query string	required

Returns:

Type	Description
`Table`	An opaque table expression

Examples:

>>> con = ibis.duckdb.connect("ci/ibis-testing-data/ibis_testing.ddb")
>>> t = con.table("functional_alltypes")
>>> expr = t.sql("SELECT sum(double_col) FROM functional_alltypes")
>>> expr
r0 := AlchemyTable: functional_alltypes
  index           int64
    ⋮
  month           int32
SQLStringView[r0]: _ibis_view_1
  query: 'SELECT sum(double_col) FROM functional_alltypes'
  schema:
    sum(double_col) float64

`to_array()` ¶

View a single column table as an array.

Returns:

Type	Description
`Value`	A single column view of a table

`union(*tables, distinct=False)` ¶

Compute the set union of multiple table expressions.

The input tables must have identical schemas.

Parameters:

Name	Type	Description	Default
`*tables`	`Table`	One or more table expressions	`()`
`distinct`	`bool`	Only return distinct rows	`False`

Returns:

Type	Description
`Table`	A new table containing the union of all input tables.

`unpack(*columns)` ¶

Project the struct fields of each of columns into self.

Existing fields are retained in the projection.

Parameters:

Name	Type	Description	Default
`columns`	`str`	String column names to project into `self`.	`()`

Returns:

Type	Description
`Table`	The child table with struct fields of each of `columns` projected.

Examples:

>>> schema = dict(a="struct<b: float, c: string>", d="string")
>>> t = ibis.table(schema, name="t")
>>> t
UnboundTable: t
  a struct<b: float64, c: string>
  d string
>>> t.unpack("a")
r0 := UnboundTable: t
  a struct<b: float64, c: string>
  d string
Selection[r0]
  selections:
    b: StructField(r0.a, field='b')
    c: StructField(r0.a, field='c')
    d: r0.d

`view()` ¶

Create a new table expression distinct from the current one.

Use this API for any self-referencing operations like a self-join.

Returns:

Type	Description
`Table`	Table expression

`GroupedTable` ¶

An intermediate table expression to hold grouping information.

Functions¶

`count(metric_name='count')` ¶

Computing the number of rows per group.

Parameters:

Name	Type	Description	Default
`metric_name`	`str`	Name to use for the row count metric	`'count'`

Returns:

Type	Description
`Table`	The aggregated table

`having(expr)` ¶

Add a post-aggregation result filter expr.

Parameters:

Name	Type	Description	Default
`expr`	`ir.BooleanScalar`	An expression that filters based on an aggregate value.	required

Returns:

Type	Description
`GroupedTable`	A grouped table expression

`mutate(exprs=None, **kwds)` ¶

Return a table projection with window functions applied.

Any arguments can be functions.

Parameters:

Name	Type	Description	Default
`exprs`	`ir.Value \| Sequence[ir.Value] \| None`	List of expressions	`None`
`kwds`	`ir.Value`	Expressions	`{}`

Examples:

>>> import ibis
>>> t = ibis.table([
...     ('foo', 'string'),
...     ('bar', 'string'),
...     ('baz', 'double'),
... ], name='t')
>>> t
UnboundTable[t]
  foo string
  bar string
  baz float64
>>> expr = (t.group_by('foo')
...          .order_by(ibis.desc('bar'))
...          .mutate(qux=lambda x: x.baz.lag(), qux2=t.baz.lead()))
>>> print(expr)
r0 := UnboundTable[t]
  foo string
  bar string
  baz float64
Selection[r0]
  selections:
    r0
    qux:  Window(Lag(r0.baz), window=Window(group_by=[r0.foo], order_by=[desc|r0.bar], how='rows'))
    qux2: Window(Lead(r0.baz), window=Window(group_by=[r0.foo], order_by=[desc|r0.bar], how='rows'))

Returns:

Type	Description
`Table`	A table expression with window functions applied

`order_by(expr)` ¶

Sort a grouped table expression by expr.

Notes¶

This API call is ignored in aggregations.

Parameters:

Name	Type	Description	Default
`expr`	`ir.Value \| Iterable[ir.Value]`	Expressions to order the results by	required

Returns:

Type	Description
`GroupedTable`	A sorted grouped GroupedTable

`over(window)` ¶

Apply a window over the input expressions.

Parameters:

Name	Type	Description	Default
`window`	`_window.Window`	Window to add to the input	required

Returns:

Type	Description
`GroupedTable`	A new grouped table expression

`projection(exprs)` ¶

Project new columns out of the grouped table.

Table Expressions¶

Table ¶

Functions¶

aggregate(metrics=None, by=None, having=None, **kwargs) ¶

alias(alias) ¶

asof_join(left, right, predicates=(), by=(), tolerance=None, *, suffixes=('_x', '_y')) ¶

count(where=None) ¶

cross_join(left, right, *rest, suffixes=('_x', '_y')) ¶

difference(*tables, distinct=True) ¶

distinct() ¶

drop(*fields) ¶

dropna(subset=None, how='any') ¶

fillna(replacements) ¶

Notes¶

filter(predicates) ¶

get_column(name) ¶

get_columns(iterable) ¶

group_by(by=None, **additional_grouping_expressions) ¶

head(n=5) ¶

info(buf=None) ¶

intersect(*tables, distinct=True) ¶

join(left, right, predicates=(), how='inner', *, suffixes=('_x', '_y')) ¶

limit(n, offset=0) ¶

mutate(exprs=None, **mutations) ¶

order_by(by) ¶

relabel(substitutions) ¶

rowid() ¶

schema() ¶

select(*exprs, **named_exprs) ¶

set_column(name, expr) ¶

sql(query) ¶

to_array() ¶

union(*tables, distinct=False) ¶

unpack(*columns) ¶

See Also¶

view() ¶

GroupedTable ¶

Functions¶

count(metric_name='count') ¶

having(expr) ¶

mutate(exprs=None, **kwds) ¶

order_by(expr) ¶

Notes¶

over(window) ¶

projection(exprs) ¶

See Also¶

`Table` ¶

`aggregate(metrics=None, by=None, having=None, **kwargs)` ¶

`alias(alias)` ¶

`asof_join(left, right, predicates=(), by=(), tolerance=None, *, suffixes=('_x', '_y'))` ¶

`count(where=None)` ¶

`cross_join(left, right, *rest, suffixes=('_x', '_y'))` ¶

`difference(*tables, distinct=True)` ¶

`distinct()` ¶

`drop(*fields)` ¶

`dropna(subset=None, how='any')` ¶

`fillna(replacements)` ¶

`filter(predicates)` ¶

`get_column(name)` ¶

`get_columns(iterable)` ¶

`group_by(by=None, **additional_grouping_expressions)` ¶

`head(n=5)` ¶

`info(buf=None)` ¶

`intersect(*tables, distinct=True)` ¶

`join(left, right, predicates=(), how='inner', *, suffixes=('_x', '_y'))` ¶

`limit(n, offset=0)` ¶

`mutate(exprs=None, **mutations)` ¶

`order_by(by)` ¶

`relabel(substitutions)` ¶

`rowid()` ¶

`schema()` ¶

`select(*exprs, **named_exprs)` ¶

`set_column(name, expr)` ¶

`sql(query)` ¶

`to_array()` ¶

`union(*tables, distinct=False)` ¶

`unpack(*columns)` ¶

`view()` ¶

`GroupedTable` ¶

`count(metric_name='count')` ¶

`having(expr)` ¶

`mutate(exprs=None, **kwds)` ¶

`order_by(expr)` ¶

`over(window)` ¶

`projection(exprs)` ¶