Table Expressions¶
Table expressions form the basis for most Ibis expressions.
Table
¶
Bases: Expr
, JupyterMixin
Functions¶
aggregate(metrics=None, by=None, having=None, **kwargs)
¶
Aggregate a table with a given set of reductions grouping by by
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metrics |
Sequence[ir.Scalar] | None
|
Aggregate expressions |
None
|
by |
Sequence[ir.Value] | None
|
Grouping expressions |
None
|
having |
Sequence[ir.BooleanValue] | None
|
Post-aggregation filters |
None
|
kwargs |
ir.Value
|
Named aggregate expressions |
{}
|
Returns:
Type | Description |
---|---|
Table
|
An aggregate table expression |
alias(alias)
¶
Create a table expression with a specific name alias
.
This method is useful for exposing an ibis expression to the underlying
backend for use in the
Table.sql
method.
.alias
will create a temporary view
.alias
creates a temporary view in the database.
This side effect will be removed in a future version of ibis and is not part of the public API.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
alias |
str
|
Name of the child expression |
required |
Returns:
Type | Description |
---|---|
Table
|
An table expression |
Examples:
>>> con = ibis.duckdb.connect("ci/ibis-testing-data/ibis_testing.ddb")
>>> t = con.table("functional_alltypes")
>>> expr = t.alias("my_t").sql("SELECT sum(double_col) FROM my_t")
>>> expr
r0 := AlchemyTable: functional_alltypes
index int64
⋮
month int32
r1 := View[r0]: my_t
schema:
index int64
⋮
month int32
SQLStringView[r1]: _ibis_view_0
query: 'SELECT sum(double_col) FROM my_t'
schema:
sum(double_col) float64
asof_join(left, right, predicates=(), by=(), tolerance=None, *, suffixes=('_x', '_y'))
¶
Perform an "as-of" join between left
and right
.
Similar to a left join except that the match is done on nearest key rather than equal keys.
Optionally, match keys with by
before joining with predicates
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
left |
Table
|
Table expression |
required |
right |
Table
|
Table expression |
required |
predicates |
str | ir.BooleanColumn | Sequence[str | ir.BooleanColumn]
|
Join expressions |
()
|
by |
str | ir.Column | Sequence[str | ir.Column]
|
column to group by before joining |
()
|
tolerance |
str | ir.IntervalScalar | None
|
Amount of time to look behind when joining |
None
|
suffixes |
tuple[str, str]
|
Left and right suffixes that will be used to rename overlapping columns. |
('_x', '_y')
|
Returns:
Type | Description |
---|---|
Table
|
Table expression |
count(where=None)
¶
Compute the number of rows in the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
where |
ir.BooleanValue | None
|
Optional boolean expression to filter rows when counting. |
None
|
Returns:
Type | Description |
---|---|
IntegerScalar
|
Number of rows in the table |
Examples:
>>> import ibis
>>> from ibis import _
>>> t = ibis.table(dict(a="int"), name="t")
>>> t.count()
r0 := UnboundTable: t
a int64
count: CountStar(t)
>>> t.aggregate(n=_.count(_.a > 1), total=_.sum())
r0 := UnboundTable: t
a int64
Aggregation[r0]
metrics:
n: CountStar(t, where=r0.a > 1)
total: Sum(r0.a)
cross_join(left, right, *rest, suffixes=('_x', '_y'))
¶
Compute the cross join of a sequence of tables.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
left |
Table
|
Left table |
required |
right |
Table
|
Right table |
required |
rest |
Table
|
Additional tables to cross join |
()
|
suffixes |
tuple[str, str]
|
Left and right suffixes that will be used to rename overlapping columns. |
('_x', '_y')
|
Returns:
Type | Description |
---|---|
Table
|
Cross join of |
Examples:
>>> import ibis
>>> schemas = [(name, 'int64') for name in 'abcde']
>>> a, b, c, d, e = [
... ibis.table([(name, type)], name=name) for name, type in schemas
... ]
>>> joined1 = ibis.cross_join(a, b, c, d, e)
>>> joined1
r0 := UnboundTable[e]
e int64
r1 := UnboundTable[d]
d int64
r2 := UnboundTable[c]
c int64
r3 := UnboundTable[b]
b int64
r4 := UnboundTable[a]
a int64
r5 := CrossJoin[r3, r2]
r6 := CrossJoin[r5, r1]
r7 := CrossJoin[r6, r0]
CrossJoin[r4, r7]
difference(*tables, distinct=True)
¶
Compute the set difference of multiple table expressions.
The input tables must have identical schemas.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tables |
Table
|
One or more table expressions |
()
|
distinct |
bool
|
Only diff distinct rows not occurring in the calling table |
True
|
Returns:
Type | Description |
---|---|
Table
|
The rows present in |
distinct()
¶
Compute the set of unique rows in the table.
drop(*fields)
¶
dropna(subset=None, how='any')
¶
Remove rows with null values from the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subset |
Sequence[str] | None
|
Columns names to consider when dropping nulls. By default all columns are considered. |
None
|
how |
Literal['any', 'all']
|
Determine whether a row is removed if there is at least one null value in the row ('any'), or if all row values are null ('all'). Options are 'any' or 'all'. Default is 'any'. |
'any'
|
Examples:
>>> import ibis
>>> t = ibis.table(dict(a='int64', b='string'), name='t')
>>> t = t.dropna() # Drop all rows where any values are null
>>> t
r0 := UnboundTable: t
a int64
b string
DropNa[r0]
how: 'any'
>>> t.dropna(how='all') # Only drop rows where all values are null
r0 := UnboundTable: t
a int64
b string
r1 := DropNa[r0]
how: 'all'
>>> t.dropna(subset=['a'], how='all') # Only drop rows where all values in column 'a' are null # noqa: E501
r0 := UnboundTable: t
a int64
b string
DropNa[r0]
how: 'all'
subset:
r0.a
Returns:
Type | Description |
---|---|
Table
|
Table expression |
fillna(replacements)
¶
Fill null values in a table expression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
replacements |
ir.Scalar | Mapping[str, ir.Scalar]
|
Value with which to fill the nulls. If passed as a mapping, the keys are column names that map to their replacement value. If passed as a scalar, all columns are filled with that value. |
required |
Notes¶
There is potential lack of type stability with the fillna
API. For
example, different library versions may impact whether or not a given
backend promotes integer replacement values to floats.
Examples:
>>> import ibis
>>> import ibis.expr.datatypes as dt
>>> t = ibis.table([('a', 'int64'), ('b', 'string')])
>>> res = t.fillna(0.0) # Replace nulls in all columns with 0.0
>>> res = t.fillna({c: 0.0 for c, t in t.schema().items() if t == dt.float64})
Returns:
Type | Description |
---|---|
Table
|
Table expression |
filter(predicates)
¶
Select rows from table
based on predicates
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predicates |
ir.BooleanValue | Sequence[ir.BooleanValue]
|
Boolean value expressions used to select rows in |
required |
Returns:
Type | Description |
---|---|
Table
|
Filtered table expression |
get_column(name)
¶
Get a reference to a single column from the table.
Returns:
Type | Description |
---|---|
Column
|
A column named |
get_columns(iterable)
¶
Get multiple columns from the table.
Examples:
>>> import ibis
>>> table = ibis.table(
... [
... ('a', 'int64'),
... ('b', 'string'),
... ('c', 'timestamp'),
... ('d', 'float'),
... ],
... name='t'
... )
>>> a, b, c = table.get_columns(['a', 'b', 'c'])
Returns:
Type | Description |
---|---|
list[ir.Column]
|
List of column expressions |
group_by(by=None, **additional_grouping_expressions)
¶
Create a grouped table expression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
by |
Grouping expressions |
None
|
|
additional_grouping_expressions |
Any
|
Named grouping expressions |
{}
|
Examples:
>>> import ibis
>>> from ibis import _
>>> t = ibis.table(dict(a='int32', b='timestamp', c='double'), name='t')
>>> t.group_by([_.a, _.b]).aggregate(sum_of_c=_.c.sum())
r0 := UnboundTable: t
a int32
b timestamp
c float64
Aggregation[r0]
metrics:
sum_of_c: Sum(r0.c)
by:
a: r0.a
b: r0.b
Returns:
Type | Description |
---|---|
GroupedTable
|
A grouped table expression |
head(n=5)
¶
info(buf=None)
¶
intersect(*tables, distinct=True)
¶
Compute the set intersection of multiple table expressions.
The input tables must have identical schemas.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*tables |
Table
|
One or more table expressions |
()
|
distinct |
bool
|
Only return distinct rows |
True
|
Returns:
Type | Description |
---|---|
Table
|
A new table containing the intersection of all input tables. |
join(left, right, predicates=(), how='inner', *, suffixes=('_x', '_y'))
¶
Perform a join between two tables.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
left |
Table
|
Left table to join |
required |
right |
Table
|
Right table to join |
required |
predicates |
str | Sequence[str | tuple[str | ir.Column, str | ir.Column] | ir.BooleanColumn]
|
Boolean or column names to join on |
()
|
how |
Literal['inner', 'left', 'outer', 'right', 'semi', 'anti', 'any_inner', 'any_left', 'left_semi']
|
Join method |
'inner'
|
suffixes |
tuple[str, str]
|
Left and right suffixes that will be used to rename overlapping columns. |
('_x', '_y')
|
limit(n, offset=0)
¶
mutate(exprs=None, **mutations)
¶
Add columns to a table expression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exprs |
Sequence[ir.Expr] | None
|
List of named expressions to add as columns |
None
|
mutations |
ir.Value
|
Named expressions using keyword arguments |
{}
|
Returns:
Type | Description |
---|---|
Table
|
Table expression with additional columns |
Examples:
Using keywords arguments to name the new columns
>>> import ibis
>>> table = ibis.table(
... [('foo', 'double'), ('bar', 'double')],
... name='t'
... )
>>> expr = table.mutate(qux=table.foo + table.bar, baz=5)
>>> expr
r0 := UnboundTable[t]
foo float64
bar float64
Selection[r0]
selections:
r0
baz: 5
qux: r0.foo + r0.bar
Use the name
method to name
the new columns.
>>> new_columns = [ibis.literal(5).name('baz',),
... (table.foo + table.bar).name('qux')]
>>> expr2 = table.mutate(new_columns)
>>> expr.equals(expr2)
True
order_by(by)
¶
Sort a table by one or more expressions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
by |
str | ir.Column | tuple[str | ir.Column, bool] | Sequence[tuple[str | ir.Column, bool]]
|
An expression (or expressions) to sort the table by. |
required |
Examples:
>>> import ibis
>>> t = ibis.table(dict(a='int64', b='string'))
>>> t.order_by(['a', ibis.desc('b')])
r0 := UnboundTable: unbound_table_0
a int64
b string
Selection[r0]
sort_keys:
asc|r0.a
desc|r0.b
Returns:
Type | Description |
---|---|
Table
|
Sorted table |
relabel(substitutions)
¶
Rename columns in the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
substitutions |
Mapping[str, str] | Callable[[str], str | None]
|
A mapping or function from old to new column names. If a column isn't in the mapping (or if the callable returns None) it is left with its original name. |
required |
Returns:
Type | Description |
---|---|
Table
|
A relabeled table expression |
rowid()
¶
A unique integer per row, only valid on physical tables.
Any further meaning behind this expression is backend dependent.
Generally this corresponds to some index into the database storage
(for example, sqlite or duckdb's rowid
).
For a monotonically increasing row number, see ibis.row_number
.
Returns:
Type | Description |
---|---|
IntegerColumn
|
An integer column |
schema()
¶
Get the schema for this table (if one is known).
Returns:
Type | Description |
---|---|
Schema
|
The table's schema. |
select(*exprs, **named_exprs)
¶
Compute a new table expression using exprs
and named_exprs
.
Passing an aggregate function to this method will broadcast the aggregate's value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.
For backwards compatibility the keyword argument exprs
is reserved
and cannot be used to name an expression. This behavior will be removed
in v4.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exprs |
ir.Value | str | Iterable[ir.Value | str]
|
Column expression, string, or list of column expressions and strings. |
()
|
named_exprs |
ir.Value | str
|
Column expressions |
{}
|
Returns:
Type | Description |
---|---|
Table
|
Table expression |
Examples:
Simple projection
>>> import ibis
>>> t = ibis.table(dict(a="int64", b="double"), name='t')
>>> proj = t.select(t.a, b_plus_1=t.b + 1)
>>> proj
r0 := UnboundTable[t]
a int64
b float64
Selection[r0]
selections:
a: r0.a
b_plus_1: r0.b + 1
>>> proj2 = t.select("a", b_plus_1=t.b + 1)
>>> proj.equals(proj2)
True
Aggregate projection
>>> agg_proj = t.select(sum_a=t.a.sum(), mean_b=t.b.mean())
>>> agg_proj
r0 := UnboundTable[t]
a int64
b float64
Selection[r0]
selections:
sum_a: Window(Sum(r0.a), window=Window(how='rows'))
mean_b: Window(Mean(r0.b), window=Window(how='rows'))
Note the Window
objects here.
Their existence means that the result of the aggregation will be broadcast across the number of rows in the input column. The purpose of this expression rewrite is to make it easy to write column/scalar-aggregate operations like
>>> t.select(demeaned_a=t.a - t.a.mean())
r0 := UnboundTable[t]
a int64
b float64
Selection[r0]
selections:
demeaned_a: r0.a - Window(Mean(r0.a), window=Window(how='rows'))
set_column(name, expr)
¶
sql(query)
¶
Run a SQL query against a table expression.
The SQL string is backend specific
query
must be valid SQL for the execution backend the expression
will run against
See Table.alias
for
details on using named table expressions in a SQL string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query |
str
|
Query string |
required |
Returns:
Type | Description |
---|---|
Table
|
An opaque table expression |
Examples:
>>> con = ibis.duckdb.connect("ci/ibis-testing-data/ibis_testing.ddb")
>>> t = con.table("functional_alltypes")
>>> expr = t.sql("SELECT sum(double_col) FROM functional_alltypes")
>>> expr
r0 := AlchemyTable: functional_alltypes
index int64
⋮
month int32
SQLStringView[r0]: _ibis_view_1
query: 'SELECT sum(double_col) FROM functional_alltypes'
schema:
sum(double_col) float64
to_array()
¶
View a single column table as an array.
Returns:
Type | Description |
---|---|
Value
|
A single column view of a table |
union(*tables, distinct=False)
¶
Compute the set union of multiple table expressions.
The input tables must have identical schemas.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*tables |
Table
|
One or more table expressions |
()
|
distinct |
bool
|
Only return distinct rows |
False
|
Returns:
Type | Description |
---|---|
Table
|
A new table containing the union of all input tables. |
unpack(*columns)
¶
Project the struct fields of each of columns
into self
.
Existing fields are retained in the projection.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns |
str
|
String column names to project into |
()
|
Returns:
Type | Description |
---|---|
Table
|
The child table with struct fields of each of |
Examples:
>>> schema = dict(a="struct<b: float, c: string>", d="string")
>>> t = ibis.table(schema, name="t")
>>> t
UnboundTable: t
a struct<b: float64, c: string>
d string
>>> t.unpack("a")
r0 := UnboundTable: t
a struct<b: float64, c: string>
d string
Selection[r0]
selections:
b: StructField(r0.a, field='b')
c: StructField(r0.a, field='c')
d: r0.d
See Also¶
GroupedTable
¶
An intermediate table expression to hold grouping information.
Functions¶
count(metric_name='count')
¶
having(expr)
¶
Add a post-aggregation result filter expr
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.BooleanScalar
|
An expression that filters based on an aggregate value. |
required |
Returns:
Type | Description |
---|---|
GroupedTable
|
A grouped table expression |
mutate(exprs=None, **kwds)
¶
Return a table projection with window functions applied.
Any arguments can be functions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exprs |
ir.Value | Sequence[ir.Value] | None
|
List of expressions |
None
|
kwds |
ir.Value
|
Expressions |
{}
|
Examples:
>>> import ibis
>>> t = ibis.table([
... ('foo', 'string'),
... ('bar', 'string'),
... ('baz', 'double'),
... ], name='t')
>>> t
UnboundTable[t]
foo string
bar string
baz float64
>>> expr = (t.group_by('foo')
... .order_by(ibis.desc('bar'))
... .mutate(qux=lambda x: x.baz.lag(), qux2=t.baz.lead()))
>>> print(expr)
r0 := UnboundTable[t]
foo string
bar string
baz float64
Selection[r0]
selections:
r0
qux: Window(Lag(r0.baz), window=Window(group_by=[r0.foo], order_by=[desc|r0.bar], how='rows'))
qux2: Window(Lead(r0.baz), window=Window(group_by=[r0.foo], order_by=[desc|r0.bar], how='rows'))
Returns:
Type | Description |
---|---|
Table
|
A table expression with window functions applied |
order_by(expr)
¶
Sort a grouped table expression by expr
.
Notes¶
This API call is ignored in aggregations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Value | Iterable[ir.Value]
|
Expressions to order the results by |
required |
Returns:
Type | Description |
---|---|
GroupedTable
|
A sorted grouped GroupedTable |
over(window)
¶
Apply a window over the input expressions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window |
_window.Window
|
Window to add to the input |
required |
Returns:
Type | Description |
---|---|
GroupedTable
|
A new grouped table expression |