Skip to content

Chain expressions with the underscore API

Expressions can easily be chained using the deferred expression API, also known as the Underscore (_) API.

In this guide, we use the _ API to concisely create column expressions and then chain table expressions.

Setup

To get started, import _ from ibis:

import ibis
from ibis import _

import pandas as pd

Let's create two in-memory tables using ibis.memtable, an API introduced in 3.2:

t1 = ibis.memtable(pd.DataFrame({'x': range(5), 'y': list('ab')*2 + list('e')}))
t2 = ibis.memtable(pd.DataFrame({'x': range(10), 'z': list(reversed(list('ab')*2 + list('e')))*2}))

Creating column expressions

We can use _ to create new column expressions without explicit reference to the previous table expression:

# We can pass a deferred expression into a function:
def modf(t):
    return t.x % 3

xmod = modf(_)

# We can create ColumnExprs like aggregate expressions:
ymax = _.y.max()
zmax = _.z.max()
zct = _.z.count()

Chaining Ibis expressions

We can also use it to chain Ibis expressions in one Python expression:

join = (
    t1
    # _ is t1
    .join(t2, _.x == t2.x)
    # _ is the join result:
    .mutate(xmod=xmod)
    # _ is the TableExpression after mutate:
    .group_by(_.xmod)
    # `ct` is a ColumnExpression derived from a deferred expression:
    .aggregate(ymax=ymax, zmax=zmax)
    # _ is the aggregation result:
    .filter(_.ymax == _.zmax)
    # _ is the filtered result, and re-create xmod in t2 using modf:
    .join(t2, _.xmod == modf(t2))
    # _ is the second join result:
    .join(t1, _.xmod == modf(t1))
    # _ is the third join result:
    .select(_.x, _.y, _.z)
    # Finally, _ is the selection result:
    .order_by(_.x)
)

Last update: June 22, 2023