Skip to content

Column Selectors

selectors

Convenient column selectors.

Check out the blog post on selectors for examples!

Rationale

Column selectors are convenience functions for selecting columns that share some property.

Discussion

For example, a common task is to be able to select all numeric columns for a subsequent computation.

Without selectors this becomes quite verbose and tedious to write:

>>> import ibis
>>> t = ibis.table(...)  # doctest: +SKIP
>>> t.select([t[c] for c in t.columns if t[c].type().is_numeric()])  # doctest: +SKIP

Compare that to the numeric selector:

>>> import ibis.selectors as s
>>> t.select(s.numeric())  # doctest: +SKIP

When there are multiple properties to check it gets worse:

>>> t.select(  # doctest: +SKIP
...     [
...         t[c] for c in t.columns
...         if t[c].type().is_numeric()
...         if ("a" in c or "cd" in c)
...     ]
... )

Using a composition of selectors this is much less tiresome:

>>> t.select(s.numeric() & s.contains(("a", "cd")))  # doctest: +SKIP

Classes

Predicate

Bases: Selector

Functions
expand(table)

Evaluate self.predicate on every column of table.

Parameters:

Name Type Description Default
table ir.Table

An ibis table expression

required

Functions

across(selector, func, names=None)

Apply data transformations across multiple columns.

Parameters:

Name Type Description Default
selector Selector | Iterable[str] | str

An expression that selects columns on which the transformation function will be applied, an iterable of str column names or a single str column name.

required
func Deferred | Callable[[ir.Value], ir.Value] | Mapping[str | None, Deferred | Callable[[ir.Value], ir.Value]]

A function (or dictionary of functions) to use to transform the data.

required
names str | Callable[[str, str | None], str] | None

A lambda function or a format string to name the columns created by the transformation function.

None

Returns:

Type Description
Across

An Across selector object

Examples:

>>> import ibis
>>> ibis.options.interactive = True
>>> from ibis import _, selectors as s
>>> t = ibis.examples.penguins.fetch()
>>> t.select(s.startswith("bill")).mutate(
...     s.across(
...         s.numeric(),
...         dict(centered =_ - _.mean()),
...         names = "{fn}_{col}"
...     )
... )
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ centered_bill_length_mm ┃ … ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━┩
│ float64        │ float64       │ float64                 │ … │
├────────────────┼───────────────┼─────────────────────────┼───┤
│           39.1 │          18.7 │                -4.82193 │ … │
│           39.5 │          17.4 │                -4.42193 │ … │
│           40.3 │          18.0 │                -3.62193 │ … │
│            nan │           nan │                     nan │ … │
│           36.7 │          19.3 │                -7.22193 │ … │
│           39.3 │          20.6 │                -4.62193 │ … │
│           38.9 │          17.8 │                -5.02193 │ … │
│           39.2 │          19.6 │                -4.72193 │ … │
│           34.1 │          18.1 │                -9.82193 │ … │
│           42.0 │          20.2 │                -1.92193 │ … │
│              … │             … │                       … │ … │
└────────────────┴───────────────┴─────────────────────────┴───┘

all()

Return every column from a table.

all_of(*predicates)

Include columns satisfying all of predicates.

any_of(*predicates)

Include columns satisfying any of predicates.

c(*names)

Select specific column names.

contains(needles, how=any)

Return columns whose name contains needles.

Parameters:

Name Type Description Default
needles str | tuple[str, ...]

One or more strings to search for in column names

required
how Callable[[Iterable[bool]], bool]

A boolean reduction to allow the configuration of how needles are summarized.

any

Examples:

Select columns that contain either "a" or "b"

>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(a="int64", b="string", c="float", d="array<int16>", ab="struct<x: int>"))
>>> expr = t.select(s.contains(("a", "b")))
>>> expr.columns
['a', 'b', 'ab']

Select columns that contain all of "a" and "b", that is, both "a" and "b" must be in each column's name to match.

>>> expr = t.select(s.contains(("a", "b"), how=all))
>>> expr.columns
['ab']
See Also

matches

endswith(suffixes)

Select columns whose name ends with one of suffixes.

Parameters:

Name Type Description Default
suffixes str | tuple[str, ...]

Suffixes to compare column names against

required
See Also

startswith

first()

Return the first column of a table.

if_all(selector, predicate)

Return the conjunction of predicate applied on all selector columns.

Parameters:

Name Type Description Default
selector Selector

A column selector

required
predicate Deferred | Callable

A callable or deferred object defining a predicate to apply to each column from selector.

required

Examples:

>>> import ibis
>>> from ibis import selectors as s, _
>>> ibis.options.interactive = True
>>> penguins = ibis.examples.penguins.fetch()
>>> cols = s.across(s.endswith("_mm"), (_ - _.mean()) / _.std())
>>> expr = penguins.mutate(cols).filter(s.if_all(s.endswith("_mm"), _.abs() > 1))
>>> expr_by_hand = penguins.mutate(cols).filter(
...     (_.bill_length_mm.abs() > 1)
...     & (_.bill_depth_mm.abs() > 1)
...     & (_.flipper_length_mm.abs() > 1)
... )
>>> expr.equals(expr_by_hand)
True
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ … │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼───┤
│ Adelie  │ Dream     │      -1.157951 │      1.088129 │         -1.416272 │ … │
│ Adelie  │ Torgersen │      -1.231217 │      1.138768 │         -1.202926 │ … │
│ Gentoo  │ Biscoe    │       1.149917 │     -1.443781 │          1.214987 │ … │
│ Gentoo  │ Biscoe    │       1.040019 │     -1.089314 │          1.072757 │ … │
│ Gentoo  │ Biscoe    │       1.131601 │     -1.089314 │          1.712792 │ … │
│ Gentoo  │ Biscoe    │       1.241499 │     -1.089314 │          1.570562 │ … │
│ Gentoo  │ Biscoe    │       1.351398 │     -1.494420 │          1.214987 │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴───┘

if_any(selector, predicate)

Return the disjunction of predicate applied on all selector columns.

Parameters:

Name Type Description Default
selector Selector

A column selector

required
predicate Deferred | Callable

A callable or deferred object defining a predicate to apply to each column from selector.

required

Examples:

>>> import ibis
>>> from ibis import selectors as s, _
>>> ibis.options.interactive = True
>>> penguins = ibis.examples.penguins.fetch()
>>> cols = s.across(s.endswith("_mm"), (_ - _.mean()) / _.std())
>>> expr = penguins.mutate(cols).filter(s.if_any(s.endswith("_mm"), _.abs() > 2))
>>> expr_by_hand = penguins.mutate(cols).filter(
...     (_.bill_length_mm.abs() > 2)
...     | (_.bill_depth_mm.abs() > 2)
...     | (_.flipper_length_mm.abs() > 2)
... )
>>> expr.equals(expr_by_hand)
True
>>> expr
┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━┩
│ string  │ string │ float64        │ float64       │ float64           │ … │
├─────────┼────────┼────────────────┼───────────────┼───────────────────┼───┤
│ Adelie  │ Biscoe │      -1.103002 │      0.733662 │         -2.056307 │ … │
│ Gentoo  │ Biscoe │       1.113285 │     -0.431017 │          2.068368 │ … │
│ Gentoo  │ Biscoe │       2.871660 │     -0.076550 │          2.068368 │ … │
│ Gentoo  │ Biscoe │       1.900890 │     -0.734846 │          2.139483 │ … │
│ Gentoo  │ Biscoe │       1.076652 │     -0.177826 │          2.068368 │ … │
│ Gentoo  │ Biscoe │       0.856855 │     -0.582932 │          2.068368 │ … │
│ Gentoo  │ Biscoe │       1.497929 │     -0.076550 │          2.068368 │ … │
│ Gentoo  │ Biscoe │       1.388031 │     -0.431017 │          2.068368 │ … │
│ Gentoo  │ Biscoe │       2.047422 │     -0.582932 │          2.068368 │ … │
│ Adelie  │ Dream  │      -2.165354 │     -0.836123 │         -0.918466 │ … │
│ …       │ …      │              … │             … │                 … │ … │
└─────────┴────────┴────────────────┴───────────────┴───────────────────┴───┘

last()

Return the last column of a table.

matches(regex)

Return columns whose name matches the regular expression regex.

Parameters:

Name Type Description Default
regex str | re.Pattern

A string or re.Pattern object

required

Examples:

>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(ab="string", abd="int", be="array<string>"))
>>> expr = t.select(s.matches(r"ab+"))
>>> expr.columns
['ab', 'abd']
See Also

contains

numeric()

Return numeric columns.

Examples:

>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(a="int", b="string", c="array<string>"), name="t")
>>> t
UnboundTable: t
  a int64
  b string
  c array<string>
>>> expr = t.select(s.numeric())  # `a` has integer type, so it's numeric
>>> expr.columns
['a']
See Also

of_type

of_type(dtype)

Select columns of type dtype.

Parameters:

Name Type Description Default
dtype dt.DataType | str | type[dt.DataType]

DataType instance, str or DataType class

required

Examples:

Select according to a specific DataType instance

>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(name="string", siblings="array<string>", parents="array<int64>"))
>>> expr = t.select(s.of_type(dt.Array(dt.string)))
>>> expr.columns
['siblings']

Strings are also accepted

>>> expr = t.select(s.of_type("array<string>"))
>>> expr.columns
['siblings']

Abstract/unparametrized types may also be specified by their string name (e.g. "integer" for any integer type), or by passing in a DataType class instead. The following options are equivalent.

>>> expr1 = t.select(s.of_type("array"))
>>> expr2 = t.select(s.of_type(dt.Array))
>>> expr1.equals(expr2)
True
>>> expr2.columns
['siblings', 'parents']
See Also

numeric

startswith(prefixes)

Select columns whose name starts with one of prefixes.

Parameters:

Name Type Description Default
prefixes str | tuple[str, ...]

Prefixes to compare column names against

required

Examples:

>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(apples="int", oranges="float", bananas="bool"), name="t")
>>> expr = t.select(s.startswith(("a", "b")))
>>> expr.columns
['apples', 'bananas']
See Also

endswith

where(predicate)

Select columns that satisfy predicate.

Use this selector when one of the other selectors does not meet your needs.

Parameters:

Name Type Description Default
predicate Callable[[ir.Value], bool]

A callable that accepts an ibis value expression and returns a bool

required

Examples:

>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(a="float32"), name="t")
>>> expr = t.select(s.where(lambda col: col.get_name() == "a"))
>>> expr.columns
['a']

Last update: June 22, 2023