Column Selectors¶
selectors
¶
Convenient column selectors.
Check out the blog post on selectors for examples!
Rationale¶
Column selectors are convenience functions for selecting columns that share some property.
Discussion¶
For example, a common task is to be able to select all numeric columns for a subsequent computation.
Without selectors this becomes quite verbose and tedious to write:
>>> import ibis
>>> t = ibis.table(...) # doctest: +SKIP
>>> t.select([t[c] for c in t.columns if t[c].type().is_numeric()]) # doctest: +SKIP
Compare that to the numeric
selector:
>>> import ibis.selectors as s
>>> t.select(s.numeric()) # doctest: +SKIP
When there are multiple properties to check it gets worse:
>>> t.select( # doctest: +SKIP
... [
... t[c] for c in t.columns
... if t[c].type().is_numeric()
... if ("a" in c or "cd" in c)
... ]
... )
Using a composition of selectors this is much less tiresome:
>>> t.select(s.numeric() & s.contains(("a", "cd"))) # doctest: +SKIP
Classes¶
Predicate
¶
Functions¶
across(selector, func, names=None)
¶
Apply data transformations across multiple columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
selector |
Selector | Iterable[str] | str
|
An expression that selects columns on which the transformation function
will be applied, an iterable of |
required |
func |
Deferred | Callable[[ir.Value], ir.Value] | Mapping[str | None, Deferred | Callable[[ir.Value], ir.Value]]
|
A function (or dictionary of functions) to use to transform the data. |
required |
names |
str | Callable[[str, str | None], str] | None
|
A lambda function or a format string to name the columns created by the transformation function. |
None
|
Returns:
Type | Description |
---|---|
Across
|
An |
Examples:
>>> import ibis
>>> ibis.options.interactive = True
>>> from ibis import _, selectors as s
>>> t = ibis.examples.penguins.fetch()
>>> t.select(s.startswith("bill")).mutate(
... s.across(
... s.numeric(),
... dict(centered =_ - _.mean()),
... names = "{fn}_{col}"
... )
... )
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ centered_bill_length_mm ┃ … ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━┩
│ float64 │ float64 │ float64 │ … │
├────────────────┼───────────────┼─────────────────────────┼───┤
│ 39.1 │ 18.7 │ -4.82193 │ … │
│ 39.5 │ 17.4 │ -4.42193 │ … │
│ 40.3 │ 18.0 │ -3.62193 │ … │
│ nan │ nan │ nan │ … │
│ 36.7 │ 19.3 │ -7.22193 │ … │
│ 39.3 │ 20.6 │ -4.62193 │ … │
│ 38.9 │ 17.8 │ -5.02193 │ … │
│ 39.2 │ 19.6 │ -4.72193 │ … │
│ 34.1 │ 18.1 │ -9.82193 │ … │
│ 42.0 │ 20.2 │ -1.92193 │ … │
│ … │ … │ … │ … │
└────────────────┴───────────────┴─────────────────────────┴───┘
all()
¶
Return every column from a table.
all_of(*predicates)
¶
Include columns satisfying all of predicates
.
any_of(*predicates)
¶
Include columns satisfying any of predicates
.
c(*names)
¶
Select specific column names.
contains(needles, how=any)
¶
Return columns whose name contains needles
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
needles |
str | tuple[str, ...]
|
One or more strings to search for in column names |
required |
how |
Callable[[Iterable[bool]], bool]
|
A boolean reduction to allow the configuration of how |
any
|
Examples:
Select columns that contain either "a"
or "b"
>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(a="int64", b="string", c="float", d="array<int16>", ab="struct<x: int>"))
>>> expr = t.select(s.contains(("a", "b")))
>>> expr.columns
['a', 'b', 'ab']
Select columns that contain all of "a"
and "b"
, that is, both "a"
and
"b"
must be in each column's name to match.
>>> expr = t.select(s.contains(("a", "b"), how=all))
>>> expr.columns
['ab']
See Also¶
endswith(suffixes)
¶
first()
¶
Return the first column of a table.
if_all(selector, predicate)
¶
Return the conjunction of predicate
applied on all selector
columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
selector |
Selector
|
A column selector |
required |
predicate |
Deferred | Callable
|
A callable or deferred object defining a predicate to apply to each
column from |
required |
Examples:
>>> import ibis
>>> from ibis import selectors as s, _
>>> ibis.options.interactive = True
>>> penguins = ibis.examples.penguins.fetch()
>>> cols = s.across(s.endswith("_mm"), (_ - _.mean()) / _.std())
>>> expr = penguins.mutate(cols).filter(s.if_all(s.endswith("_mm"), _.abs() > 1))
>>> expr_by_hand = penguins.mutate(cols).filter(
... (_.bill_length_mm.abs() > 1)
... & (_.bill_depth_mm.abs() > 1)
... & (_.flipper_length_mm.abs() > 1)
... )
>>> expr.equals(expr_by_hand)
True
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━┩
│ string │ string │ float64 │ float64 │ float64 │ … │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼───┤
│ Adelie │ Dream │ -1.157951 │ 1.088129 │ -1.416272 │ … │
│ Adelie │ Torgersen │ -1.231217 │ 1.138768 │ -1.202926 │ … │
│ Gentoo │ Biscoe │ 1.149917 │ -1.443781 │ 1.214987 │ … │
│ Gentoo │ Biscoe │ 1.040019 │ -1.089314 │ 1.072757 │ … │
│ Gentoo │ Biscoe │ 1.131601 │ -1.089314 │ 1.712792 │ … │
│ Gentoo │ Biscoe │ 1.241499 │ -1.089314 │ 1.570562 │ … │
│ Gentoo │ Biscoe │ 1.351398 │ -1.494420 │ 1.214987 │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴───┘
if_any(selector, predicate)
¶
Return the disjunction of predicate
applied on all selector
columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
selector |
Selector
|
A column selector |
required |
predicate |
Deferred | Callable
|
A callable or deferred object defining a predicate to apply to each
column from |
required |
Examples:
>>> import ibis
>>> from ibis import selectors as s, _
>>> ibis.options.interactive = True
>>> penguins = ibis.examples.penguins.fetch()
>>> cols = s.across(s.endswith("_mm"), (_ - _.mean()) / _.std())
>>> expr = penguins.mutate(cols).filter(s.if_any(s.endswith("_mm"), _.abs() > 2))
>>> expr_by_hand = penguins.mutate(cols).filter(
... (_.bill_length_mm.abs() > 2)
... | (_.bill_depth_mm.abs() > 2)
... | (_.flipper_length_mm.abs() > 2)
... )
>>> expr.equals(expr_by_hand)
True
>>> expr
┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━┩
│ string │ string │ float64 │ float64 │ float64 │ … │
├─────────┼────────┼────────────────┼───────────────┼───────────────────┼───┤
│ Adelie │ Biscoe │ -1.103002 │ 0.733662 │ -2.056307 │ … │
│ Gentoo │ Biscoe │ 1.113285 │ -0.431017 │ 2.068368 │ … │
│ Gentoo │ Biscoe │ 2.871660 │ -0.076550 │ 2.068368 │ … │
│ Gentoo │ Biscoe │ 1.900890 │ -0.734846 │ 2.139483 │ … │
│ Gentoo │ Biscoe │ 1.076652 │ -0.177826 │ 2.068368 │ … │
│ Gentoo │ Biscoe │ 0.856855 │ -0.582932 │ 2.068368 │ … │
│ Gentoo │ Biscoe │ 1.497929 │ -0.076550 │ 2.068368 │ … │
│ Gentoo │ Biscoe │ 1.388031 │ -0.431017 │ 2.068368 │ … │
│ Gentoo │ Biscoe │ 2.047422 │ -0.582932 │ 2.068368 │ … │
│ Adelie │ Dream │ -2.165354 │ -0.836123 │ -0.918466 │ … │
│ … │ … │ … │ … │ … │ … │
└─────────┴────────┴────────────────┴───────────────┴───────────────────┴───┘
last()
¶
Return the last column of a table.
matches(regex)
¶
Return columns whose name matches the regular expression regex
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
regex |
str | re.Pattern
|
A string or |
required |
Examples:
>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(ab="string", abd="int", be="array<string>"))
>>> expr = t.select(s.matches(r"ab+"))
>>> expr.columns
['ab', 'abd']
See Also¶
numeric()
¶
Return numeric columns.
Examples:
>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(a="int", b="string", c="array<string>"), name="t")
>>> t
UnboundTable: t
a int64
b string
c array<string>
>>> expr = t.select(s.numeric()) # `a` has integer type, so it's numeric
>>> expr.columns
['a']
See Also¶
of_type(dtype)
¶
Select columns of type dtype
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dtype |
dt.DataType | str | type[dt.DataType]
|
|
required |
Examples:
Select according to a specific DataType
instance
>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(name="string", siblings="array<string>", parents="array<int64>"))
>>> expr = t.select(s.of_type(dt.Array(dt.string)))
>>> expr.columns
['siblings']
Strings are also accepted
>>> expr = t.select(s.of_type("array<string>"))
>>> expr.columns
['siblings']
Abstract/unparametrized types may also be specified by their string name
(e.g. "integer" for any integer type), or by passing in a DataType
class
instead. The following options are equivalent.
>>> expr1 = t.select(s.of_type("array"))
>>> expr2 = t.select(s.of_type(dt.Array))
>>> expr1.equals(expr2)
True
>>> expr2.columns
['siblings', 'parents']
See Also¶
startswith(prefixes)
¶
Select columns whose name starts with one of prefixes
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prefixes |
str | tuple[str, ...]
|
Prefixes to compare column names against |
required |
Examples:
>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(apples="int", oranges="float", bananas="bool"), name="t")
>>> expr = t.select(s.startswith(("a", "b")))
>>> expr.columns
['apples', 'bananas']
See Also¶
where(predicate)
¶
Select columns that satisfy predicate
.
Use this selector when one of the other selectors does not meet your needs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predicate |
Callable[[ir.Value], bool]
|
A callable that accepts an ibis value expression and returns a |
required |
Examples:
>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(a="float32"), name="t")
>>> expr = t.select(s.where(lambda col: col.get_name() == "a"))
>>> expr.columns
['a']