Column Selectors¶
            selectors
¶
  Convenient column selectors.
Check out the blog post on selectors for examples!
Rationale¶
Column selectors are convenience functions for selecting columns that share some property.
Discussion¶
For example, a common task is to be able to select all numeric columns for a subsequent computation.
Without selectors this becomes quite verbose and tedious to write:
>>> import ibis
>>> t = ibis.table(...)  # doctest: +SKIP
>>> t.select([t[c] for c in t.columns if t[c].type().is_numeric()])  # doctest: +SKIP
Compare that to the numeric selector:
>>> import ibis.selectors as s
>>> t.select(s.numeric())  # doctest: +SKIP
When there are multiple properties to check it gets worse:
>>> t.select(  # doctest: +SKIP
...     [
...         t[c] for c in t.columns
...         if t[c].type().is_numeric()
...         if ("a" in c or "cd" in c)
...     ]
... )
Using a composition of selectors this is much less tiresome:
>>> t.select(s.numeric() & s.contains(("a", "cd")))  # doctest: +SKIP
Classes¶
          Predicate
¶
  
Functions¶
across(selector, func, names=None)
¶
  Apply data transformations across multiple columns.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
selector | 
          
                Selector | Iterable[str] | str
           | 
          
             An expression that selects columns on which the transformation function
will be applied, an iterable of   | 
          required | 
func | 
          
                Deferred | Callable[[ir.Value], ir.Value] | Mapping[str | None, Deferred | Callable[[ir.Value], ir.Value]]
           | 
          
             A function (or dictionary of functions) to use to transform the data.  | 
          required | 
names | 
          
                str | Callable[[str, str | None], str] | None
           | 
          
             A lambda function or a format string to name the columns created by the transformation function.  | 
          
                None
           | 
        
Returns:
| Type | Description | 
|---|---|
                Across
           | 
          
             An   | 
        
Examples:
>>> import ibis
>>> ibis.options.interactive = True
>>> from ibis import _, selectors as s
>>> t = ibis.examples.penguins.fetch()
>>> t.select(s.startswith("bill")).mutate(
...     s.across(
...         s.numeric(),
...         dict(centered =_ - _.mean()),
...         names = "{fn}_{col}"
...     )
... )
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ centered_bill_length_mm ┃ … ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━┩
│ float64        │ float64       │ float64                 │ … │
├────────────────┼───────────────┼─────────────────────────┼───┤
│           39.1 │          18.7 │                -4.82193 │ … │
│           39.5 │          17.4 │                -4.42193 │ … │
│           40.3 │          18.0 │                -3.62193 │ … │
│            nan │           nan │                     nan │ … │
│           36.7 │          19.3 │                -7.22193 │ … │
│           39.3 │          20.6 │                -4.62193 │ … │
│           38.9 │          17.8 │                -5.02193 │ … │
│           39.2 │          19.6 │                -4.72193 │ … │
│           34.1 │          18.1 │                -9.82193 │ … │
│           42.0 │          20.2 │                -1.92193 │ … │
│              … │             … │                       … │ … │
└────────────────┴───────────────┴─────────────────────────┴───┘
all()
¶
  Return every column from a table.
all_of(*predicates)
¶
  Include columns satisfying all of predicates.
any_of(*predicates)
¶
  Include columns satisfying any of predicates.
c(*names)
¶
  Select specific column names.
contains(needles, how=any)
¶
  Return columns whose name contains needles.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
needles | 
          
                str | tuple[str, ...]
           | 
          
             One or more strings to search for in column names  | 
          required | 
how | 
          
                Callable[[Iterable[bool]], bool]
           | 
          
             A boolean reduction to allow the configuration of how   | 
          
                any
           | 
        
Examples:
Select columns that contain either "a" or "b"
>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(a="int64", b="string", c="float", d="array<int16>", ab="struct<x: int>"))
>>> expr = t.select(s.contains(("a", "b")))
>>> expr.columns
['a', 'b', 'ab']
Select columns that contain all of "a" and "b", that is, both "a" and
"b" must be in each column's name to match.
>>> expr = t.select(s.contains(("a", "b"), how=all))
>>> expr.columns
['ab']
See Also¶
endswith(suffixes)
¶
  
first()
¶
  Return the first column of a table.
if_all(selector, predicate)
¶
  Return the conjunction of predicate applied on all selector columns.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
selector | 
          
                Selector
           | 
          
             A column selector  | 
          required | 
predicate | 
          
                Deferred | Callable
           | 
          
             A callable or deferred object defining a predicate to apply to each
column from   | 
          required | 
Examples:
>>> import ibis
>>> from ibis import selectors as s, _
>>> ibis.options.interactive = True
>>> penguins = ibis.examples.penguins.fetch()
>>> cols = s.across(s.endswith("_mm"), (_ - _.mean()) / _.std())
>>> expr = penguins.mutate(cols).filter(s.if_all(s.endswith("_mm"), _.abs() > 1))
>>> expr_by_hand = penguins.mutate(cols).filter(
...     (_.bill_length_mm.abs() > 1)
...     & (_.bill_depth_mm.abs() > 1)
...     & (_.flipper_length_mm.abs() > 1)
... )
>>> expr.equals(expr_by_hand)
True
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ … │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼───┤
│ Adelie  │ Dream     │      -1.157951 │      1.088129 │         -1.416272 │ … │
│ Adelie  │ Torgersen │      -1.231217 │      1.138768 │         -1.202926 │ … │
│ Gentoo  │ Biscoe    │       1.149917 │     -1.443781 │          1.214987 │ … │
│ Gentoo  │ Biscoe    │       1.040019 │     -1.089314 │          1.072757 │ … │
│ Gentoo  │ Biscoe    │       1.131601 │     -1.089314 │          1.712792 │ … │
│ Gentoo  │ Biscoe    │       1.241499 │     -1.089314 │          1.570562 │ … │
│ Gentoo  │ Biscoe    │       1.351398 │     -1.494420 │          1.214987 │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴───┘
if_any(selector, predicate)
¶
  Return the disjunction of predicate applied on all selector columns.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
selector | 
          
                Selector
           | 
          
             A column selector  | 
          required | 
predicate | 
          
                Deferred | Callable
           | 
          
             A callable or deferred object defining a predicate to apply to each
column from   | 
          required | 
Examples:
>>> import ibis
>>> from ibis import selectors as s, _
>>> ibis.options.interactive = True
>>> penguins = ibis.examples.penguins.fetch()
>>> cols = s.across(s.endswith("_mm"), (_ - _.mean()) / _.std())
>>> expr = penguins.mutate(cols).filter(s.if_any(s.endswith("_mm"), _.abs() > 2))
>>> expr_by_hand = penguins.mutate(cols).filter(
...     (_.bill_length_mm.abs() > 2)
...     | (_.bill_depth_mm.abs() > 2)
...     | (_.flipper_length_mm.abs() > 2)
... )
>>> expr.equals(expr_by_hand)
True
>>> expr
┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━┩
│ string  │ string │ float64        │ float64       │ float64           │ … │
├─────────┼────────┼────────────────┼───────────────┼───────────────────┼───┤
│ Adelie  │ Biscoe │      -1.103002 │      0.733662 │         -2.056307 │ … │
│ Gentoo  │ Biscoe │       1.113285 │     -0.431017 │          2.068368 │ … │
│ Gentoo  │ Biscoe │       2.871660 │     -0.076550 │          2.068368 │ … │
│ Gentoo  │ Biscoe │       1.900890 │     -0.734846 │          2.139483 │ … │
│ Gentoo  │ Biscoe │       1.076652 │     -0.177826 │          2.068368 │ … │
│ Gentoo  │ Biscoe │       0.856855 │     -0.582932 │          2.068368 │ … │
│ Gentoo  │ Biscoe │       1.497929 │     -0.076550 │          2.068368 │ … │
│ Gentoo  │ Biscoe │       1.388031 │     -0.431017 │          2.068368 │ … │
│ Gentoo  │ Biscoe │       2.047422 │     -0.582932 │          2.068368 │ … │
│ Adelie  │ Dream  │      -2.165354 │     -0.836123 │         -0.918466 │ … │
│ …       │ …      │              … │             … │                 … │ … │
└─────────┴────────┴────────────────┴───────────────┴───────────────────┴───┘
last()
¶
  Return the last column of a table.
matches(regex)
¶
  Return columns whose name matches the regular expression regex.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
regex | 
          
                str | re.Pattern
           | 
          
             A string or   | 
          required | 
Examples:
>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(ab="string", abd="int", be="array<string>"))
>>> expr = t.select(s.matches(r"ab+"))
>>> expr.columns
['ab', 'abd']
See Also¶
numeric()
¶
  Return numeric columns.
Examples:
>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(a="int", b="string", c="array<string>"), name="t")
>>> t
UnboundTable: t
  a int64
  b string
  c array<string>
>>> expr = t.select(s.numeric())  # `a` has integer type, so it's numeric
>>> expr.columns
['a']
See Also¶
of_type(dtype)
¶
  Select columns of type dtype.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
dtype | 
          
                dt.DataType | str | type[dt.DataType]
           | 
          
             
  | 
          required | 
Examples:
Select according to a specific DataType instance
>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(name="string", siblings="array<string>", parents="array<int64>"))
>>> expr = t.select(s.of_type(dt.Array(dt.string)))
>>> expr.columns
['siblings']
Strings are also accepted
>>> expr = t.select(s.of_type("array<string>"))
>>> expr.columns
['siblings']
Abstract/unparametrized types may also be specified by their string name
(e.g. "integer" for any integer type), or by passing in a DataType class
instead. The following options are equivalent.
>>> expr1 = t.select(s.of_type("array"))
>>> expr2 = t.select(s.of_type(dt.Array))
>>> expr1.equals(expr2)
True
>>> expr2.columns
['siblings', 'parents']
See Also¶
startswith(prefixes)
¶
  Select columns whose name starts with one of prefixes.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
prefixes | 
          
                str | tuple[str, ...]
           | 
          
             Prefixes to compare column names against  | 
          required | 
Examples:
>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(apples="int", oranges="float", bananas="bool"), name="t")
>>> expr = t.select(s.startswith(("a", "b")))
>>> expr.columns
['apples', 'bananas']
See Also¶
where(predicate)
¶
  Select columns that satisfy predicate.
Use this selector when one of the other selectors does not meet your needs.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
predicate | 
          
                Callable[[ir.Value], bool]
           | 
          
             A callable that accepts an ibis value expression and returns a   | 
          required | 
Examples:
>>> import ibis
>>> import ibis.selectors as s
>>> t = ibis.table(dict(a="float32"), name="t")
>>> expr = t.select(s.where(lambda col: col.get_name() == "a"))
>>> expr.columns
['a']