Ibis for dplyr users¶
R users familiar with dplyr, tidyr, and other packages in the Tidyverse are likely to find Ibis familiar. In fact, some Ibis features were even inspired by similar features in the Tidyverse.
However, due to differences between Python and R and the design and goals of Ibis itself, you may notice some big differences right away:
- No pipe: The handy magrittr pipe (
%>%
) or R's newer native pipe (|>
) don't exist in Python so you instead have to chain sequences of operations together with a period (.
). The.
in Python is analogous to R's$
which lets you access attributes and methods on objects. - No unquoted column names: Non-standard evaluation is common in R but not present in Python. To reference a column in Ibis, you can pass a string, property on a table (e.g.,
tbl.some_column
), or you can make use of selectors. - Ibis is lazy by default: Similar to dbplyr and its
collect()
method, Ibis does not evaluate our queries until we call.to_pandas()
. For the purposes of this document, we setibis.options.interactive = True
which limits results to 10 rows, executes automatically, and prints a nicely-formatted table.
Using the same example data and similar operations as in Introduction to dplyr, below you will find some examples of the more common dplyr and tidyr operations and their Ibis equivalents.
Loading Ibis¶
To start using dplyr in R we would run:
library(dplyr)
To load Ibis:
import ibis
And then also load and alias some helpers to make our code more concise:
import ibis.selectors as s
from ibis import _
Last, as mentioned above, to get Ibis to automatically execute our queries and show the results in a nicely-formatted table, we run:
ibis.options.interactive = True
Loading example data¶
In R, datasets are typically lazily loaded with packages. For instance, the starwars
dataset is packaged with dplyr, but is not loaded in memory before you start using it. Ibis provides many datasets in the examples
module. So to be able to use the starwars
dataset, you can use:
starwars = ibis.examples.starwars.fetch()
Similar to dplyr, if we evaluate the name of a table, we get a nicely-formatted table:
starwars
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ name ┃ height ┃ mass ┃ hair_color ┃ skin_color ┃ eye_color ┃ birth_year ┃ sex ┃ gender ┃ homeworld ┃ species ┃ films ┃ vehicles ┃ starships ┃ ┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩ │ string │ int64 │ float64 │ string │ string │ string │ float64 │ string │ string │ string │ string │ string │ string │ string │ ├────────────────────┼────────┼─────────┼───────────────┼─────────────┼───────────┼────────────┼────────┼───────────┼───────────┼─────────┼────────┼──────────┼───────────┤ │ Luke Skywalker │ 172 │ 77.0 │ blond │ fair │ blue │ 19.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ C-3PO │ 167 │ 75.0 │ NULL │ gold │ yellow │ 112.0 │ none │ masculine │ Tatooine │ Droid │ NULL │ NULL │ NULL │ │ R2-D2 │ 96 │ 32.0 │ NULL │ white, blue │ red │ 33.0 │ none │ masculine │ Naboo │ Droid │ NULL │ NULL │ NULL │ │ Darth Vader │ 202 │ 136.0 │ none │ white │ yellow │ 41.9 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ Leia Organa │ 150 │ 49.0 │ brown │ light │ brown │ 19.0 │ female │ feminine │ Alderaan │ Human │ NULL │ NULL │ NULL │ │ Owen Lars │ 178 │ 120.0 │ brown, grey │ light │ blue │ 52.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ Beru Whitesun lars │ 165 │ 75.0 │ brown │ light │ blue │ 47.0 │ female │ feminine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ R5-D4 │ 97 │ 32.0 │ NULL │ white, red │ red │ nan │ none │ masculine │ Tatooine │ Droid │ NULL │ NULL │ NULL │ │ Biggs Darklighter │ 183 │ 84.0 │ black │ light │ brown │ 24.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ Obi-Wan Kenobi │ 182 │ 77.0 │ auburn, white │ fair │ blue-gray │ 57.0 │ male │ masculine │ Stewjon │ Human │ NULL │ NULL │ NULL │ │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ └────────────────────┴────────┴─────────┴───────────────┴─────────────┴───────────┴────────────┴────────┴───────────┴───────────┴─────────┴────────┴──────────┴───────────┘
In addition to printing a nicely-formatted table and automatically executing, setting ibis.options.interactive
to True
also causes our query to be limited to 10 rows. To get Ibis to give us all rows, we can directly call to_pandas
and save the result as a pandas DataFrame:
starwars_df = starwars.to_pandas()
Which then gives us all of the data as a pandas DataFrame:
starwars_df
name | height | mass | hair_color | skin_color | eye_color | birth_year | sex | gender | homeworld | species | films | vehicles | starships | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Luke Skywalker | 172.0 | 77.0 | blond | fair | blue | 19.0 | male | masculine | Tatooine | Human | None | None | None |
1 | C-3PO | 167.0 | 75.0 | None | gold | yellow | 112.0 | none | masculine | Tatooine | Droid | None | None | None |
2 | R2-D2 | 96.0 | 32.0 | None | white, blue | red | 33.0 | none | masculine | Naboo | Droid | None | None | None |
3 | Darth Vader | 202.0 | 136.0 | none | white | yellow | 41.9 | male | masculine | Tatooine | Human | None | None | None |
4 | Leia Organa | 150.0 | 49.0 | brown | light | brown | 19.0 | female | feminine | Alderaan | Human | None | None | None |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
82 | Rey | NaN | NaN | brown | light | hazel | NaN | female | feminine | None | Human | None | None | None |
83 | Poe Dameron | NaN | NaN | brown | light | brown | NaN | male | masculine | None | Human | None | None | None |
84 | BB8 | NaN | NaN | none | none | black | NaN | none | masculine | None | Droid | None | None | None |
85 | Captain Phasma | NaN | NaN | unknown | unknown | unknown | NaN | None | None | None | None | None | None | None |
86 | Padmé Amidala | 165.0 | 45.0 | brown | light | brown | 46.0 | female | feminine | Naboo | Human | None | None | None |
87 rows × 14 columns
Directly calling to_pandas
and saving the result to a variable is useful for passing the results of Ibis table expressions to other packages (e.g., matplotlib).
Inspecting the dataset with head()
¶
Just like in R, you can use head()
to inspect the beginning of a dataset. You can also specify the number of rows you want to get back by using the parameter n
(default n = 5
).
In R:
head(starwars) # or starwars |> head()
With Ibis:
starwars.head(6)
┏━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ name ┃ height ┃ mass ┃ hair_color ┃ skin_color ┃ eye_color ┃ birth_year ┃ sex ┃ gender ┃ homeworld ┃ species ┃ films ┃ vehicles ┃ starships ┃ ┡━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩ │ string │ int64 │ float64 │ string │ string │ string │ float64 │ string │ string │ string │ string │ string │ string │ string │ ├────────────────┼────────┼─────────┼─────────────┼─────────────┼───────────┼────────────┼────────┼───────────┼───────────┼─────────┼────────┼──────────┼───────────┤ │ Luke Skywalker │ 172 │ 77.0 │ blond │ fair │ blue │ 19.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ C-3PO │ 167 │ 75.0 │ NULL │ gold │ yellow │ 112.0 │ none │ masculine │ Tatooine │ Droid │ NULL │ NULL │ NULL │ │ R2-D2 │ 96 │ 32.0 │ NULL │ white, blue │ red │ 33.0 │ none │ masculine │ Naboo │ Droid │ NULL │ NULL │ NULL │ │ Darth Vader │ 202 │ 136.0 │ none │ white │ yellow │ 41.9 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ Leia Organa │ 150 │ 49.0 │ brown │ light │ brown │ 19.0 │ female │ feminine │ Alderaan │ Human │ NULL │ NULL │ NULL │ │ Owen Lars │ 178 │ 120.0 │ brown, grey │ light │ blue │ 52.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ └────────────────┴────────┴─────────┴─────────────┴─────────────┴───────────┴────────────┴────────┴───────────┴───────────┴─────────┴────────┴──────────┴───────────┘
There is no tail()
in Ibis because most databases do not support this operation.
Another method you can use to limit the number of rows returned by a query is limit()
which also takes the n
parameter.
starwars.limit(3)
┏━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ name ┃ height ┃ mass ┃ hair_color ┃ skin_color ┃ eye_color ┃ birth_year ┃ sex ┃ gender ┃ homeworld ┃ species ┃ films ┃ vehicles ┃ starships ┃ ┡━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩ │ string │ int64 │ float64 │ string │ string │ string │ float64 │ string │ string │ string │ string │ string │ string │ string │ ├────────────────┼────────┼─────────┼────────────┼─────────────┼───────────┼────────────┼────────┼───────────┼───────────┼─────────┼────────┼──────────┼───────────┤ │ Luke Skywalker │ 172 │ 77.0 │ blond │ fair │ blue │ 19.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ C-3PO │ 167 │ 75.0 │ NULL │ gold │ yellow │ 112.0 │ none │ masculine │ Tatooine │ Droid │ NULL │ NULL │ NULL │ │ R2-D2 │ 96 │ 32.0 │ NULL │ white, blue │ red │ 33.0 │ none │ masculine │ Naboo │ Droid │ NULL │ NULL │ NULL │ └────────────────┴────────┴─────────┴────────────┴─────────────┴───────────┴────────────┴────────┴───────────┴───────────┴─────────┴────────┴──────────┴───────────┘
Filtering rows with filter()¶
Ibis, like dplyr, has filter
to select rows based on conditions.
With dplyr:
starwars |>
filter(skin_color == "light")
In Ibis:
starwars.filter(_.skin_color == "light")
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ name ┃ height ┃ mass ┃ hair_color ┃ skin_color ┃ eye_color ┃ birth_year ┃ sex ┃ gender ┃ homeworld ┃ species ┃ films ┃ vehicles ┃ starships ┃ ┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩ │ string │ int64 │ float64 │ string │ string │ string │ float64 │ string │ string │ string │ string │ string │ string │ string │ ├────────────────────┼────────┼─────────┼─────────────┼────────────┼───────────┼────────────┼────────┼───────────┼───────────┼─────────┼────────┼──────────┼───────────┤ │ Leia Organa │ 150 │ 49.0 │ brown │ light │ brown │ 19.0 │ female │ feminine │ Alderaan │ Human │ NULL │ NULL │ NULL │ │ Owen Lars │ 178 │ 120.0 │ brown, grey │ light │ blue │ 52.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ Beru Whitesun lars │ 165 │ 75.0 │ brown │ light │ blue │ 47.0 │ female │ feminine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ Biggs Darklighter │ 183 │ 84.0 │ black │ light │ brown │ 24.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ Lobot │ 175 │ 79.0 │ none │ light │ blue │ 37.0 │ male │ masculine │ Bespin │ Human │ NULL │ NULL │ NULL │ │ Cordé │ 157 │ nan │ brown │ light │ brown │ nan │ female │ feminine │ Naboo │ Human │ NULL │ NULL │ NULL │ │ Dormé │ 165 │ nan │ brown │ light │ brown │ nan │ female │ feminine │ Naboo │ Human │ NULL │ NULL │ NULL │ │ Raymus Antilles │ 188 │ 79.0 │ brown │ light │ brown │ nan │ male │ masculine │ Alderaan │ Human │ NULL │ NULL │ NULL │ │ Rey │ NULL │ nan │ brown │ light │ hazel │ nan │ female │ feminine │ NULL │ Human │ NULL │ NULL │ NULL │ │ Poe Dameron │ NULL │ nan │ brown │ light │ brown │ nan │ male │ masculine │ NULL │ Human │ NULL │ NULL │ NULL │ │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ └────────────────────┴────────┴─────────┴─────────────┴────────────┴───────────┴────────────┴────────┴───────────┴───────────┴─────────┴────────┴──────────┴───────────┘
In dplyr, you can specify multiple conditions separated with ,
that are then combined with the &
operator:
starwars |>
filter(skin_color == "light", eye_color == "brown")
In Ibis, you can do the same by putting multiple conditions in a list:
starwars.filter([_.skin_color == "light", _.eye_color == "brown"])
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ name ┃ height ┃ mass ┃ hair_color ┃ skin_color ┃ eye_color ┃ birth_year ┃ sex ┃ gender ┃ homeworld ┃ species ┃ films ┃ vehicles ┃ starships ┃ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩ │ string │ int64 │ float64 │ string │ string │ string │ float64 │ string │ string │ string │ string │ string │ string │ string │ ├───────────────────┼────────┼─────────┼────────────┼────────────┼───────────┼────────────┼────────┼───────────┼───────────┼─────────┼────────┼──────────┼───────────┤ │ Leia Organa │ 150 │ 49.0 │ brown │ light │ brown │ 19.0 │ female │ feminine │ Alderaan │ Human │ NULL │ NULL │ NULL │ │ Biggs Darklighter │ 183 │ 84.0 │ black │ light │ brown │ 24.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ Cordé │ 157 │ nan │ brown │ light │ brown │ nan │ female │ feminine │ Naboo │ Human │ NULL │ NULL │ NULL │ │ Dormé │ 165 │ nan │ brown │ light │ brown │ nan │ female │ feminine │ Naboo │ Human │ NULL │ NULL │ NULL │ │ Raymus Antilles │ 188 │ 79.0 │ brown │ light │ brown │ nan │ male │ masculine │ Alderaan │ Human │ NULL │ NULL │ NULL │ │ Poe Dameron │ NULL │ nan │ brown │ light │ brown │ nan │ male │ masculine │ NULL │ Human │ NULL │ NULL │ NULL │ │ Padmé Amidala │ 165 │ 45.0 │ brown │ light │ brown │ 46.0 │ female │ feminine │ Naboo │ Human │ NULL │ NULL │ NULL │ └───────────────────┴────────┴─────────┴────────────┴────────────┴───────────┴────────────┴────────┴───────────┴───────────┴─────────┴────────┴──────────┴───────────┘
In previous code, we used the _
helper we imported earlier. The _
is shorthand for the table returned by the previous step in the chained sequence of operations (in this case, starwars
). We could have also written the more verbose form,
starwars.filter([starwars.skin_color == "light", starwars.eye_color == "brown"])
If you want to combine multiple conditions, in dplyr, you could do:
starwars |>
filter(
(skin_color == "light" & eye_color == "brown") |
species == "Droid"
)
In Ibis, this would be:
starwars.filter(
((_.skin_color == "light") & (_.eye_color == "brown")) |
(_.species == "Droid")
)
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ name ┃ height ┃ mass ┃ hair_color ┃ skin_color ┃ eye_color ┃ birth_year ┃ sex ┃ gender ┃ homeworld ┃ species ┃ films ┃ vehicles ┃ starships ┃ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩ │ string │ int64 │ float64 │ string │ string │ string │ float64 │ string │ string │ string │ string │ string │ string │ string │ ├───────────────────┼────────┼─────────┼────────────┼─────────────┼───────────┼────────────┼────────┼───────────┼───────────┼─────────┼────────┼──────────┼───────────┤ │ C-3PO │ 167 │ 75.0 │ NULL │ gold │ yellow │ 112.0 │ none │ masculine │ Tatooine │ Droid │ NULL │ NULL │ NULL │ │ R2-D2 │ 96 │ 32.0 │ NULL │ white, blue │ red │ 33.0 │ none │ masculine │ Naboo │ Droid │ NULL │ NULL │ NULL │ │ R5-D4 │ 97 │ 32.0 │ NULL │ white, red │ red │ nan │ none │ masculine │ Tatooine │ Droid │ NULL │ NULL │ NULL │ │ IG-88 │ 200 │ 140.0 │ none │ metal │ red │ 15.0 │ none │ masculine │ NULL │ Droid │ NULL │ NULL │ NULL │ │ Leia Organa │ 150 │ 49.0 │ brown │ light │ brown │ 19.0 │ female │ feminine │ Alderaan │ Human │ NULL │ NULL │ NULL │ │ Biggs Darklighter │ 183 │ 84.0 │ black │ light │ brown │ 24.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ Cordé │ 157 │ nan │ brown │ light │ brown │ nan │ female │ feminine │ Naboo │ Human │ NULL │ NULL │ NULL │ │ Dormé │ 165 │ nan │ brown │ light │ brown │ nan │ female │ feminine │ Naboo │ Human │ NULL │ NULL │ NULL │ │ R4-P17 │ 96 │ nan │ none │ silver, red │ red, blue │ nan │ none │ feminine │ NULL │ Droid │ NULL │ NULL │ NULL │ │ BB8 │ NULL │ nan │ none │ none │ black │ nan │ none │ masculine │ NULL │ Droid │ NULL │ NULL │ NULL │ │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ └───────────────────┴────────┴─────────┴────────────┴─────────────┴───────────┴────────────┴────────┴───────────┴───────────┴─────────┴────────┴──────────┴───────────┘
Sorting your data with order_by()¶
To sort a column, dplyr has the verb arrange
. For instance, to sort the column height
using dplyr:
starwars |>
arrange(height)
Ibis has the order_by
method, so to perform the same operation:
starwars.order_by(_.height)
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ name ┃ height ┃ mass ┃ hair_color ┃ skin_color ┃ eye_color ┃ birth_year ┃ sex ┃ gender ┃ homeworld ┃ species ┃ films ┃ vehicles ┃ starships ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩ │ string │ int64 │ float64 │ string │ string │ string │ float64 │ string │ string │ string │ string │ string │ string │ string │ ├───────────────────────┼────────┼─────────┼────────────┼─────────────┼───────────┼────────────┼────────┼───────────┼─────────────┼────────────────┼────────┼──────────┼───────────┤ │ Yoda │ 66 │ 17.0 │ white │ green │ brown │ 896.0 │ male │ masculine │ NULL │ Yoda's species │ NULL │ NULL │ NULL │ │ Ratts Tyerell │ 79 │ 15.0 │ none │ grey, blue │ unknown │ nan │ male │ masculine │ Aleen Minor │ Aleena │ NULL │ NULL │ NULL │ │ Wicket Systri Warrick │ 88 │ 20.0 │ brown │ brown │ brown │ 8.0 │ male │ masculine │ Endor │ Ewok │ NULL │ NULL │ NULL │ │ Dud Bolt │ 94 │ 45.0 │ none │ blue, grey │ yellow │ nan │ male │ masculine │ Vulpter │ Vulptereen │ NULL │ NULL │ NULL │ │ R2-D2 │ 96 │ 32.0 │ NULL │ white, blue │ red │ 33.0 │ none │ masculine │ Naboo │ Droid │ NULL │ NULL │ NULL │ │ R4-P17 │ 96 │ nan │ none │ silver, red │ red, blue │ nan │ none │ feminine │ NULL │ Droid │ NULL │ NULL │ NULL │ │ R5-D4 │ 97 │ 32.0 │ NULL │ white, red │ red │ nan │ none │ masculine │ Tatooine │ Droid │ NULL │ NULL │ NULL │ │ Sebulba │ 112 │ 40.0 │ none │ grey, red │ orange │ nan │ male │ masculine │ Malastare │ Dug │ NULL │ NULL │ NULL │ │ Gasgano │ 122 │ nan │ none │ white, blue │ black │ nan │ male │ masculine │ Troiken │ Xexto │ NULL │ NULL │ NULL │ │ Watto │ 137 │ nan │ black │ blue, grey │ yellow │ nan │ male │ masculine │ Toydaria │ Toydarian │ NULL │ NULL │ NULL │ │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ └───────────────────────┴────────┴─────────┴────────────┴─────────────┴───────────┴────────────┴────────┴───────────┴─────────────┴────────────────┴────────┴──────────┴───────────┘
You might notice that while dplyr puts missing values at the end, Ibis places them at the top. This behavior can actually vary from backend to backend and is something to be aware of when using Ibis.
If you want to order using multiple variables, you can pass them as a list:
starwars.order_by([_.height, _.mass]) # or starwars.order_by(["height", "mass"])
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ name ┃ height ┃ mass ┃ hair_color ┃ skin_color ┃ eye_color ┃ birth_year ┃ sex ┃ gender ┃ homeworld ┃ species ┃ films ┃ vehicles ┃ starships ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩ │ string │ int64 │ float64 │ string │ string │ string │ float64 │ string │ string │ string │ string │ string │ string │ string │ ├───────────────────────┼────────┼─────────┼────────────┼─────────────┼───────────┼────────────┼────────┼───────────┼─────────────┼────────────────┼────────┼──────────┼───────────┤ │ Yoda │ 66 │ 17.0 │ white │ green │ brown │ 896.0 │ male │ masculine │ NULL │ Yoda's species │ NULL │ NULL │ NULL │ │ Ratts Tyerell │ 79 │ 15.0 │ none │ grey, blue │ unknown │ nan │ male │ masculine │ Aleen Minor │ Aleena │ NULL │ NULL │ NULL │ │ Wicket Systri Warrick │ 88 │ 20.0 │ brown │ brown │ brown │ 8.0 │ male │ masculine │ Endor │ Ewok │ NULL │ NULL │ NULL │ │ Dud Bolt │ 94 │ 45.0 │ none │ blue, grey │ yellow │ nan │ male │ masculine │ Vulpter │ Vulptereen │ NULL │ NULL │ NULL │ │ R2-D2 │ 96 │ 32.0 │ NULL │ white, blue │ red │ 33.0 │ none │ masculine │ Naboo │ Droid │ NULL │ NULL │ NULL │ │ R4-P17 │ 96 │ nan │ none │ silver, red │ red, blue │ nan │ none │ feminine │ NULL │ Droid │ NULL │ NULL │ NULL │ │ R5-D4 │ 97 │ 32.0 │ NULL │ white, red │ red │ nan │ none │ masculine │ Tatooine │ Droid │ NULL │ NULL │ NULL │ │ Sebulba │ 112 │ 40.0 │ none │ grey, red │ orange │ nan │ male │ masculine │ Malastare │ Dug │ NULL │ NULL │ NULL │ │ Gasgano │ 122 │ nan │ none │ white, blue │ black │ nan │ male │ masculine │ Troiken │ Xexto │ NULL │ NULL │ NULL │ │ Watto │ 137 │ nan │ black │ blue, grey │ yellow │ nan │ male │ masculine │ Toydaria │ Toydarian │ NULL │ NULL │ NULL │ │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ └───────────────────────┴────────┴─────────┴────────────┴─────────────┴───────────┴────────────┴────────┴───────────┴─────────────┴────────────────┴────────┴──────────┴───────────┘
To order a column in descending order, there are two ways to do it. Note that missing values remain at the top.
starwars.order_by(_.height.desc()) # or: starwars.order_by(ibis.desc("height"))
┏━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ name ┃ height ┃ mass ┃ hair_color ┃ skin_color ┃ eye_color ┃ birth_year ┃ sex ┃ gender ┃ homeworld ┃ species ┃ films ┃ vehicles ┃ starships ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩ │ string │ int64 │ float64 │ string │ string │ string │ float64 │ string │ string │ string │ string │ string │ string │ string │ ├──────────────┼────────┼─────────┼────────────┼──────────────┼───────────────┼────────────┼────────┼───────────┼───────────┼──────────┼────────┼──────────┼───────────┤ │ Yarael Poof │ 264 │ nan │ none │ white │ yellow │ nan │ male │ masculine │ Quermia │ Quermian │ NULL │ NULL │ NULL │ │ Tarfful │ 234 │ 136.0 │ brown │ brown │ blue │ nan │ male │ masculine │ Kashyyyk │ Wookiee │ NULL │ NULL │ NULL │ │ Lama Su │ 229 │ 88.0 │ none │ grey │ black │ nan │ male │ masculine │ Kamino │ Kaminoan │ NULL │ NULL │ NULL │ │ Chewbacca │ 228 │ 112.0 │ brown │ unknown │ blue │ 200.0 │ male │ masculine │ Kashyyyk │ Wookiee │ NULL │ NULL │ NULL │ │ Roos Tarpals │ 224 │ 82.0 │ none │ grey │ orange │ nan │ male │ masculine │ Naboo │ Gungan │ NULL │ NULL │ NULL │ │ Grievous │ 216 │ 159.0 │ none │ brown, white │ green, yellow │ nan │ male │ masculine │ Kalee │ Kaleesh │ NULL │ NULL │ NULL │ │ Taun We │ 213 │ nan │ none │ grey │ black │ nan │ female │ feminine │ Kamino │ Kaminoan │ NULL │ NULL │ NULL │ │ Rugor Nass │ 206 │ nan │ none │ green │ orange │ nan │ male │ masculine │ Naboo │ Gungan │ NULL │ NULL │ NULL │ │ Tion Medon │ 206 │ 80.0 │ none │ grey │ black │ nan │ male │ masculine │ Utapau │ Pau'an │ NULL │ NULL │ NULL │ │ Darth Vader │ 202 │ 136.0 │ none │ white │ yellow │ 41.9 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ └──────────────┴────────┴─────────┴────────────┴──────────────┴───────────────┴────────────┴────────┴───────────┴───────────┴──────────┴────────┴──────────┴───────────┘
Selecting columns with select()¶
Ibis, like dplyr, has a select
method to include or exclude columns:
With dplyr:
starwars |>
select(hair_color)
In Ibis:
starwars.select(_.hair_color)
┏━━━━━━━━━━━━━━━┓ ┃ hair_color ┃ ┡━━━━━━━━━━━━━━━┩ │ string │ ├───────────────┤ │ blond │ │ NULL │ │ NULL │ │ none │ │ brown │ │ brown, grey │ │ brown │ │ NULL │ │ black │ │ auburn, white │ │ … │ └───────────────┘
Note: A common pitfall to be aware of when referencing column names in Ibis is when column names collide with built-in methods on the Ibis Table object, such as count
. In this situation, you will have to reference count
like table["count"]
or _["count"]
.
dplyr also allows selecting more than one column at a time:
starwars |>
select(hair_color, skin_color, eye_color)
In Ibis, we can either quote the names:
starwars.select("hair_color", "skin_color", "eye_color")
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ hair_color ┃ skin_color ┃ eye_color ┃ ┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━┩ │ string │ string │ string │ ├───────────────┼─────────────┼───────────┤ │ blond │ fair │ blue │ │ NULL │ gold │ yellow │ │ NULL │ white, blue │ red │ │ none │ white │ yellow │ │ brown │ light │ brown │ │ brown, grey │ light │ blue │ │ brown │ light │ blue │ │ NULL │ white, red │ red │ │ black │ light │ brown │ │ auburn, white │ fair │ blue-gray │ │ … │ … │ … │ └───────────────┴─────────────┴───────────┘
Or use the _
helper:
starwars.select(_.hair_color, _.skin_color, _.eye_color)
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ hair_color ┃ skin_color ┃ eye_color ┃ ┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━┩ │ string │ string │ string │ ├───────────────┼─────────────┼───────────┤ │ blond │ fair │ blue │ │ NULL │ gold │ yellow │ │ NULL │ white, blue │ red │ │ none │ white │ yellow │ │ brown │ light │ brown │ │ brown, grey │ light │ blue │ │ brown │ light │ blue │ │ NULL │ white, red │ red │ │ black │ light │ brown │ │ auburn, white │ fair │ blue-gray │ │ … │ … │ … │ └───────────────┴─────────────┴───────────┘
To select columns by name based on a condition, dplyr has helpers such as:
- starts_with(): Starts with a prefix.
- ends_with(): Ends with a suffix.
- contains(): Contains a literal string.
These and many more selectors are available in Ibis as well, with slightly different names:
starwars.select(s.startswith("h"))
┏━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ height ┃ hair_color ┃ homeworld ┃ ┡━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩ │ int64 │ string │ string │ ├────────┼───────────────┼───────────┤ │ 172 │ blond │ Tatooine │ │ 167 │ NULL │ Tatooine │ │ 96 │ NULL │ Naboo │ │ 202 │ none │ Tatooine │ │ 150 │ brown │ Alderaan │ │ 178 │ brown, grey │ Tatooine │ │ 165 │ brown │ Tatooine │ │ 97 │ NULL │ Tatooine │ │ 183 │ black │ Tatooine │ │ 182 │ auburn, white │ Stewjon │ │ … │ … │ … │ └────────┴───────────────┴───────────┘
starwars.select(s.endswith("color"))
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ hair_color ┃ skin_color ┃ eye_color ┃ ┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━┩ │ string │ string │ string │ ├───────────────┼─────────────┼───────────┤ │ blond │ fair │ blue │ │ NULL │ gold │ yellow │ │ NULL │ white, blue │ red │ │ none │ white │ yellow │ │ brown │ light │ brown │ │ brown, grey │ light │ blue │ │ brown │ light │ blue │ │ NULL │ white, red │ red │ │ black │ light │ brown │ │ auburn, white │ fair │ blue-gray │ │ … │ … │ … │ └───────────────┴─────────────┴───────────┘
starwars.select(s.contains("world"))
┏━━━━━━━━━━━┓ ┃ homeworld ┃ ┡━━━━━━━━━━━┩ │ string │ ├───────────┤ │ Tatooine │ │ Tatooine │ │ Naboo │ │ Tatooine │ │ Alderaan │ │ Tatooine │ │ Tatooine │ │ Tatooine │ │ Tatooine │ │ Stewjon │ │ … │ └───────────┘
See the Ibis Column Selectors documentation for the full list of selectors in Ibis.
Renaming columns with relabel()¶
Ibis allows you to rename columns using relabel()
which provides similar functionality to rename()
in dplyr.
In dplyr:
starwars |>
rename("homeworld" = "home_world")
In Ibis, use relabel
and pass a dict
of name mappings:
starwars.relabel({"homeworld": "home_world"})
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ name ┃ height ┃ mass ┃ hair_color ┃ skin_color ┃ eye_color ┃ birth_year ┃ sex ┃ gender ┃ home_world ┃ species ┃ films ┃ vehicles ┃ starships ┃ ┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩ │ string │ int64 │ float64 │ string │ string │ string │ float64 │ string │ string │ string │ string │ string │ string │ string │ ├────────────────────┼────────┼─────────┼───────────────┼─────────────┼───────────┼────────────┼────────┼───────────┼────────────┼─────────┼────────┼──────────┼───────────┤ │ Luke Skywalker │ 172 │ 77.0 │ blond │ fair │ blue │ 19.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ C-3PO │ 167 │ 75.0 │ NULL │ gold │ yellow │ 112.0 │ none │ masculine │ Tatooine │ Droid │ NULL │ NULL │ NULL │ │ R2-D2 │ 96 │ 32.0 │ NULL │ white, blue │ red │ 33.0 │ none │ masculine │ Naboo │ Droid │ NULL │ NULL │ NULL │ │ Darth Vader │ 202 │ 136.0 │ none │ white │ yellow │ 41.9 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ Leia Organa │ 150 │ 49.0 │ brown │ light │ brown │ 19.0 │ female │ feminine │ Alderaan │ Human │ NULL │ NULL │ NULL │ │ Owen Lars │ 178 │ 120.0 │ brown, grey │ light │ blue │ 52.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ Beru Whitesun lars │ 165 │ 75.0 │ brown │ light │ blue │ 47.0 │ female │ feminine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ R5-D4 │ 97 │ 32.0 │ NULL │ white, red │ red │ nan │ none │ masculine │ Tatooine │ Droid │ NULL │ NULL │ NULL │ │ Biggs Darklighter │ 183 │ 84.0 │ black │ light │ brown │ 24.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ │ Obi-Wan Kenobi │ 182 │ 77.0 │ auburn, white │ fair │ blue-gray │ 57.0 │ male │ masculine │ Stewjon │ Human │ NULL │ NULL │ NULL │ │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ └────────────────────┴────────┴─────────┴───────────────┴─────────────┴───────────┴────────────┴────────┴───────────┴────────────┴─────────┴────────┴──────────┴───────────┘
Add new columns with mutate()¶
Ibis, like dplyr, uses the mutate
verb to add columns.
In dplyr,
starwars |>
mutate(height_m = height / 100) |>
select(name, height_m)
In Ibis:
(
starwars
.mutate(height_m = _.height / 100)
.select("name", "height_m")
)
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓ ┃ name ┃ height_m ┃ ┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩ │ string │ float64 │ ├────────────────────┼──────────┤ │ Luke Skywalker │ 1.72 │ │ C-3PO │ 1.67 │ │ R2-D2 │ 0.96 │ │ Darth Vader │ 2.02 │ │ Leia Organa │ 1.50 │ │ Owen Lars │ 1.78 │ │ Beru Whitesun lars │ 1.65 │ │ R5-D4 │ 0.97 │ │ Biggs Darklighter │ 1.83 │ │ Obi-Wan Kenobi │ 1.82 │ │ … │ … │ └────────────────────┴──────────┘
A big difference between dplyr's mutate
and Ibis' mutate
is that, in Ibis, you have to chain separate mutate
calls together when you reference newly-created columns in the same mutate
whereas in dplyr, you can put them all in the same call. This makes Ibis' mutate
more similar to transform
in base R.
In dplyr, we only need one mutate
call:
starwars %>%
mutate(
height_m = height / 100,
BMI = mass / (height_m^2)
) %>%
select(BMI, everything())
In Ibis, for BMI
to reference height_m
, it needs to be in a separate mutate
call:
(starwars
.mutate(
height_m = _.height / 100
)
.mutate(
BMI = _.mass / (_.height_m**2)
)
.select("BMI", ~s.matches("BMI"))
)
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓ ┃ BMI ┃ name ┃ height ┃ mass ┃ hair_color ┃ skin_color ┃ eye_color ┃ birth_year ┃ sex ┃ gender ┃ homeworld ┃ species ┃ films ┃ vehicles ┃ starships ┃ height_m ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩ │ float64 │ string │ int64 │ float64 │ string │ string │ string │ float64 │ string │ string │ string │ string │ string │ string │ string │ float64 │ ├───────────┼────────────────────┼────────┼─────────┼───────────────┼─────────────┼───────────┼────────────┼────────┼───────────┼───────────┼─────────┼────────┼──────────┼───────────┼──────────┤ │ 26.027582 │ Luke Skywalker │ 172 │ 77.0 │ blond │ fair │ blue │ 19.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ 1.72 │ │ 26.892323 │ C-3PO │ 167 │ 75.0 │ NULL │ gold │ yellow │ 112.0 │ none │ masculine │ Tatooine │ Droid │ NULL │ NULL │ NULL │ 1.67 │ │ 34.722222 │ R2-D2 │ 96 │ 32.0 │ NULL │ white, blue │ red │ 33.0 │ none │ masculine │ Naboo │ Droid │ NULL │ NULL │ NULL │ 0.96 │ │ 33.330066 │ Darth Vader │ 202 │ 136.0 │ none │ white │ yellow │ 41.9 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ 2.02 │ │ 21.777778 │ Leia Organa │ 150 │ 49.0 │ brown │ light │ brown │ 19.0 │ female │ feminine │ Alderaan │ Human │ NULL │ NULL │ NULL │ 1.50 │ │ 37.874006 │ Owen Lars │ 178 │ 120.0 │ brown, grey │ light │ blue │ 52.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ 1.78 │ │ 27.548209 │ Beru Whitesun lars │ 165 │ 75.0 │ brown │ light │ blue │ 47.0 │ female │ feminine │ Tatooine │ Human │ NULL │ NULL │ NULL │ 1.65 │ │ 34.009990 │ R5-D4 │ 97 │ 32.0 │ NULL │ white, red │ red │ nan │ none │ masculine │ Tatooine │ Droid │ NULL │ NULL │ NULL │ 0.97 │ │ 25.082863 │ Biggs Darklighter │ 183 │ 84.0 │ black │ light │ brown │ 24.0 │ male │ masculine │ Tatooine │ Human │ NULL │ NULL │ NULL │ 1.83 │ │ 23.245985 │ Obi-Wan Kenobi │ 182 │ 77.0 │ auburn, white │ fair │ blue-gray │ 57.0 │ male │ masculine │ Stewjon │ Human │ NULL │ NULL │ NULL │ 1.82 │ │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ └───────────┴────────────────────┴────────┴─────────┴───────────────┴─────────────┴───────────┴────────────┴────────┴───────────┴───────────┴─────────┴────────┴──────────┴───────────┴──────────┘
Summarize values with aggregate()¶
To summarize tables, dplyr has the verbs summarise
/summarize
:
In dplyr:
starwars %>%
summarise(height = mean(height, na.rm = TRUE))
In Ibis, the corresponding verb is aggregate
:
starwars.aggregate(height = _.height.mean())
┏━━━━━━━━━━━━┓ ┃ height ┃ ┡━━━━━━━━━━━━┩ │ float64 │ ├────────────┤ │ 174.358025 │ └────────────┘
Note: Throughout this guide, where dplyr uses R generics, Ibis uses Python methods. In the previous code cell, aggregate
is a method on a table and mean
is a method on a column. If you want to perform aggregations on multiple columns, you can call the method that you want on the column you want to apply it to.
Join tables with left_join()¶
To demonstrate how to do joins with Ibis, we'll load two more example datasets that also come from the example datasets included in dplyr:
band_members = ibis.examples.band_members.fetch()
band_instruments = ibis.examples.band_instruments.fetch()
In dplyr, we can perform a left join of these two tables like:
band_members |>
left_join(band_instruments)
In Ibis:
band_members.left_join(band_instruments, "name")
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┓ ┃ name ┃ band ┃ name_right ┃ plays ┃ ┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━┩ │ string │ string │ string │ string │ ├────────┼─────────┼────────────┼────────┤ │ John │ Beatles │ John │ guitar │ │ Paul │ Beatles │ Paul │ bass │ │ Mick │ Stones │ NULL │ NULL │ └────────┴─────────┴────────────┴────────┘
There are two main differences between Ibis and dplyr here:
- Ibis requires us to explicitly specify our join key ("name", in this example) whereas in dplyr, if the join key is missing, we get the natural join of the two tables which joins across all shared column names
- Ibis keeps columns for join keys from each table whereas dplyr does not by default
To replicate the result we'd get by default in dplyr but using Ibis, we need to incorporate two other verbs we've already seen in this tutorial:
(
band_members
.left_join(band_instruments, "name")
.select(~s.contains("_right"))
)
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┓ ┃ name ┃ band ┃ plays ┃ ┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━┩ │ string │ string │ string │ ├────────┼─────────┼────────┤ │ John │ Beatles │ guitar │ │ Paul │ Beatles │ bass │ │ Mick │ Stones │ NULL │ └────────┴─────────┴────────┘
Pivot data with pivot_wider() and pivot_longer()¶
dplyr users are likely to be familiar with the pivot_wider
and pivot_longer
functions from the tidyr package which convert tables between wide and long formats, respectively.
In dplyr+tidyr:
starwars |>
select(name, matches("color")) |>
pivot_longer(matches("color"), names_to = "attribute", values_to = "color")
In Ibis:
starwars_colors = (
starwars
.select("name", s.matches("color"))
.pivot_longer(s.matches("color"), names_to="attribute", values_to="color")
)
starwars_colors
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┓ ┃ name ┃ attribute ┃ color ┃ ┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━┩ │ string │ string │ string │ ├────────────────┼────────────┼─────────────┤ │ Luke Skywalker │ hair_color │ blond │ │ Luke Skywalker │ skin_color │ fair │ │ Luke Skywalker │ eye_color │ blue │ │ C-3PO │ hair_color │ NULL │ │ C-3PO │ skin_color │ gold │ │ C-3PO │ eye_color │ yellow │ │ R2-D2 │ hair_color │ NULL │ │ R2-D2 │ skin_color │ white, blue │ │ R2-D2 │ eye_color │ red │ │ Darth Vader │ hair_color │ none │ │ … │ … │ … │ └────────────────┴────────────┴─────────────┘
And the reverse, in dplyr:
starwars_colors |>
pivot_wider(names_from = "attribute", values_from = "value")
In Ibis:
(
starwars_colors.
pivot_wider(names_from="attribute", values_from="color")
)
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ name ┃ hair_color ┃ skin_color ┃ eye_color ┃ ┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━┩ │ string │ string │ string │ string │ ├────────────────────┼───────────────┼─────────────┼───────────┤ │ Luke Skywalker │ blond │ fair │ blue │ │ C-3PO │ NULL │ gold │ yellow │ │ R2-D2 │ NULL │ white, blue │ red │ │ Darth Vader │ none │ white │ yellow │ │ Leia Organa │ brown │ light │ brown │ │ Owen Lars │ brown, grey │ light │ blue │ │ Beru Whitesun lars │ brown │ light │ blue │ │ R5-D4 │ NULL │ white, red │ red │ │ Biggs Darklighter │ black │ light │ brown │ │ Obi-Wan Kenobi │ auburn, white │ fair │ blue-gray │ │ … │ … │ … │ … │ └────────────────────┴───────────────┴─────────────┴───────────┘
Next Steps¶
Now that you've gotten an introudction to the common differences between dplyr and Ibis, head over to Getting Started with ibis for a full introduction. If you're familiar with SQL, check out Ibis for SQL Programmers. If you're familiar with pandas, take a look at Ibis for pandas Users