Tables are one of the core data structures in Ibis.
Table
Table(self, arg)
An immutable and lazy dataframe.
Analogous to a SQL table or a pandas DataFrame. A table expression contains an ordered set of named columns , each with a single known type. Unless explicitly ordered with an .order_by()
, the order of rows is undefined.
Table immutability means that the data underlying an Ibis Table
cannot be modified: every method on a Table returns a new Table with those changes. Laziness means that an Ibis Table
expression does not run your computation every time you call one of its methods. Instead, it is a symbolic expression that represents a set of operations to be performed, which typically is translated into a SQL query. That SQL query is then executed on a backend, where the data actually lives. The result (now small enough to be manageable) can then be materialized back into python as a pandas/pyarrow/python DataFrame/Column/scalar.
You will not create Table objects directly. Instead, you will create one
See the user guide for more info.
Attributes
columns
The list of column names in this table.
Methods
aggregate
Aggregate a table with a given set of reductions grouping by by
.
alias
Create a table expression with a specific name alias
.
as_table
Promote the expression to a table.
asof_join
Perform an “as-of” join between left
and right
.
cache
Cache the provided expression.
cast
Cast the columns of a table.
count
Compute the number of rows in the table.
cross_join
Compute the cross join of a sequence of tables.
difference
Compute the set difference of multiple table expressions.
distinct
Return a Table with duplicate rows removed.
drop
Remove fields from a table.
dropna
Remove rows with null values from the table.
fillna
Fill null values in a table expression.
filter
Select rows from table
based on predicates
.
get_name
Return the fully qualified name of the table.
group_by
Create a grouped table expression.
head
Select the first n
rows of a table.
info
Return summary information about a table.
intersect
Compute the set intersection of multiple table expressions.
join
Perform a join between two tables.
limit
Select n
rows from self
starting at offset
.
mutate
Add columns to a table expression.
nunique
Compute the number of unique rows in the table.
order_by
Sort a table by one or more expressions.
pivot_longer
Transform a table from wider to longer.
pivot_wider
Pivot a table to a wider format.
relabel
Deprecated in favor of Table.rename
.
relocate
Relocate columns
before or after other specified columns.
rename
Rename columns in the table.
rowid
A unique integer per row.
sample
Sample a fraction of rows from a table.
schema
Return the Schema for this table.
select
Compute a new table expression using exprs
and named_exprs
.
sql
Run a SQL query against a table expression.
to_array
View a single column table as an array.
to_pandas
Convert a table expression to a pandas DataFrame.
try_cast
Cast the columns of a table.
union
Compute the set union of multiple table expressions.
unpack
Project the struct fields of each of columns
into self
.
view
Create a new table expression distinct from the current one.
window_by
Create a windowing table-valued function (TVF) expression.
aggregate
aggregate(metrics=None, by=None, having=None, **kwargs)
Aggregate a table with a given set of reductions grouping by by
.
Parameters
metrics
Sequence [ir
.Scalar
] | None
Aggregate expressions. These can be any scalar-producing expression, including aggregation functions like sum
or literal values like ibis.literal(1)
.
None
by
Sequence [ir
.Value
] | None
Grouping expressions.
None
having
Sequence [ir
.BooleanValue
] | None
Post-aggregation filters. The shape requirements are the same metrics
, but the output type for having
is boolean
. ::: {.callout-warning} ## Expressions like x is None
return bool
and will not generate a SQL comparison to NULL
:::
None
kwargs
ir
.Value
Named aggregate expressions
{}
Returns
Table
An aggregate table expression
Examples
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "fruit" : ["apple" , "apple" , "banana" , "orange" ],
... "price" : [0.5 , 0.5 , 0.25 , 0.33 ],
... }
... )
>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ fruit ┃ price ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ string │ float64 │
├────────┼─────────┤
│ apple │ 0.50 │
│ apple │ 0.50 │
│ banana │ 0.25 │
│ orange │ 0.33 │
└────────┴─────────┘
>>> t.aggregate(
... by= ["fruit" ],
... total_cost= _.price.sum (),
... avg_cost= _.price.mean(),
... having= _.price.sum () < 0.5 ,
... )
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ fruit ┃ total_cost ┃ avg_cost ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ float64 │ float64 │
├────────┼────────────┼──────────┤
│ banana │ 0.25 │ 0.25 │
│ orange │ 0.33 │ 0.33 │
└────────┴────────────┴──────────┘
alias
alias(alias)
Create a table expression with a specific name alias
.
This method is useful for exposing an ibis expression to the underlying backend for use in the Table.sql
method.
.alias
creates a temporary view in the database.
This side effect will be removed in a future version of ibis and is not part of the public API .
Parameters
alias
str
Name of the child expression
required
Returns
Table
An table expression
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> expr = t.alias("pingüinos" ).sql('SELECT * FROM "pingüinos" LIMIT 5' )
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
as_table
as_table()
Promote the expression to a table.
This method is a no-op for table expressions.
Examples
>>> t = ibis.table(dict (a= "int" ), name= "t" )
>>> s = t.as_table()
>>> t is s
asof_join
asof_join(left, right, predicates=(), by=(), tolerance=None, *, lname='', rname='{name}_right')
Perform an “as-of” join between left
and right
.
Similar to a left join except that the match is done on nearest key rather than equal keys.
Optionally, match keys with by
before joining with predicates
.
Parameters
left
Table
Table expression
required
right
Table
Table expression
required
predicates
str | ir
.BooleanColumn
| Sequence [str | ir
.BooleanColumn
]
Join expressions
()
by
str | ir
.Column
| Sequence [str | ir
.Column
]
column to group by before joining
()
tolerance
str | ir
.IntervalScalar
| None
Amount of time to look behind when joining
None
lname
str
A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"
).
''
rname
str
A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"
).
'{name}_right'
cache
cache()
Cache the provided expression.
All subsequent operations on the returned expression will be performed on the cached data. Use the with
statement to limit the lifetime of a cached table.
This method is idempotent: calling it multiple times in succession will return the same value as the first call.
Subsequent evaluations will not recompute the expression so method chaining will not incur the overhead of caching more than once.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> cached_penguins = t.mutate(computation= "Heavy Computation" ).cache()
>>> cached_penguins
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃ computation ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │ string │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼───────────────────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │ Heavy Computation │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴───────────────────┘
Explicit cache cleanup
>>> with t.mutate(computation= "Heavy Computation" ).cache() as cached_penguins:
... cached_penguins
cast
cast(schema)
Cast the columns of a table.
Similar to pandas.DataFrame.astype
.
Parameters
schema
SupportsSchema
Mapping, schema or iterable of pairs to use for casting
required
Examples
>>> import ibis
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.schema()
ibis.Schema {
species string
island string
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm int64
body_mass_g int64
sex string
year int64
}
>>> cols = ["body_mass_g" , "bill_length_mm" ]
>>> t[cols].head()
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ body_mass_g ┃ bill_length_mm ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ int64 │ float64 │
├─────────────┼────────────────┤
│ 3750 │ 39.1 │
│ 3800 │ 39.5 │
│ 3250 │ 40.3 │
│ NULL │ NULL │
│ 3450 │ 36.7 │
└─────────────┴────────────────┘
Columns not present in the input schema will be passed through unchanged
['species',
'island',
'bill_length_mm',
'bill_depth_mm',
'flipper_length_mm',
'body_mass_g',
'sex',
'year']
>>> expr = t.cast({"body_mass_g" : "float64" , "bill_length_mm" : "int" })
>>> expr.select(* cols).head()
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ body_mass_g ┃ bill_length_mm ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ float64 │ int64 │
├─────────────┼────────────────┤
│ 3750.0 │ 39 │
│ 3800.0 │ 40 │
│ 3250.0 │ 40 │
│ NULL │ NULL │
│ 3450.0 │ 37 │
└─────────────┴────────────────┘
Columns that are in the input schema
but not in the table raise an error
>>> t.cast({"foo" : "string" })
IbisError: Cast schema has fields that are not in the table: ['foo']
count
count(where=None)
Compute the number of rows in the table.
Parameters
where
ir
.BooleanValue
| None
Optional boolean expression to filter rows when counting.
None
Returns
IntegerScalar
Number of rows in the table
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : ["foo" , "bar" , "baz" ]})
>>> t
┏━━━━━━━━┓
┃ a ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo │
│ bar │
│ baz │
└────────┘
>>> t.count(t.a != "foo" )
ibis.expr.types.numeric.IntegerScalar
cross_join
cross_join(left, right, *rest, lname='', rname='{name}_right')
Compute the cross join of a sequence of tables.
Parameters
left
Table
Left table
required
right
Table
Right table
required
rest
Table
Additional tables to cross join
()
lname
str
A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"
).
''
rname
str
A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"
).
'{name}_right'
Returns
Table
Cross join of left
, right
and rest
Examples
>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.count()
>>> agg = t.drop("year" ).agg(s.across(s.numeric(), _.mean()))
>>> expr = t.cross_join(agg)
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃ bill_length_mm_right ┃ bill_depth_mm_right ┃ flipper_length_mm_right ┃ body_mass_g_right ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │ float64 │ float64 │ float64 │ float64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼──────────────────────┼─────────────────────┼─────────────────────────┼───────────────────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴──────────────────────┴─────────────────────┴─────────────────────────┴───────────────────┘
['species',
'island',
'bill_length_mm',
'bill_depth_mm',
'flipper_length_mm',
'body_mass_g',
'sex',
'year',
'bill_length_mm_right',
'bill_depth_mm_right',
'flipper_length_mm_right',
'body_mass_g_right']
difference
difference(table, *rest, distinct=True)
Compute the set difference of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
Table
A table expression
required
*rest
Table
Additional table expressions
()
distinct
bool
Only diff distinct rows not occurring in the calling table
True
Returns
Table
The rows present in self
that are not present in tables
.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a" : [1 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t2 = ibis.memtable({"a" : [2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
└───────┘
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
└───────┘
distinct
distinct(on=None, keep='first')
Return a Table with duplicate rows removed.
Similar to pandas.DataFrame.drop_duplicates()
.
Parameters
on
str | Iterable [str ] | s
.Selector
| None
Only consider certain columns for identifying duplicates. By default deduplicate all of the columns.
None
keep
Literal [‘first’, ‘last’] | None
Determines which duplicates to keep. - "first"
: Drop duplicates except for the first occurrence. - "last"
: Drop duplicates except for the last occurrence. - None
: Drop all duplicates
'first'
Examples
>>> import ibis
>>> import ibis.examples as ex
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> t = ex.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Compute the distinct rows of a subset of columns
>>> t[["species" , "island" ]].distinct().order_by(s.all ())
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species ┃ island ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ string │
├───────────┼───────────┤
│ Adelie │ Biscoe │
│ Adelie │ Dream │
│ Adelie │ Torgersen │
│ Chinstrap │ Dream │
│ Gentoo │ Biscoe │
└───────────┴───────────┘
Drop all duplicate rows except the first
>>> t.distinct(on= ["species" , "island" ], keep= "first" ).order_by(s.all ())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Biscoe │ 37.8 │ 18.3 │ 174 │ 3400 │ female │ 2007 │
│ Adelie │ Dream │ 39.5 │ 16.7 │ 178 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Chinstrap │ Dream │ 46.5 │ 17.9 │ 192 │ 3500 │ female │ 2007 │
│ Gentoo │ Biscoe │ 46.1 │ 13.2 │ 211 │ 4500 │ female │ 2007 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Drop all duplicate rows except the last
>>> t.distinct(on= ["species" , "island" ], keep= "last" ).order_by(s.all ())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Biscoe │ 42.7 │ 18.3 │ 196 │ 4075 │ male │ 2009 │
│ Adelie │ Dream │ 41.5 │ 18.5 │ 201 │ 4000 │ male │ 2009 │
│ Adelie │ Torgersen │ 43.1 │ 19.2 │ 197 │ 3500 │ male │ 2009 │
│ Chinstrap │ Dream │ 50.2 │ 18.7 │ 198 │ 3775 │ female │ 2009 │
│ Gentoo │ Biscoe │ 49.9 │ 16.1 │ 213 │ 5400 │ male │ 2009 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Drop all duplicated rows
>>> expr = t.distinct(on= ["species" , "island" , "year" , "bill_length_mm" ], keep= None )
>>> expr.count()
You can pass selectors
to on
>>> t.distinct(on=~ s.numeric())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Biscoe │ 37.7 │ 18.7 │ 180 │ 3600 │ male │ 2007 │
│ Gentoo │ Biscoe │ 50.0 │ 16.3 │ 230 │ 5700 │ male │ 2007 │
│ Gentoo │ Biscoe │ 44.5 │ 14.3 │ 216 │ 4100 │ NULL │ 2007 │
│ Adelie │ Dream │ 39.5 │ 16.7 │ 178 │ 3250 │ female │ 2007 │
│ Chinstrap │ Dream │ 50.0 │ 19.5 │ 196 │ 3900 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Dream │ 37.2 │ 18.1 │ 178 │ 3900 │ male │ 2007 │
│ Chinstrap │ Dream │ 46.5 │ 17.9 │ 192 │ 3500 │ female │ 2007 │
│ Adelie │ Biscoe │ 37.8 │ 18.3 │ 174 │ 3400 │ female │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
The only valid values of keep
are "first"
, "last"
and [`None][None]
>>> t.distinct(on= "species" , keep= "second" )
IbisError: Invalid value for `keep`: 'second', must be 'first', 'last' or None
drop
drop(*fields)
Remove fields from a table.
Parameters
fields
str | Selector
Fields to drop. Strings and selectors are accepted.
()
Returns
Table
A table with all columns matching fields
removed.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Drop one or more columns
>>> t.drop("species" ).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
└───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.drop("species" , "bill_length_mm" ).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ float64 │ int64 │ int64 │ string │ int64 │
├───────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Torgersen │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Torgersen │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Torgersen │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Torgersen │ 19.3 │ 193 │ 3450 │ female │ 2007 │
└───────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Drop with selectors, mix and match
>>> import ibis.selectors as s
>>> t.drop("species" , s.startswith("bill_" )).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ int64 │ int64 │ string │ int64 │
├───────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen │ 181 │ 3750 │ male │ 2007 │
│ Torgersen │ 186 │ 3800 │ female │ 2007 │
│ Torgersen │ 195 │ 3250 │ female │ 2007 │
│ Torgersen │ NULL │ NULL │ NULL │ 2007 │
│ Torgersen │ 193 │ 3450 │ female │ 2007 │
└───────────┴───────────────────┴─────────────┴────────┴───────┘
dropna
dropna(subset=None, how='any')
Remove rows with null values from the table.
Parameters
subset
Sequence [str ] | str | None
Columns names to consider when dropping nulls. By default all columns are considered.
None
how
Literal [‘any’, ‘all’]
Determine whether a row is removed if there is at least one null value in the row ('any'
), or if all row values are null ('all'
).
'any'
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.dropna(["bill_length_mm" , "body_mass_g" ]).count()
>>> t.dropna(how= "all" ).count() # no rows where all columns are null
fillna
fillna(replacements)
Fill null values in a table expression.
For example, different library versions may impact whether a given backend promotes integer replacement values to floats.
Parameters
replacements
ir
.Scalar
| Mapping [str , ir
.Scalar
]
Value with which to fill nulls. If replacements
is a mapping, the keys are column names that map to their replacement value. If passed as a scalar all columns are filled with that value.
required
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.sex
┏━━━━━━━━┓
┃ sex ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ male │
│ female │
│ female │
│ NULL │
│ female │
│ male │
│ female │
│ male │
│ NULL │
│ NULL │
│ … │
└────────┘
>>> t.fillna({"sex" : "unrecorded" }).sex
┏━━━━━━━━━━━━┓
┃ sex ┃
┡━━━━━━━━━━━━┩
│ string │
├────────────┤
│ male │
│ female │
│ female │
│ unrecorded │
│ female │
│ male │
│ female │
│ male │
│ unrecorded │
│ unrecorded │
│ … │
└────────────┘
filter
filter(predicates)
Select rows from table
based on predicates
.
Parameters
predicates
ir
.BooleanValue
| Sequence [ir
.BooleanValue
] | IfAnyAll
Boolean value expressions used to select rows in table
.
required
Returns
Table
Filtered table expression
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.filter ([t.species == "Adelie" , t.body_mass_g > 3500 ]).sex.value_counts().dropna(
... "sex"
... ).order_by("sex" )
┏━━━━━━━━┳━━━━━━━━━━━┓
┃ sex ┃ sex_count ┃
┡━━━━━━━━╇━━━━━━━━━━━┩
│ string │ int64 │
├────────┼───────────┤
│ female │ 22 │
│ male │ 68 │
└────────┴───────────┘
get_name
get_name()
Return the fully qualified name of the table.
group_by
group_by(by=None, **key_exprs)
Create a grouped table expression.
Similar to SQL’s GROUP BY statement, or pandas .groupby() method.
Examples
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "fruit" : ["apple" , "apple" , "banana" , "orange" ],
... "price" : [0.5 , 0.5 , 0.25 , 0.33 ],
... }
... )
>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ fruit ┃ price ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ string │ float64 │
├────────┼─────────┤
│ apple │ 0.50 │
│ apple │ 0.50 │
│ banana │ 0.25 │
│ orange │ 0.33 │
└────────┴─────────┘
>>> t.group_by("fruit" ).agg(total_cost= _.price.sum (), avg_cost= _.price.mean()).order_by(
... "fruit"
... )
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ fruit ┃ total_cost ┃ avg_cost ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ float64 │ float64 │
├────────┼────────────┼──────────┤
│ apple │ 1.00 │ 0.50 │
│ banana │ 0.25 │ 0.25 │
│ orange │ 0.33 │ 0.33 │
└────────┴────────────┴──────────┘
head
head(n=5)
Select the first n
rows of a table.
Parameters
n
int
Number of rows to include
5
Returns
Table
self
limited to n
rows
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : [1 , 1 , 2 ], "b" : ["c" , "a" , "a" ]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ c │
│ 1 │ a │
│ 2 │ a │
└───────┴────────┘
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ c │
│ 1 │ a │
└───────┴────────┘
info
info()
Return summary information about a table.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.info()
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━┓
┃ name ┃ type ┃ nullable ┃ nulls ┃ non_nulls ┃ null_frac ┃ pos ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━┩
│ string │ string │ boolean │ int64 │ int64 │ float64 │ int8 │
├───────────────────┼─────────┼──────────┼───────┼───────────┼───────────┼──────┤
│ species │ string │ True │ 0 │ 344 │ 0.000000 │ 0 │
│ island │ string │ True │ 0 │ 344 │ 0.000000 │ 1 │
│ bill_length_mm │ float64 │ True │ 2 │ 342 │ 0.005814 │ 2 │
│ bill_depth_mm │ float64 │ True │ 2 │ 342 │ 0.005814 │ 3 │
│ flipper_length_mm │ int64 │ True │ 2 │ 342 │ 0.005814 │ 4 │
│ body_mass_g │ int64 │ True │ 2 │ 342 │ 0.005814 │ 5 │
│ sex │ string │ True │ 11 │ 333 │ 0.031977 │ 6 │
│ year │ int64 │ True │ 0 │ 344 │ 0.000000 │ 7 │
└───────────────────┴─────────┴──────────┴───────┴───────────┴───────────┴──────┘
intersect
intersect(table, *rest, distinct=True)
Compute the set intersection of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
Table
A table expression
required
*rest
Table
Additional table expressions
()
distinct
bool
Only return distinct rows
True
Returns
Table
A new table containing the intersection of all input tables.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a" : [1 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t2 = ibis.memtable({"a" : [2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
└───────┘
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
└───────┘
join
join(left, right, predicates=(), how='inner', *, lname='', rname='{name}_right')
Perform a join between two tables.
Parameters
left
Table
Left table to join
required
right
Table
Right table to join
required
predicates
str | Sequence [str | ir
.BooleanColumn
| Literal [True] | Literal [False] | tuple [str | ir
.Column
| ir
.Deferred
, str | ir
.Column
| ir
.Deferred
]]
Condition(s) to join on. See examples for details.
()
how
Literal [‘inner’, ‘left’, ‘outer’, ‘right’, ‘semi’, ‘anti’, ‘any_inner’, ‘any_left’, ‘left_semi’]
Join method, e.g. "inner"
or "left"
.
'inner'
lname
str
A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"
).
''
rname
str
A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"
).
'{name}_right'
Examples
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> movies = ibis.examples.ml_latest_small_movies.fetch()
>>> movies.head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ movieId ┃ title ┃ genres ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ string │
├─────────┼────────────────────────────────────┼─────────────────────────────────────────────┤
│ 1 │ Toy Story (1995) │ Adventure|Animation|Children|Comedy|Fantasy │
│ 2 │ Jumanji (1995) │ Adventure|Children|Fantasy │
│ 3 │ Grumpier Old Men (1995) │ Comedy|Romance │
│ 4 │ Waiting to Exhale (1995) │ Comedy|Drama|Romance │
│ 5 │ Father of the Bride Part II (1995) │ Comedy │
└─────────┴────────────────────────────────────┴─────────────────────────────────────────────┘
>>> ratings = ibis.examples.ml_latest_small_ratings.fetch().drop("timestamp" )
>>> ratings.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ int64 │ int64 │ float64 │
├────────┼─────────┼─────────┤
│ 1 │ 1 │ 4.0 │
│ 1 │ 3 │ 4.0 │
│ 1 │ 6 │ 4.0 │
│ 1 │ 47 │ 5.0 │
│ 1 │ 50 │ 5.0 │
└────────┴─────────┴─────────┘
Equality left join on the shared movieId
column. Note the _right
suffix added to all overlapping columns from the right table (in this case only the “movieId” column).
>>> ratings.join(movies, "movieId" , how= "left" ).head(5 )
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating ┃ movieId_right ┃ title ┃ genres ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64 │ int64 │ float64 │ int64 │ string │ string │
├────────┼─────────┼─────────┼───────────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│ 1 │ 1 │ 4.0 │ 1 │ Toy Story (1995) │ Adventure|Animation|Children|Comedy|Fantasy │
│ 1 │ 3 │ 4.0 │ 3 │ Grumpier Old Men (1995) │ Comedy|Romance │
│ 1 │ 6 │ 4.0 │ 6 │ Heat (1995) │ Action|Crime|Thriller │
│ 1 │ 47 │ 5.0 │ 47 │ Seven (a.k.a. Se7en) (1995) │ Mystery|Thriller │
│ 1 │ 50 │ 5.0 │ 50 │ Usual Suspects, The (1995) │ Crime|Mystery|Thriller │
└────────┴─────────┴─────────┴───────────────┴─────────────────────────────┴─────────────────────────────────────────────┘
Explicit equality join using the default how
value of "inner"
. Note how there is no _right
suffix added to the movieId
column since this is an inner join and the movieId
column is part of the join condition.
>>> ratings.join(movies, ratings.movieId == movies.movieId).head(5 )
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating ┃ title ┃ genres ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64 │ int64 │ float64 │ string │ string │
├────────┼─────────┼─────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│ 1 │ 1 │ 4.0 │ Toy Story (1995) │ Adventure|Animation|Children|Comedy|Fantasy │
│ 1 │ 3 │ 4.0 │ Grumpier Old Men (1995) │ Comedy|Romance │
│ 1 │ 6 │ 4.0 │ Heat (1995) │ Action|Crime|Thriller │
│ 1 │ 47 │ 5.0 │ Seven (a.k.a. Se7en) (1995) │ Mystery|Thriller │
│ 1 │ 50 │ 5.0 │ Usual Suspects, The (1995) │ Crime|Mystery|Thriller │
└────────┴─────────┴─────────┴─────────────────────────────┴─────────────────────────────────────────────┘
>>> tags = ibis.examples.ml_latest_small_tags.fetch()
>>> tags.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ tag ┃ timestamp ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64 │ int64 │ string │ int64 │
├────────┼─────────┼─────────────────┼────────────┤
│ 2 │ 60756 │ funny │ 1445714994 │
│ 2 │ 60756 │ Highly quotable │ 1445714996 │
│ 2 │ 60756 │ will ferrell │ 1445714992 │
│ 2 │ 89774 │ Boxing story │ 1445715207 │
│ 2 │ 89774 │ MMA │ 1445715200 │
└────────┴─────────┴─────────────────┴────────────┘
You can join on multiple columns/conditions by passing in a sequence. Find all instances where a user both tagged and rated a movie:
>>> tags.join(ratings, ["userId" , "movieId" ]).head(5 )
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃ userId ┃ movieId ┃ tag ┃ timestamp ┃ rating ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ int64 │ int64 │ string │ int64 │ float64 │
├────────┼─────────┼────────────────┼────────────┼─────────┤
│ 62 │ 2 │ Robin Williams │ 1528843907 │ 4.0 │
│ 62 │ 110 │ sword fight │ 1528152535 │ 4.5 │
│ 62 │ 410 │ gothic │ 1525636609 │ 4.5 │
│ 62 │ 2023 │ mafia │ 1525636733 │ 5.0 │
│ 62 │ 2124 │ quirky │ 1525636846 │ 5.0 │
└────────┴─────────┴────────────────┴────────────┴─────────┘
To self-join a table with itself, you need to call .view()
on one of the arguments so the two tables are distinct from each other.
For crafting more complex join conditions, a valid form of a join condition is a 2-tuple like ({left_key}, {right_key})
, where each key can be
a Column
Deferred expression
lambda of the form (Table) -> Column
For example, to find all movies pairings that received the same (ignoring case) tags:
>>> movie_tags = tags["movieId" , "tag" ]
>>> view = movie_tags.view()
>>> movie_tags.join(
... view,
... [
... movie_tags.movieId != view.movieId,
... (_.tag.lower(), lambda t: t.tag.lower()),
... ],
... ).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ movieId ┃ tag ┃ movieId_right ┃ tag_right ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ int64 │ string │
├─────────┼───────────────────┼───────────────┼───────────────────┤
│ 60756 │ funny │ 1732 │ funny │
│ 60756 │ Highly quotable │ 1732 │ Highly quotable │
│ 89774 │ Tom Hardy │ 139385 │ tom hardy │
│ 106782 │ drugs │ 1732 │ drugs │
│ 106782 │ Leonardo DiCaprio │ 5989 │ Leonardo DiCaprio │
└─────────┴───────────────────┴───────────────┴───────────────────┘
limit
limit(n, offset=0)
Select n
rows from self
starting at offset
.
Parameters
n
int | None
Number of rows to include. If None
, the entire table is selected starting from offset
.
required
offset
int
Number of rows to skip first
0
Returns
Table
The first n
rows of self
starting at offset
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : [1 , 1 , 2 ], "b" : ["c" , "a" , "a" ]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ c │
│ 1 │ a │
│ 2 │ a │
└───────┴────────┘
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ c │
│ 1 │ a │
└───────┴────────┘
You can use None
with offset
to slice starting from a particular row
>>> t.limit(None , offset= 1 )
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ a │
│ 2 │ a │
└───────┴────────┘
mutate
mutate(exprs=None, **mutations)
Add columns to a table expression.
Parameters
exprs
Sequence [ir
.Expr
] | None
List of named expressions to add as columns
None
mutations
ir
.Value
Named expressions using keyword arguments
{}
Returns
Table
Table expression with additional columns
Examples
>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch().select("species" , "year" , "bill_length_mm" )
>>> t
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species ┃ year ┃ bill_length_mm ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ int64 │ float64 │
├─────────┼───────┼────────────────┤
│ Adelie │ 2007 │ 39.1 │
│ Adelie │ 2007 │ 39.5 │
│ Adelie │ 2007 │ 40.3 │
│ Adelie │ 2007 │ NULL │
│ Adelie │ 2007 │ 36.7 │
│ Adelie │ 2007 │ 39.3 │
│ Adelie │ 2007 │ 38.9 │
│ Adelie │ 2007 │ 39.2 │
│ Adelie │ 2007 │ 34.1 │
│ Adelie │ 2007 │ 42.0 │
│ … │ … │ … │
└─────────┴───────┴────────────────┘
Add a new column from a per-element expression
>>> t.mutate(next_year= _.year + 1 ).head()
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species ┃ year ┃ bill_length_mm ┃ next_year ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ int64 │ float64 │ int64 │
├─────────┼───────┼────────────────┼───────────┤
│ Adelie │ 2007 │ 39.1 │ 2008 │
│ Adelie │ 2007 │ 39.5 │ 2008 │
│ Adelie │ 2007 │ 40.3 │ 2008 │
│ Adelie │ 2007 │ NULL │ 2008 │
│ Adelie │ 2007 │ 36.7 │ 2008 │
└─────────┴───────┴────────────────┴───────────┘
Add a new column based on an aggregation. Note the automatic broadcasting.
>>> t.select("species" , bill_demean= _.bill_length_mm - _.bill_length_mm.mean()).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ species ┃ bill_demean ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━┩
│ string │ float64 │
├─────────┼─────────────┤
│ Adelie │ -4.82193 │
│ Adelie │ -4.42193 │
│ Adelie │ -3.62193 │
│ Adelie │ NULL │
│ Adelie │ -7.22193 │
└─────────┴─────────────┘
Mutate across multiple columns
>>> t.mutate(s.across(s.numeric() & ~ s.c("year" ), _ - _.mean())).head()
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species ┃ year ┃ bill_length_mm ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ int64 │ float64 │
├─────────┼───────┼────────────────┤
│ Adelie │ 2007 │ -4.82193 │
│ Adelie │ 2007 │ -4.42193 │
│ Adelie │ 2007 │ -3.62193 │
│ Adelie │ 2007 │ NULL │
│ Adelie │ 2007 │ -7.22193 │
└─────────┴───────┴────────────────┘
nunique
nunique(where=None)
Compute the number of unique rows in the table.
Parameters
where
ir
.BooleanValue
| None
Optional boolean expression to filter rows when counting.
None
Returns
IntegerScalar
Number of unique rows in the table
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : ["foo" , "bar" , "bar" ]})
>>> t
┏━━━━━━━━┓
┃ a ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo │
│ bar │
│ bar │
└────────┘
>>> t.nunique(t.a != "foo" )
order_by
order_by(by)
Sort a table by one or more expressions.
Similar to pandas.DataFrame.sort_values()
.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "a" : [3 , 2 , 1 , 3 ],
... "b" : ["a" , "B" , "c" , "D" ],
... "c" : [4 , 6 , 5 , 7 ],
... }
... )
>>> t
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 3 │ a │ 4 │
│ 2 │ B │ 6 │
│ 1 │ c │ 5 │
│ 3 │ D │ 7 │
└───────┴────────┴───────┘
Sort by b. Default is ascending. Note how capital letters come before lowercase
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 2 │ B │ 6 │
│ 3 │ D │ 7 │
│ 3 │ a │ 4 │
│ 1 │ c │ 5 │
└───────┴────────┴───────┘
Sort in descending order
>>> t.order_by(ibis.desc("b" ))
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 1 │ c │ 5 │
│ 3 │ a │ 4 │
│ 3 │ D │ 7 │
│ 2 │ B │ 6 │
└───────┴────────┴───────┘
You can also use the deferred API to get the same result
>>> from ibis import _
>>> t.order_by(_.b.desc())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 1 │ c │ 5 │
│ 3 │ a │ 4 │
│ 3 │ D │ 7 │
│ 2 │ B │ 6 │
└───────┴────────┴───────┘
Sort by multiple columns/expressions
>>> t.order_by(["a" , _.c.desc()])
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 1 │ c │ 5 │
│ 2 │ B │ 6 │
│ 3 │ D │ 7 │
│ 3 │ a │ 4 │
└───────┴────────┴───────┘
You can actually pass arbitrary expressions to use as sort keys. For example, to ignore the case of the strings in column b
>>> t.order_by(_.b.lower())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 3 │ a │ 4 │
│ 2 │ B │ 6 │
│ 1 │ c │ 5 │
│ 3 │ D │ 7 │
└───────┴────────┴───────┘
This means than shuffling a Table is super simple
>>> t.order_by(ibis.random())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 1 │ c │ 5 │
│ 2 │ B │ 6 │
│ 3 │ a │ 4 │
│ 3 │ D │ 7 │
└───────┴────────┴───────┘
pivot_longer
pivot_longer(col, *, names_to='name', names_pattern='(.+)', names_transform=None, values_to='value', values_transform=None)
Transform a table from wider to longer.
Parameters
col
str | s
.Selector
String column name or selector.
required
names_to
str | Iterable [str ]
A string or iterable of strings indicating how to name the new pivoted columns.
'name'
names_pattern
str | re .Pattern
Pattern to use to extract column names from the input. By default the entire column name is extracted.
'(.+)'
names_transform
Callable [[str ], ir
.Value
] | Mapping [str , Callable [[str ], ir
.Value
]] | None
Function or mapping of a name in names_to
to a function to transform a column name to a value.
None
values_to
str
Name of the pivoted value column.
'value'
values_transform
Callable [[ir
.Value
], ir
.Value
] | Deferred
| None
Apply a function to the value column. This can be a lambda or deferred expression.
None
Examples
Basic usage
>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
>>> relig_income = ibis.examples.relig_income_raw.fetch()
>>> relig_income
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ religion ┃ <$10k ┃ $10-20k ┃ $20-30k ┃ $30-40k ┃ $40-50k ┃ $50-75k ┃ $75-100k ┃ $100-150k ┃ >150k ┃ Don't know/refused ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │
├─────────────────────────┼───────┼─────────┼─────────┼─────────┼─────────┼─────────┼──────────┼───────────┼───────┼────────────────────┤
│ Agnostic │ 27 │ 34 │ 60 │ 81 │ 76 │ 137 │ 122 │ 109 │ 84 │ 96 │
│ Atheist │ 12 │ 27 │ 37 │ 52 │ 35 │ 70 │ 73 │ 59 │ 74 │ 76 │
│ Buddhist │ 27 │ 21 │ 30 │ 34 │ 33 │ 58 │ 62 │ 39 │ 53 │ 54 │
│ Catholic │ 418 │ 617 │ 732 │ 670 │ 638 │ 1116 │ 949 │ 792 │ 633 │ 1489 │
│ Don’t know/refused │ 15 │ 14 │ 15 │ 11 │ 10 │ 35 │ 21 │ 17 │ 18 │ 116 │
│ Evangelical Prot │ 575 │ 869 │ 1064 │ 982 │ 881 │ 1486 │ 949 │ 723 │ 414 │ 1529 │
│ Hindu │ 1 │ 9 │ 7 │ 9 │ 11 │ 34 │ 47 │ 48 │ 54 │ 37 │
│ Historically Black Prot │ 228 │ 244 │ 236 │ 238 │ 197 │ 223 │ 131 │ 81 │ 78 │ 339 │
│ Jehovah's Witness │ 20 │ 27 │ 24 │ 24 │ 21 │ 30 │ 15 │ 11 │ 6 │ 37 │
│ Jewish │ 19 │ 19 │ 25 │ 25 │ 30 │ 95 │ 69 │ 87 │ 151 │ 162 │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└─────────────────────────┴───────┴─────────┴─────────┴─────────┴─────────┴─────────┴──────────┴───────────┴───────┴────────────────────┘
Here we convert column names not matching the selector for the religion
column and convert those names into values
>>> relig_income.pivot_longer(~ s.c("religion" ), names_to= "income" , values_to= "count" )
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ religion ┃ income ┃ count ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │
├──────────┼────────────────────┼───────┤
│ Agnostic │ <$10k │ 27 │
│ Agnostic │ $10-20k │ 34 │
│ Agnostic │ $20-30k │ 60 │
│ Agnostic │ $30-40k │ 81 │
│ Agnostic │ $40-50k │ 76 │
│ Agnostic │ $50-75k │ 137 │
│ Agnostic │ $75-100k │ 122 │
│ Agnostic │ $100-150k │ 109 │
│ Agnostic │ >150k │ 84 │
│ Agnostic │ Don't know/refused │ 96 │
│ … │ … │ … │
└──────────┴────────────────────┴───────┘
Similarly for a different example dataset, we convert names to values but using a different selector and the default values_to
value.
>>> world_bank_pop = ibis.examples.world_bank_pop_raw.fetch()
>>> world_bank_pop.head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ country ┃ indicator ┃ 2000 ┃ 2001 ┃ 2002 ┃ 2003 ┃ 2004 ┃ 2005 ┃ 2006 ┃ 2007 ┃ 2008 ┃ 2009 ┃ 2010 ┃ 2011 ┃ 2012 ┃ 2013 ┃ 2014 ┃ 2015 ┃ 2016 ┃ 2017 ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │
├─────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ ABW │ SP.URB.TOTL │ 4.162500e+04 │ 4.202500e+04 │ 4.219400e+04 │ 4.227700e+04 │ 4.231700e+04 │ 4.239900e+04 │ 4.255500e+04 │ 4.272900e+04 │ 4.290600e+04 │ 4.307900e+04 │ 4.320600e+04 │ 4.349300e+04 │ 4.386400e+04 │ 4.422800e+04 │ 4.458800e+04 │ 4.494300e+04 │ 4.529700e+04 │ 4.564800e+04 │
│ ABW │ SP.URB.GROW │ 1.664222e+00 │ 9.563731e-01 │ 4.013352e-01 │ 1.965172e-01 │ 9.456936e-02 │ 1.935880e-01 │ 3.672580e-01 │ 4.080490e-01 │ 4.133830e-01 │ 4.023963e-01 │ 2.943735e-01 │ 6.620631e-01 │ 8.493932e-01 │ 8.264135e-01 │ 8.106692e-01 │ 7.930256e-01 │ 7.845785e-01 │ 7.718989e-01 │
│ ABW │ SP.POP.TOTL │ 8.910100e+04 │ 9.069100e+04 │ 9.178100e+04 │ 9.270100e+04 │ 9.354000e+04 │ 9.448300e+04 │ 9.560600e+04 │ 9.678700e+04 │ 9.799600e+04 │ 9.921200e+04 │ 1.003410e+05 │ 1.012880e+05 │ 1.021120e+05 │ 1.028800e+05 │ 1.035940e+05 │ 1.042570e+05 │ 1.048740e+05 │ 1.054390e+05 │
│ ABW │ SP.POP.GROW │ 2.539234e+00 │ 1.768757e+00 │ 1.194718e+00 │ 9.973955e-01 │ 9.009892e-01 │ 1.003077e+00 │ 1.181566e+00 │ 1.227711e+00 │ 1.241397e+00 │ 1.233231e+00 │ 1.131541e+00 │ 9.393559e-01 │ 8.102306e-01 │ 7.493010e-01 │ 6.916153e-01 │ 6.379592e-01 │ 5.900625e-01 │ 5.372957e-01 │
│ AFE │ SP.URB.TOTL │ 1.155517e+08 │ 1.197755e+08 │ 1.242275e+08 │ 1.288340e+08 │ 1.336475e+08 │ 1.387456e+08 │ 1.440267e+08 │ 1.492313e+08 │ 1.553838e+08 │ 1.617762e+08 │ 1.684561e+08 │ 1.754157e+08 │ 1.825587e+08 │ 1.901087e+08 │ 1.980733e+08 │ 2.065563e+08 │ 2.150833e+08 │ 2.237321e+08 │
└─────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┘
>>> world_bank_pop.pivot_longer(s.matches(r"\d {4} " ), names_to= "year" ).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
┃ country ┃ indicator ┃ year ┃ value ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
│ string │ string │ string │ float64 │
├─────────┼─────────────┼────────┼─────────┤
│ ABW │ SP.URB.TOTL │ 2000 │ 41625.0 │
│ ABW │ SP.URB.TOTL │ 2001 │ 42025.0 │
│ ABW │ SP.URB.TOTL │ 2002 │ 42194.0 │
│ ABW │ SP.URB.TOTL │ 2003 │ 42277.0 │
│ ABW │ SP.URB.TOTL │ 2004 │ 42317.0 │
└─────────┴─────────────┴────────┴─────────┘
pivot_longer
has some preprocessing capabiltiies like stripping a prefix and applying a function to column names
>>> billboard = ibis.examples.billboard.fetch()
>>> billboard
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ artist ┃ track ┃ date_entered ┃ wk1 ┃ wk2 ┃ wk3 ┃ wk4 ┃ wk5 ┃ wk6 ┃ wk7 ┃ wk8 ┃ wk9 ┃ wk10 ┃ wk11 ┃ wk12 ┃ wk13 ┃ wk14 ┃ wk15 ┃ wk16 ┃ wk17 ┃ wk18 ┃ wk19 ┃ wk20 ┃ wk21 ┃ wk22 ┃ wk23 ┃ wk24 ┃ wk25 ┃ wk26 ┃ wk27 ┃ wk28 ┃ wk29 ┃ wk30 ┃ wk31 ┃ wk32 ┃ wk33 ┃ wk34 ┃ wk35 ┃ wk36 ┃ wk37 ┃ wk38 ┃ wk39 ┃ wk40 ┃ wk41 ┃ wk42 ┃ wk43 ┃ wk44 ┃ wk45 ┃ wk46 ┃ wk47 ┃ wk48 ┃ wk49 ┃ wk50 ┃ wk51 ┃ wk52 ┃ wk53 ┃ wk54 ┃ wk55 ┃ wk56 ┃ wk57 ┃ wk58 ┃ wk59 ┃ wk60 ┃ wk61 ┃ wk62 ┃ wk63 ┃ wk64 ┃ wk65 ┃ wk66 ┃ wk67 ┃ wk68 ┃ wk69 ┃ wk70 ┃ wk71 ┃ wk72 ┃ wk73 ┃ wk74 ┃ wk75 ┃ wk76 ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ string │ string │ date │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ string │ string │ string │ string │ string │ string │ string │ string │ string │ string │ string │
├────────────────┼─────────────────────────┼──────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┤
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 87 │ 82 │ 72 │ 77 │ 87 │ 94 │ 99 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02 │ 91 │ 87 │ 92 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 3 Doors Down │ Kryptonite │ 2000-04-08 │ 81 │ 70 │ 68 │ 67 │ 66 │ 57 │ 54 │ 53 │ 51 │ 51 │ 51 │ 51 │ 47 │ 44 │ 38 │ 28 │ 22 │ 18 │ 18 │ 14 │ 12 │ 7 │ 6 │ 6 │ 6 │ 5 │ 5 │ 4 │ 4 │ 4 │ 4 │ 3 │ 3 │ 3 │ 4 │ 5 │ 5 │ 9 │ 9 │ 15 │ 14 │ 13 │ 14 │ 16 │ 17 │ 21 │ 22 │ 24 │ 28 │ 33 │ 42 │ 42 │ 49 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 3 Doors Down │ Loser │ 2000-10-21 │ 76 │ 76 │ 72 │ 69 │ 67 │ 65 │ 55 │ 59 │ 62 │ 61 │ 61 │ 59 │ 61 │ 66 │ 72 │ 76 │ 75 │ 67 │ 73 │ 70 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 504 Boyz │ Wobble Wobble │ 2000-04-15 │ 57 │ 34 │ 25 │ 17 │ 17 │ 31 │ 36 │ 49 │ 53 │ 57 │ 64 │ 70 │ 75 │ 76 │ 78 │ 85 │ 92 │ 96 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 98^0 │ Give Me Just One Nig... │ 2000-08-19 │ 51 │ 39 │ 34 │ 26 │ 26 │ 19 │ 2 │ 2 │ 3 │ 6 │ 7 │ 22 │ 29 │ 36 │ 47 │ 67 │ 66 │ 84 │ 93 │ 94 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ A*Teens │ Dancing Queen │ 2000-07-08 │ 97 │ 97 │ 96 │ 95 │ 100 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Aaliyah │ I Don't Wanna │ 2000-01-29 │ 84 │ 62 │ 51 │ 41 │ 38 │ 35 │ 35 │ 38 │ 38 │ 36 │ 37 │ 37 │ 38 │ 49 │ 61 │ 63 │ 62 │ 67 │ 83 │ 86 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Aaliyah │ Try Again │ 2000-03-18 │ 59 │ 53 │ 38 │ 28 │ 21 │ 18 │ 16 │ 14 │ 12 │ 10 │ 9 │ 8 │ 6 │ 1 │ 2 │ 2 │ 2 │ 2 │ 3 │ 4 │ 5 │ 5 │ 6 │ 9 │ 13 │ 14 │ 16 │ 23 │ 22 │ 33 │ 36 │ 43 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Adams, Yolanda │ Open My Heart │ 2000-08-26 │ 76 │ 76 │ 74 │ 69 │ 68 │ 67 │ 61 │ 58 │ 57 │ 59 │ 66 │ 68 │ 61 │ 67 │ 59 │ 63 │ 67 │ 71 │ 79 │ 89 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└────────────────┴─────────────────────────┴──────────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┘
>>> billboard.pivot_longer(
... s.startswith("wk" ),
... names_to= "week" ,
... names_pattern= r"wk(.+)" ,
... names_transform= int ,
... values_to= "rank" ,
... values_transform= _.cast("int" ),
... ).dropna("rank" )
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━┓
┃ artist ┃ track ┃ date_entered ┃ week ┃ rank ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━┩
│ string │ string │ date │ int8 │ int64 │
├─────────┼─────────────────────────┼──────────────┼──────┼───────┤
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 1 │ 87 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 2 │ 82 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 3 │ 72 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 4 │ 77 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 5 │ 87 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 6 │ 94 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 7 │ 99 │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02 │ 1 │ 91 │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02 │ 2 │ 87 │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02 │ 3 │ 92 │
│ … │ … │ … │ … │ … │
└─────────┴─────────────────────────┴──────────────┴──────┴───────┘
You can use regular expression capture groups to extract multiple variables stored in column names
>>> who = ibis.examples.who.fetch()
>>> who
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ country ┃ iso2 ┃ iso3 ┃ year ┃ new_sp_m014 ┃ new_sp_m1524 ┃ new_sp_m2534 ┃ new_sp_m3544 ┃ new_sp_m4554 ┃ new_sp_m5564 ┃ new_sp_m65 ┃ new_sp_f014 ┃ new_sp_f1524 ┃ new_sp_f2534 ┃ new_sp_f3544 ┃ new_sp_f4554 ┃ new_sp_f5564 ┃ new_sp_f65 ┃ new_sn_m014 ┃ new_sn_m1524 ┃ new_sn_m2534 ┃ new_sn_m3544 ┃ new_sn_m4554 ┃ new_sn_m5564 ┃ new_sn_m65 ┃ new_sn_f014 ┃ new_sn_f1524 ┃ new_sn_f2534 ┃ new_sn_f3544 ┃ new_sn_f4554 ┃ new_sn_f5564 ┃ new_sn_f65 ┃ new_ep_m014 ┃ new_ep_m1524 ┃ new_ep_m2534 ┃ new_ep_m3544 ┃ new_ep_m4554 ┃ new_ep_m5564 ┃ new_ep_m65 ┃ new_ep_f014 ┃ new_ep_f1524 ┃ new_ep_f2534 ┃ new_ep_f3544 ┃ new_ep_f4554 ┃ new_ep_f5564 ┃ new_ep_f65 ┃ newrel_m014 ┃ newrel_m1524 ┃ newrel_m2534 ┃ newrel_m3544 ┃ newrel_m4554 ┃ newrel_m5564 ┃ newrel_m65 ┃ newrel_f014 ┃ newrel_f1524 ┃ newrel_f2534 ┃ newrel_f3544 ┃ newrel_f4554 ┃ newrel_f5564 ┃ newrel_f65 ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ string │ string │ string │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │
├─────────────┼────────┼────────┼───────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┤
│ Afghanistan │ AF │ AFG │ 1980 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1981 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1982 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1983 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1984 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1985 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1986 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1987 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1988 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1989 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└─────────────┴────────┴────────┴───────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┘
>>> who.pivot_longer(
... s.r["new_sp_m014" :"newrel_f65" ],
... names_to= ["diagnosis" , "gender" , "age" ],
... names_pattern= "new_?(.*)_(.)(.*)" ,
... values_to= "count" ,
... )
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ country ┃ iso2 ┃ iso3 ┃ year ┃ diagnosis ┃ gender ┃ age ┃ count ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ string │ string │ string │ int64 │
├─────────────┼────────┼────────┼───────┼───────────┼────────┼────────┼───────┤
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 014 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 1524 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 2534 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 3544 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 4554 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 5564 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 65 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ f │ 014 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ f │ 1524 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ f │ 2534 │ NULL │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────────┴────────┴────────┴───────┴───────────┴────────┴────────┴───────┘
names_transform
is flexible, and can be:
1. A mapping of one or more names in `names_to` to callable
2. A callable that will be applied to every name
Let’s recode gender and age to numeric values using a mapping
>>> who.pivot_longer(
... s.r["new_sp_m014" :"newrel_f65" ],
... names_to= ["diagnosis" , "gender" , "age" ],
... names_pattern= "new_?(.*)_(.)(.*)" ,
... names_transform= dict (
... gender= {"m" : 1 , "f" : 2 }.get,
... age= dict (
... zip (
... ["014" , "1524" , "2534" , "3544" , "4554" , "5564" , "65" ],
... range (7 ),
... )
... ).get,
... ),
... values_to= "count" ,
... )
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━┳━━━━━━━┓
┃ country ┃ iso2 ┃ iso3 ┃ year ┃ diagnosis ┃ gender ┃ age ┃ count ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ string │ int8 │ int8 │ int64 │
├─────────────┼────────┼────────┼───────┼───────────┼────────┼──────┼───────┤
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 0 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 1 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 2 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 3 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 4 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 5 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 6 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 2 │ 0 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 2 │ 1 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 2 │ 2 │ NULL │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────────┴────────┴────────┴───────┴───────────┴────────┴──────┴───────┘
The number of match groups in names_pattern
must match the length of names_to
>>> who.pivot_longer(
... s.r["new_sp_m014" :"newrel_f65" ],
... names_to= ["diagnosis" , "gender" , "age" ],
... names_pattern= "new_?(.*)_.(.*)" ,
... )
IbisInputError: Number of match groups in `names_pattern`'new_?(.*)_.(.*)' (2 groups) doesn't match the length of `names_to` ['diagnosis', 'gender', 'age'] (length 3)
names_transform
must be a mapping or callable
>>> who.pivot_longer(
... s.r["new_sp_m014" :"newrel_f65" ], names_transform= "upper"
... ) # quartodoc: +EXPECTED_FAILURE
IbisTypeError: `names_transform` must be a mapping or callable. Got <class 'str'>
pivot_wider
pivot_wider(id_cols=None, names_from='name', names_prefix='', names_sep='_', names_sort=False, names=None, values_from='value', values_fill=None, values_agg='arbitrary')
Pivot a table to a wider format.
Parameters
id_cols
s
.Selector
| None
A set of columns that uniquely identify each observation.
None
names_from
str | Iterable [str ] | s
.Selector
An argument describing which column or columns to use to get the name of the output columns.
'name'
names_prefix
str
String added to the start of every column name.
''
names_sep
str
If names_from
or values_from
contains multiple columns, this argument will be used to join their values together into a single string to use as a column name.
'_'
names_sort
bool
If True
columns are sorted. If False
column names are ordered by appearance.
False
names
Iterable [str ] | None
An explicit sequence of values to look for in columns matching names_from
. * When this value is None
, the values will be computed from names_from
. * When this value is not None
, each element’s length must match the length of names_from
. See examples below for more detail.
None
values_from
str | Iterable [str ] | s
.Selector
An argument describing which column or columns to get the cell values from.
'value'
values_fill
int | float | str | ir
.Scalar
| None
A scalar value that specifies what each value should be filled with when missing.
None
values_agg
str | Callable [[ir
.Value
], ir
.Scalar
] | Deferred
A function applied to the value in each cell in the output.
'arbitrary'
Returns
Table
Wider pivoted table
Examples
>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
Basic usage
>>> fish_encounters = ibis.examples.fish_encounters.fetch()
>>> fish_encounters
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━┓
┃ fish ┃ station ┃ seen ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼─────────┼───────┤
│ 4842 │ Release │ 1 │
│ 4842 │ I80_1 │ 1 │
│ 4842 │ Lisbon │ 1 │
│ 4842 │ Rstr │ 1 │
│ 4842 │ Base_TD │ 1 │
│ 4842 │ BCE │ 1 │
│ 4842 │ BCW │ 1 │
│ 4842 │ BCE2 │ 1 │
│ 4842 │ BCW2 │ 1 │
│ 4842 │ MAE │ 1 │
│ … │ … │ … │
└───────┴─────────┴───────┘
>>> fish_encounters.pivot_wider(names_from= "station" , values_from= "seen" )
┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ fish ┃ Lisbon ┃ Base_TD ┃ MAE ┃ MAW ┃ Release ┃ BCE ┃ BCE2 ┃ I80_1 ┃ Rstr ┃ BCW ┃ BCW2 ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │
├───────┼────────┼─────────┼───────┼───────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 4848 │ 1 │ NULL │ NULL │ NULL │ 1 │ NULL │ NULL │ 1 │ 1 │ NULL │ NULL │
│ 4865 │ 1 │ NULL │ NULL │ NULL │ 1 │ NULL │ NULL │ 1 │ NULL │ NULL │ NULL │
│ 4843 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4844 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4845 │ 1 │ 1 │ NULL │ NULL │ 1 │ NULL │ NULL │ 1 │ 1 │ NULL │ NULL │
│ 4849 │ NULL │ NULL │ NULL │ NULL │ 1 │ NULL │ NULL │ 1 │ NULL │ NULL │ NULL │
│ 4859 │ 1 │ 1 │ NULL │ NULL │ 1 │ NULL │ NULL │ 1 │ 1 │ NULL │ NULL │
│ 4861 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4847 │ 1 │ NULL │ NULL │ NULL │ 1 │ NULL │ NULL │ 1 │ NULL │ NULL │ NULL │
│ 4850 │ NULL │ 1 │ NULL │ NULL │ 1 │ 1 │ NULL │ 1 │ 1 │ 1 │ NULL │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└───────┴────────┴─────────┴───────┴───────┴─────────┴───────┴───────┴───────┴───────┴───────┴───────┘
Fill missing pivoted values using values_fill
>>> fish_encounters.pivot_wider(
... names_from= "station" , values_from= "seen" , values_fill= 0
... )
┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ fish ┃ Lisbon ┃ Base_TD ┃ MAE ┃ MAW ┃ Release ┃ BCE ┃ BCE2 ┃ I80_1 ┃ Rstr ┃ BCW ┃ BCW2 ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │
├───────┼────────┼─────────┼───────┼───────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 4848 │ 1 │ 0 │ 0 │ 0 │ 1 │ 0 │ 0 │ 1 │ 1 │ 0 │ 0 │
│ 4865 │ 1 │ 0 │ 0 │ 0 │ 1 │ 0 │ 0 │ 1 │ 0 │ 0 │ 0 │
│ 4847 │ 1 │ 0 │ 0 │ 0 │ 1 │ 0 │ 0 │ 1 │ 0 │ 0 │ 0 │
│ 4850 │ 0 │ 1 │ 0 │ 0 │ 1 │ 1 │ 0 │ 1 │ 1 │ 1 │ 0 │
│ 4864 │ 0 │ 0 │ 0 │ 0 │ 1 │ 0 │ 0 │ 1 │ 0 │ 0 │ 0 │
│ 4843 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4844 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4845 │ 1 │ 1 │ 0 │ 0 │ 1 │ 0 │ 0 │ 1 │ 1 │ 0 │ 0 │
│ 4849 │ 0 │ 0 │ 0 │ 0 │ 1 │ 0 │ 0 │ 1 │ 0 │ 0 │ 0 │
│ 4859 │ 1 │ 1 │ 0 │ 0 │ 1 │ 0 │ 0 │ 1 │ 1 │ 0 │ 0 │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└───────┴────────┴─────────┴───────┴───────┴─────────┴───────┴───────┴───────┴───────┴───────┴───────┘
Compute multiple values columns
>>> us_rent_income = ibis.examples.us_rent_income.fetch()
>>> us_rent_income
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ geoid ┃ name ┃ variable ┃ estimate ┃ moe ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ int64 │
├────────┼────────────┼──────────┼──────────┼───────┤
│ 01 │ Alabama │ income │ 24476 │ 136 │
│ 01 │ Alabama │ rent │ 747 │ 3 │
│ 02 │ Alaska │ income │ 32940 │ 508 │
│ 02 │ Alaska │ rent │ 1200 │ 13 │
│ 04 │ Arizona │ income │ 27517 │ 148 │
│ 04 │ Arizona │ rent │ 972 │ 4 │
│ 05 │ Arkansas │ income │ 23789 │ 165 │
│ 05 │ Arkansas │ rent │ 709 │ 5 │
│ 06 │ California │ income │ 29454 │ 109 │
│ 06 │ California │ rent │ 1358 │ 3 │
│ … │ … │ … │ … │ … │
└────────┴────────────┴──────────┴──────────┴───────┘
>>> us_rent_income.pivot_wider(
... names_from= "variable" , values_from= ["estimate" , "moe" ]
... )
┏━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ geoid ┃ name ┃ estimate_income ┃ moe_income ┃ estimate_rent ┃ moe_rent ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ string │ int64 │ int64 │ int64 │ int64 │
├────────┼──────────────┼─────────────────┼────────────┼───────────────┼──────────┤
│ 05 │ Arkansas │ 23789 │ 165 │ 709 │ 5 │
│ 16 │ Idaho │ 25298 │ 208 │ 792 │ 7 │
│ 30 │ Montana │ 26249 │ 206 │ 751 │ 9 │
│ 39 │ Ohio │ 27435 │ 94 │ 764 │ 2 │
│ 47 │ Tennessee │ 25453 │ 102 │ 808 │ 4 │
│ 06 │ California │ 29454 │ 109 │ 1358 │ 3 │
│ 13 │ Georgia │ 27024 │ 106 │ 927 │ 3 │
│ 15 │ Hawaii │ 32453 │ 218 │ 1507 │ 18 │
│ 38 │ North Dakota │ 32336 │ 245 │ 775 │ 9 │
│ 40 │ Oklahoma │ 26207 │ 101 │ 766 │ 3 │
│ … │ … │ … │ … │ … │ … │
└────────┴──────────────┴─────────────────┴────────────┴───────────────┴──────────┘
The column name separator can be changed using the names_sep
parameter
>>> us_rent_income.pivot_wider(
... names_from= "variable" ,
... names_sep= "." ,
... values_from= ("estimate" , "moe" ),
... )
┏━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ geoid ┃ name ┃ estimate.income ┃ moe.income ┃ estimate.rent ┃ moe.rent ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ string │ int64 │ int64 │ int64 │ int64 │
├────────┼────────────────┼─────────────────┼────────────┼───────────────┼──────────┤
│ 05 │ Arkansas │ 23789 │ 165 │ 709 │ 5 │
│ 16 │ Idaho │ 25298 │ 208 │ 792 │ 7 │
│ 30 │ Montana │ 26249 │ 206 │ 751 │ 9 │
│ 39 │ Ohio │ 27435 │ 94 │ 764 │ 2 │
│ 47 │ Tennessee │ 25453 │ 102 │ 808 │ 4 │
│ 04 │ Arizona │ 27517 │ 148 │ 972 │ 4 │
│ 28 │ Mississippi │ 22766 │ 194 │ 740 │ 5 │
│ 32 │ Nevada │ 29019 │ 213 │ 1017 │ 6 │
│ 33 │ New Hampshire │ 33172 │ 387 │ 1052 │ 9 │
│ 37 │ North Carolina │ 26482 │ 111 │ 844 │ 3 │
│ … │ … │ … │ … │ … │ … │
└────────┴────────────────┴─────────────────┴────────────┴───────────────┴──────────┘
Supply an alternative function to summarize values
>>> warpbreaks = ibis.examples.warpbreaks.fetch().select("wool" , "tension" , "breaks" )
>>> warpbreaks
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┓
┃ wool ┃ tension ┃ breaks ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━┩
│ string │ string │ int64 │
├────────┼─────────┼────────┤
│ A │ L │ 26 │
│ A │ L │ 30 │
│ A │ L │ 54 │
│ A │ L │ 25 │
│ A │ L │ 70 │
│ A │ L │ 52 │
│ A │ L │ 51 │
│ A │ L │ 26 │
│ A │ L │ 67 │
│ A │ M │ 18 │
│ … │ … │ … │
└────────┴─────────┴────────┘
>>> warpbreaks.pivot_wider(names_from= "wool" , values_from= "breaks" , values_agg= "mean" )
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ tension ┃ A ┃ B ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ float64 │ float64 │
├─────────┼───────────┼───────────┤
│ L │ 44.555556 │ 28.222222 │
│ M │ 24.000000 │ 28.777778 │
│ H │ 24.555556 │ 18.777778 │
└─────────┴───────────┴───────────┘
Passing Deferred
objects to values_agg
is supported
>>> warpbreaks.pivot_wider(
... names_from= "tension" ,
... values_from= "breaks" ,
... values_agg= _.sum (),
... ).order_by("wool" )
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ wool ┃ L ┃ M ┃ H ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ int64 │ int64 │ int64 │
├────────┼───────┼───────┼───────┤
│ A │ 401 │ 216 │ 221 │
│ B │ 254 │ 259 │ 169 │
└────────┴───────┴───────┴───────┘
Use a custom aggregate function
>>> warpbreaks.pivot_wider(
... names_from= "wool" ,
... values_from= "breaks" ,
... values_agg= lambda col: col.std() / col.mean(),
... )
┏━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ tension ┃ A ┃ B ┃
┡━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ float64 │ float64 │
├─────────┼──────────┼──────────┤
│ L │ 0.406183 │ 0.349325 │
│ M │ 0.360844 │ 0.327719 │
│ H │ 0.418344 │ 0.260590 │
└─────────┴──────────┴──────────┘
Generate some random data, setting the random seed for reproducibility
>>> import random
>>> random.seed(0 )
>>> raw = ibis.memtable(
... [
... dict (
... product= product,
... country= country,
... year= year,
... production= random.random(),
... )
... for product in "AB"
... for country in ["AI" , "EI" ]
... for year in range (2000 , 2015 )
... ]
... )
>>> production = raw.filter (((_.product == "A" ) & (_.country == "AI" )) | (_.product == "B" ))
>>> production
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ product ┃ country ┃ year ┃ production ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│ string │ string │ int64 │ float64 │
├─────────┼─────────┼───────┼────────────┤
│ B │ AI │ 2000 │ 0.477010 │
│ B │ AI │ 2001 │ 0.865310 │
│ B │ AI │ 2002 │ 0.260492 │
│ B │ AI │ 2003 │ 0.805028 │
│ B │ AI │ 2004 │ 0.548699 │
│ B │ AI │ 2005 │ 0.014042 │
│ B │ AI │ 2006 │ 0.719705 │
│ B │ AI │ 2007 │ 0.398824 │
│ B │ AI │ 2008 │ 0.824845 │
│ B │ AI │ 2009 │ 0.668153 │
│ … │ … │ … │ … │
└─────────┴─────────┴───────┴────────────┘
Pivoting with multiple name columns
>>> production.pivot_wider(
... names_from= ["product" , "country" ],
... values_from= "production" ,
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year ┃ B_AI ┃ B_EI ┃ A_AI ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64 │ float64 │ float64 │
├───────┼──────────┼──────────┼──────────┤
│ 2004 │ 0.548699 │ 0.967540 │ 0.511275 │
│ 2007 │ 0.398824 │ 0.080446 │ 0.303313 │
│ 2011 │ 0.493578 │ 0.109058 │ 0.504687 │
│ 2001 │ 0.865310 │ 0.191067 │ 0.757954 │
│ 2003 │ 0.805028 │ 0.238616 │ 0.258917 │
│ 2009 │ 0.668153 │ 0.507941 │ 0.583382 │
│ 2006 │ 0.719705 │ 0.447970 │ 0.783799 │
│ 2002 │ 0.260492 │ 0.567511 │ 0.420572 │
│ 2000 │ 0.477010 │ 0.870471 │ 0.844422 │
│ 2010 │ 0.001143 │ 0.932834 │ 0.908113 │
│ … │ … │ … │ … │
└───────┴──────────┴──────────┴──────────┘
Select a subset of names. This call incurs no computation when constructing the expression.
>>> production.pivot_wider(
... names_from= ["product" , "country" ],
... names= [("A" , "AI" ), ("B" , "AI" )],
... values_from= "production" ,
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year ┃ A_AI ┃ B_AI ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64 │ float64 │
├───────┼──────────┼──────────┤
│ 2012 │ 0.281838 │ 0.867603 │
│ 2000 │ 0.844422 │ 0.477010 │
│ 2010 │ 0.908113 │ 0.001143 │
│ 2014 │ 0.618369 │ 0.325204 │
│ 2006 │ 0.783799 │ 0.719705 │
│ 2008 │ 0.476597 │ 0.824845 │
│ 2001 │ 0.757954 │ 0.865310 │
│ 2003 │ 0.258917 │ 0.805028 │
│ 2009 │ 0.583382 │ 0.668153 │
│ 2002 │ 0.420572 │ 0.260492 │
│ … │ … │ … │
└───────┴──────────┴──────────┘
Sort the new columns’ names
>>> production.pivot_wider(
... names_from= ["product" , "country" ],
... values_from= "production" ,
... names_sort= True ,
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year ┃ A_AI ┃ B_AI ┃ B_EI ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64 │ float64 │ float64 │
├───────┼──────────┼──────────┼──────────┤
│ 2012 │ 0.281838 │ 0.867603 │ 0.551267 │
│ 2006 │ 0.783799 │ 0.719705 │ 0.447970 │
│ 2008 │ 0.476597 │ 0.824845 │ 0.320055 │
│ 2000 │ 0.844422 │ 0.477010 │ 0.870471 │
│ 2010 │ 0.908113 │ 0.001143 │ 0.932834 │
│ 2014 │ 0.618369 │ 0.325204 │ 0.547441 │
│ 2002 │ 0.420572 │ 0.260492 │ 0.567511 │
│ 2001 │ 0.757954 │ 0.865310 │ 0.191067 │
│ 2003 │ 0.258917 │ 0.805028 │ 0.238616 │
│ 2009 │ 0.583382 │ 0.668153 │ 0.507941 │
│ … │ … │ … │ … │
└───────┴──────────┴──────────┴──────────┘
relabel
relabel(substitutions)
Deprecated in favor of Table.rename
.
relocate
relocate(*columns, before=None, after=None, **kwargs)
Relocate columns
before or after other specified columns.
Parameters
columns
str | s
.Selector
Columns to relocate. Selectors are accepted.
()
before
str | s
.Selector
| None
A column name or selector to insert the new columns before.
None
after
str | s
.Selector
| None
A column name or selector. Columns in columns
are relocated after the last column selected in after
.
None
kwargs
str
Additional column names to relocate, renaming argument values to keyword argument names.
{}
Returns
Table
A table with the columns relocated.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> import ibis.selectors as s
>>> t = ibis.memtable(dict (a= [1 ], b= [1 ], c= [1 ], d= ["a" ], e= ["a" ], f= ["a" ]))
>>> t
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃ c ┃ d ┃ e ┃ f ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │ int64 │ string │ string │ string │
├───────┼───────┼───────┼────────┼────────┼────────┤
│ 1 │ 1 │ 1 │ a │ a │ a │
└───────┴───────┴───────┴────────┴────────┴────────┘
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ f ┃ a ┃ b ┃ c ┃ d ┃ e ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ string │ int64 │ int64 │ int64 │ string │ string │
├────────┼───────┼───────┼───────┼────────┼────────┤
│ a │ 1 │ 1 │ 1 │ a │ a │
└────────┴───────┴───────┴───────┴────────┴────────┘
>>> t.relocate("a" , after= "c" )
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ b ┃ c ┃ a ┃ d ┃ e ┃ f ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │ int64 │ string │ string │ string │
├───────┼───────┼───────┼────────┼────────┼────────┤
│ 1 │ 1 │ 1 │ a │ a │ a │
└───────┴───────┴───────┴────────┴────────┴────────┘
>>> t.relocate("f" , before= "b" )
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a ┃ f ┃ b ┃ c ┃ d ┃ e ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ string │ int64 │ int64 │ string │ string │
├───────┼────────┼───────┼───────┼────────┼────────┤
│ 1 │ a │ 1 │ 1 │ a │ a │
└───────┴────────┴───────┴───────┴────────┴────────┘
>>> t.relocate("a" , after= s.last())
┏━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ b ┃ c ┃ d ┃ e ┃ f ┃ a ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │ string │ string │ string │ int64 │
├───────┼───────┼────────┼────────┼────────┼───────┤
│ 1 │ 1 │ a │ a │ a │ 1 │
└───────┴───────┴────────┴────────┴────────┴───────┘
Relocate allows renaming
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ ff ┃ a ┃ b ┃ c ┃ d ┃ e ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ string │ int64 │ int64 │ int64 │ string │ string │
├────────┼───────┼───────┼───────┼────────┼────────┤
│ a │ 1 │ 1 │ 1 │ a │ a │
└────────┴───────┴───────┴───────┴────────┴────────┘
You can relocate based on any predicate selector, such as of_type
>>> t.relocate(s.of_type("string" ))
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ d ┃ e ┃ f ┃ a ┃ b ┃ c ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ int64 │ int64 │
├────────┼────────┼────────┼───────┼───────┼───────┤
│ a │ a │ a │ 1 │ 1 │ 1 │
└────────┴────────┴────────┴───────┴───────┴───────┘
>>> t.relocate(s.numeric(), after= s.last())
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ d ┃ e ┃ f ┃ a ┃ b ┃ c ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ int64 │ int64 │
├────────┼────────┼────────┼───────┼───────┼───────┤
│ a │ a │ a │ 1 │ 1 │ 1 │
└────────┴────────┴────────┴───────┴───────┴───────┘
>>> t.relocate(s.any_of(s.c(* "ae" )))
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a ┃ e ┃ b ┃ c ┃ d ┃ f ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ string │ int64 │ int64 │ string │ string │
├───────┼────────┼───────┼───────┼────────┼────────┤
│ 1 │ a │ 1 │ 1 │ a │ a │
└───────┴────────┴───────┴───────┴────────┴────────┘
When multiple columns are selected with before
or after
, those selected columns are moved before and after the selectors
input
>>> t = ibis.memtable(dict (a= [1 ], b= ["a" ], c= [1 ], d= ["a" ]))
>>> t.relocate(s.numeric(), after= s.of_type("string" ))
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ b ┃ d ┃ a ┃ c ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │ int64 │
├────────┼────────┼───────┼───────┤
│ a │ a │ 1 │ 1 │
└────────┴────────┴───────┴───────┘
>>> t.relocate(s.numeric(), before= s.of_type("string" ))
┏━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a ┃ c ┃ b ┃ d ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │ string │ string │
├───────┼───────┼────────┼────────┤
│ 1 │ 1 │ a │ a │
└───────┴───────┴────────┴────────┘
When there are duplicate renames in a call to relocate, the last one is preserved
>>> t.relocate(e= "d" , f= "d" )
┏━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ f ┃ a ┃ b ┃ c ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ int64 │ string │ int64 │
├────────┼───────┼────────┼───────┤
│ a │ 1 │ a │ 1 │
└────────┴───────┴────────┴───────┘
However, if there are duplicates that are not part of a rename, the order specified in the relocate call is preserved
>>> t.relocate(
... "b" ,
... s.of_type("string" ), # "b" is a string column, so the selector matches
... )
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ b ┃ d ┃ a ┃ c ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │ int64 │
├────────┼────────┼───────┼───────┤
│ a │ a │ 1 │ 1 │
└────────┴────────┴───────┴───────┘
rename
rename(method=None, /, **substitutions)
Rename columns in the table.
Parameters
method
str | Callable [[str ], str | None] | Literal [‘snake_case’, ‘ALL_CAPS’] | Mapping [str , str ] | None
An optional method for renaming columns. May be one of: - A format string to use to rename all columns, like "prefix_{name}"
. - A function from old name to new name. If the function returns None
the old name is used. - The literal strings "snake_case"
or "ALL_CAPS"
to rename all columns using a snake_case
or "ALL_CAPS"
naming convention respectively. - A mapping from new name to old name. Existing columns not present in the mapping will passthrough with their original name.
None
substitutions
str
Columns to be explicitly renamed, expressed as new_name=old_name
keyword arguments.
{}
Returns
Table
A renamed table expression
Examples
>>> import ibis
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> first3 = s.r[:3 ] # first 3 columns
>>> t = ibis.examples.penguins_raw_raw.fetch().select(first3)
>>> t
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ studyName ┃ Sample Number ┃ Species ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ string │
├───────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708 │ 1 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 2 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 3 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 4 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 5 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 6 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 7 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 8 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 9 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 10 │ Adelie Penguin (Pygoscelis adeliae) │
│ … │ … │ … │
└───────────┴───────────────┴─────────────────────────────────────┘
Rename specific columns by passing keyword arguments like new_name="old_name"
>>> t.rename(study_name= "studyName" ).head(1 )
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ study_name ┃ Sample Number ┃ Species ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ string │
├────────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708 │ 1 │ Adelie Penguin (Pygoscelis adeliae) │
└────────────┴───────────────┴─────────────────────────────────────┘
Rename all columns using a format string
>>> t.rename("p_ {name} " ).head(1 )
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ p_studyName ┃ p_Sample Number ┃ p_Species ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ string │
├─────────────┼─────────────────┼─────────────────────────────────────┤
│ PAL0708 │ 1 │ Adelie Penguin (Pygoscelis adeliae) │
└─────────────┴─────────────────┴─────────────────────────────────────┘
Rename all columns using a snake_case convention
>>> t.rename("snake_case" ).head(1 )
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ study_name ┃ sample_number ┃ species ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ string │
├────────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708 │ 1 │ Adelie Penguin (Pygoscelis adeliae) │
└────────────┴───────────────┴─────────────────────────────────────┘
Rename all columns using an ALL_CAPS convention
>>> t.rename("ALL_CAPS" ).head(1 )
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ STUDY_NAME ┃ SAMPLE_NUMBER ┃ SPECIES ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ string │
├────────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708 │ 1 │ Adelie Penguin (Pygoscelis adeliae) │
└────────────┴───────────────┴─────────────────────────────────────┘
Rename all columns using a callable
>>> t.rename(str .upper).head(1 )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ STUDYNAME ┃ SAMPLE NUMBER ┃ SPECIES ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ string │
├───────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708 │ 1 │ Adelie Penguin (Pygoscelis adeliae) │
└───────────┴───────────────┴─────────────────────────────────────┘
rowid
rowid()
A unique integer per row.
Any further meaning behind this expression is backend dependent. Generally this corresponds to some index into the database storage (for example, SQLite and DuckDB’s rowid
).
For a monotonically increasing row number, see ibis.row_number
.
Returns
IntegerColumn
An integer column
sample
sample(fraction, *, method='row', seed=None)
Sample a fraction of rows from a table.
Sampling is by definition a random operation. Some backends support specifying a seed
for repeatable results, but not all backends support that option. And some backends (duckdb, for example) do support specifying a seed but may still not have repeatable results in all cases.
In all cases, results are backend-specific. An execution against one backend is unlikely to sample the same rows when executed against a different backend, even with the same seed
set.
Parameters
fraction
float
The percentage of rows to include in the sample, expressed as a float between 0 and 1.
required
method
Literal [‘row’, ‘block’]
The sampling method to use. The default is “row”, which includes each row with a probability of fraction
. If method is “block”, some backends may instead perform sampling a fraction of blocks of rows (where “block” is a backend dependent definition). This is identical to “row” for backends lacking a blockwise sampling implementation. For those coming from SQL, “row” and “block” correspond to “bernoulli” and “system” respectively in a TABLESAMPLE clause.
'row'
seed
int | None
An optional random seed to use, for repeatable sampling. The range of possible seed values is backend specific (most support at least [0, 2**31 - 1]
). Backends that never support specifying a seed for repeatable sampling will error appropriately. Note that some backends (like DuckDB) do support specifying a seed, but may still not have repeatable results in all cases.
None
Returns
Table
The input table, with fraction
of rows selected.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"x" : [1 , 2 , 3 , 4 ], "y" : ["a" , "b" , "c" , "d" ]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ x ┃ y ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ a │
│ 2 │ b │
│ 3 │ c │
│ 4 │ d │
└───────┴────────┘
Sample approximately half the rows, with a seed specified for reproducibility.
>>> t.sample(0.5 , seed= 1234 )
┏━━━━━━━┳━━━━━━━━┓
┃ x ┃ y ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 2 │ b │
│ 3 │ c │
└───────┴────────┘
schema
schema()
Return the Schema for this table.
Returns
Schema
The table’s schema.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.schema()
ibis.Schema {
species string
island string
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm int64
body_mass_g int64
sex string
year int64
}
select
select(*exprs, **named_exprs)
Compute a new table expression using exprs
and named_exprs
.
Passing an aggregate function to this method will broadcast the aggregate’s value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.
For backwards compatibility the keyword argument exprs
is reserved and cannot be used to name an expression. This behavior will be removed in v4.
Parameters
exprs
ir
.Value
| str | Iterable [ir
.Value
| str ]
Column expression, string, or list of column expressions and strings.
()
named_exprs
ir
.Value
| str
Column expressions
{}
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Simple projection
>>> t.select("island" , "bill_length_mm" ).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island ┃ bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼────────────────┤
│ Torgersen │ 39.1 │
│ Torgersen │ 39.5 │
│ Torgersen │ 40.3 │
│ Torgersen │ NULL │
│ Torgersen │ 36.7 │
└───────────┴────────────────┘
In that simple case, you could also just use python’s indexing syntax
>>> t[["island" , "bill_length_mm" ]].head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island ┃ bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼────────────────┤
│ Torgersen │ 39.1 │
│ Torgersen │ 39.5 │
│ Torgersen │ 40.3 │
│ Torgersen │ NULL │
│ Torgersen │ 36.7 │
└───────────┴────────────────┘
Projection by zero-indexed column position
>>> t.select(0 , 4 ).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ flipper_length_mm ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │
├─────────┼───────────────────┤
│ Adelie │ 181 │
│ Adelie │ 186 │
│ Adelie │ 195 │
│ Adelie │ NULL │
│ Adelie │ 193 │
└─────────┴───────────────────┘
Projection with renaming and compute in one call
>>> t.select(next_year= t.year + 1 ).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64 │
├───────────┤
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
└───────────┘
You can do the same thing with a named expression, and using the deferred API
>>> from ibis import _
>>> t.select((_.year + 1 ).name("next_year" )).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64 │
├───────────┤
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
└───────────┘
Projection with aggregation expressions
>>> t.select("island" , bill_mean= t.bill_length_mm.mean()).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ island ┃ bill_mean ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼───────────┤
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
└───────────┴───────────┘
Projection with a selector
>>> import ibis.selectors as s
>>> t.select(s.numeric() & ~ s.c("year" )).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64 │ float64 │ int64 │ int64 │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│ 39.1 │ 18.7 │ 181 │ 3750 │
│ 39.5 │ 17.4 │ 186 │ 3800 │
│ 40.3 │ 18.0 │ 195 │ 3250 │
│ NULL │ NULL │ NULL │ NULL │
│ 36.7 │ 19.3 │ 193 │ 3450 │
└────────────────┴───────────────┴───────────────────┴─────────────┘
Projection + aggregation across multiple columns
>>> from ibis import _
>>> t.select(s.across(s.numeric() & ~ s.c("year" ), _.mean())).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64 │ float64 │ float64 │ float64 │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
└────────────────┴───────────────┴───────────────────┴─────────────┘
sql
sql(query, dialect=None)
Run a SQL query against a table expression.
Parameters
query
str
Query string
required
dialect
str | None
Optional string indicating the dialect of query
. Defaults to the backend’s native dialect.
None
Returns
Table
An opaque table expression
Examples
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch(table_name= "penguins" )
>>> expr = t.sql(
... """
... SELECT island, mean(bill_length_mm) AS avg_bill_length
... FROM penguins
... GROUP BY 1
... ORDER BY 2 DESC
... """
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island ┃ avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼─────────────────┤
│ Biscoe │ 45.257485 │
│ Dream │ 44.167742 │
│ Torgersen │ 38.950980 │
└───────────┴─────────────────┘
Mix and match ibis expressions with SQL queries
>>> t = ibis.examples.penguins.fetch(table_name= "penguins" )
>>> expr = t.sql(
... """
... SELECT island, mean(bill_length_mm) AS avg_bill_length
... FROM penguins
... GROUP BY 1
... ORDER BY 2 DESC
... """
... )
>>> expr = expr.mutate(
... island= _.island.lower(),
... avg_bill_length= _.avg_bill_length.round (1 ),
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island ┃ avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼─────────────────┤
│ biscoe │ 45.3 │
│ dream │ 44.2 │
│ torgersen │ 39.0 │
└───────────┴─────────────────┘
Because ibis expressions aren’t named, they aren’t visible to subsequent .sql
calls. Use the alias
method to assign a name to an expression.
>>> expr.alias("b" ).sql("SELECT * FROM b WHERE avg_bill_length > 40" )
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island ┃ avg_bill_length ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├────────┼─────────────────┤
│ biscoe │ 45.3 │
│ dream │ 44.2 │
└────────┴─────────────────┘
to_array
to_array()
View a single column table as an array.
Returns
Value
A single column view of a table
to_pandas
to_pandas(**kwargs)
Convert a table expression to a pandas DataFrame.
Parameters
kwargs
Same as keyword arguments to execute
{}
try_cast
try_cast(schema)
Cast the columns of a table.
If the cast fails for a row, the value is returned as NULL
or NaN
depending on backend behavior.
Parameters
schema
SupportsSchema
Mapping, schema or iterable of pairs to use for casting
required
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : ["1" , "2" , "3" ], "b" : ["2.2" , "3.3" , "book" ]})
>>> t.try_cast({"a" : "int" , "b" : "float" })
┏━━━━━━━┳━━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━━┩
│ int64 │ float64 │
├───────┼─────────┤
│ 1 │ 2.2 │
│ 2 │ 3.3 │
│ 3 │ NULL │
└───────┴─────────┘
union
union(table, *rest, distinct=False)
Compute the set union of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
Table
A table expression
required
*rest
Table
Additional table expressions
()
distinct
bool
Only return distinct rows
False
Returns
Table
A new table containing the union of all input tables.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a" : [1 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t2 = ibis.memtable({"a" : [2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
└───────┘
>>> t1.union(t2) # union all by default
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
│ 2 │
│ 3 │
└───────┘
>>> t1.union(t2, distinct= True ).order_by("a" )
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
│ 3 │
└───────┘
unpack
unpack(*columns)
Project the struct fields of each of columns
into self
.
Existing fields are retained in the projection.
Parameters
columns
str
String column names to project into self
.
()
Returns
Table
The child table with struct fields of each of columns
projected.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> lines = '''
... {"name": "a", "pos": {"lat": 10.1, "lon": 30.3 }}
... {"name": "b", "pos": {"lat": 10.2, "lon": 30.2 }}
... {"name": "c", "pos": {"lat": 10.3, "lon": 30.1 }}
... '''
>>> with open ("/tmp/lines.json" , "w" ) as f:
... nbytes = f.write(lines) # nbytes is unused
>>> t = ibis.read_json("/tmp/lines.json" )
>>> t
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ name ┃ pos ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ struct<lat: float64, lon: float64> │
├────────┼────────────────────────────────────┤
│ a │ { 'lat' : 10.1 , 'lon' : 30.3 } │
│ b │ { 'lat' : 10.2 , 'lon' : 30.2 } │
│ c │ { 'lat' : 10.3 , 'lon' : 30.1 } │
└────────┴────────────────────────────────────┘
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ lat ┃ lon ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ string │ float64 │ float64 │
├────────┼─────────┼─────────┤
│ a │ 10.1 │ 30.3 │
│ b │ 10.2 │ 30.2 │
│ c │ 10.3 │ 30.1 │
└────────┴─────────┴─────────┘
view
view()
Create a new table expression distinct from the current one.
Use this API for any self-referencing operations like a self-join.
window_by
window_by(time_col)
Create a windowing table-valued function (TVF) expression.
Windowing table-valued functions (TVF) assign rows of a table to windows based on a time attribute column in the table.
Parameters
time_col
ir
.Value
Column of the table that will be mapped to windows.
required
Returns
WindowedTable
WindowedTable expression.
GroupedTable
GroupedTable(self, table, by, having=None, order_by=None, **expressions)
An intermediate table expression to hold grouping information.
Methods
aggregate
Compute aggregates over a group by.
count
Computing the number of rows per group.
having
Add a post-aggregation result filter expr
.
mutate
Return a table projection with window functions applied.
order_by
Sort a grouped table expression by expr
.
over
Apply a window over the input expressions.
select
Project new columns out of the grouped table.
aggregate
aggregate(metrics=None, **kwds)
Compute aggregates over a group by.
count
count()
Computing the number of rows per group.
Returns
Table
The aggregated table
having
having(expr)
Add a post-aggregation result filter expr
.
Parameters
expr
ir
.BooleanScalar
An expression that filters based on an aggregate value.
required
mutate
mutate(*exprs, **kwexprs)
Return a table projection with window functions applied.
Any arguments can be functions.
Parameters
exprs
ir
.Value
| Sequence [ir
.Value
]
List of expressions
()
kwexprs
ir
.Value
Expressions
{}
Examples
>>> import ibis
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> (
... t.select("species" , "bill_length_mm" )
... .group_by("species" )
... .mutate(centered_bill_len= ibis._.bill_length_mm - ibis._.bill_length_mm.mean())
... .order_by(s.all ())
... )
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ bill_length_mm ┃ centered_bill_len ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string │ float64 │ float64 │
├─────────┼────────────────┼───────────────────┤
│ Adelie │ 32.1 │ -6.691391 │
│ Adelie │ 33.1 │ -5.691391 │
│ Adelie │ 33.5 │ -5.291391 │
│ Adelie │ 34.0 │ -4.791391 │
│ Adelie │ 34.1 │ -4.691391 │
│ Adelie │ 34.4 │ -4.391391 │
│ Adelie │ 34.5 │ -4.291391 │
│ Adelie │ 34.6 │ -4.191391 │
│ Adelie │ 34.6 │ -4.191391 │
│ Adelie │ 35.0 │ -3.791391 │
│ … │ … │ … │
└─────────┴────────────────┴───────────────────┘
Returns
Table
A table expression with window functions applied
order_by
order_by(expr)
Sort a grouped table expression by expr
.
Notes
This API call is ignored in aggregations.
Parameters
expr
ir
.Value
| Iterable [ir
.Value
]
Expressions to order the results by
required
over
over(window=None, *, rows=None, range=None, group_by=None, order_by=None)
Apply a window over the input expressions.
Parameters
window
Window to add to the input
None
rows
Whether to use the ROWS
window clause
None
range
Whether to use the RANGE
window clause
None
group_by
Grouping key
None
order_by
Ordering key
None
select
select(*exprs, **kwexprs)
Project new columns out of the grouped table.
read_csv
ibis.read_csv(sources, table_name=None, **kwargs)
Lazily load a CSV or set of CSVs.
This function delegates to the read_csv
method on the current default backend (DuckDB or ibis.config.default_backend
).
Parameters
sources
str | Path | Sequence [str | Path ]
A filesystem path or URL or list of same. Supports CSV and TSV files.
required
table_name
str | None
A name to refer to the table. If not provided, a name will be generated.
None
kwargs
Any
Backend-specific keyword arguments for the file type. For the DuckDB backend used by default, please refer to: * CSV/TSV: https://duckdb.org/docs/data/csv#parameters.
{}
Returns
ir
.Table
Table expression representing a file
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> lines = '''a,b
... 1,d
... 2,
... ,f
... '''
>>> with open ("/tmp/lines.csv" , mode= "w" ) as f:
... nbytes = f.write(lines) # nbytes is unused
>>> t = ibis.read_csv("/tmp/lines.csv" )
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ d │
│ 2 │ NULL │
│ NULL │ f │
└───────┴────────┘
read_delta
ibis.read_delta(source, table_name=None, **kwargs)
Lazily load a Delta Lake table.
Parameters
source
str | Path
A filesystem path or URL.
required
table_name
str | None
A name to refer to the table. If not provided, a name will be generated.
None
kwargs
Any
Backend-specific keyword arguments for the file type.
{}
Returns
ir
.Table
Table expression representing a file
Examples
>>> import ibis
>>> import pandas as pd
>>> ibis.options.interactive = True
>>> df = pd.DataFrame({"a" : [1 , 2 , 3 ], "b" : list ("ghi" )})
>>> df
>>> import deltalake as dl
>>> dl.write_deltalake("/tmp/data.delta" , df, mode= "overwrite" )
>>> t = ibis.read_delta("/tmp/data.delta" )
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ g │
│ 2 │ h │
│ 3 │ i │
└───────┴────────┘
read_json
ibis.read_json(sources, table_name=None, **kwargs)
Lazily load newline-delimited JSON data.
This function delegates to the read_json
method on the current default backend (DuckDB or ibis.config.default_backend
).
Parameters
sources
str | Path | Sequence [str | Path ]
A filesystem path or URL or list of same.
required
table_name
str | None
A name to refer to the table. If not provided, a name will be generated.
None
kwargs
Any
Backend-specific keyword arguments for the file type. See https://duckdb.org/docs/extensions/json.html for details.
{}
Returns
ir
.Table
Table expression representing a file
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> lines = '''
... {"a": 1, "b": "d"}
... {"a": 2, "b": null}
... {"a": null, "b": "f"}
... '''
>>> with open ("/tmp/lines.json" , mode= "w" ) as f:
... nbytes = f.write(lines) # nbytes is unused
>>> t = ibis.read_json("/tmp/lines.json" )
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ d │
│ 2 │ NULL │
│ NULL │ f │
└───────┴────────┘
read_parquet
ibis.read_parquet(sources, table_name=None, **kwargs)
Lazily load a parquet file or set of parquet files.
This function delegates to the read_parquet
method on the current default backend (DuckDB or ibis.config.default_backend
).
Parameters
sources
str | Path | Sequence [str | Path ]
A filesystem path or URL or list of same.
required
table_name
str | None
A name to refer to the table. If not provided, a name will be generated.
None
kwargs
Any
Backend-specific keyword arguments for the file type. For the DuckDB backend used by default, please refer to: * Parquet: https://duckdb.org/docs/data/parquet
{}
Returns
ir
.Table
Table expression representing a file
Examples
>>> import ibis
>>> import pandas as pd
>>> ibis.options.interactive = True
>>> df = pd.DataFrame({"a" : [1 , 2 , 3 ], "b" : list ("ghi" )})
>>> df
>>> df.to_parquet("/tmp/data.parquet" )
>>> t = ibis.read_parquet("/tmp/data.parquet" )
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ g │
│ 2 │ h │
│ 3 │ i │
└───────┴────────┘
memtable
ibis.memtable(data, *, columns=None, schema=None, name=None)
Construct an ibis table expression from in-memory data.
Parameters
data
Any data accepted by the pandas.DataFrame
constructor or a pyarrow.Table
. Examples of acceptable objects are a pandas.DataFrame
, a pyarrow.Table
, a list of dicts of non-ibis Python objects, etc. ibis
objects, like MapValue
, will result in an error. Do not depend on the underlying storage type (e.g., pyarrow.Table), it’s subject to change across non-major releases.
required
columns
Iterable [str ] | None
Optional typing.Iterable
of str
column names. If provided, must match the number of columns in data
.
None
schema
SupportsSchema
| None
Optional Schema
. The functions use data
to infer a schema if not passed.
None
name
str | None
Optional name of the table.
None
Returns
Table
A table expression backed by in-memory data.
Examples
>>> import ibis
>>> t = ibis.memtable([{"a" : 1 }, {"a" : 2 }])
>>> t
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t = ibis.memtable([{"a" : 1 , "b" : "foo" }, {"a" : 2 , "b" : "baz" }])
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ foo │
│ 2 │ baz │
└───────┴────────┘
Create a table literal without column names embedded in the data and pass columns
>>> t = ibis.memtable([(1 , "foo" ), (2 , "baz" )], columns= ["a" , "b" ])
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ foo │
│ 2 │ baz │
└───────┴────────┘
Create a table literal without column names embedded in the data. Ibis generates column names if none are provided.
>>> t = ibis.memtable([(1 , "foo" ), (2 , "baz" )])
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ col0 ┃ col1 ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ foo │
│ 2 │ baz │
└───────┴────────┘
table
ibis.table(schema=None, name=None)
Create a table literal or an abstract table without data.
Parameters
schema
SupportsSchema
| None
A schema for the table
None
name
str | None
Name for the table. One is generated if this value is None
.
None
Examples
Create a table with no data backing it
>>> import ibis
>>> ibis.options.interactive = False
>>> t = ibis.table(schema= dict (a= "int" , b= "string" ), name= "t" )
>>> t
UnboundTable: t
a int64
b string
difference
ibis.difference(table, *rest, distinct=True)
Compute the set difference of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
ir
.Table
A table expression
required
*rest
ir
.Table
Additional table expressions
()
distinct
bool
Only diff distinct rows not occurring in the calling table
True
Returns
Table
The rows present in self
that are not present in tables
.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a" : [1 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t2 = ibis.memtable({"a" : [2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
└───────┘
>>> ibis.difference(t1, t2)
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
└───────┘
intersect
ibis.intersect(table, *rest, distinct=True)
Compute the set intersection of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
ir
.Table
A table expression
required
*rest
ir
.Table
Additional table expressions
()
distinct
bool
Only return distinct rows
True
Returns
Table
A new table containing the intersection of all input tables.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a" : [1 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t2 = ibis.memtable({"a" : [2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
└───────┘
>>> ibis.intersect(t1, t2)
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
└───────┘
union
ibis.union(table, *rest, distinct=False)
Compute the set union of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
ir
.Table
A table expression
required
*rest
ir
.Table
Additional table expressions
()
distinct
bool
Only return distinct rows
False
Returns
Table
A new table containing the union of all input tables.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a" : [1 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t2 = ibis.memtable({"a" : [2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
└───────┘
>>> ibis.union(t1, t2) # union all by default
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
│ 2 │
│ 3 │
└───────┘
>>> ibis.union(t1, t2, distinct= True ).order_by("a" )
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
│ 3 │
└───────┘
join
ibis.join(left, right, predicates=(), how='inner', *, lname='', rname='{name}_right')
Perform a join between two tables.
Parameters
left
Table
Left table to join
required
right
Table
Right table to join
required
predicates
str | Sequence [str | ir
.BooleanColumn
| Literal [True] | Literal [False] | tuple [str | ir
.Column
| ir
.Deferred
, str | ir
.Column
| ir
.Deferred
]]
Condition(s) to join on. See examples for details.
()
how
Literal [‘inner’, ‘left’, ‘outer’, ‘right’, ‘semi’, ‘anti’, ‘any_inner’, ‘any_left’, ‘left_semi’]
Join method, e.g. "inner"
or "left"
.
'inner'
lname
str
A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"
).
''
rname
str
A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"
).
'{name}_right'
Examples
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> movies = ibis.examples.ml_latest_small_movies.fetch()
>>> movies.head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ movieId ┃ title ┃ genres ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ string │
├─────────┼────────────────────────────────────┼─────────────────────────────────────────────┤
│ 1 │ Toy Story (1995) │ Adventure|Animation|Children|Comedy|Fantasy │
│ 2 │ Jumanji (1995) │ Adventure|Children|Fantasy │
│ 3 │ Grumpier Old Men (1995) │ Comedy|Romance │
│ 4 │ Waiting to Exhale (1995) │ Comedy|Drama|Romance │
│ 5 │ Father of the Bride Part II (1995) │ Comedy │
└─────────┴────────────────────────────────────┴─────────────────────────────────────────────┘
>>> ratings = ibis.examples.ml_latest_small_ratings.fetch().drop("timestamp" )
>>> ratings.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ int64 │ int64 │ float64 │
├────────┼─────────┼─────────┤
│ 1 │ 1 │ 4.0 │
│ 1 │ 3 │ 4.0 │
│ 1 │ 6 │ 4.0 │
│ 1 │ 47 │ 5.0 │
│ 1 │ 50 │ 5.0 │
└────────┴─────────┴─────────┘
Equality left join on the shared movieId
column. Note the _right
suffix added to all overlapping columns from the right table (in this case only the “movieId” column).
>>> ratings.join(movies, "movieId" , how= "left" ).head(5 )
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating ┃ movieId_right ┃ title ┃ genres ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64 │ int64 │ float64 │ int64 │ string │ string │
├────────┼─────────┼─────────┼───────────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│ 1 │ 1 │ 4.0 │ 1 │ Toy Story (1995) │ Adventure|Animation|Children|Comedy|Fantasy │
│ 1 │ 3 │ 4.0 │ 3 │ Grumpier Old Men (1995) │ Comedy|Romance │
│ 1 │ 6 │ 4.0 │ 6 │ Heat (1995) │ Action|Crime|Thriller │
│ 1 │ 47 │ 5.0 │ 47 │ Seven (a.k.a. Se7en) (1995) │ Mystery|Thriller │
│ 1 │ 50 │ 5.0 │ 50 │ Usual Suspects, The (1995) │ Crime|Mystery|Thriller │
└────────┴─────────┴─────────┴───────────────┴─────────────────────────────┴─────────────────────────────────────────────┘
Explicit equality join using the default how
value of "inner"
. Note how there is no _right
suffix added to the movieId
column since this is an inner join and the movieId
column is part of the join condition.
>>> ratings.join(movies, ratings.movieId == movies.movieId).head(5 )
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating ┃ title ┃ genres ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64 │ int64 │ float64 │ string │ string │
├────────┼─────────┼─────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│ 1 │ 1 │ 4.0 │ Toy Story (1995) │ Adventure|Animation|Children|Comedy|Fantasy │
│ 1 │ 3 │ 4.0 │ Grumpier Old Men (1995) │ Comedy|Romance │
│ 1 │ 6 │ 4.0 │ Heat (1995) │ Action|Crime|Thriller │
│ 1 │ 47 │ 5.0 │ Seven (a.k.a. Se7en) (1995) │ Mystery|Thriller │
│ 1 │ 50 │ 5.0 │ Usual Suspects, The (1995) │ Crime|Mystery|Thriller │
└────────┴─────────┴─────────┴─────────────────────────────┴─────────────────────────────────────────────┘
>>> tags = ibis.examples.ml_latest_small_tags.fetch()
>>> tags.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ tag ┃ timestamp ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64 │ int64 │ string │ int64 │
├────────┼─────────┼─────────────────┼────────────┤
│ 2 │ 60756 │ funny │ 1445714994 │
│ 2 │ 60756 │ Highly quotable │ 1445714996 │
│ 2 │ 60756 │ will ferrell │ 1445714992 │
│ 2 │ 89774 │ Boxing story │ 1445715207 │
│ 2 │ 89774 │ MMA │ 1445715200 │
└────────┴─────────┴─────────────────┴────────────┘
You can join on multiple columns/conditions by passing in a sequence. Find all instances where a user both tagged and rated a movie:
>>> tags.join(ratings, ["userId" , "movieId" ]).head(5 )
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃ userId ┃ movieId ┃ tag ┃ timestamp ┃ rating ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ int64 │ int64 │ string │ int64 │ float64 │
├────────┼─────────┼────────────────┼────────────┼─────────┤
│ 62 │ 2 │ Robin Williams │ 1528843907 │ 4.0 │
│ 62 │ 110 │ sword fight │ 1528152535 │ 4.5 │
│ 62 │ 410 │ gothic │ 1525636609 │ 4.5 │
│ 62 │ 2023 │ mafia │ 1525636733 │ 5.0 │
│ 62 │ 2124 │ quirky │ 1525636846 │ 5.0 │
└────────┴─────────┴────────────────┴────────────┴─────────┘
To self-join a table with itself, you need to call .view()
on one of the arguments so the two tables are distinct from each other.
For crafting more complex join conditions, a valid form of a join condition is a 2-tuple like ({left_key}, {right_key})
, where each key can be
a Column
Deferred expression
lambda of the form (Table) -> Column
For example, to find all movies pairings that received the same (ignoring case) tags:
>>> movie_tags = tags["movieId" , "tag" ]
>>> view = movie_tags.view()
>>> movie_tags.join(
... view,
... [
... movie_tags.movieId != view.movieId,
... (_.tag.lower(), lambda t: t.tag.lower()),
... ],
... ).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ movieId ┃ tag ┃ movieId_right ┃ tag_right ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ int64 │ string │
├─────────┼───────────────────┼───────────────┼───────────────────┤
│ 60756 │ funny │ 1732 │ funny │
│ 60756 │ Highly quotable │ 1732 │ Highly quotable │
│ 89774 │ Tom Hardy │ 139385 │ tom hardy │
│ 106782 │ drugs │ 1732 │ drugs │
│ 106782 │ Leonardo DiCaprio │ 5989 │ Leonardo DiCaprio │
└─────────┴───────────────────┴───────────────┴───────────────────┘
row_number
ibis.row_number()
Return an analytic function expression for the current row number.
row_number
is normalized across backends to start at 0
Returns
IntegerColumn
A column expression enumerating rows
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values" : [1 , 2 , 1 , 2 , 3 , 2 ]})
>>> t.mutate(rownum= ibis.row_number())
┏━━━━━━━━┳━━━━━━━━┓
┃ values ┃ rownum ┃
┡━━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │
├────────┼────────┤
│ 1 │ 0 │
│ 2 │ 1 │
│ 1 │ 2 │
│ 2 │ 3 │
│ 3 │ 4 │
│ 2 │ 5 │
└────────┴────────┘
rank
ibis.rank()
Compute position of first element within each equal-value group in sorted order.
Equivalent to SQL’s RANK()
window function.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values" : [1 , 2 , 1 , 2 , 3 , 2 ]})
>>> t.mutate(rank= ibis.rank().over(order_by= t.values))
┏━━━━━━━━┳━━━━━━━┓
┃ values ┃ rank ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├────────┼───────┤
│ 1 │ 0 │
│ 1 │ 0 │
│ 2 │ 2 │
│ 2 │ 2 │
│ 2 │ 2 │
│ 3 │ 5 │
└────────┴───────┘
dense_rank
ibis.dense_rank()
Position of first element within each group of equal values.
Values are returned in sorted order and duplicate values are ignored.
Equivalent to SQL’s DENSE_RANK()
.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values" : [1 , 2 , 1 , 2 , 3 , 2 ]})
>>> t.mutate(rank= ibis.dense_rank().over(order_by= t.values))
┏━━━━━━━━┳━━━━━━━┓
┃ values ┃ rank ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├────────┼───────┤
│ 1 │ 0 │
│ 1 │ 0 │
│ 2 │ 1 │
│ 2 │ 1 │
│ 2 │ 1 │
│ 3 │ 2 │
└────────┴───────┘
percent_rank
ibis.percent_rank()
Return the relative rank of the values in the column.
Returns
FloatingColumn
The percent rank
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values" : [1 , 2 , 1 , 2 , 3 , 2 ]})
>>> t.mutate(pct_rank= ibis.percent_rank().over(order_by= t.values))
┏━━━━━━━━┳━━━━━━━━━━┓
┃ values ┃ pct_rank ┃
┡━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64 │
├────────┼──────────┤
│ 1 │ 0.0 │
│ 1 │ 0.0 │
│ 2 │ 0.4 │
│ 2 │ 0.4 │
│ 2 │ 0.4 │
│ 3 │ 1.0 │
└────────┴──────────┘
cume_dist
ibis.cume_dist()
Return the cumulative distribution over a window.
Returns
FloatingColumn
The cumulative distribution
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values" : [1 , 2 , 1 , 2 , 3 , 2 ]})
>>> t.mutate(dist= ibis.cume_dist().over(order_by= t.values))
┏━━━━━━━━┳━━━━━━━━━━┓
┃ values ┃ dist ┃
┡━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64 │
├────────┼──────────┤
│ 1 │ 0.333333 │
│ 1 │ 0.333333 │
│ 2 │ 0.833333 │
│ 2 │ 0.833333 │
│ 2 │ 0.833333 │
│ 3 │ 1.000000 │
└────────┴──────────┘
ntile
ibis.ntile(buckets)
Return the integer number of a partitioning of the column values.
Parameters
buckets
int | ir
.IntegerValue
Number of buckets to partition into
required
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values" : [1 , 2 , 1 , 2 , 3 , 2 ]})
>>> t.mutate(ntile= ibis.ntile(2 ).over(order_by= t.values))
┏━━━━━━━━┳━━━━━━━┓
┃ values ┃ ntile ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├────────┼───────┤
│ 1 │ 0 │
│ 1 │ 0 │
│ 2 │ 0 │
│ 2 │ 1 │
│ 2 │ 1 │
│ 3 │ 1 │
└────────┴───────┘
window
ibis.window(preceding=None, following=None, order_by=None, group_by=None, *, rows=None, range=None, between=None)
Create a window clause for use with window functions.
The ROWS
window clause includes peer rows based on differences in row number whereas RANGE
includes rows based on the differences in row value of a single order_by
expression.
All window frame bounds are inclusive.
Parameters
preceding
Number of preceding rows in the window
None
following
Number of following rows in the window
None
group_by
Grouping key
None
order_by
Ordering key
None
rows
Whether to use the ROWS
window clause
None
range
Whether to use the RANGE
window clause
None
between
Automatically infer the window kind based on the boundaries
None
cumulative_window
ibis.cumulative_window(group_by=None, order_by=None)
Create a cumulative window for use with window functions.
All window frames / ranges are inclusive.
Parameters
group_by
Grouping key
None
order_by
Ordering key
None
range_window
ibis.range_window(preceding=None, following=None, group_by=None, order_by=None)
Create a range-based window clause for use with window functions.
This RANGE window clause aggregates rows based upon differences in the value of the order-by expression.
All window frames / ranges are inclusive.
Parameters
preceding
Number of preceding rows in the window
None
following
Number of following rows in the window
None
group_by
Grouping key
None
order_by
Ordering key
None
trailing_range_window
ibis.trailing_range_window(preceding, order_by, group_by=None)
Create a trailing range window for use with window functions.
Parameters
preceding
A value expression
required
order_by
Ordering key
required
group_by
Grouping key
None
trailing_window
ibis.trailing_window(preceding, group_by=None, order_by=None)
Create a trailing window for use with window functions.
Parameters
preceding
The number of preceding rows
required
group_by
Grouping key
None
order_by
Ordering key
None
Back to top