PySpark¶
Install¶
Install ibis and dependencies for the PySpark backend:
pip install 'ibis-framework[pyspark]'
conda install -c conda-forge ibis-pyspark
mamba install -c conda-forge ibis-pyspark
Connect¶
API¶
Create a client by passing in PySpark things to ibis.pyspark.connect
.
See ibis.backends.pyspark.Backend.do_connect
for connection parameter information.
ibis.pyspark.connect
is a thin wrapper around ibis.backends.pyspark.Backend.do_connect
.
Connection Parameters¶
do_connect(session)
¶
Create a PySpark Backend
for use with Ibis.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
session |
SparkSession
|
A SparkSession instance |
required |
Examples:
>>> import ibis
>>> from pyspark.sql import SparkSession
>>> session = SparkSession.builder.getOrCreate()
>>> ibis.pyspark.connect(session)
<ibis.backends.pyspark.Backend at 0x...>
Backend API¶
Backend
¶
Bases: BaseSQLBackend
Classes¶
Options
¶
Bases: ibis.config.Config
PySpark options.
Attributes:
Name | Type | Description |
---|---|---|
treat_nan_as_null |
bool
|
Treat NaNs in floating point expressions as NULL. |
Functions¶
close()
¶
Close Spark connection and drop any temporary objects.
compile(expr, timecontext=None, params=None, *args, **kwargs)
¶
Compile an ibis expression to a PySpark DataFrame object.
compute_stats(name, database=None, noscan=False)
¶
create_database(name, path=None, force=False)
¶
create_table(table_name, obj=None, schema=None, database=None, force=False, format='parquet')
¶
Create a new table in Spark.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
Table name |
required |
obj |
ir.Table | pd.DataFrame | None
|
If passed, creates table from select statement results |
None
|
schema |
sch.Schema | None
|
Mutually exclusive with obj, creates an empty table with a schema |
None
|
database |
str | None
|
Database name |
None
|
force |
bool
|
If true, create table if table with indicated name already exists |
False
|
format |
str
|
Table format |
'parquet'
|
Examples:
>>> con.create_table('new_table_name', table_expr)
create_view(name, expr, database=None, can_exist=False, temporary=False)
¶
Create a Spark view from a table expression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
View name |
required |
expr |
ir.Table
|
Expression to use for the view |
required |
database |
str | None
|
Database name |
None
|
can_exist |
bool
|
Replace an existing view of the same name if it exists |
False
|
temporary |
bool
|
Whether the table is temporary |
False
|
drop_database(name, force=False)
¶
drop_table(name, database=None, force=False)
¶
Drop a table.
drop_table_or_view(name, database=None, force=False)
¶
Drop a Spark table or view.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Table or view name |
required |
database |
str | None
|
Database name |
None
|
force |
bool
|
Database may throw exception if table does not exist |
False
|
Examples:
>>> table = 'my_table'
>>> db = 'operations'
>>> con.drop_table_or_view(table, db, force=True)
drop_view(name, database=None, force=False)
¶
Drop a view.
execute(expr, **kwargs)
¶
Execute an expression.
insert(table_name, obj=None, database=None, overwrite=False, values=None, validate=True)
¶
Insert data into an existing table.
Examples:
>>> table = 'my_table'
>>> con.insert(table, table_expr)
Completely overwrite contents¶
>>> con.insert(table, table_expr, overwrite=True)