Skip to content

Release Notes

6.2.0 (2023-08-31)

Features

  • trino: add source application to trino backend (cf5fdb9)

Bug Fixes

  • bigquery,impala: escape all ASCII escape sequences in string literals (402f5ca)
  • bigquery: correctly escape ASCII escape sequences in regex patterns (a455203)
  • release: pin conventional-changelog-conventionalcommits to 6.1.0 (d6526b8)
  • trino: ensure that list_databases look at all catalogs not just the current one (cfbdbf1)
  • trino: override incorrect base sqlalchemy list_schemas implementation (84d38a1)

Documentation

  • trino: add connection docstring (507a00e)

6.1.0 (2023-08-03)

Features

  • api: add ibis.dtype top-level API (867e5f1)
  • api: add table.nunique() for counting unique table rows (adcd762)
  • api: allow mixing literals and columns in ibis.array (3355dd8)
  • api: improve efficiency of __dataframe__ protocol (15e27da)
  • api: support boolean literals in join API (c56376f)
  • arrays: add concat method equivalent to __add__/__radd__ (0ed0ab1)
  • arrays: add repeat method equivalent to __mul__/__rmul__ (b457c7b)
  • backends: add current_schema API (955a9d0)
  • bigquery: fill out CREATE TABLE DDL options including support for overwrite (5dac7ec)
  • datafusion: add count_distinct, median, approx_median, stddev and var aggregations (45089c4)
  • datafusion: add extract url fields functions (4f5ea98)
  • datafusion: add functions sign, power, nullifzero, log (ef72e40)
  • datafusion: add RegexSearch, StringContains and StringJoin (4edaab5)
  • datafusion: implement in-memory table (d4ec5c2)
  • flink: add tests and translation rules for additional operators (fc2aa5d)
  • flink: implement translation rules and tests for over aggregation in Flink backend (e173cd7)
  • flink: implement translation rules for literal expressions in flink compiler (a8f4880)
  • improved error messages when missing backend dependencies (2fe851b)
  • make output of to_sql a proper str subclass (084bdb9)
  • pandas: add ExtractURLField functions (e369333)
  • polars: implement ops.SelfReference (983e393)
  • pyspark: read/write delta tables (d403187)
  • refactor ddl for create_database and add create_schema where relevant (d7a857c)
  • sqlite: add scalar python udf support to sqlite (92f29e6)
  • sqlite: implement extract url field functions (cb1956f)
  • trino: implement support for .sql table expression method (479bc60)
  • trino: support table properties when creating a table (b9d65ef)

Bug Fixes

  • api: allow scalar window order keys (3d3f4f3)
  • backends: make current_database implementation and API consistent across all backends (eeeeee0)
  • bigquery: respect the fully qualified table name at the init (a25f460)
  • clickhouse: check dispatching instead of membership in the registry for has_operation (acb7f3f)
  • datafusion: always quote column names to prevent datafusion from normalizing case (310db2b)
  • deps: update dependency datafusion to v27 (3a311cd)
  • druid: handle conversion issues from string, binary, and timestamp (b632063)
  • duckdb: avoid double escaping backslashes for bind parameters (8436f57)
  • duckdb: cast read_only to string for connection (27e17d6)
  • duckdb: deduplicate results from list_schemas() (172520e)
  • duckdb: ensure that current_database returns the correct value (2039b1e)
  • duckdb: handle conversion from duckdb_engine unsigned int aliases (e6fd0cc)
  • duckdb: map hugeint to decimal to avoid information loss (4fe91d4)
  • duckdb: run pre-execute-hooks in duckdb before file export (5bdaa1d)
  • duckdb: use regexp_matches to ensure that matching checks containment instead of a full match (0a0cda6)
  • examples: remove example datasets that are incompatible with case-insensitive file systems (4048826)
  • exprs: ensure that left_semi and semi are equivalent (bbc1eb7)
  • forward arguments through __dataframe__ protocol (50f3be9)
  • ir: change "it not a" to "is not a" in errors (d0d463f)
  • memtable: implement support for translation of empty memtable (05b02da)
  • mysql: fix UUID type reflection for sqlalchemy 2.0.18 (12d4039)
  • mysql: pass-through kwargs to connect_args (e3f3e2d)
  • ops: ensure that name attribute is always valid for ops.SelfReference (9068aca)
  • polars: ensure that pivot_longer works with more than one column (822c912)
  • polars: fix collect implementation (c1182be)
  • postgres: by default use domain socket (e44fdfb)
  • pyspark: make has_operation method a [@classmethod](https://github.com/classmethod) (c1b7dbc)
  • release: use @google/semantic-release-replace-plugin@1.2.0 to avoid module loading bug (673aab3)
  • snowflake: fix broken unnest functionality (207587c)
  • snowflake: reset the schema and database to the original schema after creating them (54ce26a)
  • snowflake: reset to original schema when resetting the database (32ff832)
  • snowflake: use regexp_instr != 0 instead of REGEXP keyword (06e2be4)
  • sqlalchemy: add support for sqlalchemy string subclassed types (8b33b35)
  • sql: handle parsing aliases (3645cf4)
  • trino: handle all remaining common datatype parsing (b3778c7)
  • trino: remove filter index warning in Trino dialect (a2ae7ae)

Documentation

  • add conda/mamba install instructions for specific backends (c643fca)
  • add docstrings to DataType.is_* methods (ed40fdb)
  • backend-matrix: add ability to select a specific subset of backends (f663066)
  • backends: document memtable support and performance for each backend (b321733)
  • blog: v6.0.0 release blog (21fc5da)
  • document versioning policy (242ea15)
  • dot-sql: add examples of mixing ibis expressions and SQL strings (5abd30e)
  • dplyr: small fixes to the dplyr getting started guide (4b57f7f)
  • expand docstring for dtype function (39b7a24)
  • fix functions names in examples of extract url fields (872445e)
  • fix heading in 6.0.0 blog (0ad3ce2)
  • oracle: add note about old password checks in oracle (470b90b)
  • postgres: fix postgres memtable docs (7423eb9)
  • release-notes: fix typo (a319e3a)
  • social: add social media preview cards (e98a0a6)
  • update imports/exports for pyspark backend (16d73c4)

Refactors

  • pyarrow: remove unnecessary calls to combine_chunks (c026d2d)
  • pyarrow: use schema.empty_table() instead of manually constructing empty tables (c099302)
  • result-handling: remove result_handler in favor of expression specific methods (3dc7143)
  • snowflake: enable multiple statements and clean up duplicated parameter setting code (75824a6)
  • tests: clean up backend test setup to make non-data-loading steps atomic (16b4632)

6.0.0 (2023-07-05)

⚠ BREAKING CHANGES

  • imports: Use of ibis.udf as a module is removed. Use ibis.legacy.udf instead.
  • The minimum supported Python version is now Python 3.9
  • api: group_by().count() no longer automatically names the count aggregation count. Use relabel to rename columns.
  • backends: Backend.ast_schema is removed. Use expr.as_table().schema() instead.
  • snowflake/postgres: Postgres UDFs now use the new @udf.scalar.python API. This should be a low-effort replacement for the existing API.
  • ir: ops.NullLiteral is removed
  • datatypes: dt.Interval has no longer a default unit, dt.interval is removed
  • deps: snowflake-connector-python's lower bound was increased to 3.0.2, the minimum version needed to avoid a high-severity vulnerability. Please upgrade snowflake-connector-python to at least version 3.0.2.
  • api: Table.difference(), Table.intersection(), and Table.union() now require at least one argument.
  • postgres: Ibis no longer automatically defines first/last reductions on connection to the postgres backend. Use DDL shown in https://wiki.postgresql.org/wiki/First/last_(aggregate) or one of the pgxn implementations instead.
  • api: ibis.examples.<example-name>.fetch no longer forwards arbitrary keyword arguments to read_csv/read_parquet.
  • datatypes: dt.Interval.value_type attribute is removed
  • api: Table.count() is no longer automatically named "count". Use Table.count().name("count") to achieve the previous behavior.
  • trino: The trino backend now requires at least version 0.321 of the trino Python package.
  • backends: removed AlchemyTable, AlchemyDatabase, DaskTable, DaskDatabase, PandasTable, PandasDatabase, PySparkDatabaseTable, use ops.DatabaseTable instead
  • dtypes: temporal unit enums are now available under ibis.common.temporal instead of ibis.common.enums.
  • clickhouse: external_tables can no longer be passed in ibis.clickhouse.connect. Pass external_tables directly in raw_sql/execute/to_pyarrow/to_pyarrow_batches().
  • datatypes: dt.Set is now an alias for dt.Array
  • bigquery: Before this change, ibis timestamp is mapping to Bigquery TIMESTAMP type and no timezone supports. However, it's not correct, BigQuery TIMESTAMP type should have UTC timezone, while DATETIME type is the no timezone version. Hence, this change is breaking the ibis timestamp mapping to BigQuery: If ibis timestamp has the UTC timezone, will map to BigQuery TIMESTAMP type. If ibis timestamp has no timezone, will map to BigQuery DATETIME type.
  • impala: Cursors are no longer returned from DDL operations to prevent resource leakage. Use raw_sql if you need specialized operations that return a cursor. Additionally, table-based DDL operations now return the table they're operating on.
  • api: Column.first()/Column.last() are now reductions by default. Code running these expressions in isolation will no longer be windowed over the entire table. Code using this function in select-based APIs should function unchanged.
  • bigquery: when using the bigquery backend, casting float to int will no longer round floats to the nearest integer
  • ops.Hash: The hash method on table columns on longer accepts the how argument. The hashing functions available are highly backend-dependent and the intention of the hash operation is to provide a fast, consistent (on the same backend, only) integer value. If you have been passing in a value for how, you can remove it and you will get the same results as before, as there were no backends with multiple hash functions working.
  • duckdb: Some CSV files may now have headers that did not have them previously. Set header=False to get the previous behavior.
  • deps: New environments will have a different default setting for compression in the ClickHouse backend due to removal of optional dependencies. Ibis is still capable of using the optional dependencies but doesn't include them by default. Install clickhouse-cityhash and lz4 to preserve the previous behavior.
  • api: Table.set_column() is removed; use Table.mutate(name=expr) instead
  • api: the suffixes argument in all join methods has been removed in favor of lname/rname args. The default renaming scheme for duplicate columns has also changed. To get the exact same behavior as before, pass in lname="{name}_x", rname="{name}_y".
  • ir: IntervalType.unit is now an enum instead of a string
  • type-system: Inferred types of Python objects may be slightly different. Ibis now use pyarrow to infer the column types of pandas DataFrame and other types.
  • backends: path argument of Backend.connect() is removed, use the database argument instead
  • api: removed Table.sort_by() and Table.groupby(), use .order_by() and .group_by() respectively
  • datatypes: DataType.scalar and column class attributes are now strings.
  • backends: Backend.load_data(), Backend.exists_database() and Backend.exists_table() are removed
  • ir: Value.summary() and NumericValue.summary() are removed
  • schema: Schema.merge() is removed, use the union operator schema1 | schema2 instead
  • api: ibis.sequence() is removed

  • drop support for Python 3.8 (747f4ca)

Features

  • add dask windowing (9cb920a)
  • add easy type hints to GroupBy (da330b1)
  • add microsecond method to TimestampValue and TimeValue (e9df2da)
  • api: add __dataframe__ implementation (b3d9619)
  • api: add ALL_CAPS option to Table.relabel (c0b30e2)
  • api: add first/last reduction APIs (8c01980)
  • api: add zip operation and api (fecf695)
  • api: allow passing multiple keyword arguments to ibis.interval (22ee854)
  • api: better repr and pickle support for deferred expressions (2b1ec9c)
  • api: exact median (c53031c)
  • api: raise better error on column name collision in joins (e04c38c)
  • api: replace suffixes in join with lname/rname (3caf3a1)
  • api: support abstract type names in selectors.of_type (f6d2d56)
  • api: support list of strings and single strings in the across selector (a6b60e7)
  • api: use create_table to load example data (42e09a4)
  • bigquery: add client and storage_client params to connect (4cf1354)
  • bigquery: enable group_concat over windows (d6a1117)
  • cast: add table-level try_cast (5e4d16b)
  • clickhouse: add array zip impl (efba835)
  • clickhouse: move to clickhouse supported Python client (012557a)
  • clickhouse: set default engine to native file (29815fa)
  • clickhouse: support pyarrow decimal types (7472dd5)
  • common: add a pure python egraph implementation (aed2ed0)
  • common: add pattern matchers (b515d5c)
  • common: add support for start parameter in StringFind (31ce741)
  • common: add Topmost and Innermost pattern matchers (90b48fc)
  • common: implement copy protocol for Immutable base class (e61c66b)
  • create_table: support pyarrow Table in table creation (9dbb25c)
  • datafusion: add string functions (66c0afb)
  • datafusion: add support for scalar pyarrow UDFs (45935b7)
  • datafusion: minimal decimal support (c550780)
  • datafusion: register tables and datasets in datafusion (cb2cc58)
  • datatypes: add support for decimal values with arrow-based APIs (b4ba6b9)
  • datatypes: support creating Timestamp from units (66f2ff0)
  • deps: load examples lazily (4ea0ddb)
  • duckdb: add attach_sqlite method (bd32649)
  • duckdb: add support for native and pyarrow UDFs (7e56fc4)
  • duckdb: expand map support to .values() and map concatenation (ad49a09)
  • duckdb: set header=True by default (e4b515d)
  • duckdb: support 0.8.0 (ae9ae7d)
  • duckdb: support array zip operation (2d14ccc)
  • duckdb: support motherduck (053dc7e)
  • duckdb: warn when querying an already consumed RecordBatchReader (5a013ff)
  • flink: add initial flink SQL compiler (053a6d2)
  • formats: support timestamps in delta output; default to micros for pyarrow conversion (d8d5710)
  • implement read_delta and to_delta for some backends (74fc863)
  • implement read_delta for datafusion (eb4602f)
  • implement try_cast for a few backends (f488f0e)
  • io: add to_torch API (685c8fc)
  • io: add az/gs prefixes to normalize_filename in utils (e9eebba)
  • mysql: add re_extract (5ed40e1)
  • oracle: add oracle backend (c9b038b)
  • oracle: support temporary tables (6e64cd0)
  • pandas: add approx_median (6714b9f)
  • pandas: support passing memtables to create_table (3ea9a21)
  • polars: add any and all reductions (0bd3c01)
  • polars: add argmin and argmax (78562d3)
  • polars: add correlation operation (05ff488)
  • polars: add polars support for identical_to (aab3bae)
  • polars: add support for offset, binary literals, and dropna(how='all') (d2298e9)
  • polars: allow seamless connection for DataFrame as well as LazyFrame (a2a3e45)
  • polars: implement .sql methods (86f2a34)
  • polars: lower-latency column return for non-temporal results (b009563)
  • polars: support pyarrow decimal types (7e6c365)
  • polars: support SQL dialect translation (c87f695)
  • polars: support table registration from multiple parquet files (9c0a8be)
  • postgres: add ApproxMedian aggregation (887f572)
  • pyspark: add zip array impl (6c00cbc)
  • snowflake/postgres: scalar UDFs (dbf5b62)
  • snowflake: implement array zip (839e1f0)
  • snowflake: implement proper approx median (b15a6fe)
  • snowflake: support SSO and other forms of passwordless authentication (23ac53d)
  • snowflake: use the client python version as the UDF runtime where possible (69a9101)
  • sql: allow any SQL dialect accepted by sqlgllot in Table.sql and Backend.sql (f38c447)
  • sqlite: add argmin and argmax functions (c8af9d4)
  • sqlite: add arithmetic mode aggregation (6fcac44)
  • sqlite: add ops.DateSub, ops.DateAdd, ops.DateDiff (cfd65a0)
  • streamlit: add support for streamlit connection interface (05c9449)
  • trino: implement zip (cd11daa)

Bug Fixes

  • add issue write permission to assign.yml (9445cee)
  • alchemy: close the cursor on error during dataframe construction (cc7dffb)
  • backends: fix capitalize to lowercase subsequent characters (49978f9)
  • backends: fix notall/notany translation (56b56b3)
  • bigquery: add srid=4326 to the geography dtype mapping (57a825b)
  • bigquery: allow passing both schema and obj in create_table (49cc2c4)
  • bigquery: bigquery timestamp and datetime dtypes (067e8a5)
  • bigquery: ensure that bigquery temporal ops work with the new timeunit/dateunit/intervalunit enums (0e00d86)
  • bigquery: ensure that generated names are used when compiling columns and allow flexible column names (c7044fe)
  • bigquery: fix table naming from count rename removal refactor (5b009d2)
  • bigquery: raise OperationNotDefinedError for IntervalAdd and IntervalSubtract (501aaf7)
  • bigquery: support capture group functionality (3f4f05b)
  • bigquery: truncate when casting float to int (267d8e1)
  • ci: use mariadb-admin instead of mysqladmin in mariadb 11.x (d4ccd3d)
  • clickhouse: avoid generating names for structs (5d11f48)
  • clickhouse: clean up external tables per query to avoid leaking them across queries (6d32edd)
  • clickhouse: close cursors more aggressively (478a40f)
  • clickhouse: use correct functions for milli and micro extraction (49b3136)
  • clickhouse: use named rather than positional group by (1f7e309)
  • clickhouse: use the correct dialect to generate subquery string for Contains operation (f656bd5)
  • common: fix bug in re_extract (6ebaeab), closes #6167
  • core: interval resolution should upcast to smallest unit (f7f844d), closes #6139
  • datafusion: fix incorrect order of predicate -> select compilation (0092304)
  • deps: make pyarrow a required dependency (b217cde)
  • deps: prevent vulnerable snowflake-connector-python versions (6dedb45)
  • deps: support multipledispatch version 1 (805a7d7)
  • deps: update dependency atpublic to v4 (3a44755)
  • deps: update dependency datafusion to v22 (15d8d11)
  • deps: update dependency datafusion to v23 (e4d666d)
  • deps: update dependency datafusion to v24 (c158b78)
  • deps: update dependency datafusion to v25 (c3a6264)
  • deps: update dependency datafusion to v26 (7e84ffe)
  • deps: update dependency deltalake to >=0.9.0,<0.11.0 (9817a83)
  • deps: update dependency pyarrow to v12 (3cbc239)
  • deps: update dependency sqlglot to v12 (5504bd4)
  • deps: update dependency sqlglot to v13 (1485dd0)
  • deps: update dependency sqlglot to v14 (9c40c06)
  • deps: update dependency sqlglot to v15 (f149729)
  • deps: update dependency sqlglot to v16 (46601ef)
  • deps: update dependency sqlglot to v17 (9b50fb4)
  • docs: fix failing doctests (04b9f19)
  • docs: typo in code without selectors (b236893)
  • docs: typo in docstrings and comments (0d3ed86)
  • docs: typo in snowflake do_connect kwargs (671bc31)
  • duckdb: better types for null literals (7b9d85e)
  • duckdb: disable map values and map merge for columns (b5472b3)
  • duckdb: ensure to_timestamp returns a UTC timestamp (0ce0b9f)
  • duckdb: ensure connection lifetime is greater than or equal to record batch reader lifetime (6ed353e)
  • duckdb: ensure that quoted struct field names work (47de1c3)
  • duckdb: ensure that types are inferred correctly across duckdb_engine versions (9c3d173)
  • duckdb: fix check for literal maps (b2b229b)
  • duckdb: fix exporting pyarrow record batches by bumping duckdb to 0.8.1 (aca52ab)
  • duckdb: fix read_csv problem with kwargs (6f71735), closes #6190
  • examples: move lockfile creation to data directory (b8f6e6b)
  • examples: use filelock to prevent pooch from clobbering files when fetching concurrently (e14662e)
  • expr: fix graphviz rendering (6d4a34f)
  • impala: do not cast ca_cert None value to string (bfdfb0e)
  • impala: expose hdfs_connect function as ibis.impala.hdfs_connect (27a0d12)
  • impala: more aggressively clean up cursors internally (bf5687e)
  • impala: replace time_mapping with TIME_MAPPING and backwards compatible check (4c3ca20)
  • ir: force an alias if projecting or aggregating columns (9fb1e88)
  • ir: raise Exception for group by with no keys (845f7ab), closes #6237
  • mssql: dont yield from inside a cursor (4af0731)
  • mysql: do not fail when we cannot set the session timezone (930f8ab)
  • mysql: ensure enum string functions are coerced to the correct type (e499c7f)
  • mysql: ensure that floats and double do not come back as Python Decimal objects (a3c329f)
  • mysql: fix binary literals (e081252)
  • mysql: handle the zero timestamp value (9ac86fd)
  • operations: ensure that self refs have a distinct name from the table they are referencing (bd8eb88)
  • oracle: disable autoload when cleaning up temp tables (b824142)
  • oracle: disable statement cache (41d3857)
  • oracle: disable temp tables to get inserts working (f9985fe)
  • pandas, dask: allow overlapping non-predicate columns in asof join (09e26a0)
  • pandas: fix first and last over windows (9079bc4), closes #5417
  • pandas: fix string translate function (12b9569), closes #6157
  • pandas: grouped aggregation using a case statement (d4ac345)
  • pandas: preserve RHS values in asof join when column names collide (4514668)
  • pandas: solve problem with first and last window function (dfdede5), closes #4918
  • polars: avoid implode deprecation warning (ce3bdad)
  • polars: ensure that to_pyarrow is called from the backend (41bacf2)
  • polars: make list column operations backwards compatible (35fc5f7)
  • postgres: ensure that alias method overwrites view even if types are different (7d5845b)
  • postgres: ensure that backend still works when create/drop first/last aggregates fails (eb5d534)
  • pyspark: enable joining on columns with different names as well as complex predicates (dcee821)
  • snowflake: always use pyarrow for memtables (da34d6f)
  • snowflake: ensure connection lifetime is greater than or equal to record batch reader lifetime (34a0c59)
  • snowflake: ensure that _pandas_converter attribute is resolved correctly (9058bbe)
  • snowflake: ensure that temp tables are only created once (43b8152)
  • snowflake: ensure unnest works for nested struct/object types (fc6ffc2)
  • snowflake: ensure use of the right timezone value (40426bf)
  • snowflake: fix tmpdir construction for python <3.10 (a507ae2)
  • snowflake: fix incorrect arguments to snowflake regexp_substr (9261f70)
  • snowflake: fix invalid attribute access when using pyarrow (bfd90a8)
  • snowflake: handle broken upstream behavior when a table can't be found (31a8366)
  • snowflake: resolve import error from interval datatype refactor (3092012)
  • snowflake: use convert_timezone for timezone conversion instead of invalid postgres AT TIME ZONE syntax (1595e7b)
  • sqlalchemy: ensure that backends don't clobber tables needed by inputs (76e38a3)
  • sqlalchemy: ensure that union_all-generated memtables use the correct column names (a4f546b)
  • sqlalchemy: prepend the table's schema when querying metadata (d8818e2)
  • sqlalchemy: quote struct field names (f5c91fc)
  • tests: ensure that record batch readers are cleaned up (d230a8d)
  • trino: bump lower bound to avoid having to handle experimental_python_types (bf6eeab)
  • trino: ensure that nested array types are inferred correctly (030f76d)
  • trino: fix incorrect version computation (04d3a89)
  • trino: support trino 0.323 special tuple type for struct results (ea1529d)
  • type-system: infer in-memory object types using pyarrow (f7018ee)
  • typehint: update type hint for class instance (2e1e14f)

Documentation

  • across: add documentation for across (b8941d3)
  • add allowed input for memtable constructor (69cdee5)
  • add disclaimer on no row order guarantees (75dd8b0)
  • add examples to if_any and if_all (5015677)
  • add platform comment in conda env creation (e38eacb)
  • add read_delta and related to backends docs (90eaed2)
  • api: ensure all top-level items have a description (c83d783)
  • api: hide dunder methods in API docs (6724b7b)
  • api: manually add inherited mixin methods to timey classes (7dbc96d)
  • api: show source for classes to allow dunder method inspection (4cef0f8)
  • backends: fix typo in pip install command (6a7207c)
  • bigquery: add connection explainer to bigquery backend docs (84caa5b)
  • blog: add Ibis + PyTorch + DuckDB blog post (1ad946c)
  • change plural variable name cols to col (c33a3ed), closes #6115
  • clarify map refers to Python Mapping container (f050a61)
  • css: enable code block copy button, don't select prompt (3510abe)
  • de-template remaining backends (except pandas, dask, impala) (82b7408)
  • describe NULL differences with pandas (688b293)
  • dev-env: remove python 3.8 from environment support matrix (4f89565)
  • drop docker-compose install for conda dev env setup (e19924d)
  • duckdb: add quick explainer on connecting to motherduck (4ef710e)
  • file support: add badge and docstrings for read_* methods (0767b7c)
  • fill out more docstrings (dc0289c)
  • fix errors and add 'table' before 'expression' (096b568)
  • fix some redirects (3a23c1f)
  • fix typo in Table.relabel return description (05cc51e)
  • generic: add docstring examples in types/generic (1d87292)
  • guides: add brief installation instructions at top of notebooks (dc3e694)
  • guides: update ibis-for-dplyr-users.ipynb with latest (1aa172e), closes #6125
  • improve docstrings for BooleanValue and BoleanColumn (30c1009)
  • improve docstrings to map types (72a49b0)
  • install: add quotes to all bracketed installs for shell compatibility (bb5c075)
  • intersphinx: add mapping to autolink pyarrow and pandas refs (cd92019)
  • intro: create Ibis for dplyr users document (e02a6f2)
  • introguides: use DuckDB for intro pandas notebook, remove iris (a7e845a)
  • link to Ibis for dplyr users (6e7c6a2)
  • make pandas.md filename lowercase (4937d45)
  • more group_by() and NULL in pandas guide (486b696)
  • more spelling fixes (564abbe)
  • move API docs to top-level (dcc409f)
  • numeric: add examples to numeric methods (39b470f)
  • oracle: add basic backend documentation (c871790)
  • oracle: add oracle to matrix (89aecf2)
  • python-versions: document how we decide to drop support for Python versions (3474dbc)
  • redirect Pandas to pandas (4074284)
  • remove trailing whitespace (63db643)
  • reorder sections in pandas guide (3b66093)
  • restructure and consistency (351d424)
  • snowflake: add connection explainer to snowflake backend docs (a62bbcd)
  • streamlit: fix ibis-framework install (a8cf773)
  • update copyright and some minor edits (b9aed44)
  • update notany/notall docstrings with arg (a5ec986), closes #5993
  • update structs and fix constructor docstrings (493437a)
  • use lowercase pandas (19b5d10)
  • use to_pandas instead of execute (882949e)

Refactors

  • alchemy: abstract out custom type mapping and fix sqlite (d712e2e)
  • api: consolidate ibis.date(), ibis.time() and ibis.timestamp() functions (20f71bf)
  • api: enforce at least one argument for Table set operations (57e948f)
  • api: remove automatic count name from relations (2cb19ec)
  • api: remove automatic group by count naming (15d9e50)
  • api: remove deprecated ibis.sequence() function (de0bf69)
  • api: remove deprecated Table.set_column() method (aa5ed94)
  • api: remove deprecated Table.sort_by() and Table.groupby() methods (1316635)
  • backends: remove ast_schema method (51b5ef8)
  • backends: remove backend specific DatabaseTable operations (d1bab97)
  • backends: remove deprecated Backend.load_data(), .exists_database() and .exists_table() methods (755555f)
  • backends: remove deprecated path argument of Backend.connect() (6737ea8)
  • bigquery: align datatype conversions with the new convention (70b8232)
  • bigquery: support a broader range of interval units in temporal binary operations (f78ce73)
  • common: add sanity checks for creating ENodes and Patterns (fc89cc3)
  • common: cleanup unit conversions (73de24e)
  • common: disallow unit conversions between days and hours (5619ce0)
  • common: move ibis.collections.DisjointSet to ibis.common.egraph (07dde21)
  • common: move tests for re_extract to general suite (acd1774)
  • common: use an enum as a sentinel value instead of NoMatch class (6674353), closes #6049
  • dask/pandas: align datatype conversions with the new convention (cecc24c)
  • datatypes: make pandas conversion backend specific if needed (544d27c)
  • datatypes: normalize interval values to integers (80a40ab)
  • datatypes: remove Set() in favor of Array() datatype (30a4f7e)
  • datatypes: remove value_type parametrization of the Interval datatype (463cdc3)
  • datatypes: remove direct ir dependency from datatypes (d7f0be0)
  • datatypes: use typehints instead of rules (704542e)
  • deps: remove optional dependency on clickhouse-cityhash and lz4 (736fe26)
  • dtypes: add normalize_datetime() and normalize_timezone() common utilities (c00ab38)
  • dtypes: turn dt.dtype() into lazily dispatched factory function (5261003)
  • formats: consolidate the dataframe conversion logic (53ed88e)
  • formats: encapsulate conversions to TypeMapper, SchemaMapper and DataMapper subclasses (ab35311)
  • formats: introduce a standalone subpackage to deal with common in-memory formats (e8f45f5)
  • impala: rely on impyla cursor for _wait_synchronous (a1b8736)
  • imports: move old UDF implementation to ibis.legacy module (cf93d5d)
  • ir: encapsulate temporal unit handling in enums (1b8fa7b)
  • ir: remove rlz.column_from, rlz.base_table_of and rlz.function_of rules (ed71d51)
  • ir: remove deprecated Value.summary() and NumericValue.summary() expression methods (6cd8050)
  • ir: remove redundant ops.NullLiteral() operation (a881703)
  • ir: simplify Expr._find_backends() implementation by using the ibis.common.graph utilities (91ff8d4)
  • ir: use dt.normalize() to construct literals (bf72f16)
  • ops.Hash: remove how from backend-specific hash operation (46a55fc)
  • pandas: solve and remove stale TODOs (92d979e)
  • polars: align datatype conversion functions with the new convention (5d61159)
  • postgres: fail at execute time for UDFs to avoid db connections in .compile() (e3a4d4d)
  • pyspark: align datatype conversion functions with the new convention (3437bb6)
  • pyspark: remove useless window branching in compiler (ad08da4)
  • replace custom _merge using pd.merge (fe74f76)
  • schema: remove deprecated Schema.merge() method (d307722)
  • schema: use type annotations instead of rules (98cd539)
  • snowflake: add flags to supplemental JavaScript UDFs (054add4)
  • sql: align datatype conversions with the new convention (0ef145b)
  • sqlite: remove roundtripping for DayOfWeekIndex and DayOfWeekName (b5a2bc5)
  • test: cleanup test data (7ae2b24)
  • to-pyarrow-batches: ensure that batch readers are always closed and exhausted (35a391f)
  • trino: always clean up prepared statements created when accessing query metadata (4f3a4cd)
  • util: use base32 to compress uuid table names (ba039a3)

Performance

  • imports: speed up checking for geospatial support (aa601af)
  • snowflake: use pyarrow for all transport (1fb89a1)
  • sqlalchemy: lazily construct the inspector object (8db5624)

Deprecations

  • api: deprecate tuple syntax for order by keys (5ed5110)

5.1.0 (2023-04-11)

Features

  • api: expand distinct API for dropping duplicates based on column subsets (3720ea5)
  • api: implement pyarrow memtables (9d4fbbd)
  • api: support passing a format string to Table.relabel (0583959)
  • api: thread kwargs around properly to support more complex connection arguments (7e0e15b)
  • backends: add more array functions (5208801)
  • bigquery: make to_pyarrow_batches() smarter (42f5987)
  • bigquery: support bignumeric type (d7c0f49)
  • default repr to showing all columns in Jupyter notebooks (91a0811)
  • druid: add re_search support (946202b)
  • duckdb: add map operations (a4c4e77)
  • duckdb: support sqlalchemy 2 (679bb52)
  • mssql: implement ops.StandardDev, ops.Variance (e322f1d)
  • pandas: support memtable in pandas backend (6e4d621), closes #5467
  • polars: implement count distinct (aea4ccd)
  • postgres: implement ops.Arbitrary (ee8dbab)
  • pyspark: pivot_longer (f600c90)
  • pyspark: add ArrayFilter operation (2b1301e)
  • pyspark: add ArrayMap operation (e2c159c)
  • pyspark: add DateDiff operation (bfd6109)
  • pyspark: add partial support for interval types (067120d)
  • pyspark: add read_csv, read_parquet, and register (7bd22af)
  • pyspark: implement count distinct (db29e10)
  • pyspark: support basic caching (ab0df7a)
  • snowflake: add optional 'connect_args' param (8bf2043)
  • snowflake: native pyarrow support (ce3d6a4)
  • sqlalchemy: support unknown types (fde79fa)
  • sqlite: implement ops.Arbitrary (9bcdf77)
  • sql: use temp views where possible (5b9d8c0)
  • table: implement pivot_wider API (60e7731)
  • ux: move ibis.expr.selectors to ibis.selectors and deprecate for removal in 6.0 (0ae639d)

Bug Fixes

  • api: disambiguate attribute errors from a missing resolve method (e12c4df)
  • api: support filter on literal followed by aggregate (68d65c8)
  • clickhouse: do not render aliases when compiling aggregate expression components (46caf3b)
  • clickhouse: ensure that clickhouse depends on sqlalchemy for make_url usage (ea10a27)
  • clickhouse: ensure that truncate works (1639914)
  • clickhouse: fix create_table implementation (5a54489)
  • clickhouse: workaround sqlglot issue with calling match (762f4d6)
  • deps: support pandas 2.0 (4f1d9fe)
  • duckdb: branch to avoid unnecessary dataframe construction (9d5d943)
  • duckdb: disable the progress bar by default (1a1892c)
  • duckdb: drop use of experimental parallel csv reader (47d8b92)
  • duckdb: generate SIMILAR TO instead of tilde to workaround sqlglot issue (434da27)
  • improve typing signature of .dropna() (e11de3f)
  • mssql: improve aggregation on expressions (58aa78d)
  • mssql: remove invalid aggregations (1ce3ef9)
  • polars: backwards compatibility for the time_zone and time_unit properties (3a2c4df)
  • postgres: allow inference of unknown types (343fb37)
  • pyspark: fail when aggregation contains a having filter (bd81a9f)
  • pyspark: raise proper error when trying to generate sql (51afc13)
  • snowflake: fix new array operations; remove ArrayRemove operation (772668b)
  • snowflake: make sure ephemeral tables following backend quoting rules (9a845df)
  • snowflake: make sure pyarrow is used when possible (01f5154)
  • sql: ensure that set operations resolve to a single relation (3a02965)
  • sql: generate consistent pivot_longer semantics in the presence of multiple unnests (6bc301a)
  • sqlglot: work with newer versions (6f7302d)
  • trino,duckdb,postgres: make cumulative notany/notall aggregations work (c2e985f)
  • trino: only support how='first' with arbitrary reduction (315b5e7)
  • ux: use guaranteed length-1 characters for NULL values (8618789)

Refactors

  • api: remove explicit use of .projection in favor of the shorter .select (73df8df)
  • cache: factor out ref counted cache (c816f00)
  • duckdb: simplify to_pyarrow_batches implementation (d6235ee)
  • duckdb: source loaded and installed extensions from duckdb (fb06262)
  • duckdb: use native duckdb parquet reader unless auth required (e9f57eb)
  • generate uuid-based names for temp tables (a1164df)
  • memtable: clean up dispatch code (9a19302)
  • memtable: dedup table proxy code (3bccec0)
  • sqlalchemy: remove unused _meta instance attributes (523e198)

Deprecations

  • api: deprecate Table.set_column in favor of Table.mutate (954a6b7)

Documentation

  • add a getting started guide (8fd03ce)
  • add warning about comparisons to None (5cf186a)
  • blog: add campaign finance blog post (383c708)
  • blog: add campaign finance to SUMMARY.md (0bdd093)
  • clean up agg argument descriptions and add join examples (93d3059)
  • comparison: add a "why ibis" page (011cc19)
  • move conda before nix in dev setup instructions (6b2cbaa)
  • nth: improve docstring for nth() (fb7b34b)
  • patch docs build to fix anchor links (51be459)
  • penguins: add citation for palmer penguins data (679848d)
  • penguins: change to flipper (eec3706)
  • refresh environment setup pages (b609571)
  • selectors: make doctests more complete and actually run them (c8f2964)
  • style and review fixes in getting started guide (3b0f8db)

5.0.0 (2023-03-15)

⚠ BREAKING CHANGES

  • api: Snowflake identifiers are now kept as is from the database. Many table names and column names may now be in SHOUTING CASE. Adjust code accordingly.
  • backend: Backends now raise ibis.common.exceptions.UnsupportedOperationError in more places during compilation. You may need to catch this error type instead of the previous type, which differed between backends.
  • ux: Table.info now returns an expression
  • ux: Passing a sequence of column names to Table.drop is removed. Replace drop(cols) with drop(*cols).
  • The spark plugin alias is removed. Use pyspark instead
  • ir: removed ibis.expr.scope and ibis.expr.timecontext modules, access them under ibis.backends.base.df.<module>
  • some methods have been removed from the top-level ibis.<backend> namespaces, access them on a connected backend instance instead.
  • common: removed ibis.common.geospatial, import the functions from ibis.backends.base.sql.registry.geospatial
  • datatypes: JSON is no longer a subtype of String
  • datatype: Category, CategoryValue/Column/Scalar are removed. Use string types instead.
  • ux: The metric_name argument to value_counts is removed. Use Table.relabel to change the metric column's name.
  • deps: the minimum version of parsy is now 2.0
  • ir/backends: removed the following symbols:
  • ibis.backends.duckdb.parse_type() function
  • ibis.backends.impala.Backend.set_database() method
  • ibis.backends.pyspark.Backend.set_database() method
  • ibis.backends.impala.ImpalaConnection.ping() method
  • ibis.expr.operations.DatabaseTable.change_name() method
  • ibis.expr.operations.ParseURL class
  • ibis.expr.operations.Value.to_projection() method
  • ibis.expr.types.Table.get_column() method
  • ibis.expr.types.Table.get_columns() method
  • ibis.expr.types.StringValue.parse_url() method
  • schema: Schema.from_dict(), .delete() and .append() methods are removed
  • datatype: struct_type.pairs is removed, use struct_type.fields instead
  • datatype: Struct(names, types) is not supported anymore, pass a dictionary to Struct constructor instead

Features

  • add max_columns option for table repr (a3aa236)
  • add examples API (b62356e)
  • api: add map/array accessors for easy conversion of JSON to stronger-typed values (d1e9d11)
  • api: add array to string join operation (74de349)
  • api: add builtin support for relabeling columns to snake case (1157273)
  • api: add support for passing a mapping to ibis.map (d365fd4)
  • api: allow single argument set operations (bb0a6f0)
  • api: implement to_pandas() API for ecosystem compatibility (cad316c)
  • api: implement isin (ac31db2)
  • api: make cache evaluate only once per session per expression (5a8ffe9)
  • api: make create_table uniform (833c698)
  • api: more selectors (5844304)
  • api: upcast pandas DataFrames to memtables in rlz.table rule (8dcfb8d)
  • backends: implement ops.Time for sqlalchemy backends (713cd33)
  • bigquery: add BIGNUMERIC type support (5c98ea4)
  • bigquery: add UUID literal support (ac47c62)
  • bigquery: enable subqueries in select statements (ef4dc86)
  • bigquery: implement create and drop table method (5f3c22c)
  • bigquery: implement create_view and drop_view method (a586473)
  • bigquery: support creating tables from in-memory tables (c3a25f1)
  • bigquery: support in-memory tables (37e3279)
  • change Rich repr of dtypes from blue to dim (008311f)
  • clickhouse: implement ArrayFilter translation (f2144b6)
  • clickhouse: implement ops.ArrayMap (45000e7)
  • clickhouse: implement ops.MapLength (fc82eaa)
  • clickhouse: implement ops.Capitalize (914c64c)
  • clickhouse: implement ops.ExtractMillisecond (ee74e3a)
  • clickhouse: implement ops.RandomScalar (104aeed)
  • clickhouse: implement ops.StringAscii (a507d17)
  • clickhouse: implement ops.TimestampFromYMDHMS, ops.DateFromYMD (05f5ae5)
  • clickhouse: improve error message for invalid types in literal (e4d7799)
  • clickhouse: support asof_join (7ed5143)
  • common: add abstract mapping collection with support for set operations (7d4aa0f)
  • common: add support for variadic positional and variadic keyword annotations (baea1fa)
  • common: hold typehint in the annotation objects (b3601c6)
  • common: support Callable arguments and return types in Validator.from_annotable() (ae57c36)
  • common: support positional only and keyword only arguments in annotations (340dca1)
  • dask/pandas: raise OperationNotDefinedError exc for not defined operations (2833685)
  • datafusion: implement ops.Degrees, ops.Radians (7e61391)
  • datafusion: implement ops.Exp (7cb3ade)
  • datafusion: implement ops.Pi, ops.E (5a74cb4)
  • datafusion: implement ops.RandomScalar (5d1cd0f)
  • datafusion: implement ops.StartsWith (8099014)
  • datafusion: implement ops.StringAscii (b1d7672)
  • datafusion: implement ops.StrRight (016a082)
  • datafusion: implement ops.Translate (2fe3fc4)
  • datafusion: support substr without end (a19fd87)
  • datatype/schema: support datatype and schema declaration using type annotated classes (6722c31)
  • datatype: enable inference of Decimal type (8761732)
  • datatype: implement Mapping abstract base class for StructType (5df2022)
  • deps: add Python 3.11 support and tests (6f3f759)
  • druid: add Apache Druid backend (c4cc2a6)
  • druid: implement bitwise operations (3ac7447)
  • druid: implement ops.Pi, ops.Modulus, ops.Power, ops.Log10 (090ff03)
  • druid: implement ops.Sign (35f52cc)
  • druid: implement ops.StringJoin (42cd9a3)
  • duckdb: add support for reading tables from sqlite databases (9ba2211)
  • duckdb: add UUID type support (5cd6d76)
  • duckdb: implement ArrayFilter translation (5f35d5c)
  • duckdb: implement ops.ArrayMap (063602d)
  • duckdb: implement create_view and drop_view method (4f73953)
  • duckdb: implement ops.Capitalize (b17116e)
  • duckdb: implement ops.TimestampDiff, ops.IntervalAdd, ops.IntervalSubtract (a7fd8fb)
  • duckdb: implement uuid result type (3150333)
  • duckdb: support dt.MACADDR, dt.INET as string (c4739c7)
  • duckdb: use read_json_auto when reading json (4193867)
  • examples: add imdb dataset examples (3d63203)
  • examples: add movielens small dataset (5f7c15c)
  • examples: add wowah_data data to examples (bf9a7cc)
  • examples: enable progressbar and faster hashing (4adfe29)
  • impala: implement ops.Clip (279fd78)
  • impala: implement ops.Radians, ops.Degrees (a794ace)
  • impala: implement ops.RandomScalar (874f2ff)
  • io: add to_parquet, to_csv to backends (fecca42)
  • ir: add ArrayFilter operation (e719d60)
  • ir: add ArrayMap operation (49e5f7a)
  • mysql: support in-memory tables (4dfabbd)
  • pandas/dask: implement bitwise operations (4994add)
  • pandas/dask: implement ops.Pi, ops.E (091be3c)
  • pandas: add basic unnest support (dd36b9d)
  • pandas: implement ops.StartsWith, ops.EndsWith (2725423)
  • pandas: support more pandas extension dtypes (54818ef)
  • polars: implement ops.Union (17c6011)
  • polars: implement ops.Pi, ops.E (6d8fc4a)
  • postgres: allow connecting with an explicit schema (39c9ea8)
  • postgres: fix interval literal (c0fa933)
  • postgres: implement argmin/argmax (82668ec)
  • postgres: parse tsvector columns as strings (fac8c47), closes #5402
  • pyspark: add support for ops.ArgMin and ops.ArgMax (a3fa57c)
  • pyspark: implement ops.Between (ed83465)
  • return Table from create_table(), create_view() (e4ea597)
  • schema: implement Mapping abstract base class for Schema (167d85a)
  • selectors: support ranges (e10caf4)
  • snowflake: add support for alias in snowflake (b1b947a)
  • snowflake: add support for bulk upload for temp tables in snowflake (6cc174f)
  • snowflake: add UUID literal support (436c781)
  • snowflake: implement argmin/argmax (8b998a5)
  • snowflake: implement ops.BitwiseAnd, ops.BitwiseNot, ops.BitwiseOr, ops.BitwiseXor (1acd4b7)
  • snowflake: implement ops.GroupConcat (2219866)
  • snowflake: implement remaining map functions (c48c9a6)
  • snowflake: support binary variance reduction with filters (eeabdee)
  • snowflake: support cross-database table access (79cb445)
  • sqlalchemy: generalize unnest to work on backends that don't support it (5943ce7)
  • sqlite: add sqlite type support (addd6a9)
  • sqlite: support in-memory tables (1b24848)
  • sql: support for creating temporary tables in sql based backends (466cf35)
  • tables: cast table using schema (96ce109)
  • tables: implement pivot_longer API (11c5736)
  • trino: enable MapLength operation (a7ad1db)
  • trino: implement ArrayFilter translation (50f6fcc)
  • trino: implement ops.ArrayMap (657bf61)
  • trino: implement ops.Between (d70b9c0)
  • trino: support sqlalchemy 2 (0d078c1)
  • ux: accept selectors in Table.drop (325140f)
  • ux: allow creating unbound tables using annotated class definitions (d7bf6a2)
  • ux: easy interactive setup (6850146)
  • ux: expose between, rows and range keyword arguments in value.over() (5763063)

Bug Fixes

  • analysis: extract Limit subqueries (62f6e14)
  • api: add a name attribute to backend proxy modules (d6d8e7e)
  • api: fix broken __radd__ array concat operation (121d9a0)
  • api: only include valid python identifiers in struct tab completion (8f33775)
  • api: only include valid python identifiers in table tab completion (031a48c)
  • backend: provide useful error if default backend is unavailable (1dbc682)
  • backends: fix capitalize implementations across all backends (d4f0275)
  • backends: fix null literal handling (7f46342)
  • bigquery: ensure that memtables are translated correctly (d6e56c5)
  • bigquery: fix decimal literals (4a04c9b)
  • bigquery: regenerate negative string index sql snapshots (3f02c73)
  • bigquery: regenerate sql for predicate pushdown fix (509806f)
  • cache: remove bogus schema argument and validate database argument type (c4254f6)
  • ci: fix invalid test id (f70de1d)
  • clickhouse: fix decimal literal (4dcd2cb)
  • clickhouse: fix set ops with table operands (86bcf32)
  • clickhouse: raise OperationNotDefinedError if operation is not supported (71e2570)
  • clickhouse: register in-memory tables in pyarrow-related calls (09a045c)
  • clickhouse: use a bool type supported by clickhouse_driver (ab8f064)
  • clickhouse: workaround sqlglot's insistence on uppercasing (6151f37)
  • compiler: generate aliases in a less clever way (04a4aa5)
  • datafusion: support sum aggregation on bool column (9421400)
  • deps: bump duckdb to 0.7.0 (38d2276)
  • deps: bump snowflake-connector-python upper bound (b368b04)
  • deps: ensure that pyspark depends on sqlalchemy (60c7382)
  • deps: update dependency pyarrow to v11 (2af5d8d)
  • deps: update dependency sqlglot to v11 (e581e2f)
  • don't expose backend methods on ibis.<backend> directly (5a16431)
  • druid: remove invalid operations (19f214c)
  • duckdb: add null to duckdb datatype parser (07d2a86)
  • duckdb: ensure that temp_directory exists (00ba6cb)
  • duckdb: explicitly set timezone to UTC on connection (6ae4a06)
  • duckdb: fix blob type in literal (f66e8a1)
  • duckdb: fix memtable to_pyarrow/to_pyarrow_batches (0e8b066)
  • duckdb: in-memory objects registered with duckdb show up in list_tables (7772f79)
  • duckdb: quote identifiers if necessary in struct_pack (6e598cc)
  • duckdb: support casting to unsigned integer types (066c158)
  • duckdb: treat g re_replace flag as literal text (aa3c31c)
  • duckdb: workaround an ownership bug at the interaction of duckdb, pandas and pyarrow (2819cff)
  • duckdb: workaround duckdb bug that prevents multiple substitutions (0e09220)
  • imports: remove top-level import of sqlalchemy from base backend (b13cf25)
  • io: add read_parquet and read_csv to base backend mixin (ce80d36), closes #5420
  • ir: incorrect predicate pushdown (9a9204f)
  • ir: make find_subqueries return in topological order (3587910)
  • ir: properly raise error if literal cannot be coerced to a datatype (e16b91f)
  • ir: reorder the right schema of set operations to align with the left schema (58e60ae)
  • ir: use rlz.map_to() rule instead of isin to normalize temporal units (a1c46a2)
  • ir: use static connection pooling to prevent dropping temporary state (6d2ae26)
  • mssql: set sqlglot to tsql (1044573)
  • mysql: remove invalid operations (8f34a2b)
  • pandas/dask: handle non numpy scalar results in wrap_case_result (a3b82f7)
  • pandas: don't try to dispatch on arrow dtype if not available (d22ae7b)
  • pandas: handle casting to arrays with None elements (382b90f)
  • pandas: handle NAs in array conversion (06bd15d)
  • polars: back compat for concat_str separator argument (ced5a61)
  • polars: back compat for the reverse/descending argument (f067d81)
  • polars: polars execute respect limit kwargs (d962faf)
  • polars: properly infer polars categorical dtype (5a4707a)
  • polars: use metric name in aggregate output to dedupe columns (234d8c1)
  • pyspark: fix incorrect ops.EndsWith translation rule (4c0a5a2)
  • pyspark: fix isnan and isinf to work on bool (8dc623a)
  • snowflake: allow loose casting of objects and arrays (1cf8df0)
  • snowflake: ensure that memtables are translated correctly (b361e07)
  • snowflake: ensure that null comparisons are correct (9b83699)
  • snowflake: ensure that quoting matches snowflake behavior, not sqlalchemy (b6b67f9)
  • snowflake: ensure that we do not try to use a None schema or database (03e0265)
  • snowflake: handle the case where pyarrow isn't installed (b624fa3)
  • snowflake: make array_agg preserve nulls (24b95bf)
  • snowflake: quote column names on construction of sa.Column (af4db5c)
  • snowflake: remove broken pyarrow fetch support (c440adb)
  • snowflake: return NULL when trying to call map functions on non-object JSON (d85fb28)
  • snowflake: use _flatten to avoid overriding unrelated function in other backends (8c31594)
  • sqlalchemy: ensure that isin contains full column expression (9018eb6)
  • sqlalchemy: get builtin dialects working; mysql/mssql/postgres/sqlite (d2356bc)
  • sqlalchemy: make strip family of functions behave like Python (dd0a04c)
  • sqlalchemy: reflect most recent schema when view is replaced (62c8dea)
  • sqlalchemy: use sa.true instead of Python literal (8423eba)
  • sqlalchemy: use indexed group by key references everywhere possible (9f1ddd8)
  • sql: ensure that set operations generate valid sql in the presence of additional constructs such as sort keys (3e2c364)
  • sqlite: explicitly disallow array in literal (de73b37)
  • sqlite: fix random scalar range (26d0dde)
  • support negative string indices (f84a54d)
  • trino: workaround broken dialect (b502faf)
  • types: fix argument types of Table.order_by() (6ed3a97)
  • util: make convert_unit work with python types (cb3a90c)
  • ux: give the value_counts aggregate column a better name (abab1d7)
  • ux: make string range selectors inclusive (7071669)
  • ux: make top level set operations work (f5976b2)

Performance

  • duckdb: faster to_parquet/to_csv implementations (6071bb5)
  • fix duckdb insert-from-dataframe performance (cd27b99)

  • deps: bump minimum required version of parsy (22020cb)

  • remove spark alias to pyspark and associated cruft (4b286bd)

Refactors

  • analysis: slightly simplify find_subqueries() (ab3712f)
  • backend: normalize exceptions (065b66d)
  • clickhouse: clean up parsing rules (6731772)
  • common: move frozendict and DotDict to ibis.common.collections (4451375)
  • common: move the geospatial module to the base SQL backend (3e7bfa3)
  • dask: remove unneeded create_table() (86885a6)
  • datatype: clean up parsing rules (c15fb5f)
  • datatype: remove Category type and related APIs (bb0ee78)
  • datatype: remove StructType.pairs property in favor of identical fields attribute (6668122)
  • datatypes: move sqlalchemy datatypes to specific backend (d7b49eb)
  • datatypes: remove String parent type from JSON type (34f3898)
  • datatype: use a dictionary to store StructType fields rather than names and types tuples (84455ac)
  • datatype: use lazy dispatch when inferring pandas Timedelta objects (e5280ea)
  • drop limit kwarg from to_parquet/to_csv (a54460c)
  • duckdb: clean up parsing rules (30da8f9)
  • duckdb: handle parsing timestamp scale (16c1443)
  • duckdb: remove unused list<...> parsing rule (f040b86)
  • duckdb: use a proper sqlalchemy construct for structs and reduce casting (8daa4a1)
  • ir/api: introduce window frame operation and revamp the window API (2bc5e5e)
  • ir/backends: remove various deprecated functions and methods (a8d3007)
  • ir: reorganize the scope and timecontext utilities (80bd494)
  • ir: update ArrayMap to use the new callable_with validation rule (560474e)
  • move pretty repr tests back to their own file (4a75988)
  • nix: clean up marker argument construction (12eb916)
  • postgres: clean up datatype parsing (1f61661)
  • postgres: clean up literal arrays (21b122d)
  • pyspark: remove another private function (c5081cf)
  • remove unnecessary top-level rich console (8083a6b)
  • rules: remove unused non_negative_integer and pair rules (e00920a)
  • schema: remove deprecated Schema.from_dict(), .delete() and .append() methods (8912b24)
  • snowflake: remove the need for parsy (c53403a)
  • sqlalchemy: set session parameters once per connection (ed4b476)
  • sqlalchemy: use backend-specific startswith/endswith implementations (6101de2)
  • test_sqlalchemy.py: move to snapshot testing (96998f0)
  • tests: reorganize rules test file to the ibis.expr subpackage (47f0909)
  • tests: reorganize schema test file to the ibis.expr subpackage (40033e1)
  • tests: reorganize datatype test files to the datatypes subpackage (16199c6)
  • trino: clean up datatype parsing (84c0e35)
  • ux: return expression from Table.info (71cc0e0)

Deprecations

  • api: deprecate summary API (e449c07)
  • api: mark ibis.sequence() for removal (3589f80)

Documentation

  • add a bunch of string expression examples (18d3112)
  • add Apache Druid to backend matrix (764d9c3)
  • add CNAME file to mkdocs source (6d19111)
  • add druid to the backends index docs page (ad0b6a3)
  • add missing DataFusion entry to the backends in the README (8ce025a)
  • add redirects for common old pages (c9087f2)
  • api: document deferred API and its pitfalls (8493604)
  • api: improve collect method API documentation (b4fcef1)
  • array expression examples (6812c17)
  • backends: document default backend configuration (6d917d3)
  • backends: link to configuration from the backends list (144044d)
  • blob: blog on ibis + substrait + duckdb (5dc7a0a)
  • blog: adds examples sneak peek blog + assets folder (fcbb3d5)
  • blog: adds to file sneak peek blog (128194f)
  • blog: specify parsy 2.0 in substrait blog article (c264477)
  • bump query engine count in README and use project-preferred names (11169f7)
  • don't sort backends by coverage percentage by default (68f73b1)
  • drop docs versioning (d7140e7)
  • duckdb: fix broken docstring examples (51084ad)
  • enable light/dark mode toggle in docs (b9e812a)
  • fill out table API with working examples (16fc8be)
  • fix notebook logging example (04b75ef)
  • how-to: fix sessionize.md to use ibis.read_parquet (ff9cbf7)
  • improve Expr.substitute() docstring (b954edd)
  • improve/update pandas walkthrough (80b05d8)
  • io: doc/ux improvements for read_parquet and friends (2541556), closes #5420
  • io: update README.md to recommend installing duckdb as default backend (0a72ec0), closes #5423 #5420
  • move tutorial from docs to external ibis-examples repo (11b0237)
  • parquet: add docstring examples for to_parquet incl. partitioning (8040164)
  • point to ibis-examples repo in the README (1205636)
  • README.md: clean up readme, fix typos, alter the example (383a3d3)
  • remove duplicate "or" (b6ef3cc)
  • remove duplicate spark backend in install docs (5954618)
  • render __dunder__ method API documentation (b532c63)
  • rerender ci-analysis notebook with new table header colors (50507b6)
  • streamlit: fix url for support matrix (594199b)
  • tutorial: remove impala from sql tutorial (7627c13)
  • use teal for primary & accent colors (24be961)

4.1.0 (2023-01-25)

Features

  • add ibis.get_backend function (2d27df8)
  • add py.typed to allow mypy to type check packages that use ibis (765d42e)
  • api: add ibis.set_backend function (e7fabaf)
  • api: add selectors for easier selection of columns (306bc88)
  • bigquery: add JS UDF support (e74328b)
  • bigquery: add SQL UDF support (db24173)
  • bigquery: add to_pyarrow method (30157c5)
  • bigquery: implement bitwise operations (55b69b1)
  • bigquery: implement ops.Typeof (b219919)
  • bigquery: implement ops.ZeroIfNull (f4c5607)
  • bigquery: implement struct literal (c5f2a1d)
  • clickhouse: properly support native boolean types (31cc7ba)
  • common: add support for annotating with coercible types (ae4a415)
  • common: make frozendict truly immutable (1c25213)
  • common: support annotations with typing.Literal (6f89f0b)
  • common: support generic mapping and sequence type annotations (ddc6603)
  • dask: support connect() with no arguments (67eed42)
  • datatype: add optional timestamp scale parameter (a38115a)
  • datatypes: add as_struct method to convert schemas to structs (64be7b1)
  • duckdb: add read_json function for consuming newline-delimited JSON files (65e65c1)
  • mssql: add a bunch of missing types (c698d35)
  • mssql: implement inference for DATETIME2 and DATETIMEOFFSET (aa9f151)
  • nicer repr for Backend.tables (0d319ca)
  • pandas: support connect() with no arguments (78cbbdd)
  • polars: allow ibis.polars.connect() to function without any arguments (d653a07)
  • polars: handle casting to scaled timestamps (099d1ec)
  • postgres: add Map(string, string) support via the built-in HSTORE extension (f968f8f)
  • pyarrow: support conversion to pyarrow map and struct types (54a4557)
  • snowflake: add more array operations (8d8bb70)
  • snowflake: add more map operations (7ae6e25)
  • snowflake: any/all/notany/notall reductions (ba1af5e)
  • snowflake: bitwise reductions (5aba997)
  • snowflake: date from ymd (035f856)
  • snowflake: fix array slicing (bd7af2a)
  • snowflake: implement ArrayCollect (c425f68)
  • snowflake: implement NthValue (0dca57c)
  • snowflake: implement ops.Arbitrary (45f4f05)
  • snowflake: implement ops.StructColumn (41698ed)
  • snowflake: implement StringSplit (e6acc09)
  • snowflake: implement StructField and struct literals (286a5c3)
  • snowflake: implement TimestampFromUNIX (314637d)
  • snowflake: implement TimestampFromYMDHMS (1eba8be)
  • snowflake: implement typeof operation (029499c)
  • snowflake: implement exists/not exists (7c8363b)
  • snowflake: implement extract millisecond (3292e91)
  • snowflake: make literal maps and params work (dd759d3)
  • snowflake: regex extract, search and replace (9c82179)
  • snowflake: string to timestamp (095ded6)
  • sqlite: implement _get_schema_using_query in SQLite backend (7ff84c8)
  • trino: compile timestamp types with scale (67683d3)
  • trino: enable ops.ExistsSubquery and ops.NotExistsSubquery (9b9b315)
  • trino: map parameters (53bd910)
  • ux: improve error message when column is not found (b527506)

Bug Fixes

  • backend: read the default backend setting in _default_backend (11252af)
  • bigquery: move connection logic to do_connect (42f2106)
  • bigquery: remove invalid operations from registry (911a080)
  • bigquery: resolve deprecation warnings for StructType and Schema (c9e7078)
  • clickhouse: fix position call (702de5d)
  • correctly visualize array type (26b0b3f)
  • deps: make sure pyarrow is not an implicit dependency (10373f4)
  • duckdb: make read_csv on URLs work (9e61816)
  • duckdb: only try to load extensions when necessary for csv (c77bde7)
  • duckdb: remove invalid operations from registry (ba2ec59)
  • fallback to default backend with to_pyarrow/to_pyarrow_batches (a1a6902)
  • impala: remove broken alias elision (32b120f)
  • ir: error for order_by on nonexistent column (57b1dd8)
  • ir: ops.Where output shape should consider all arguments (6f87064)
  • mssql: infer bit as boolean everywhere (24f9d7c)
  • mssql: pull nullability from column information (490f8b4)
  • mysql: fix mysql query schema inference (12f6438)
  • polars: remove non-working Binary and Decimal literal inference (0482d15)
  • postgres: use permanent views to avoid connection pool defeat (49a4991)
  • pyspark: fix substring constant translation (40d2072)
  • set ops: raise if no tables passed to set operations (bf4bdde)
  • snowflake: bring back bitwise operations (260facd)
  • snowflake: don't always insert a cast (ee8817b)
  • snowflake: implement working TimestampNow (42d95b0)
  • snowflake: make sqlalchemy 2.0 compatible (8071255)
  • snowflake: re-enable ops.TableArrayView (a1ad2b7)
  • snowflake: remove invalid operations from registry (2831559)
  • sql: add typeof test and bring back implementations (7dc5356)
  • sqlalchemy: 2.0 compatibility (837a736)
  • sqlalchemy: fix view creation with select stmts that have bind parameters (d760e69)
  • sqlalchemy: handle correlated exists sanely (efa42bd)
  • sqlalchemy: handle generic geography/geometry by name instead of geotype (23c35e1)
  • sqlalchemy: use exec_driver_sql in view teardown (2599c9b)
  • sqlalchemy: use the backend's compiler instead of AlchemyCompiler (9f4ff54)
  • sql: fix broken call to ibis.map (045edc7)
  • sqlite: interpolate pathlib.Path correctly in attach (0415bd3)
  • trino: ensure connecting works with trino 0.321 (07cee38)
  • trino: remove invalid operations from registry (665265c)
  • ux: remove extra trailing newline in expression repr (ee6d58a)

Documentation

  • add BigQuery backend docs (09d8995)
  • add streamlit app for showing the backend operation matrix (3228f64)
  • allow deselecting geospatial ops in backend support matrix (012da8c)
  • api: document more public expression APIs (337018f)
  • backend-info: prevent app from trying install duckdb extensions (3d94082)
  • clean up gen_matrix.py after adding streamlit app (deb80f2)
  • duckdb: add to_pyarrow_batches documentation (ec1ffce)
  • embed streamlit operation matrix app to docs (469a50d)
  • make firefox render the proper iframe height (ff1d4dc)
  • publish raw data for operation matrix (62e68da)
  • re-order when to download test data (8ce8c16)
  • release: update breaking changes in the release notes for 4.0.0 (4e91401)
  • remove trailing parenthesis (4294397)
  • update ibis-version-4.0.0-release.md (f6701df)
  • update links to contributing guides (da615e4)

Refactors

  • bigquery: explicitly disallow INT64 in JS UDF (fb33bf9)
  • datatype: add custom sqlalchemy nested types for backend differentiation (dec70f5)
  • datatype: introduce to_sqla_type dispatching on dialect (a8bbc00)
  • datatypes: remove Geography and Geometry types in favor of GeoSpatial (d44978c)
  • datatype: use a mapping to store StructType fields rather than names and types tuples (ff34c7b)
  • dtypes: expose nbytes property for integer and floating point datatypes (ccf80fd)
  • duckdb: remove .raw_sql call (abc939e)
  • duckdb: use sqlalchemy-views to reduce string hacking (c162750)
  • ir: remove UnnamedMarker (dd352b1)
  • postgres: use a bindparam for metadata queries (b6b4669)
  • remove empty unused file (9d63fd6)
  • schema: use a mapping to store Schema fields rather than names and types tuples (318179a)
  • simplify _find_backend implementation (60f1a1b)
  • snowflake: remove unnecessary parse_json call in ops.StructField impl (9e80231)
  • snowflake: remove unnecessary casting (271554c)
  • snowflake: use unary instead of fixed_arity(..., 1) (4a1c7c9)
  • sqlalchemy: clean up quoting implementation (506ce01)
  • sqlalchemy: generalize handling of failed type inference (b0f4e4c)
  • sqlalchemy: move _get_schema_using_query to base class (296cd7d)
  • sqlalchemy: remove the need for deferred columns (e4011aa)
  • sqlalchemy: remove use of deprecated isnot (4ec53a4)
  • sqlalchemy: use exec_driver_sql everywhere (e8f96b6)
  • sql: finally remove _CorrelatedRefCheck (f49e429)

Deprecations

  • api: deprecate .to_projection in favor of .as_table (7706a86)
  • api: deprecate get_column/s in favor of __getitem__/__getattr__ syntax (e6372e2)
  • ir: schedule DatabaseTable.change_name for removal (e4bae26)
  • schema: schedule Schema.delete() and Schema.append() for removal (45ac9a9)

4.0.0 (2023-01-09)

⚠ BREAKING CHANGES

  • functions, methods and classes marked as deprecated are removed now
  • ir: replace HLLCardinality with ApproxCountDistinct and CMSMedian with ApproxMedian operations.
  • backends: the datatype of returned execution results now more closely matches that of the ibis expression's type. Downstream code may need to be adjusted.
  • ir: the JSONB type is replaced by the JSON type.
  • dev-deps: expression types have been removed from ibis.expr.api. Use import ibis.expr.types as ir to access these types.
  • common: removed @immutable_property decorator, use @attribute.default instead
  • timestamps: the timezone argument to to_timestamp is gone. This was only supported in the BigQuery backend. Append %Z to the format string and the desired time zone to the input column if necessary.
  • deps: ibis now supports at minimum duckdb 0.3.3. Please upgrade your duckdb install as needed.
  • api: previously ibis.connect would return a Table object when calling connect on a parquet/csv file. This now returns a backend containing a single table created from that file. When possible users may use ibis.read instead to read files into ibis tables.
  • api: histogram()'s closed argument no longer exists because it never had any effect. Remove it from your histogram method calls.
  • pandas/dask: the pandas and Dask backends now interpret casting ints to/from timestamps as seconds since the unix epoch, matching other backends.
  • datafusion: register_csv and register_parquet are removed. Pass filename to register method instead.
  • ir: ops.NodeList and ir.List are removed. Use tuples to represent sequence of expressions instead.
  • api: re_extract now follows re.match behavior. In particular, the 0th group is now the entire string if there's a match, otherwise the groups are 1-based.
  • datatypes: enums are now strings. Likely no action needed since no functionality existed.
  • ir: Replace t[t.x.topk(...)] with t.semi_join(t.x.topk(...), "x").
  • ir: ir.Analytic.type() and ir.TopK.type() methods are removed.
  • api: the default limit for table/column expressions is now None (meaning no limit).
  • ir: join changes: previously all column names that collided between left and right tables were renamed with an appended suffix. Now for the case of inner joins with only equality predicates, colliding columns that are known to be equal due to the join predicates aren't renamed.
  • impala: kerberos support is no longer installed by default for the impala backend. To add support you'll need to install the kerberos package separately.
  • ir: ops.DeferredSortKey is removed. Use ops.SortKey directly instead.
  • ir: ibis.common.grounds.Annotable is mutable by default now
  • ir: node.has_resolved_name() is removed, use isinstance(node, ops.Named) instead; node.resolve_name() is removed use node.name instead
  • ir: removed ops.Node.flat_args(), directly use node.args property instead
  • ir: removed ops.Node.inputs property, use the multipledispatched get_node_arguments() function in the pandas backend
  • ir: Node.blocks() method has been removed.
  • ir: HasSchema mixin class is no longer available, directly subclass ops.TableNode and implement schema property instead
  • ir: Removed Node.output_type property in favor of abstractmethod Node.to_expr() which now must be explicitly implemented
  • ir: Expr(Op(Expr(Op(Expr(Op))))) is now represented as Expr(Op(Op(Op))), so code using ibis internals must be migrated
  • pandas: Use timezone conversion functions to compute the original machine localized value
  • common: use ibis.common.validators.{Parameter, Signature} instead
  • ir: ibis.expr.lineage.lineage() is now removed
  • ir: removed ir.DestructValue, ir.DestructScalar and ir.DestructColumn, use table.unpack() instead
  • ir: removed Node.root_tables() method, use ibis.expr.analysis.find_immediate_parent_tables() instead
  • impala: use other methods for pinging the database

Features

  • add experimental decorator (791335f)
  • add to_pyarrow and to_pyarrow_batches (a059cf9)
  • add unbind method to expressions (4b91b0b), closes #4536
  • add way to specify sqlglot dialect on backend (f1c0608)
  • alchemy: implement json getitem for sqlalchemy backends (7384087)
  • api: add agg alias for aggregate (907583f)
  • api: add agg alias to group_by (6b6367c)
  • api: add ibis.read top level API function (e67132c)
  • api: add JSON __getitem__ operation (3e2efb4)
  • api: implement __array__ (1402347)
  • api: make drop variadic (1d69702)
  • api: return object from to_sql to support notebook syntax highlighting (87c9833)
  • api: use rich for interactive __repr__ (04758b8)
  • backend: make ArrayCollect filterable (1e1a5cf)
  • backends/mssql: add backend support for Microsoft Sql Server (fc39323)
  • bigquery: add ops.DateFromYMD, ops.TimeFromHMS, ops.TimestampFromYMDHMS (a4a7936)
  • bigquery: add ops.ExtractDayOfYear (30c547a)
  • bigquery: add support for correlation (4df9f8b)
  • bigquery: implement argmin and argmax (40c5f0d)
  • bigquery: implement pi and e (b91370a)
  • bigquery: implement array repeat (09d1e2f)
  • bigquery: implement JSON getitem functionality (9c0e775)
  • bigquery: implement ops.ArraySlice (49414ef)
  • bigquery: implement ops.Capitalize (5757bb0)
  • bigquery: implement ops.Clip (5495d6d)
  • bigquery: implement ops.Degrees, ops.Radians (5119b93)
  • bigquery: implement ops.ExtractWeekOfYear (477d287)
  • bigquery: implement ops.RandomScalar (5dc8482)
  • bigquery: implement ops.StructColumn, ops.ArrayColumn (2bbf73c)
  • bigquery: implement ops.Translate (77a4b3e)
  • bigquery: implementt ops.NthValue (b43ba28)
  • bigquery: move bigquery backend back into the main repo (cd5e881)
  • clickhouse: handle more options in parse_url implementation (874c5c0)
  • clickhouse: implement INTERSECT ALL/EXCEPT ALL (f65fbc3)
  • clickhouse: implement quantile/multiquantile (96d7d1b)
  • common: support function annotations with both typehints and rules (7e23f3e)
  • dask: implement mode aggregation (017f07a)
  • dask: implement json getitem (381d805)
  • datafusion: convert column expressions to pyarrow (0a888de)
  • datafusion: enable topk (d44903f)
  • datafusion: implement Limit (1ddc876)
  • datafusion: implement ops.StringConcat (6bb5b4f)
  • decompile: support rendering ibis expression as python code (7eebc67)
  • deps: support shapely 2.0 (68dff10)
  • display qualified named in deprecation warnings (a6e2a49)
  • docs: first draft of Ibis for pandas users (7f7c9b5)
  • duckdb: enable registration of parquet files from s3 (fced465)
  • duckdb: implement mode aggregation (36fd152)
  • duckdb: implement to_timestamp (26ca1e4)
  • duckdb: implement quantile/multiquantile (fac9705)
  • duckdb: overwrite views when calling register (ae07438)
  • duckdb: pass through kwargs to file loaders (14fa2aa)
  • duckdb: support out of core execution for in-memory connections (a4d4ba2)
  • duckdb: support registering external postgres tables with duckdb (8633e6b)
  • expr: split ParseURL operation into multiple URL extract operations (1f0fcea)
  • impala: implement strftime (d3ede8d)
  • impala: support date literals (cd334c4)
  • insert: add support for list+dict to sqlalchemy backends (15d399e)
  • ir/pandas/dask/clickhouse: revamp Map type support (62b6f2d)
  • ir: add is_* methods to DataTypes (79f5c2b)
  • ir: prototype for parsing SQL into an ibis expression (1301183)
  • ir: support python 3.10 pattern matching on Annotable nodes (eca93eb)
  • mssql: add window function support (ef1be45)
  • mssql: detect schema from SQL (ff79928)
  • mssql: extract quarter (7d04266)
  • mssql: implement ops.DayOfWeekIndex (4125593)
  • mssql: implement ops.ExtractDayOfYear (ae026d5)
  • mssql: implement ops.ExtractEpochSeconds (4f49b5b)
  • mssql: implement ops.ExtractWeekOfYear (f1394bc)
  • mssql: implement ops.Ln, ops.Log, ops.Log2, ops.Log10 (f8ee1d8)
  • mssql: implement ops.RandomScalar (4149450)
  • mssql: implement ops.TimestampTruncate, ops.DateTruncate (738e496)
  • mssql: implementt ops.DateFromYMD, ops.TimestampFromYMDHMS, ops.TimeFromHMS (e84f2ce)
  • open *.db files with sqlite in ibis.connect (37baf05)
  • pandas: implement mode aggregation (fc023b5)
  • pandas: implement RegexReplace for str (23713cc)
  • pandas: implement json getitem (8fa1190)
  • pandas: implement quantile/multiquantile (cd4dcaa)
  • pandas: support histogram API (5bfc0fe)
  • polars: enable topk (8bfb16a)
  • polars: implement mode aggregation (7982ba2)
  • polars: initial support for polars backend (afecb0a)
  • postgres: implement mode aggregation (b2f1c2d)
  • postgres: implement quantile and multiquantile (82ed4f5)
  • postgres: prettify array literals (cdc60d5)
  • pyspark: add support for struct operations (ce05987)
  • pyspark: enable topk (0f748e0)
  • pyspark: implement pi and e (fea81c6)
  • pyspark: implement json getitem (9bfb748)
  • pyspark: implement quantile and multiquantile (743f411)
  • pyspark: support histogram API (8f4808c)
  • snowflake: enable day-of-week column expression (6fd9c33)
  • snowflake: handle date and timestamp literals (ec2392d)
  • snowflake: implement mode aggregation (f35915e)
  • snowflake: implement parse_url (a9746e3)
  • snowflake: implement rowid scalar (7e1425a)
  • snowflake: implement time literal (068fc50)
  • snowflake: implement scalar (cc07d91)
  • snowflake: initial commit for snowflake backend (a8687dd)
  • snowflake: support reductions in window functions via automatic ordering (0234e5c)
  • sql: add ops.StringSQLILike (7dc4924)
  • sqlalchemy: implement ops.Where using IF/IFF functions (4cc9c15)
  • sqlalchemy: in-memory tables have name in generated SQL (01b4c60)
  • sql: improve error message in fixed_arity helper (891a1ad)
  • sqlite: add type_map arg to override type inference (1961bad)
  • sqlite: fix impl for missing pi and e functions (24b6d2f)
  • sqlite: support con.sql with explicit schema specified (7ca82f3)
  • sqlite: support wider range of datetime formats (f65093a)
  • support both postgresql:// and postgres:// in ibis.connect (2f7a7b4)
  • support deferred predicates in join (b51a64b)
  • support more operations with unsigned integers (9992953)
  • support passing callable to relabel (0bceefd)
  • support tab completion for getitem access of table columns (732dba4)
  • support Table.fillna for SQL backends (26d4cac)
  • trino: add bit_xor aggregation (830acf4)
  • trino: add EXTRACT-based functionality (6549657)
  • trino: add millisecond scale to *_trunc function (3065248)
  • trino: add some basic aggregation ops (7ecf7ab)
  • trino: extract milliseconds (09517a5)
  • trino: implement approx_median (1cba8bd)
  • trino: implement parse_url (2bc87fc)
  • trino: implement round, cot, pi, and e (c0e8736)
  • trino: implement arbitrary first support (0c7d3b3)
  • trino: implement array collect support (dfeb600)
  • trino: implement array column support (dadf9a8)
  • trino: implement array concat (240c55d)
  • trino: implement array index (c5f3a96)
  • trino: implement array length support (2d7cc65)
  • trino: implement array literal support (2182177)
  • trino: implement array repeat (2ee3d10)
  • trino: implement array slicing (643792e)
  • trino: implement basic struct operations (cc3c937)
  • trino: implement bitwise agg support (5288b35)
  • trino: implement bitwise scalar/column ops (ac4876c)
  • trino: implement default precision and scale (37f8a47)
  • trino: implement group concat support (5c41439)
  • trino: implement json getitem support (7c41566)
  • trino: implement map operations (4efc5ce)
  • trino: implement more generic and numeric ops (63b45c8)
  • trino: implement ops.Capitalize (dff14fc)
  • trino: implement ops.DateFromYMD (edd2994)
  • trino: implement ops.DateTruncate, ops.TimestampTruncate (32f4862)
  • trino: implement ops.DayOfWeekIndex, ops.DayOfWeekName (a316d6d)
  • trino: implement ops.ExtractDayOfYear (b0a3465)
  • trino: implement ops.ExtractEpochSeconds (10b82f1)
  • trino: implement ops.ExtractWeekOfYear (cf719b8)
  • trino: implement ops.Repeat (e9f6851)
  • trino: implement ops.Strftime (a436823)
  • trino: implement ops.StringAscii (93fd32d)
  • trino: implement ops.StringContains (d5cb2ec)
  • trino: implement ops.StringSplit (62d79a6)
  • trino: implement ops.StringToTimestamp (b766f62)
  • trino: implement ops.StrRight (691b39c)
  • trino: implement ops.TimeFromHMS (e5cacc2)
  • trino: implement ops.TimestampFromUNIX (ce5d726)
  • trino: implement ops.TimestampFromYMDHMS (9fa7304)
  • trino: implement ops.TimestampNow (c832e4c)
  • trino: implement ops.Translate (410ae1e)
  • trino: implement quantile/multiquantile (bc7fdab)
  • trino: implement regex functions (9e493c5)
  • trino: implement window function support (5b6cc45)
  • trino: initial trino backend (c367865)
  • trino: support string date scalar parameter (9092530)
  • trino: use proper approx_distinct function (3766fff)

Bug Fixes

  • ibis.connect always returns a backend (2d5b155)
  • allow inserting memtable with alchemy backends (c02fcc3)
  • always display at least one column in the table repr (5ea9e5a)
  • analysis: only lower sort keys that are in an agg's output (6bb4f66)
  • api: allow arbitrary sort keys (a980b34)
  • api: allow boolean scalars in predicate APIs (2a2636b)
  • api: allow deferred instances as input to ibis.desc and ibis.asc (6861347)
  • api: ensure that window functions are propagated (4fb1106)
  • api: make re_extract conform to semantics of Python's re.match (5981227)
  • auto-register csv and parquet with duckdb using ibis.connect (67c4f87)
  • avoid renaming known equal columns for inner joins with equality predicates (5d4b0ed)
  • backends: fix casting and execution result types in many backends (46c21dc)
  • bigquery: don't try to parse database when name is already fully qualified (ae3c113)
  • bigquery: fix integer to timestamp casting (f5bacad)
  • bigquery: normalize W frequency in *_trunc (893cd49)
  • catch TypeError instead of more specific error (6db19d8)
  • change default limit to None (8d1526a)
  • clarify and normalize behavior of Table.rowid (92b03d6)
  • clickhouse: ensure that correlated subqueries' columns can be referenced (708d682)
  • clickhouse: fix list_tables to use database name (edc3511)
  • clickhouse: make any/all filterable and reduce code size (99b10e2)
  • clickhouse: use clickhouse's dbapi (bd0da12)
  • common: support copying variadic annotable instances (ee0d9ad)
  • dask: make filterable reductions work (0f759fc)
  • dask: raise TypeError with informative message in ibis.dask.connect (4e67f7a)
  • define to_pandas/to_pyarrow on DataType/Schema classes directly (22f3b4d)
  • deps: bound shapely to a version that doesn't segfault (be5a779)
  • deps: update dependency datafusion to >=0.6,<0.8 (4c73870)
  • deps: update dependency geopandas to >=0.6,<0.13 (58a32dc)
  • deps: update dependency packaging to v22 (e0b6177)
  • deps: update dependency rich to v13 (4f313dd)
  • deps: update dependency sqlglot to v10 (db19d43)
  • deps: update dependency sqlglot to v9 (cf330ac)
  • docs: make sure data can be downloaded when building notebooks (fa7da17)
  • don't fuse filters & selections that contain window functions (d757069)
  • drop snowflake support for RowID (dd378f1)
  • duckdb: drop incorrect translate implementation (8690151)
  • duckdb: fix bug in json getitem for duckdb (49ce739)
  • duckdb: keep ibis.now() type semantics (eca4a2c)
  • duckdb: make array repeat actually work (021f4de)
  • duckdb: replace all in re_replace (c138f0f)
  • duckdb: rereflect sqla table on re-registration (613b311), closes #4729
  • duckdb: s3 priority (a2d03d1)
  • duckdb: silence duckdb-engine warnings (359adc3)
  • ensure numpy ops dont accidentally cast ibis types (a7ca6c8)
  • exclude geospatial ops from pandas/dask/polars has_operation (6f1d265)
  • fix table.mutate with deferred named expressions (5877d0b)
  • fix bug when disabling show_types in interactive repr (2402506)
  • fix expression repr for table -> value operations (dbf92f5)
  • handle dimensionality of empty outputs (3a88170)
  • improve rich repr support (522db9c)
  • ir: normalize date types (39056b5)
  • ir: normalize timestamps to datetime.datetime values (157efde)
  • make col.day_of_week not an expr (96e1580)
  • mssql: fix integer to timestamp casting (9122eef)
  • mssql: fix ops.TimeFromHMS (d2188e1)
  • mssql: fix ops.TimestampFromUNIX (ec28add)
  • mssql: fix round without argument (52a60ce)
  • mssql: use double-dollar sign to prevent from interpolating a value (b82da5d)
  • mysql: fix mysql startswith/endswith to be case sensitive (d7469cc)
  • mysql: handle out of bounds timestamps and fix milliseconds calculation (1f7649a)
  • mysql: upcast bool agg args (8c5f9a5)
  • pandas/dask now cast int<->timestamp as seconds since epoch (bbfe998)
  • pandas: drop RowID implementation (05f5016)
  • pandas: make quantile/multiquantile with filter work (6b5abd6)
  • pandas: support substr with no length (b2c2922)
  • pandas: use localized UTC time for now operation (f6d7327)
  • pandas: use the correct context when aggregating over a window (e7fa5c0)
  • polars: fix polars startswith to call the right method (9e6f397)
  • polars: workaround passing pl.Null to the null type (fd9633b)
  • postgres/duckdb: fix negative slicing by copying the trino impl (39e3962)
  • postgres: fix array repeat to work with literals (3c46eb1)
  • postgres: fix array_index operation (63ef892)
  • postgres: make any/all translation rules use reduction helper (78bfd1d)
  • pyspark: handle datetime.datetime literals (4f94abe)
  • remove kerberos extra for impala dialect (6ed3e5f)
  • repr: don't repeat value in repr for literals (974eeb6)
  • repr: fix off by one in repr (322c8dc)
  • s3: fix quoting and autonaming for s3 (ce09266)
  • select: raise error on attempt to select no columns in projection (94ac10e)
  • snowflake: fix extracting query parameter by (75af240)
  • snowflake: fix failing snowflake url extraction functions (2eee50b)
  • snowflake: fix snowflake list_databases (680cd24)
  • snowflake: handle schema when getting table (f6fff5b)
  • snowflake: snowflake now likes Tuesdays (1bf9d7c)
  • sqlalchemy: allow passing pd.DataFrame to create (1a083f6)
  • sqlalchemy: ensure that arbitrary expressions are valid sort keys (cb1a013)
  • sql: avoid generating cartesian products yet again (fdc52a2)
  • sqlite: fix sqlite startswith/endswith to be case sensitive (fd4a88d)
  • standardize list_tables signature everywhere (abafe1b), closes #2877
  • support arbitrary with no arguments (45156f5)
  • support dtype in __array__ methods (1294b76)
  • test: ensure that file-based url tests don't expect data to exist (c2b635a)
  • trino: fix integer to timestamp casting (49321a6)
  • trino: make filterable any/all reductions work (992bd18)
  • truncate columns in repr for wide tables (aadcba1)
  • typo: in StringValue helpstr (b2e2093)
  • ux: improve error messages for rlz.comparable failures (5ca41d2)
  • ux: prevent infinite looping when formatting a floating column of all nans (b6afe98)
  • visualize(label_edges=True) works for NodeList ops (a91ceae)
  • visualize: dedup nodes and edges and add verbose argument for debugging (521e188)
  • visualize: handle join predicates in visualize (d63cb57)
  • window: allow window range tuples in preceding or following (77172b3)

Deprecations

  • deprecate Table.groupby alias in favor of Table.group_by (39cea3b)
  • deprecate Table.sort_by in favor of Table.order_by (7ac7103)

Performance

  • add benchmark for known-slow table expression (e9617f0)
  • expr: traverse nodes only once during compilation (69019ed)
  • fix join performance by avoiding Projection construction (ed532bf)
  • node: give Nodes the default Python repr (eb26b11)
  • ux: remove pandas import overhead from import ibis (ea452fc)

  • deps: bump duckdb lower bound (4539683)

  • dev-deps: replace flake8 et al with ruff and fix lints (9c1b282)

Refactors

  • add lazy_singledispatch utility (180ecff)
  • add rlz.lazy_instance_of (4e30480)
  • add Temporal base class for temporal data types (694eec4)
  • api: add deprecated Node.op() #4519 (2b0826b)
  • avoid roundtripping to expression for IFF (3068ae2)
  • clean up cot implementations to have one less function call (0f304e5)
  • clean up timezone support in ops.TimestampFromYMDHMS (2e183a9)
  • cleanup str method docstrings (36bd36c)
  • clickhouse: implement sqlglot-based compiler (5cc5d4b)
  • clickhouse: simplify Quantile and MultiQuantile implementation (9e16e9e)
  • common: allow traversal and substitution of tuple and dictionary arguments (60f4806)
  • common: enforce slots definitions for Base subclasses (6c3df91)
  • common: move Parameter and Signature to validators.py (da20537)
  • common: reduce implementation complexity of annotations (27cee71)
  • datafusion: align register API across backends (08046aa)
  • datafusion: get name from expr (fea3e5b)
  • datatypes: remove Enum (145e706)
  • dev-deps: remove unnecessary poetry2nix overrides (5ed95bc)
  • don't sort new columns in mutate (72ec96a)
  • duckdb: use lambda to define backend operations (5d14de6)
  • impala: move impala SQL tests to snapshots (927bf65)
  • impala: replace custom pooling with sqlalchemy QueuePool (626cdca)
  • ir: ops.List -> ops.NodeList (6765bd2)
  • ir: better encapsulate graph traversal logic, schema and datatype objects are not traversable anymore (1a07725)
  • ir: generalize handling and traversal of node sequences (e8bcd0f)
  • ir: make all value operations 'Named' for more consistent naming semantics (f1eb4d2)
  • ir: move random() to api.py (e136f1b)
  • ir: remove ops.DeferredSortKey (e629633)
  • ir: remove ops.TopKNode and ir.TopK (d4dc544)
  • ir: remove Analytic expression's unused type() method (1864bc1)
  • ir: remove DecimalValue.precision(), DecimalValue.scale() method (be975bc)
  • ir: remove DestructValue expressions (762d384)
  • ir: remove duplicated literal creation code (7dfb56f)
  • ir: remove intermediate expressions (c6fb0c0)
  • ir: remove lin.lineage() since it's not used anywhere (120b1d7)
  • ir: remove node.blocks() in favor of more explicit type handling (37d8ce4)
  • ir: remove Node.inputs since it is an implementation detail of the pandas backend (6d2c49c)
  • ir: remove node.root_tables() and unify parent table handling (fbb07c1)
  • ir: remove ops.AggregateSelection in favor of an.simplify_aggregation (ecf6ed3)
  • ir: remove ops.NodeList and ir.List in favor of builtin tuples (a90ce35)
  • ir: remove pydantic dependency and make grounds more composable (9da0f41)
  • ir: remove sch.HasSchema and introduce ops.Projection base class for ops.Selection (c3b0139)
  • ir: remove unnecessary complexity introduced by variadic annotation (698314b)
  • ir: resolve circular imports so operations can be globally imported for types (d2a3919)
  • ir: simplify analysis.substitute_unbound() (a6c7406)
  • ir: simplify SortKey construction using rules (4d63280)
  • ir: simplify switch-case builders (9acf717)
  • ir: split datatypes package into multiple submodules (cce6535)
  • ir: split out table count into CountStar operation (e812e6e)
  • ir: support replacing nodes in the tree (6a0df5a)
  • ir: support variadic annotable arguments and add generic graph traversal routines (5d6a289)
  • ir: unify aggregation construction to use AggregateSelection (c7d6a6f)
  • make quantile, any, and all reductions filterable (1bafc9e)
  • make sure value_counts always has a projection (a70a302)
  • mssql: use lambda to define backend operations (1437cfb)
  • mysql: dedup extract code (d551944)
  • mysql: use lambda to define backend operations (d10bff8)
  • polars: match duckdb registration api (ac59dac)
  • postgres: use lambda to define backend operations (4c85d7b)
  • remove dead compat.py module (eda0fdb)
  • remove deprecated approximate aggregation classes (53fc6cb)
  • remove deprecated functions and classes (be1cdda)
  • remove duplicate _random_identifier calls (26e7942)
  • remove setup.py and related infrastructure (adfcce1)
  • remove the JSONB type (c4fc0ec)
  • rename some infer methods for consistency (a8f5579)
  • replace isinstance dtype checking with is_* methods (386adc2)
  • rework registration / file loading (c60e30d)
  • rules: generalize field referencing using rlz.ref() (0afb8b9)
  • simplify ops.ArrayColumn in postgres backend (f9677cc)
  • simplify histogram implementation by using window functions (41cbc29)
  • simplify ops.ArrayColumn in alchemy backend (28ff4a8)
  • snowflake: use lambda to define backend operations (cb33fce)
  • split up custom nix code; remove unused derivations (57dff10)
  • sqlite: use lambda to define backend operations (b937391)
  • test: make clickhouse tests use pytest-snapshot (413dbd2)
  • tests: move sql output to golden dir (6a6a453)
  • test: sort regex test cases by name instead of posix-ness (0dfb0e7)
  • tests: replace sqlgolden with pytest-snapshot (5700eb0)
  • timestamps: remove timezone argument to to_timestamp API (eb4762e)
  • trino: use lambda to define backend operations (dbd61a5)
  • uncouple MultiQuantile class from Quantile (9c48f8c)
  • use rlz.lazy_instance_of to delay shapely import (d14badc)
  • use lazy dispatch for dt.infer (2e56540)

Documentation

  • add backend_sensitive decorator (836f237)
  • add pip install poetry dev env setup step (69940b1)
  • add bigquery ci data analysis notebook (2b1d4e5)
  • add how to sessionize guide (18989dd)
  • add issue templates (4480c18)
  • add missing argument descriptions (ea757fa)
  • add mssql backend page (63c0f19)
  • added 4.0 release blog post (bcc0eca)
  • added memtable howto guide (5dde9bd)
  • backends: add duckdb and mssql to the backend index page (7b13218)
  • bring back git revision localized date plugin (e4fc2c9)
  • created how to guide for deferred expressions (2a9f6ab)
  • dev: python-duckdb now available for windows with conda (7f76b09)
  • document how to create a table from a pandas dataframe using ibis.memtable (c6521ec)
  • fix backends label in feature request issue form (cf852d3)
  • fix broken docstrings; reduce docstring noise; workaround griffe (bd1c637)
  • fix docs for building docs (23af567)
  • fix feature-request issue template (6fb62f5)
  • fix installation section for conda (7af6ac1)
  • fix landing page links (1879362)
  • fix links to make docs work locally and remotely (13c7810)
  • fix pyarrow batches docstring (dba9594)
  • fix single line docstring summaries (8028201)
  • fix snowflake doc link in readme.md (9aff68e)
  • fix the inline example for ibis.dask.do_connect (6a533f0)
  • fix tutorial link on install page (b34811a)
  • fix typo in first example of the homepage (9a8a25a)
  • formatting and syntax highlighting fixes (50864da)
  • front page rework (24b795a)
  • how-to: use parquet data source for sessionization, fix typos, more deferred usage (974be37)
  • improve the docstring of the generic connect method (ee87802)
  • issue template cleanups (fed37da)
  • list (e331247)
  • polars: add backend docs page (e303b68)
  • remove hrs (4c30de4)
  • renamed how to guides to be more consistent (1bdc5bd)
  • sentence structure in the Notes section (ac20232)
  • show interactive prompt for python (5d7d913)
  • split out geospatial operations in the support matrix docs (0075c28)
  • trino: add backend docs (2f262cd)
  • typo (6bac645)
  • typos headers and formatting (9566cbb)
  • udf: examples in pandas have the incorrect import path (49028b8)
  • update filename (658a296)
  • update line (4edfce0)
  • update readme (19a3f3c)
  • use buf/feat prefix only (2561a29)
  • use components instead of pieces (179ca1e)
  • use heading instead of bulleted bold (99b044e)
  • use library instead of project (fd2d915)
  • use present tense for use cases and "why" section (6cc7416)
  • www: fix frontpage example (7db39e8)

3.2.0 (2022-09-15)

Features

  • add api to get backend entry points (0152f5e)
  • api: add and_ and or_ helpers (94bd4df)
  • api: add argmax and argmin column methods (b52216a)
  • api: add distinct to Intersection and Difference operations (cd9a34c)
  • api: add ibis.memtable API for constructing in-memory table expressions (0cc6948)
  • api: add ibis.sql to easily get a formatted SQL string (d971cc3)
  • api: add Table.unpack() and StructValue.lift() APIs for projecting struct fields (ced5f53)
  • api: allow transmute-style select method (d5fc364)
  • api: implement all bitwise operators (7fc5073)
  • api: promote psql to a show_sql public API (877a05d)
  • clickhouse: add dataframe external table support for memtables (bc86aa7)
  • clickhouse: add enum, ipaddr, json, lowcardinality to type parser (8f0287f)
  • clickhouse: enable support for working window functions (310a5a8)
  • clickhouse: implement argmin and argmax (ee7c878)
  • clickhouse: implement bitwise operations (348cd08)
  • clickhouse: implement struct scalars (1f3efe9)
  • dask: implement StringReplace execution (1389f4b)
  • dask: implement ungrouped argmin and argmax (854aea7)
  • deps: support duckdb 0.5.0 (47165b2)
  • duckdb: handle query parameters in ibis.connect (fbde95d)
  • duckdb: implement argmin and argmax (abf03f1)
  • duckdb: implement bitwise xor (ca3abed)
  • duckdb: register tables from pandas/pyarrow objects (36e48cc)
  • duckdb: support unsigned integer types (2e67918)
  • impala: implement bitwise operations (c5302ab)
  • implement dropna for SQL backends (8a747fb)
  • log: make BaseSQLBackend._log print by default (12de5bb)
  • mysql: register BLOB types (1e4fb92)
  • pandas: implement argmin and argmax (bf9b948)
  • pandas: implement NotContains on grouped data (976dce7)
  • pandas: implement StringReplace execution (578795f)
  • pandas: implement Contains with a group by (c534848)
  • postgres: implement bitwise xor (9b1ebf5)
  • pyspark: add option to treat nan as null in aggregations (bf47250)
  • pyspark: implement ibis.connect for pyspark (a191744)
  • pyspark: implement Intersection and Difference (9845a3c)
  • pyspark: implement bitwise operators (33cadb1)
  • sqlalchemy: implement bitwise operator translation (bd9f64c)
  • sqlalchemy: make ibis.connect with sqlalchemy backends (b6cefb9)
  • sqlalchemy: properly implement Intersection and Difference (2bc0b69)
  • sql: implement StringReplace translation (29daa32)
  • sqlite: implement bitwise xor and bitwise not (58c42f9)
  • support table.sort_by(ibis.random()) (693005d)
  • type-system: infer pandas' string dtype (5f0eb5d)
  • ux: add duckdb as the default backend (8ccb81d)
  • ux: use rich to format Table.info() output (67234c3)
  • ux: use sqlglot for pretty printing SQL (a3c81c5)
  • variadic union, intersect, & difference functions (05aca5a)

Bug Fixes

  • api: make sure column names that are already inferred are not overwritten (6f1cb16)
  • api: support deferred objects in existing API functions (241ce6a)
  • backend: ensure that chained limits respect prior limits (02a04f5)
  • backends: ensure select after filter works (e58ca73)
  • backends: only recommend installing ibis-foo when foo is a known backend (ac6974a)
  • base-sql: fix String-generating backend string concat implementation (3cf78c1)
  • clickhouse: add IPv4/IPv6 literal inference (0a2f315)
  • clickhouse: cast repeat times argument to UInt64 (b643544)
  • clickhouse: fix listing tables from databases with no tables (08900c3)
  • compilers: make sure memtable rows have names in the SQL string compilers (18e7f95)
  • compiler: use repr for SQL string VALUES data (75af658)
  • dask: ensure predicates are computed before projections (5cd70e1)
  • dask: implement timestamp-date binary comparisons (48d5058)
  • dask: set dask upper bound due to large scale test breakage (796c645), closes #9221
  • decimal: add decimal type inference (3fe3fd8)
  • deps: update dependency duckdb-engine to >=0.1.8,<0.4.0 (113dc8f)
  • deps: update dependency duckdb-engine to >=0.1.8,<0.5.0 (ef97c9d)
  • deps: update dependency parsy to v2 (9a06131)
  • deps: update dependency shapely to >=1.6,<1.8.4 (0c787d2)
  • deps: update dependency shapely to >=1.6,<1.8.5 (d08c737)
  • deps: update dependency sqlglot to v5 (f210bb8)
  • deps: update dependency sqlglot to v6 (5ca4533)
  • duckdb: add missing types (59bad07)
  • duckdb: ensure that in-memory connections remain in their creating thread (39bc537)
  • duckdb: use fetch_arrow_table() to be able to handle big timestamps (85a76eb)
  • fix bug in pandas & dask difference implementation (88a78fa)
  • fix dask where implementation (49f8845)
  • impala: add date column dtype to impala to ibis type dict (c59e94e), closes #4449
  • pandas where supports scalar for left (48f6c1e)
  • pandas: fix anti-joins (10a659d)
  • pandas: implement timestamp-date binary comparisons (4fc666d)
  • pandas: properly handle empty groups when aggregating with GroupConcat (6545f4d)
  • pyspark: fix broken StringReplace implementation (22cb297)
  • pyspark: make sure ibis.connect works with pyspark (a7ab107)
  • pyspark: translate predicates before projections (b3d1c80)
  • sqlalchemy: fix float64 type mapping (8782773)
  • sqlalchemy: handle reductions with multiple arguments (5b2039b)
  • sqlalchemy: implement SQLQueryResult translation (786a50f)
  • sql: fix sql compilation after making InMemoryTable a subclass of PhysicalTable (aac9524)
  • squash several bugs in sort_by asc/desc handling (222b2ba)
  • support chained set operations in SQL backends (227aed3)
  • support filters on InMemoryTable exprs (abfaf1f)
  • typo: in BaseSQLBackend.compile docstring (0561b13)

Deprecations

  • right kwarg in union/intersect/difference (719a5a1)
  • duckdb: deprecate path argument in favor of database (fcacc20)
  • sqlite: deprecate path argument in favor of database (0f85919)

Performance

  • pandas: remove reexecution of alias children (64efa53)
  • pyspark: ensure that pyspark DDL doesn't use VALUES (422c98d)
  • sqlalchemy: register DataFrames cheaply where possible (ee9f1be)

Documentation

  • add to_sql (e2821a5)
  • add back constraints for transitive doc dependencies and fix docs (350fd43)
  • add coc reporting information (c2355ba)
  • add community guidelines documentation (fd0893f)
  • add HeavyAI to the readme (4c5ca80)
  • add how-to bfill and ffill (ff84027)
  • add how-to for ibis+duckdb register (73a726e)
  • add how-to section to docs (33c4b93)
  • duckdb: add installation note for duckdb >= 0.5.0 (608b1fb)
  • fix memtable docstrings (72bc0f5)
  • fix flake8 line length issues (fb7af75)
  • fix markdown (4ab6b95)
  • fix relative links in tutorial (2bd075f), closes #4064 #4201
  • make attribution style uniform across the blog (05561e0)
  • move the blog out to the top level sidebar for visibility (417ba64)
  • remove underspecified UDF doc page (0eb0ac0)

3.1.0 (2022-07-26)

Features

  • add __getattr__ support to StructValue (75bded1)
  • allow selection subclasses to define new node args (2a7dc41)
  • api: accept Schema objects in public ibis.schema (0daac6c)
  • api: add .tables accessor to BaseBackend (7ad27f0)
  • api: add e function to public API (3a07e70)
  • api: add ops.StructColumn operation (020bfdb)
  • api: add cume_dist operation (6b6b185)
  • api: add toplevel ibis.connect() (e13946b)
  • api: handle literal timestamps with timezone embedded in string (1ae976b)
  • api: ibis.connect() default to duckdb for parquet/csv extensions (ff2f088)
  • api: make struct metadata more convenient to access (3fd9bd8)
  • api: support tab completion for backends (eb75fc5)
  • api: underscore convenience api (81716da)
  • api: unnest (98ecb09)
  • backends: allow column expressions from non-foreign tables on the right side of isin/notin (e1374a4)
  • base-sql: implement trig and math functions (addb2c1)
  • clickhouse: add ability to pass arbitrary kwargs to Clickhouse do_connect (583f599)
  • clickhouse: implement ops.StructColumn operation (0063007)
  • clickhouse: implement array collect (8b2577d)
  • clickhouse: implement ArrayColumn (1301f18)
  • clickhouse: implement bit aggs (f94a5d2)
  • clickhouse: implement clip (12dfe50)
  • clickhouse: implement covariance and correlation (a37c155)
  • clickhouse: implement degrees (7946c0f)
  • clickhouse: implement proper type serialization (80f4ab9)
  • clickhouse: implement radians (c7b7f08)
  • clickhouse: implement strftime (222f2b5)
  • clickhouse: implement struct field access (fff69f3)
  • clickhouse: implement trig and math functions (c56440a)
  • clickhouse: support subsecond timestamp literals (e8698a6)
  • compiler: restore intersect_class and difference_class overrides in base SQL backend (2c46a15)
  • dask: implement trig functions (e4086bb)
  • dask: implement zeroifnull (38487db)
  • datafusion: implement negate (69dd64d)
  • datafusion: implement trig functions (16803e1)
  • duckdb: add register method to duckdb backend to load parquet and csv files (4ccc6fc)
  • duckdb: enable find_in_set test (377023d)
  • duckdb: enable group_concat test (4b9ad6c)
  • duckdb: implement ops.StructColumn operation (211bfab)
  • duckdb: implement approx_count_distinct (03c89ad)
  • duckdb: implement approx_median (894ce90)
  • duckdb: implement arbitrary first and last aggregation (8a500bc)
  • duckdb: implement NthValue (1bf2842)
  • duckdb: implement strftime (aebc252)
  • duckdb: return the ir.Table instance from DuckDB's register API (0d05d41)
  • mysql: implement FindInSet (e55bbbf)
  • mysql: implement StringToTimestamp (169250f)
  • pandas: implement bitwise aggregations (37ff328)
  • pandas: implement degrees (25b4f69)
  • pandas: implement radians (6816b75)
  • pandas: implement trig functions (1fd52d2)
  • pandas: implement zeroifnull (48e8ed1)
  • postgres/duckdb: implement covariance and correlation (464d3ef)
  • postgres: implement ArrayColumn (7b0a506)
  • pyspark: implement approx_count_distinct (1fe1d75)
  • pyspark: implement approx_median (07571a9)
  • pyspark: implement covariance and correlation (ae818fb)
  • pyspark: implement degrees (f478c7c)
  • pyspark: implement nth_value (abb559d)
  • pyspark: implement nullifzero (640234b)
  • pyspark: implement radians (18843c0)
  • pyspark: implement trig functions (fd7621a)
  • pyspark: implement Where (32b9abb)
  • pyspark: implement xor (550b35b)
  • pyspark: implement zeroifnull (db13241)
  • pyspark: topk support (9344591)
  • sqlalchemy: add degrees and radians (8b7415f)
  • sqlalchemy: add xor translation rule (2921664)
  • sqlalchemy: allow non-primitive arrays (4e02918)
  • sqlalchemy: implement approx_count_distinct as count distinct (4e8bcab)
  • sqlalchemy: implement clip (8c02639)
  • sqlalchemy: implement trig functions (34c1514)
  • sqlalchemy: implement Where (7424704)
  • sqlalchemy: implement zeroifnull (4735e9a)
  • sqlite: implement BitAnd, BitOr and BitXor (e478479)
  • sqlite: implement cotangent (01e7ce7)
  • sqlite: implement degrees and radians (2cf9c5e)

Bug Fixes

  • api: bring back null datatype parsing (fc131a1)
  • api: compute the type from both branches of Where expressions (b8f4120)
  • api: ensure that Deferred objects work in aggregations (bbb376c)
  • api: ensure that nulls can be cast to any type to allow caller promotion (fab4393)
  • api: make ExistSubquery and NotExistsSubquery pure boolean operations (dd70024)
  • backends: make execution transactional where possible (d1ea269)
  • clickhouse: cast empty result dataframe (27ae68a)
  • clickhouse: handle empty IN and NOT IN expressions (2c892eb)
  • clickhouse: return null instead of empty string for group_concat when values are filtered out (b826b40)
  • compiler: fix bool bool comparisons (1ac9a9e)
  • dask/pandas: allow limit to be None (9f91d6b)
  • dask: aggregation with multi-key groupby fails on dask backend (4f8bc70)
  • datafusion: handle predicates in aggregates (4725571)
  • deps: update dependency datafusion to >=0.4,<0.7 (f5b244e)
  • deps: update dependency duckdb to >=0.3.2,<0.5.0 (57ee818)
  • deps: update dependency duckdb-engine to >=0.1.8,<0.3.0 (3e379a0)
  • deps: update dependency geoalchemy2 to >=0.6.3,<0.13 (c04a533)
  • deps: update dependency geopandas to >=0.6,<0.12 (b899c37)
  • deps: update dependency Shapely to >=1.6,<1.8.3 (87a49ad)
  • deps: update dependency toolz to >=0.11,<0.13 (258a641)
  • don't mask udf module in init.py (3e567ba)
  • duckdb: ensure that paths with non-extension . chars are parsed correctly (9448fd3)
  • duckdb: fix struct datatype parsing (5124763)
  • duckdb: force string_agg separator to be a constant (21cdf2f)
  • duckdb: handle multiple dotted extensions; quote names; consolidate implementations (1494246)
  • duckdb: remove timezone function invocation (33d38fc)
  • geospatial: ensure that later versions of numpy are compatible with geospatial code (33f0afb)
  • impala: a delimited table explicitly declare stored as textfile (04086a4), closes #4260
  • impala: remove broken nth_value implementation (dbc9cc2)
  • ir: don't attempt fusion when projections aren't exactly equivalent (3482ba2)
  • mysql: cast mysql timestamp literals to ensure correct return type (8116e04)
  • mysql: implement integer to timestamp using from_unixtime (1b43004)
  • pandas/dask: look at pre_execute for has_operation reporting (cb44efc)
  • pandas: execute negate on bool as not (330ab4f)
  • pandas: fix struct inference from dict in the pandas backend (5886a9a)
  • pandas: force backend options registration on trace.enable() calls (8818fe6)
  • pandas: handle empty boolean column casting in Series conversion (f697e3e)
  • pandas: handle struct columns with NA elements (9a7c510)
  • pandas: handle the case of selection from a join when remapping overlapping column names (031c4c6)
  • pandas: perform correct equality comparison (d62e7b9)
  • postgres/duckdb: cast after milliseconds computation instead of after extraction (bdd1d65)
  • pyspark: handle predicates in Aggregation (842c307)
  • pyspark: prevent spark from trying to convert timezone of naive timestamps (dfb4127)
  • pyspark: remove xpassing test for #2453 (c051e28)
  • pyspark: specialize implementation of has_operation (5082346)
  • pyspark: use empty check for collect_list in GroupConcat rule (df66acb)
  • repr: allow DestructValue selections to be formatted by fmt (4b45d87)
  • repr: when formatting DestructValue selections, use struct field names as column names (d01fe42)
  • sqlalchemy: fix parsing and construction of nested array types (e20bcc0)
  • sqlalchemy: remove unused second argument when creating temporary views (8766b40)
  • sqlite: register conversion to isoformat for pandas.Timestamp (fe95dca)
  • sqlite: test case with whitespace at the end of the line (7623ae9)
  • sql: use isoformat for timestamp literals (70d0ba6)
  • type-system: infer null datatype for empty sequence of expressions (f67d5f9)
  • use bounded precision for decimal aggregations (596acfb)

Performance Improvements

  • analysis: add _projection as cached_property to avoid reconstruction of projections (98510c8)
  • lineage: ensure that expressions are not traversed multiple times in most cases (ff9708c)

Reverts

  • ci: install sqlite3 on ubuntu (1f2705f)

3.0.2 (2022-04-28)

Bug Fixes

  • docs: fix tempdir location for docs build (dcd1b22)

3.0.1 (2022-04-28)

Bug Fixes

  • build: replace version before exec plugin runs (573139c)

3.0.0 (2022-04-25)

⚠ BREAKING CHANGES

  • ir: The following are breaking changes due to simplifying expression internals
  • ibis.expr.datatypes.DataType.scalar_type and DataType.column_type factory methods have been removed, DataType.scalar and DataType.column class fields can be used to directly construct a corresponding expression instance (though prefer to use operation.to_expr())
  • ibis.expr.types.ValueExpr._name and ValueExpr._dtype`` fields are not accassible anymore. While these were not supposed to used directly nowValueExpr.has_name(),ValueExpr.get_name()andValueExpr.type()` methods are the only way to retrieve the expression's name and datatype.
  • ibis.expr.operations.Node.output_type is a property now not a method, decorate those methods with @property
  • ibis.expr.operations.Value subclasses must define output_shape and output_dtype properties from now on (note the datatype abbreviation dtype in the property name)
  • ibis.expr.rules.cast(), scalar_like() and array_like() rules have been removed
  • api: Replace t["a"].distinct() with t[["a"]].distinct().
  • deps: The sqlalchemy lower bound is now 1.4
  • ir: Schema.names and Schema.types attributes now have tuple type rather than list
  • expr: Columns that were added or used in an aggregation or mutation would be alphabetically sorted in compiled SQL outputs. This was a vestige from when Python dicts didn't preserve insertion order. Now columns will appear in the order in which they were passed to aggregate or mutate
  • api: dt.float is now dt.float64; use dt.float32 for the previous behavior.
  • ir: Relation-based execute_node dispatch rules must now accept tuples of expressions.
  • ir: removed ibis.expr.lineage.{roots,find_nodes} functions
  • config: Use ibis.options.graphviz_repr = True to enable
  • hdfs: Use fsspec instead of HDFS from ibis
  • udf: Vectorized UDF coercion functions are no longer a public API.
  • The minimum supported Python version is now Python 3.8
  • config: register_option is no longer supported, please submit option requests upstream
  • backends: Read tables with pandas.read_hdf and use the pandas backend
  • The CSV backend is removed. Use Datafusion for CSV execution.
  • backends: Use the datafusion backend to read parquet files
  • Expr() -> Expr.pipe()
  • coercion functions previously in expr/schema.py are now in udf/vectorized.py
  • api: materialize is removed. Joins with overlapping columns now have suffixes.
  • kudu: use impala instead: https://kudu.apache.org/docs/kudu_impala_integration.html
  • Any code that was relying implicitly on string-y behavior from UUID datatypes will need to add an explicit cast first.

Features

  • add repr_html for expressions to print as tables in ipython (cd6fa4e)
  • add duckdb backend (667f2d5)
  • allow construction of decimal literals (3d9e865)
  • api: add ibis.asc expression (efe177e), closes #1454
  • api: add has_operation API to the backend (4fab014)
  • api: implement type for SortExpr (ab19bd6)
  • clickhouse: implement string concat for clickhouse (1767205)
  • clickhouse: implement StrRight operation (67749a0)
  • clickhouse: implement table union (e0008d7)
  • clickhouse: implement trim, pad and string predicates (a5b7293)
  • datafusion: implement Count operation (4797a86)
  • datatypes: unbounded decimal type (f7e6f65)
  • date: add ibis.date(y,m,d) functionality (26892b6), closes #386
  • duckdb/postgres/mysql/pyspark: implement .sql on tables for mixing sql and expressions (00e8087)
  • duckdb: add functionality needed to pass integer to interval test (e2119e8)
  • duckdb: implement _get_schema_using_query (93cd730)
  • duckdb: implement now() function (6924f50)
  • duckdb: implement regexp replace and extract (18d16a7)
  • implement force argument in sqlalchemy backend base class (9df7f1b)
  • implement coalesce for the pyspark backend (8183efe)
  • implement semi/anti join for the pandas backend (cb36fc5)
  • implement semi/anti join for the pyspark backend (3e1ba9c)
  • implement the remaining clickhouse joins (b3aa1f0)
  • ir: rewrite and speed up expression repr (45ce9b2)
  • mysql: implement _get_schema_from_query (456cd44)
  • mysql: move string join impl up to alchemy for mysql (77a8eb9)
  • postgres: implement _get_schema_using_query (f2459eb)
  • pyspark: implement Distinct for pyspark (4306ad9)
  • pyspark: implement log base b for pyspark (527af3c)
  • pyspark: implement percent_rank and enable testing (c051617)
  • repr: add interval info to interval repr (df26231)
  • sqlalchemy: implement ilike (43996c0)
  • sqlite: implement date_truncate (3ce4f2a)
  • sqlite: implement ISO week of year (714ff7b)
  • sqlite: implement string join and concat (6f5f353)
  • support of arrays and tuples for clickhouse (db512a8)
  • ver: dynamic version identifiers (408f862)

Bug Fixes

  • added wheel to pyproject toml for venv users (b0b8e5c)
  • allow major version changes in CalVer dependencies (9c3fbe5)
  • annotable: allow optional arguments at any position (778995f), closes #3730
  • api: add ibis.map and .struct (327b342), closes #3118
  • api: map string multiplication with integer to repeat method (b205922)
  • api: thread suffixes parameter to individual join methods (31a9aff)
  • change TimestampType to Timestamp (e0750be)
  • clickhouse: disconnect from clickhouse when computing version (11cbf08)
  • clickhouse: use a context manager for execution (a471225)
  • combine windows during windowization (7fdd851)
  • conform epoch_seconds impls to expression return type (18a70f1)
  • context-adjustment: pass scope when calling adjust_context in pyspark backend (33aad7b), closes #3108
  • dask: fix asof joins for newer version of dask (50711cc)
  • dask: workaround dask bug (a0f3bd9)
  • deps: update dependency atpublic to v3 (3fe8f0d)
  • deps: update dependency datafusion to >=0.4,<0.6 (3fb2194)
  • deps: update dependency geoalchemy2 to >=0.6.3,<0.12 (dc3c361)
  • deps: update dependency graphviz to >=0.16,<0.21 (3014445)
  • duckdb: add casts to literals to fix binding errors (1977a55), closes #3629
  • duckdb: fix array column type discovery on leaf tables and add tests (15e5412)
  • duckdb: fix log with base b impl (4920097)
  • duckdb: support both 0.3.2 and 0.3.3 (a73ccce)
  • enforce the schema's column names in apply_to (b0f334d)
  • expose ops.IfNull for mysql backend (156c2bd)
  • expr: add more binary operators to char list and implement fallback (b88184c)
  • expr: fix formatting of table info using tabulate (b110636)
  • fix float vs real data type detection in sqlalchemy (24e6774)
  • fix list_schemas argument (69c1abf)
  • fix postgres udfs and re-enable ci tests (7d480d2)
  • fix tablecolumn execution for filter following join (064595b)
  • format: remove some newlines from formatted expr repr (ed4fa78)
  • histogram: cross_join needs onclause=True (5d36a58), closes #622
  • ibis.expr.signature.Parameter is not pickleable (828fd54)
  • implement coalesce properly in the pandas backend (aca5312)
  • implement count on tables for pyspark (7fe5573), closes #2879
  • infer coalesce types when a non-null expression occurs after the first argument (c5f2906)
  • mutate: do not lift table column that results from mutate (ba4e5e5)
  • pandas: disable range windows with order by (e016664)
  • pandas: don't reassign the same column to silence SettingWithCopyWarning warning (75dc616)
  • pandas: implement percent_rank correctly (d8b83e7)
  • prevent unintentional cross joins in mutate + filter (83eef99)
  • pyspark: fix range windows (a6f2aa8)
  • regression in Selection.sort_by with resolved_keys (c7a69cd)
  • regression in sort_by with resolved_keys (63f1382), closes #3619
  • remove broken csv pre_execute (93b662a)
  • remove importorskip call for backend tests (2f0bcd8)
  • remove incorrect fix for pandas regression (339f544)
  • remove passing schema into register_parquet (bdcbb08)
  • repr: add ops.TimeAdd to repr binop lookup table (fd94275)
  • repr: allow ops.TableNode in fmt_value (6f57003)
  • reverse the predicate pushdown substitution (f3cd358)
  • sort_index to satisfy pandas 1.4.x (6bac0fc)
  • sqlalchemy: ensure correlated subqueries FROM clauses are rendered (3175321)
  • sqlalchemy: use corresponding_column to prevent spurious cross joins (fdada21)
  • sqlalchemy: use replace selectables to prevent semi/anti join cross join (e8a1a71)
  • sql: retain column names for named ColumnExprs (f1b4b6e), closes #3754
  • sql: walk right join trees and substitute joins with right-side joins with views (0231592)
  • store schema on the pandas backend to allow correct inference (35070be)

Performance Improvements

  • datatypes: speed up str and hash (262d3d7)
  • fast path for simple column selection (d178498)
  • ir: global equality cache (13c2bb2)
  • ir: introduce CachedEqMixin to speed up equality checks (b633925)
  • repr: remove full tree repr from rule validator error message (65885ab)
  • speed up attribute access (89d1c05)
  • use assign instead of concat in projections when possible (985c242)

Miscellaneous Chores

  • deps: increase sqlalchemy lower bound to 1.4 (560854a)
  • drop support for Python 3.7 (0afd138)

Code Refactoring

  • api: make primitive types more cohesive (71da8f7)
  • api: remove distinct ColumnExpr API (3f48cb8)
  • api: remove materialize (24285c1)
  • backends: remove the hdf5 backend (ff34f3e)
  • backends: remove the parquet backend (b510473)
  • config: disable graphviz-repr-in-notebook by default (214ad4e)
  • config: remove old config code and port to pydantic (4bb96d1)
  • dt.UUID inherits from DataType, not String (2ba540d)
  • expr: preserve column ordering in aggregations/mutations (668be0f)
  • hdfs: replace HDFS with fsspec (cc6eddb)
  • ir: make Annotable immutable (1f2b3fa)
  • ir: make schema annotable (b980903)
  • ir: remove unused lineage roots and find_nodes functions (d630a77)
  • ir: simplify expressions by not storing dtype and name (e929f85)
  • kudu: remove support for use of kudu through kudu-python (36bd97f)
  • move coercion functions from schema.py to udf (58eea56), closes #3033
  • remove blanket call for Expr (3a71116), closes #2258
  • remove the csv backend (0e3e02e)
  • udf: make coerce functions in ibis.udf.vectorized private (9ba4392)

2.1.1 (2022-01-12)

Bug Fixes

  • setup.py: set the correct version number for 2.1.0 (f3d267b)

2.1.0 (2022-01-12)

Bug Fixes

  • consider all packages' entry points (b495cf6)
  • datatypes: infer bytes literal as binary #2915 (#3124) (887efbd)
  • deps: bump minimum dask version to 2021.10.0 (e6b5c09)
  • deps: constrain numpy to ensure wheels are used on windows (70c308b)
  • deps: update dependency clickhouse-driver to ^0.1 || ^0.2.0 (#3061) (a839d54)
  • deps: update dependency geoalchemy2 to >=0.6,<0.11 (4cede9d)
  • deps: update dependency pyarrow to v6 (#3092) (61e52b5)
  • don't force backends to override do_connect until 3.0.0 (4b46973)
  • execute materialized joins in the pandas and dask backends (#3086) (9ed937a)
  • literal: allow creating ibis literal with uuid (#3131) (b0f4f44)
  • restore the ability to have more than two option levels (#3151) (fb4a944)
  • sqlalchemy: fix correlated subquery compilation (43b9010)
  • sqlite: defer db connection until needed (#3127) (5467afa), closes #64

Features

  • allow column_of to take a column expression (dbc34bb)
  • ci: More readable workflow job titles (#3111) (d8fd7d9)
  • datafusion: initial implementation for Arrow Datafusion backend (3a67840), closes #2627
  • datafusion: initial implementation for Arrow Datafusion backend (75876d9), closes #2627
  • make dayofweek impls conform to pandas semantics (#3161) (9297828)

Reverts

  • "ci: install gdal for fiona" (8503361)

2.0.0 (2021-10-06)

Features

  • Serialization-deserialization of Node via pickle is now byte compatible between different processes (#2938)
  • Support joining on different columns in ClickHouse backend (#2916)
  • Support summarization of empty data in pandas backend (#2908)
  • Unify implementation of fillna and isna in Pyspark backend (#2882)
  • Support binary operation with Timedelta in Pyspark backend (#2873)
  • Add group_concat operation for Clickhouse backend (#2839)
  • Support comparison of ColumnExpr to timestamp literal (#2808)
  • Make op schema a cached property (#2805)
  • Implement .insert() for SQLAlchemy backends (#2613, #2613)
  • Infer categorical and decimal Series to more specific Ibis types in pandas backend (#2792)
  • Add startswith and endswith operations (#2790)
  • Allow more flexible return type for UDFs (#2776, #2797)
  • Implement Clip in the Pyspark backend (#2779)
  • Use ndarray as array representation in pandas backend (#2753)
  • Support Spark filter with window operation (#2687)
  • Support context adjustment for udfs for pandas backend (#2646)
  • Add auth_local_webserver, auth_external_data, and auth_cache parameters to BigQuery connect method. Set auth_local_webserver to use a local server instead of copy-pasting an authorization code. Set auth_external_data to true to request additional scopes required to query Google Drive and Sheets. Set auth_cache to reauth or none to force reauthentication. (#2655)
  • Add bit_and, bit_or, and bit_xor integer column aggregates (BigQuery and MySQL backends) (#2641)
  • Backends are defined as entry points (#2379)
  • Add ibis.array for creating array expressions (#2615)
  • Implement Not operation in PySpark backend (#2607)
  • Added support for case/when in PySpark backend (#2610)
  • Add support for np.array as literals for backends that already support lists as literals (#2603)

Bugs

  • Fix data races in impala connection pool accounting (#2991)
  • Fix null literal compilation in the Clickhouse backend (#2985)
  • Fix order of limit and offset parameters in the Clickhouse backend (#2984)
  • Replace equals operation for geospatial datatype to geo_equals (#2956)
  • Fix .drop(fields). The argument can now be either a list of strings or a string. (#2829)
  • Fix projection on differences and intersections for SQL backends (#2845)
  • Backends are loaded in a lazy way, so third-party backends can import Ibis without circular imports (#2827)
  • Disable aggregation optimization due to N squared performance (#2830)
  • Fix .cast() to array outputting list instead of np.array in pandas backend (#2821)
  • Fix aggregation with mixed reduction datatypes (array + scalar) on Dask backend (#2820)
  • Fix error when using reduction UDF that returns np.array in a grouped aggregation (#2770)
  • Fix time context trimming error for multi column udfs in pandas backend (#2712)
  • Fix error during compilation of range_window in base_sql backends (:issue:2608) (#2710)
  • Fix wrong row indexing in the result for 'window after filter' for timecontext adjustment (#2696)
  • Fix aggregate exploding the output of Reduction ops that return a list/ndarray (#2702)
  • Fix issues with context adjustment for filter with PySpark backend (#2693)
  • Add temporary struct col in pyspark backend to ensure that UDFs are executed only once (#2657)
  • Fix BigQuery connect bug that ignored project ID parameter (#2588)
  • Fix overwrite logic to account for DestructColumn inside mutate API (#2636)
  • Fix fusion optimization bug that incorrectly changes operation order (#2635)
  • Fixes a NPE issue with substr in PySpark backend (#2610)
  • Fixes binary data type translation into BigQuery bytes data type (#2354)
  • Make StructValue picklable (#2577)

Support

  • Improvement of the backend API. The former Client subclasses have been replaced by a Backend class that must subclass ibis.backends.base.BaseBackend. The BaseBackend class contains abstract methods for the minimum subset of methods that backends must implement, and their signatures have been standardized across backends. The Ibis compiler has been refactored, and backends don't need to implement all compiler classes anymore if the default works for them. Only a subclass of ibis.backends.base.sql.compiler.Compiler is now required. Backends now need to register themselves as entry points. (#2678)
  • Deprecate exists_table(table) in favor of table in list_tables() (#2905)
  • Remove handwritten type parser; parsing errors that were previously IbisTypeError are now parsy.ParseError. parsy is now a hard requirement. (#2977)
  • Methods current_database and list_databases raise an exception for backends that do not support databases (#2962)
  • Method set_database has been deprecated, in favor of creating a new connection to a different database (#2913)
  • Removed log method of clients, in favor of verbose_log option (#2914)
  • Output of Client.version returned as a string, instead of a setuptools Version (#2883)
  • Deprecated list_schemas in SQLAlchemy backends in favor of list_databases (#2862)
  • Deprecated ibis.<backend>.verify() in favor of capturing exception in ibis.<backend>.compile() (#2865)
  • Simplification of data fetching. Backends don't need to implement Query anymore (#2789)
  • Move BigQuery backend to a separate repository <https://github.com/ibis-project/ibis-bigquery>_. The backend will be released separately, use pip install ibis-bigquery or conda install ibis-bigquery to install it, and then use as before. (#2665)
  • Supporting SQLAlchemy 1.4, and requiring minimum 1.3 (#2689)
  • Namespace time_col config, fix type check for trim_with_timecontext for pandas window execution (#2680)
  • Remove deprecated ibis.HDFS, ibis.WebHDFS and ibis.hdfs_connect (#2505)

1.4.0 (2020-11-07)

Features

  • Add Struct.from_dict (#2514)
  • Add hash and hashbytes support for BigQuery backend (#2310)
  • Support reduction UDF without groupby to return multiple columns for pandas backend (#2511)
  • Support analytic and reduction UDF to return multiple columns for pandas backend (#2487)
  • Support elementwise UDF to return multiple columns for pandas and PySpark backend (#2473)
  • FEAT: Support Ibis interval for window in pyspark backend (#2409)
  • Use Scope class for scope in pyspark backend (#2402)
  • Add PySpark support for ReductionVectorizedUDF (#2366)
  • Add time context in scope in execution for pandas backend (#2306)
  • Add start_point and end_point to PostGIS backend. (#2081)
  • Add set difference to general ibis api (#2347)
  • Add rowid expression, supported by SQLite and OmniSciDB (#2251)
  • Add intersection to general ibis api (#2230)
  • Add application_name argument to ibis.bigquery.connect to allow attributing Google API requests to projects that use Ibis. (#2303)
  • Add support for casting category dtype in pandas backend (#2285)
  • Add support for Union in the PySpark backend (#2270)
  • Add support for implementign custom window object for pandas backend (#2260)
  • Implement two level dispatcher for execute_node (#2246)
  • Add ibis.pandas.trace module to log time and call stack information. (#2233)
  • Validate that the output type of a UDF is a single element (#2198)
  • ZeroIfNull and NullIfZero implementation for OmniSciDB (#2186)
  • IsNan implementation for OmniSciDB (#2093)
  • [OmnisciDB] Support add_columns and drop_columns for OmnisciDB table (#2094)
  • Create ExtractQuarter operation and add its support to Clickhouse, CSV, Impala, MySQL, OmniSciDB, pandas, Parquet, PostgreSQL, PySpark, SQLite and Spark (#2175)
  • Add translation rules for isnull() and notnull() for pyspark backend (#2126)
  • Add window operations support to SQLite (#2232)
  • Implement read_csv for omniscidb backend (#2062)
  • [OmniSciDB] Add support to week extraction (#2171)
  • Date, DateDiff and TimestampDiff implementations for OmniSciDB (#2097)
  • Create ExtractWeekOfYear operation and add its support to Clickhouse, CSV, MySQL, pandas, Parquet, PostgreSQL, PySpark and Spark (#2177)
  • Add initial support for ibis.random function (#2060)
  • Added epoch_seconds extraction operation to Clickhouse, CSV, Impala, MySQL, OmniSciDB, pandas, Parquet, PostgreSQL, PySpark, SQLite, Spark and BigQuery :issue:2273 (#2178)
  • [OmniSciDB] Add "method" parameter to load_data (#2165)
  • Add non-nullable info to schema output (#2117)
  • fillna and nullif implementations for OmnisciDB (#2083)
  • Add load_data to sqlalchemy's backends and fix database parameter for load/create/drop when database parameter is the same than the current database (#1981)
  • [OmniSciDB] Add support for within, d_fully_within and point (#2125)
  • OmniSciDB - Refactor DDL and Client; Add temporary parameter to create_table and "force" parameter to drop_view (#2086)
  • Create ExtractDayOfYear operation and add its support to Clickhouse, CSV, MySQL, OmniSciDB, pandas, Parquet, PostgreSQL, PySpark, SQLite and Spark (#2173)
  • Implementations of Log Log2 Log10 for OmniSciDB backend (#2095)

Bugs

  • Table expressions do not recognize inet datatype (Postgres backend) (#2462)
  • Table expressions do not recognize macaddr datatype (Postgres backend) (#2461)
  • Fix aggcontext.Summarize not always producing scalar (pandas backend) (#2410)
  • Fix same window op with different window size on table lead to incorrect results for pyspark backend (#2414)
  • Fix same column with multiple aliases not showing properly in repr (#2229)
  • Fix reduction UDFs over ungrouped, bounded windows on pandas backend (#2395)
  • FEAT: Support rolling window UDF with non numeric inputs for pandas backend. (#2386)
  • Fix scope get to use hashmap lookup instead of list lookup (#2386)
  • Fix equality behavior for Literal ops (#2387)
  • Fix analytic ops over ungrouped and unordered windows on pandas backend (#2376)
  • Fix the covariance operator in the BigQuery backend. (#2367)
  • Update impala kerberos dependencies (#2342)
  • Added verbose logging to SQL backends (#1320)
  • Fix issue with sql_validate call to OmnisciDB. (#2256)
  • Add missing float types to pandas backend (#2237)
  • Allow group_by and order_by as window operation input in pandas backend (#2252)
  • Fix PySpark compiler error when elementwise UDF output_type is Decimal or Timestamp (#2223)
  • Fix interactive mode returning a expression instead of the value when used in Jupyter (#2157)
  • Fix PySpark error when doing alias after selection (#2127)
  • Fix millisecond issue for OmniSciDB :issue:2167, MySQL :issue:2169, PostgreSQL :issue:2166, pandas :issue:2168, BigQuery :issue:2273 backends (#2170)
  • [OmniSciDB] Fix TopK when used as filter (#2134)

Support

  • Move ibis.HDFS, ibis.WebHDFS and ibis.hdfs_connect to ibis.impala.* (#2497)
  • Drop support to Python 3.6 (#2288)
  • Simplifying tests directories structure (#2351)
  • Update google-cloud-bigquery dependency minimum version to 1.12.0 (#2304)
  • Remove "experimental" mentions for OmniSciDB and pandas backends (#2234)
  • Use an OmniSciDB image stable on CI (#2244)
  • Added fragment_size to table creation for OmniSciDB (#2107)
  • Added round() support for OmniSciDB (#2096)
  • Enabled cumulative ops support for OmniSciDB (#2113)

1.3.0 (2020-02-27)

Features

  • Improve many arguments UDF performance in pandas backend. (#2071)
  • Add DenseRank, RowNumber, MinRank, Count, PercentRank/CumeDist window operations to OmniSciDB (#1976)
  • Introduce a top level vectorized UDF module (experimental). Implement element-wise UDF for pandas and PySpark backend. (#2047)
  • Add support for multi arguments window UDAF for the pandas backend (#2035)
  • Clean up window translation logic in pyspark backend (#2004)
  • Add docstring check to CI for an initial subset files (#1996)
  • Pyspark backend bounded windows (#2001)
  • Add more POSTGIS operations (#1987)
  • SQLAlchemy Default precision and scale to decimal types for PostgreSQL and MySQL (#1969)
  • Add support for array operations in PySpark backend (#1983)
  • Implement sort, if_null, null_if and notin for PySpark backend (#1978)
  • Add support for date/time operations in PySpark backend (#1974)
  • Add support for params, query_schema, and sql in PySpark backend (#1973)
  • Implement join for PySpark backend (#1967)
  • Validate AsOfJoin tolerance and attempt interval unit conversion (#1952)
  • filter for PySpark backend (#1943)
  • window operations for pyspark backend (#1945)
  • Implement IntervalSub for pandas backend (#1951)
  • PySpark backend string and column ops (#1942)
  • PySpark backend (#1913)
  • DDL support for Spark backend (#1908)
  • Support timezone aware arrow timestamps (#1923)
  • Add shapely geometries as input for literals (#1860)
  • Add geopandas as output for omniscidb (#1858)
  • Spark UDFs (#1885)
  • Add support for Postgres UDFs (#1871)
  • Spark tests (#1830)
  • Spark client (#1807)
  • Use pandas rolling apply to implement rows_with_max_lookback (#1868)

Bugs

  • Pin "clickhouse-driver" to ">=0.1.3" (#2089)
  • Fix load data stage for Linux CI (#2069)
  • Fix datamgr.py fail if IBIS_TEST_OMNISCIDB_DATABASE=omnisci (#2057)
  • Change pymapd connection parameter from "session_id" to "sessionid" (#2041)
  • Fix pandas backend to treat trailing_window preceding arg as window bound rather than window size (e.g. preceding=0 now indicates current row rather than window size 0) (#2009)
  • Fix handling of Array types in Postgres UDF (#2015)
  • Fix pydocstyle config (#2010)
  • Pinning clickhouse-driver<0.1.2 (#2006)
  • Fix CI log for database (#1984)
  • Fixes explain operation (#1933)
  • Fix incorrect assumptions about attached SQLite databases (#1937)
  • Upgrade to JDK11 (#1938)
  • sql method doesn't work when the query uses LIMIT clause (#1903)
  • Fix union implementation (#1910)
  • Fix failing com imports on master (#1912)
  • OmniSci/MapD - Fix reduction for bool (#1901)
  • Pass scope to grouping execution in the pandas backend (#1899)
  • Fix various Spark backend issues (#1888)
  • Make Nodes enforce the proper signature (#1891)
  • Fix according to bug in pd.to_datetime when passing the unit flag (#1893)
  • Fix small formatting buglet in PR merge tool (#1883)
  • Fix the case where we do not have an index when using preceding with intervals (#1876)
  • Fixed issues with geo data (#1872)
  • Remove -x from pytest call in linux CI (#1869)
  • Fix return type of Struct.from_tuples (#1867)

Support

  • Add support to Python 3.8 (#2066)
  • Pin back version of isort (#2079)
  • Use user-defined port variables for Omnisci and PostgreSQL tests (#2082)
  • Change omniscidb image tag from v5.0.0 to v5.1.0 on docker-compose recipe (#2077)
  • [Omnisci] The same SRIDs for test_geo_spatial_binops (#2051)
  • Unpin rtree version (#2078)
  • Link pandas issues with xfail tests in pandas/tests/test_udf.py (#2074)
  • Disable Postgres tests on Windows CI. (#2075)
  • use conda for installation black and isort tools (#2068)
  • CI: Fix CI builds related to new pandas 1.0 compatibility (#2061)
  • Fix data map for int8 on OmniSciDB backend (#2056)
  • Add possibility to run tests for separate backend via make test BACKENDS=[YOUR BACKEND] (#2052)
  • Fix "cudf" import on OmniSciDB backend (#2055)
  • CI: Drop table only if it exists (OmniSciDB) (#2050)
  • Add initial documentation for OmniSciDB, MySQL, PySpark and SparkSQL backends, add initial documentation for geospatial methods and add links to Ibis wiki page (#2034)
  • Implement covariance for bigquery backend (#2044)
  • Add Spark to supported backends list (#2046)
  • Ping dependency of rtree to fix CI failure (#2043)
  • Drop support for Python 3.5 (#2037)
  • HTML escape column names and types in png repr. (#2023)
  • Add geospatial tutorial notebook (#1991)
  • Change omniscidb image tag from v4.7.0 to v5.0.0 on docker-compose recipe (#2031)
  • Pin "semantic_version" to "<2.7" in the docs build CI, fix "builddoc" and "doc" section inside "Makefile" and skip mysql tzinfo on CI to allow to run MySQL using docker container on a hard disk drive. (#2030)
  • Fixed impala start up issues (#2012)
  • cache all ops in translate() (#1999)
  • Add black step to CI (#1988)
  • Json UUID any (#1962)
  • Add log for database services (#1982)
  • Fix BigQuery backend fixture so batting and awards_players fixture re… (#1972)
  • Disable BigQuery explicitly in all/test_join.py (#1971)
  • Re-formatting all files using pre-commit hook (#1963)
  • Disable codecov report upload during CI builds (#1961)
  • Developer doc enhancements (#1960)
  • Missing geospatial ops for OmniSciDB (#1958)
  • Remove pandas deprecation warnings (#1950)
  • Add developer docs to get docker setup (#1948)
  • More informative IntegrityError on duplicate columns (#1949)
  • Improve geospatial literals and smoke tests (#1928)
  • PostGIS enhancements (#1925)
  • Rename mapd to omniscidb backend (#1866)
  • Fix failing BigQuery tests (#1926)
  • Added missing null literal op (#1917)
  • Update link to Presto website (#1895)
  • Removing linting from windows (#1896)
  • Fix link to NUMFOCUS CoC (#1884)
  • Added CoC section (#1882)
  • Remove pandas exception for rows_with_max_lookback (#1859)
  • Move CI pipelines to Azure (#1856)

1.2.0 (2019-06-24)

Features

  • Add new geospatial functions to OmniSciDB backend (#1836)
  • allow pandas timedelta in rows_with_max_lookback (#1838)
  • Accept rows-with-max-lookback as preceding parameter (#1825)
  • PostGIS support (#1787)

Bugs

  • Fix call to psql causing failing CI (#1855)
  • Fix nested array literal repr (#1851)
  • Fix repr of empty schema (#1850)
  • Add max_lookback to window replace and combine functions (#1843)
  • Partially revert #1758 (#1837)

Support

  • Skip SQLAlchemy backend tests in connect method in backends.py (#1847)
  • Validate order_by when using rows_with_max_lookback window (#1848)
  • Generate release notes from commits (#1845)
  • Raise exception on backends where rows_with_max_lookback can't be implemented (#1844)
  • Tighter version spec for pytest (#1840)
  • Allow passing a branch to ci/feedstock.py (#1826)

1.1.0 (2019-06-09)

Features

  • Conslidate trailing window functions (#1809)
  • Call to_interval when casting integers to intervals (#1766)
  • Add session feature to mapd client API (#1796)
  • Add min periods parameter to Window (#1792)
  • Allow strings for types in pandas UDFs (#1785)
  • Add missing date operations and struct field operation for the pandas backend (#1790)
  • Add window operations to the OmniSci backend (#1771)
  • Reimplement the pandas backend using topological sort (#1758)
  • Add marker for xfailing specific backends (#1778)
  • Enable window function tests where possible (#1777)
  • is_computable_arg dispatcher (#1743)
  • Added float32 and geospatial types for create table from schema (#1753)

Bugs

  • Fix group_concat test and implementations (#1819)
  • Fix failing strftime tests on Python 3.7 (#1818)
  • Remove unnecessary (and erroneous in some cases) frame clauses (#1757)
  • Chained mutate operations are buggy (#1799)
  • Allow projections from joins to attempt fusion (#1783)
  • Fix Python 3.5 dependency versions (#1798)
  • Fix compatibility and bugs associated with pandas toposort reimplementation (#1789)
  • Fix outer_join generating LEFT join instead of FULL OUTER (#1772)
  • NullIf should enforce that its arguments are castable to a common type (#1782)
  • Fix conda create command in documentation (#1775)
  • Fix preceding and following with None (#1765)
  • PostgreSQL interval type not recognized (#1661)

Support

  • Remove decorator hacks and add custom markers (#1820)
  • Add development deps to setup.py (#1814)
  • Fix design and developer docs (#1805)
  • Pin sphinx version to 2.0.1 (#1810)
  • Add pep8speaks integration (#1793)
  • Fix typo in UDF signature specification (#1821)
  • Clean up most xpassing tests (#1779)
  • Update omnisci container version (#1781)
  • Constrain PyMapD version to get passing builds (#1776)
  • Remove warnings and clean up some docstrings (#1763)
  • Add StringToTimestamp as unsupported (#1638)
  • Add isort pre-commit hooks (#1759)
  • Add Python 3.5 testing back to CI (#1750)
  • Re-enable CI for building step (#1700)
  • Update README reference to MapD to say OmniSci (#1749)

1.0.0 (2019-03-26)

Features

  • Add black as a pre-commit hook (#1735)
  • Add support for the arbitrary aggregate in the mapd backend (#1680)
  • Add SQL method for the MapD backend (#1731)
  • Clean up merge PR script and use the actual merge feature of GitHub (#1744)
  • Add cross join to the pandas backend (#1723)
  • Implement default handler for multiple client pre_execute (#1727)
  • Implement BigQuery auth using pydata_google_auth (#1728)
  • Timestamp literal accepts a timezone parameter (#1712)
  • Remove support for passing integers to ibis.timestamp (#1725)
  • Add find_nodes to lineage (#1704)
  • Remove a bunch of deprecated APIs and clean up warnings (#1714)
  • Implement table distinct for the pandas backend (#1716)
  • Implement geospatial functions for MapD (#1678)
  • Implement geospatial types for MapD (#1666)
  • Add pre commit hook (#1685)
  • Getting started with mapd, mysql and pandas (#1686)
  • Support column names with special characters in mapd (#1675)
  • Allow operations to hide arguments from display (#1669)
  • Remove implicit ordering requirements in the PostgreSQL backend (#1636)
  • Add cross join operator to MapD (#1655)
  • Fix UDF bugs and add support for non-aggregate analytic functions (#1637)
  • Support string slicing with other expressions (#1627)
  • Publish the ibis roadmap (#1618)
  • Implement approx_median in BigQuery (#1604)
  • Make ibis node instances hashable (#1611)
  • Add range_window and trailing_range_window to docs (#1608)

Bugs

  • Make dev/merge-pr.py script handle PR branches (#1745)
  • Fix NULLIF implementation for the pandas backend (#1742)
  • Fix casting to float in the MapD backend (#1737)
  • Fix testing for BigQuery after auth flow update (#1741)
  • Fix skipping for new BigQuery auth flow (#1738)
  • Fix bug in TableExpr.drop (#1732)
  • Filter the raw warning from newer pandas to support older pandas (#1729)
  • Fix BigQuery credentials link (#1706)
  • Add Union as an unsuppoted operation for MapD (#1639)
  • Fix visualizing an ibis expression when showing a selection after a table join (#1705)
  • Fix MapD exception for toDateTime (#1659)
  • Use == to compare strings (#1701)
  • Resolves joining with different column names (#1647)
  • Fix map get with compatible types (#1643)
  • Fixed where operator for MapD (#1653)
  • Remove parameters from mapd (#1648)
  • Make sure we cast when NULL is else in CASE expressions (#1651)
  • Fix equality (#1600)

Support

  • Do not build universal wheels (#1748)
  • Remove tag prefix from versioneer (#1747)
  • Use releases to manage documentation (#1746)
  • Use cudf instead of pygdf (#1694)
  • Fix multiple CI issues (#1696)
  • Update mapd ci to v4.4.1 (#1681)
  • Enabled mysql CI on azure pipelines (#1672)
  • Remove support for Python 2 (#1670)
  • Fix flake8 and many other warnings (#1667)
  • Update README.md for impala and kudu (#1664)
  • Remove defaults as a channel from azure pipelines (#1660)
  • Fixes a very typo in the pandas/core.py docstring (#1658)
  • Unpin clickhouse-driver version (#1657)
  • Add test for reduction returning lists (#1650)
  • Fix Azure VM image name (#1646)
  • Updated MapD server-CI (#1641)
  • Add TableExpr.drop to API documentation (#1645)
  • Fix Azure deployment step (#1642)
  • Set up CI with Azure Pipelines (#1640)
  • Fix conda builds (#1609)

v0.14.0 (2018-08-23)

This release brings refactored, more composable core components and rule system to ibis. We also focused quite heavily on the BigQuery backend this release.

New Features

  • Allow keyword arguments in Node subclasses (#968)
  • Splat args into Node subclasses instead of requiring a list (#969)
  • Add support for UNION in the BigQuery backend (#1408, #1409)
  • Support for writing UDFs in BigQuery (#1377). See the BigQuery UDF docs for more details.
  • Support for cross-project expressions in the BigQuery backend. (#1427, #1428)
  • Add strftime and to_timestamp support for BigQuery (#1422, #1410)
  • Require google-cloud-bigquery >=1.0 (#1424)
  • Limited support for interval arithmetic in the pandas backend (#1407)
  • Support for subclassing TableExpr (#1439)
  • Fill out pandas backend operations (#1423)
  • Add common DDL APIs to the pandas backend (#1464)
  • Implement the sql method for BigQuery (#1463)
  • Add to_timestamp for BigQuery (#1455)
  • Add the mapd backend (#1419)
  • Implement range windows (#1349)
  • Support for map types in the pandas backend (#1498)
  • Add mean and sum for boolean types in BigQuery (#1516)
  • All recent versions of SQLAlchemy are now supported (#1384)
  • Add support for NUMERIC types in the BigQuery backend (#1534)
  • Speed up grouped and rolling operations in the pandas backend (#1549)
  • Implement TimestampNow for BigQuery and pandas (#1575)

Bug Fixes

  • Nullable property is now propagated through value types (#1289)
  • Implicit casting between signed and unsigned integers checks boundaries
  • Fix precedence of case statement (#1412)
  • Fix handling of large timestamps (#1440)
  • Fix identical_to precedence (#1458)
  • pandas 0.23 compatibility (#1458)
  • Preserve timezones in timestamp-typed literals (#1459)
  • Fix incorrect topological ordering of UNION expressions (#1501)
  • Fix projection fusion bug when attempting to fuse columns of the same name (#1496)
  • Fix output type for some decimal operations (#1541)

API Changes

  • The previous, private rules API has been rewritten (#1366)
  • Defining input arguments for operations happens in a more readable fashion instead of the previous [input_type]{.title-ref} list.
  • Removed support for async query execution (only Impala supported)
  • Remove support for Python 3.4 (#1326)
  • BigQuery division defaults to using IEEE_DIVIDE (#1390)
  • Add tolerance parameter to asof_join (#1443)

v0.13.0 (2018-03-30)

This release brings new backends, including support for executing against files, MySQL, pandas user defined scalar and aggregations along with a number of bug fixes and reliability enhancements. We recommend that all users upgrade from earlier versions of Ibis.

New Backends

  • File Support for CSV & HDF5 (#1165, #1194)
  • File Support for Parquet Format (#1175, #1194)
  • Experimental support for MySQL thanks to \@kszucs (#1224)

New Features

  • Support for Unsigned Integer Types (#1194)
  • Support for Interval types and expressions with support for execution on the Impala and Clickhouse backends (#1243)
  • Isnan, isinf operations for float and double values (#1261)
  • Support for an interval with a quarter period (#1259)
  • ibis.pandas.from_dataframe convenience function (#1155)
  • Remove the restriction on ROW_NUMBER() requiring it to have an ORDER BY clause (#1371)
  • Add .get() operation on a Map type (#1376)
  • Allow visualization of custom defined expressions
  • Add experimental support for pandas UDFs/UDAFs (#1277)
  • Functions can be used as groupby keys (#1214, #1215)
  • Generalize the use of the where parameter to reduction operations (#1220)
  • Support for interval operations thanks to \@kszucs (#1243, #1260, #1249)
  • Support for the PARTITIONTIME column in the BigQuery backend (#1322)
  • Add arbitrary() method for selecting the first non null value in a column (#1230, #1309)
  • Windowed MultiQuantile operation in the pandas backend thanks to \@DiegoAlbertoTorres (#1343)
  • Rules for validating table expressions thanks to \@DiegoAlbertoTorres (#1298)
  • Complete end-to-end testing framework for all supported backends (#1256)
  • contains/not contains now supported in the pandas backend (#1210, #1211)
  • CI builds are now reproducible locally thanks to \@kszucs (#1121, #1237, #1255, #1311)
  • isnan/isinf operations thanks to \@kszucs (#1261)
  • Framework for generalized dtype and schema inference, and implicit casting thanks to \@kszucs (#1221, #1269)
  • Generic utilities for expression traversal thanks to \@kszucs (#1336)
  • day_of_week API (#306, #1047)
  • Design documentation for ibis (#1351)

Bug Fixes

  • Unbound parameters were failing in the simple case of a ibis.expr.types.TableExpr.mutate call with no operation (#1378)
  • Fix parameterized subqueries (#1300, #1331, #1303, #1378)
  • Fix subquery extraction, which wasn't happening in topological order (#1342)
  • Fix parenthesization if isnull (#1307)
  • Calling drop after mutate did not work (#1296, #1299)
  • SQLAlchemy backends were missing an implementation of ibis.expr.operations.NotContains.
  • Support REGEX_EXTRACT in PostgreSQL 10 (#1276, #1278)

API Changes

  • Fixing #1378 required the removal of the name parameter to the ibis.param function. Use the ibis.expr.types.Expr.name method instead.

v0.12.0 (2017-10-28)

This release brings Clickhouse and BigQuery SQL support along with a number of bug fixes and reliability enhancements. We recommend that all users upgrade from earlier versions of Ibis.

New Backends

  • BigQuery backend (#1170), thanks to \@tsdlovell.
  • Clickhouse backend (#1127), thanks to \@kszucs.

New Features

  • Add support for Binary data type (#1183)
  • Allow users of the BigQuery client to define their own API proxy classes (#1188)
  • Add support for HAVING in the pandas backend (#1182)
  • Add struct field tab completion (#1178)
  • Add expressions for Map/Struct types and columns (#1166)
  • Support Table.asof_join (#1162)
  • Allow right side of arithmetic operations to take over (#1150)
  • Add a data_preload step in pandas backend (#1142)
  • expressions in join predicates in the pandas backend (#1138)
  • Scalar parameters (#1075)
  • Limited window function support for pandas (#1083)
  • Implement Time datatype (#1105)
  • Implement array ops for pandas (#1100)
  • support for passing multiple quantiles in .quantile() (#1094)
  • support for clip and quantile ops on DoubleColumns (#1090)
  • Enable unary math operations for pandas, sqlite (#1071)
  • Enable casting from strings to temporal types (#1076)
  • Allow selection of whole tables in pandas joins (#1072)
  • Implement comparison for string vs date and timestamp types (#1065)
  • Implement isnull and notnull for pandas (#1066)
  • Allow like operation to accept a list of conditions to match (#1061)
  • Add a pre_execute step in pandas backend (#1189)

Bug Fixes

  • Remove global expression caching to ensure repeatable code generation (#1179, #1181)
  • Fix ORDER BY generation without a GROUP BY (#1180, #1181)
  • Ensure that ~ibis.expr.datatypes.DataType and subclasses hash properly (#1172)
  • Ensure that the pandas backend can deal with unary operations in groupby
  • (#1182)
  • Incorrect impala code generated for NOT with complex argument (#1176)
  • BUG/CLN: Fix predicates on Selections on Joins (#1149)
  • Don\'t use SET LOCAL to allow redshift to work (#1163)
  • Allow empty arrays as arguments (#1154)
  • Fix column renaming in groupby keys (#1151)
  • Ensure that we only cast if timezone is not None (#1147)
  • Fix location of conftest.py (#1107)
  • TST/Make sure we drop tables during postgres testing (#1101)
  • Fix misleading join error message (#1086)
  • BUG/TST: Make hdfs an optional dependency (#1082)
  • Memoization should include expression name where available (#1080)

Performance Enhancements

  • Speed up imports (#1074)
  • Fix execution perf of groupby and selection (#1073)
  • Use normalize for casting to dates in pandas (#1070)
  • Speed up pandas groupby (#1067)

Contributors

The following people contributed to the 0.12.0 release :

$ git shortlog -sn --no-merges v0.11.2..v0.12.0
63  Phillip Cloud
 8  Jeff Reback
 2  Krisztián Szűcs
 2  Tory Haavik
 1  Anirudh
 1  Szucs Krisztian
 1  dlovell
 1  kwangin

0.11.0 (2017-06-28)

This release brings initial pandas backend support along with a number of bug fixes and reliability enhancements. We recommend that all users upgrade from earlier versions of Ibis.

New Features

  • Experimental pandas backend to allow execution of ibis expression against pandas DataFrames
  • Graphviz visualization of ibis expressions. Implements _repr_png_ for Jupyter Notebook functionality
  • Ability to create a partitioned table from an ibis expression
  • Support for missing operations in the SQLite backend: sqrt, power, variance, and standard deviation, regular expression functions, and missing power support for PostgreSQL
  • Support for schemas inside databases with the PostgreSQL backend
  • Appveyor testing on core ibis across all supported Python versions
  • Add year/month/day methods to date types
  • Ability to sort, group by and project columns according to positional index rather than only by name
  • Added a type parameter to ibis.literal to allow user specification of literal types

Bug Fixes

  • Fix broken conda recipe
  • Fix incorrectly typed fillna operation
  • Fix postgres boolean summary operations
  • Fix kudu support to reflect client API Changes
  • Fix equality of nested types and construction of nested types when the value type is specified as a string

API Changes

  • Deprecate passing integer values to the ibis.timestamp literal constructor, this will be removed in 0.12.0
  • Added the admin_timeout parameter to the kudu client connect function

Contributors

$ git shortlog --summary --numbered v0.10.0..v0.11.0

  58 Phillip Cloud
   1 Greg Rahn
   1 Marius van Niekerk
   1 Tarun Gogineni
   1 Wes McKinney

0.8 (2016-05-19)

This release brings initial PostgreSQL backend support along with a number of critical bug fixes and usability improvements. As several correctness bugs with the SQL compiler were fixed, we recommend that all users upgrade from earlier versions of Ibis.

New Features

  • Initial PostgreSQL backend contributed by Phillip Cloud.
  • Add groupby as an alias for group_by to table expressions

Bug Fixes

  • Fix an expression error when filtering based on a new field
  • Fix Impala\'s SQL compilation of using OR with compound filters
  • Various fixes with the having(...) function in grouped table expressions
  • Fix CTE (WITH) extraction inside UNION ALL expressions.
  • Fix ImportError on Python 2 when mock library not installed

API Changes

  • The deprecated ibis.impala_connect and ibis.make_client APIs have been removed

0.7 (2016-03-16)

This release brings initial Kudu-Impala integration and improved Impala and SQLite support, along with several critical bug fixes.

New Features

  • Apache Kudu (incubating) integration for Impala users. Will add some documentation here when possible.
  • Add use_https option to ibis.hdfs_connect for WebHDFS connections in secure (Kerberized) clusters without SSL enabled.
  • Correctly compile aggregate expressions involving multiple subqueries.

To explain this last point in more detail, suppose you had:

table = ibis.table([('flag', 'string'),
                    ('value', 'double')],
                   'tbl')

flagged = table[table.flag == '1']
unflagged = table[table.flag == '0']

fv = flagged.value
uv = unflagged.value

expr = (fv.mean() / fv.sum()) - (uv.mean() / uv.sum())

The last expression now generates the correct Impala or SQLite SQL:

SELECT t0.`tmp` - t1.`tmp` AS `tmp`
FROM (
  SELECT avg(`value`) / sum(`value`) AS `tmp`
  FROM tbl
  WHERE `flag` = '1'
) t0
  CROSS JOIN (
    SELECT avg(`value`) / sum(`value`) AS `tmp`
    FROM tbl
    WHERE `flag` = '0'
  ) t1

Bug Fixes

  • CHAR(n) and VARCHAR(n) Impala types now correctly map to Ibis string expressions
  • Fix inappropriate projection-join-filter expression rewrites resulting in incorrect generated SQL.
  • ImpalaClient.create_table correctly passes STORED AS PARQUET for format='parquet'.
  • Fixed several issues with Ibis dependencies (impyla, thriftpy, sasl, thrift_sasl), especially for secure clusters. Upgrading will pull in these new dependencies.
  • Do not fail in ibis.impala.connect when trying to create the temporary Ibis database if no HDFS connection passed.
  • Fix join predicate evaluation bug when column names overlap with table attributes.
  • Fix handling of fully-materialized joins (aka select * joins) in SQLAlchemy / SQLite.

Contributors

Thank you to all who contributed patches to this release.

$ git log v0.6.0..v0.7.0 --pretty=format:%aN | sort | uniq -c | sort -rn
    21 Wes McKinney
     1 Uri Laserson
     1 Kristopher Overholt

0.6 (2015-12-01)

This release brings expanded pandas and Impala integration, including support for managing partitioned tables in Impala. See the new Ibis for Impala Users guide for more on using Ibis with Impala.

The Ibis for SQL Programmers guide also was written since the 0.5 release.

This release also includes bug fixes affecting generated SQL correctness. All users should upgrade as soon as possible.

New Features

  • New integrated Impala functionality. See Ibis for Impala Users for more details on these things.
    • Improved Impala-pandas integration. Create tables or insert into existing tables from pandas DataFrame objects.
    • Partitioned table metadata management API. Add, drop, alter, and insert into table partitions.
    • Add is_partitioned property to ImpalaTable.
    • Added support for LOAD DATA DDL using the load_data function, also supporting partitioned tables.
    • Modify table metadata (location, format, SerDe properties etc.) using ImpalaTable.alter
    • Interrupting Impala expression execution with Control-C will attempt to cancel the running query with the server.
    • Set the compression codec (e.g. snappy) used with ImpalaClient.set_compression_codec.
    • Get and set query options for a client session with ImpalaClient.get_options and ImpalaClient.set_options.
    • Add ImpalaTable.metadata method that parses the output of the DESCRIBE FORMATTED DDL to simplify table metadata inspection.
    • Add ImpalaTable.stats and ImpalaTable.column_stats to see computed table and partition statistics.
    • Add CHAR and VARCHAR handling
    • Add refresh, invalidate_metadata DDL options and add incremental option to compute_stats for COMPUTE INCREMENTAL STATS.
  • Add substitute method for performing multiple value substitutions in an array or scalar expression.
  • Division is by default true division like Python 3 for all numeric data. This means for SQL systems that use C-style division semantics, the appropriate CAST will be automatically inserted in the generated SQL.
  • Easier joins on tables with overlapping column names. See Ibis for SQL Programmers.
  • Expressions like string_expr[:3] now work as expected.
  • Add coalesce instance method to all value expressions.
  • Passing limit=None to the execute method on expressions disables any default row limits.

API Changes

  • ImpalaTable.rename no longer mutates the calling table expression.

Contributors

$ git log v0.5.0..v0.6.0 --pretty=format:%aN | sort | uniq -c | sort -rn
46 Wes McKinney
 3 Uri Laserson
 1 Phillip Cloud
 1 mariusvniekerk
 1 Kristopher Overholt

0.5 (2015-09-10)

Highlights in this release are the SQLite, Python 3, Impala UDA support, and an asynchronous execution API. There are also many usability improvements, bug fixes, and other new features.

New Features

  • SQLite client and built-in function support
  • Ibis now supports Python 3.4 as well as 2.6 and 2.7
  • Ibis can utilize Impala user-defined aggregate (UDA) functions
  • SQLAlchemy-based translation toolchain to enable more SQL engines having SQLAlchemy dialects to be supported
  • Many window function usability improvements (nested analytic functions and deferred binding conveniences)
  • More convenient aggregation with keyword arguments in aggregate functions
  • Built preliminary wrapper API for MADLib-on-Impala
  • Add var and std aggregation methods and support in Impala
  • Add nullifzero numeric method for all SQL engines
  • Add rename method to Impala tables (for renaming tables in the Hive metastore)
  • Add close method to ImpalaClient for session cleanup (#533)
  • Add relabel method to table expressions
  • Add insert method to Impala tables
  • Add compile and verify methods to all expressions to test compilation and ability to compile (since many operations are unavailable in SQLite, for example)

API Changes

  • Impala Ibis client creation now uses only ibis.impala.connect, and ibis.make_client has been deprecated

Contributors

$ git log v0.4.0..v0.5.0 --pretty=format:%aN | sort | uniq -c | sort -rn
      55 Wes McKinney
      9 Uri Laserson
      1 Kristopher Overholt

0.4 (2015-08-14)

New Features

  • Add tooling to use Impala C++ scalar UDFs within Ibis (#262, #195)
  • Support and testing for Kerberos-enabled secure HDFS clusters
  • Many table functions can now accept functions as parameters (invoked on the calling table) to enhance composability and emulate late-binding semantics of languages (like R) that have non-standard evaluation (#460)
  • Add any, all, notany, and notall reductions on boolean arrays, as well as cumany and cumall
  • Using topk now produces an analytic expression that is executable (as an aggregation) but can also be used as a filter as before (#392, #91)
  • Added experimental database object \"usability layer\", see ImpalaClient.database.
  • Add TableExpr.info
  • Add compute_stats API to table expressions referencing physical Impala tables
  • Add explain method to ImpalaClient to show query plan for an expression
  • Add chmod and chown APIs to HDFS interface for superusers
  • Add convert_base method to strings and integer types
  • Add option to ImpalaClient.create_table to create empty partitioned tables
  • ibis.cross_join can now join more than 2 tables at once
  • Add ImpalaClient.raw_sql method for running naked SQL queries
  • ImpalaClient.insert now validates schemas locally prior to sending query to cluster, for better usability.
  • Add conda installation recipes

Contributors

$ git log v0.3.0..v0.4.0 --pretty=format:%aN | sort | uniq -c | sort -rn
     38 Wes McKinney
      9 Uri Laserson
      2 Meghana Vuyyuru
      2 Kristopher Overholt
      1 Marius van Niekerk

0.3 (2015-07-20)

First public release. See https://ibis-project.org for more.

New Features

  • Implement window / analytic function support
  • Enable non-equijoins (join clauses with operations other than ==).
  • Add remaining string functions supported by Impala.
  • Add pipe method to tables (hat-tip to the pandas dev team).
  • Add mutate convenience method to tables.
  • Fleshed out WebHDFS implementations: get/put directories, move files, etc. See the full HDFS API.
  • Add truncate method for timestamp values
  • ImpalaClient can execute scalar expressions not involving any table.
  • Can also create internal Impala tables with a specific HDFS path.
  • Make Ibis\'s temporary Impala database and HDFS paths configurable (see ibis.options).
  • Add truncate_table function to client (if the user\'s Impala cluster supports it).
  • Python 2.6 compatibility
  • Enable Ibis to execute concurrent queries in multithreaded applications (earlier versions were not thread-safe).
  • Test data load script in scripts/load_test_data.py
  • Add an internal operation type signature API to enhance developer productivity.

Contributors

$ git log v0.2.0..v0.3.0 --pretty=format:%aN | sort | uniq -c | sort -rn
     59 Wes McKinney
     29 Uri Laserson
      4 Isaac Hodes
      2 Meghana Vuyyuru

0.2 (2015-06-16)

New Features

  • insert method on Ibis client for inserting data into existing tables.
  • parquet_file, delimited_file, and avro_file client methods for querying datasets not yet available in Impala
  • New ibis.hdfs_connect method and HDFS client API for WebHDFS for writing files and directories to HDFS
  • New timedelta API and improved timestamp data support
  • New bucket and histogram methods on numeric expressions
  • New category logical datatype for handling bucketed data, among other things
  • Add summary API to numeric expressions
  • Add value_counts convenience API to array expressions
  • New string methods like, rlike, and contains for fuzzy and regex searching
  • Add options.verbose option and configurable options.verbose_log callback function for improved query logging and visibility
  • Support for new SQL built-in functions
    • ibis.coalesce
    • ibis.greatest and ibis.least
    • ibis.where for conditional logic (see also ibis.case and ibis.cases)
    • nullif method on value expressions
    • ibis.now
  • New aggregate functions: approx_median, approx_nunique, and group_concat
  • where argument in aggregate functions
  • Add having method to group_by intermediate object
  • Added group-by convenience table.group_by(exprs).COLUMN_NAME.agg_function()
  • Add default expression names to most aggregate functions
  • New Impala database client helper methods
    • create_database
    • drop_database
    • exists_database
    • list_databases
    • set_database
  • Client list_tables searching / listing method
  • Add add, sub, and other explicit arithmetic methods to value expressions

API Changes

  • New Ibis client and Impala connection workflow. Client now combined from an Impala connection and an optional HDFS connection

Bug Fixes

  • Numerous expression API bug fixes and rough edges fixed

Contributors

$ git log v0.1.0..v0.2.0 --pretty=format:%aN | sort | uniq -c | sort -rn
     71 Wes McKinney
      1 Juliet Hougland
      1 Isaac Hodes

0.1 (2015-03-26)

First Ibis release.

  • Expression DSL design and type system
  • Expression to ImpalaSQL compiler toolchain
  • Impala built-in function wrappers

    $ git log 84d0435..v0.1.0 --pretty=format:%aN | sort | uniq -c | sort -rn 78 Wes McKinney 1 srus 1 Henry Robinson


Last update: August 31, 2023