Lazy Mode and Logging¶

So far, we have seen Ibis in interactive mode. Interactive mode (also known as eager mode) makes Ibis return the results of an operation immediately.

In most cases, instead of using interactive mode, it makes more sense to use the default lazy mode. In lazy mode, Ibis won't be executing the operations automatically, but instead, will generate an expression to be executed at a later time.

Let's see this in practice, starting with the same example as in previous tutorials - the geography database.

In [1]:

            
                Copied!
                
!curl -LsS -o geography.db 'https://storage.googleapis.com/ibis-tutorial-data/geography.db'
!curl -LsS -o geography.db 'https://storage.googleapis.com/ibis-tutorial-data/geography.db'

In [2]:

            
                Copied!
                
import os
import tempfile

import ibis

connection = ibis.sqlite.connect(
    'geography.db'
)
countries = connection.table('countries')
import os
import tempfile

import ibis

connection = ibis.sqlite.connect(
    'geography.db'
)
countries = connection.table('countries')

In previous tutorials, we set interactive mode to True, and we obtained the result of every operation.

In [3]:

            
                Copied!
                
ibis.options.interactive = True

countries['name', 'continent', 'population'].limit(3)
ibis.options.interactive = True

countries['name', 'continent', 'population'].limit(3)

Out[3]:

┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ name                 ┃ continent ┃ population ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ string               │ string    │ int32      │
├──────────────────────┼───────────┼────────────┤
│ Andorra              │ EU        │      84000 │
│ United Arab Emirates │ AS        │    4975593 │
│ Afghanistan          │ AS        │   29121286 │
└──────────────────────┴───────────┴────────────┘

But now let's see what happens if we leave the interactive option to False (the default), and we operate in lazy mode.

In [4]:

            
                Copied!
                
ibis.options.interactive = False

countries['name', 'continent', 'population'].limit(3)
ibis.options.interactive = False

countries['name', 'continent', 'population'].limit(3)

Out[4]:

r0 := AlchemyTable: countries
  iso_alpha2  string
  iso_alpha3  string
  iso_numeric int32
  fips        string
  name        string
  capital     string
  area_km2    float64
  population  int32
  continent   string

r1 := Selection[r0]
  selections:
    name:       r0.name
    continent:  r0.continent
    population: r0.population

Limit[r1, n=3]

What we find is the graph of the expressions that would return the desired result instead of the result itself.

Let's analyze the expressions in the graph:

We query the countries table (all rows and all columns)
We select the name, continent and population columns
We limit the results to only the first 3 rows

Now consider that the data is in a database, possibly in a different host than the one executing Ibis. Also consider that the results returned to the user need to be moved to the memory of the host executing Ibis.

When using interactive (or eager) mode, if we perform one operation at a time, we would do:

We would move all the rows and columns from the backend (database, big data system, etc.) into memory
Once in memory, we would discard all the columns but name, continent and population
After that, we would discard all the rows, except the first 3

This is not very efficient. If you consider that the table can have millions of rows, backed by a big data system like Spark or Impala, this may not even be possible (not enough memory to load all the data).

The solution is to use lazy mode. In lazy mode, instead of obtaining the results after each operation, we build an expression (a graph) of all the operations that need to be done. After all the operations are recorded, the graph is sent to the backend which will perform the operation in an efficient way - only moving to memory the required data.

You can think of this as writing a shopping list and requesting someone to go to the supermarket and buy everything you need once the list is complete. As opposed as getting someone to bring all the products of the supermarket to your home and then return everything you don't want.

Let's continue with our example, save the expression in a variable countries_expression, and check its type.

In [5]:

            
                Copied!
                
countries_expression = countries['name', 'continent', 'population'].limit(3)
type(countries_expression)
countries_expression = countries['name', 'continent', 'population'].limit(3)
type(countries_expression)

Out[5]:

ibis.expr.types.relations.Table

The type is an Ibis TableExpr, since the result is a table (in a broad way, you can consider it a dataframe).

Now we have our query instructions (our expression, fetching only 3 columns and 3 rows) in the variable countries_expression.

At this point, nothing has been requested from the database. We have defined what we want to extract, but we didn't request it from the database yet. We can continue building our expression if we haven't finished yet. Or once we are done, we can simply request it from the database using the method .execute().

In [6]:

            
                Copied!
                
countries_expression.execute()
countries_expression.execute()

Out[6]:

	name	continent	population
0	Andorra	EU	84000
1	United Arab Emirates	AS	4975593
2	Afghanistan	AS	29121286

We can build other types of expressions, for example, one that instead of returning a table, returns a columns.

In [7]:

            
                Copied!
                
population_in_millions = (countries['population'] / 1_000_000).name(
    'population_in_millions'
)
population_in_millions
population_in_millions = (countries['population'] / 1_000_000).name(
    'population_in_millions'
)
population_in_millions

Out[7]:

r0 := AlchemyTable: countries
  iso_alpha2  string
  iso_alpha3  string
  iso_numeric int32
  fips        string
  name        string
  capital     string
  area_km2    float64
  population  int32
  continent   string

Selection[r0]
  selections:
    population_in_millions: r0.population / 1000000

If we check its type, we can see how it is a FloatingColumn expression.

In [8]:

            
                Copied!
                
type(population_in_millions)
type(population_in_millions)

Out[8]:

ibis.expr.types.numeric.FloatingColumn

We can combine the previous expression to be a column of a table expression.

In [9]:

            
                Copied!
                
countries['name', 'continent', population_in_millions].limit(3)
countries['name', 'continent', population_in_millions].limit(3)

Out[9]:

r0 := AlchemyTable: countries
  iso_alpha2  string
  iso_alpha3  string
  iso_numeric int32
  fips        string
  name        string
  capital     string
  area_km2    float64
  population  int32
  continent   string

r1 := Selection[r0]
  selections:
    name:                   r0.name
    continent:              r0.continent
    population_in_millions: r0.population / 1000000

Limit[r1, n=3]

Since we are in lazy mode (not interactive), those expressions don't request any data from the database unless explicitly requested with .execute().

Logging queries¶

For SQL backends (and for others when it makes sense), the query sent to the database can be logged. This can be done by setting the verbose option to True.

In [10]:

            
                Copied!
                
ibis.options.verbose = True

countries['name', 'continent', population_in_millions].limit(3).execute()
ibis.options.verbose = True

countries['name', 'continent', population_in_millions].limit(3).execute()

SELECT t0.name, t0.continent, t0.population / CAST(? AS FLOAT) AS population_in_millions 
FROM main.countries AS t0
 LIMIT ? OFFSET ?

Out[10]:

	name	continent	population_in_millions
0	Andorra	EU	0.084000
1	United Arab Emirates	AS	4.975593
2	Afghanistan	AS	29.121286

By default, the logging is done to the terminal, but we can process the query with a custom function. This allows us to save executed queries to a file, save to a database, send them to a web service, etc.

For example, to save queries to a file, we can write a custom function that given a query, saves it to a log file.

In [11]:

            
                Copied!
                
                    
                    
                
                

        
import datetime
import os
import tempfile
from pathlib import Path


def log_query_to_file(query: str) -> None:
    """
    Log queries to `data/tutorial_queries.log`.

    Each file is a query. Line breaks in the query are
    represented with the string '\n'.

    A timestamp of when the query is executed is added.
    """
    dirname = Path(tempfile.gettempdir())
    fname = dirname / 'tutorial_queries.log'
    query_in_a_single_line = query.replace('\n', r'\n')
    with fname.open(mode='a') as f:
        f.write(f'{query_in_a_single_line}\n')
import datetime
import os
import tempfile
from pathlib import Path


def log_query_to_file(query: str) -> None:
    """
    Log queries to `data/tutorial_queries.log`.

    Each file is a query. Line breaks in the query are
    represented with the string '\n'.

    A timestamp of when the query is executed is added.
    """
    dirname = Path(tempfile.gettempdir())
    fname = dirname / 'tutorial_queries.log'
    query_in_a_single_line = query.replace('\n', r'\n')
    with fname.open(mode='a') as f:
        f.write(f'{query_in_a_single_line}\n')

Then we can set the verbose_log option to the custom function, execute one query, wait one second, and execute another query.

In [12]:

            
                Copied!
                
import time

ibis.options.verbose_log = log_query_to_file

countries.execute()
time.sleep(1.0)
countries['name', 'continent', population_in_millions].limit(3).execute()
import time

ibis.options.verbose_log = log_query_to_file

countries.execute()
time.sleep(1.0)
countries['name', 'continent', population_in_millions].limit(3).execute()

Out[12]:

	name	continent	population_in_millions
0	Andorra	EU	0.084000
1	United Arab Emirates	AS	4.975593
2	Afghanistan	AS	29.121286

This has created a log file in data/tutorial_queries.log where the executed queries have been logged.

In [13]:

            
                Copied!
                
!cat -n data/tutorial_queries.log
!cat -n data/tutorial_queries.log

[bat error]: 'data/tutorial_queries.log': No such file or directory (os error 2)

Last update: January 5, 2023