Giter Site home page Giter Site logo

ibis-bigquery's Introduction

Change of Address

The BigQuery backend is now maintained as a first-party backend in Ibis.

You can find updated installation and usage instructions in the Ibis documentation.

Ibis BigQuery backend

This package provides a BigQuery backend for Ibis.

Installation

Supported Python Versions

Python >= 3.7, < 3.11

Unsupported Python Versions

Python < 3.7

Install with conda:

conda install -c conda-forge ibis-bigquery

Install with pip:

pip install ibis-bigquery

Usage

Connecting to BigQuery

Recommended usage (Ibis 2.x, only):

import ibis

conn = ibis.bigquery.connect(
    project_id=YOUR_PROJECT_ID,
    dataset_id='bigquery-public-data.stackoverflow'
)

Using this library directly:

import ibis
import ibis_bigquery

conn = ibis_bigquery.connect(
    project_id=YOUR_PROJECT_ID,
    dataset_id='bigquery-public-data.stackoverflow'
)

Running a query

edu_table = conn.table(
    'international_education',
    database='bigquery-public-data.world_bank_intl_education')
edu_table = edu_table['value', 'year', 'country_code', 'indicator_code']

country_table = conn.table(
    'country_code_iso',
    database='bigquery-public-data.utility_us')
country_table = country_table['country_name', 'alpha_3_code']

expression = edu_table.join(
    country_table,
    [edu_table.country_code == country_table.alpha_3_code])

print(conn.execute(
    expression[edu_table.year == 2016]
        # Adult literacy rate.
        [edu_table.indicator_code == 'SE.ADT.LITR.ZS']
        .sort_by([ibis.desc(edu_table.value)])
        .limit(20)
))

ibis-bigquery's People

Contributors

cpcloud avatar datapythonista avatar dbeatty10 avatar gerrymanoim avatar gforsyth avatar hussainsultan avatar hwhitney456 avatar loudinb avatar nehanene15 avatar release-please[bot] avatar renato2099 avatar santind avatar saschahofmann avatar seibs avatar timothydijamco avatar tswast avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ibis-bigquery's Issues

incompatible with ibis 1.4.0

      1 import ibis
----> 2 import ibis_bigquery
      3
      4 conn = ibis_bigquery.Backend().connect(
      5     project_id=YOUR_PROJECT_ID,

/usr/local/Caskroom/miniconda/base/envs/scratch/lib/python3.9/site-packages/ibis_bigquery/__init__.py in <module>
      7 import ibis.config
      8 import pydata_google_auth
----> 9 from ibis.backends.base import BaseBackend
     10 from pydata_google_auth import cache
     11

ModuleNotFoundError: No module named 'ibis.backends.base'

As we go through this transitional state, it might be useful in helping people upgrade if the backend were compatible across the 1.x / 2.x divide. (I'm thinking of pandas-gbq, which can be used directly with older versions of pandas, even if we expect most people to use it via pandas.read_gbq)

I'm also motivated to get this package out for 1.x users, as I was able to fix the UDFs implementation on Python 3.8+ in #6

CC @datapythonista In case you have thoughts about 1.x compatibility during the 2.x refactor.

mypy is failing

Run mypy --ignore-missing-imports .
  mypy --ignore-missing-imports .
  shell: /usr/bin/bash -e {0}
  env:
    pythonLocation: /opt/hostedtoolcache/Python/3.7.10/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.7.10/x64/lib
tests/udf/test_find.py:15: error: Incompatible types in assignment (expression has type "MarkDecorator", variable has type "List[MarkDecorator]")
ibis_bigquery/udf/__init__.py:18: error: Need type annotation for '_udf_name_cache'
tests/udf/test_core.py:16: error: Incompatible types in assignment (expression has type "MarkDecorator", variable has type "List[MarkDecorator]")
ibis_bigquery/client.py:365: error: Item "None" of "Optional[str]" has no attribute "split"
tests/udf/test_udf_execute.py:20: error: Incompatible types in assignment (expression has type "MarkDecorator", variable has type "List[MarkDecorator]")
Found 5 errors in 5 files (checked 19 source files)
Error: Process completed with exit code 1.

Wrong sql when adding a time interval to a date.


import ibis
import ibis_bigquery
from google.cloud import bigquery
import pandas as pd
import datetime as dt

my_table = 'my_table'

pdf = pd.DataFrame({'date': [dt.date(2020, 1, 1), dt.date(2021, 5, 15), dt.date(2021, 7, 9)]})

# Load client
client = bigquery.Client(project=my_project)

# Load data to BQ
client.delete_table(my_dataset + "." + my_table, not_found_ok=True)
job = client.load_table_from_dataframe(pdf, my_dataset + "." + my_table)

ibis.options.interactive = False

conn = ibis_bigquery.connect(
    project_id=my_project,
    dataset_id=my_dataset)

t = conn.table(my_table)

delta = dt.timedelta(days=6)

e1 = t.mutate(z=t.date + ibis.literal(delta))

print(ibis.bigquery.compile(e1))

"""
SELECT *, DATE_ADD(`date`, INTERVAL 6 days, 0:00:00 DAY) AS `z` FROM ...
"""

e1.execute()

And the exception is:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis/expr/types.py", line 223, in execute
    return execute(
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis/client.py", line 382, in execute
    return backend.execute(
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis/client.py", line 222, in execute
    result = self._execute_query(query, **kwargs)
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis/client.py", line 229, in _execute_query
    return query.execute()
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis_bigquery/client.py", line 196, in execute
    with self.client._execute(
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis_bigquery/client.py", line 482, in _execute
    query.result()  # blocks until finished
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py", line 1266, in result
    super(QueryJob, self).result(retry=retry, timeout=timeout)
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/google/cloud/bigquery/job/base.py", line 679, in result
    return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/google/api_core/future/polling.py", line 134, in result
    raise self._exception
google.api_core.exceptions.BadRequest: 400 Syntax error: Expected ")" but got ":" at [1:46]
(job ID: adf6f6be-fbf2-485f-ad6c-e651ab84d978)
                  -----Query Job SQL Follows-----                   
    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:SELECT *, DATE_ADD(`date`, INTERVAL 6 days, 0:00:00 DAY) AS `z`
   2:FROM ....
   3:LIMIT 10000
    |    .    |    .    |    .    |    .    |    .    |    .    |

test_connect failing

https://github.com/ibis-project/ibis-bigquery/runs/2327600625

=================================== FAILURES ===================================
______________________________ test_auth_default _______________________________

project_id = 'ibis-gbq'
credentials = <google.oauth2.service_account.Credentials object at 0x7f5737dab280>
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f57357ad580>

    def test_auth_default(project_id, credentials, monkeypatch):
        mock_calls = []
    
        def mock_default(*args, **kwargs):
            mock_calls.append((args, kwargs))
            return credentials, project_id
    
        monkeypatch.setattr(pydata_google_auth, "default", mock_default)
    
        bq_backend.connect(
            project_id=project_id, dataset_id='bigquery-public-data.stackoverflow',
        )
    
        assert len(mock_calls) == 1
        args, kwargs = mock_calls[0]
        assert len(args) == 1
        scopes = args[0]
>       assert scopes == bq_backend.SCOPES
E       AttributeError: 'Backend' object has no attribute 'SCOPES'

tests/system/test_connect.py:78: AttributeError
___________________________ test_auth_external_data ____________________________

project_id = 'ibis-gbq'
credentials = <google.oauth2.service_account.Credentials object at 0x7f5737dab280>
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f57357b6970>

    def test_auth_external_data(project_id, credentials, monkeypatch):
        mock_calls = []
    
        def mock_default(*args, **kwargs):
            mock_calls.append((args, kwargs))
            return credentials, project_id
    
        monkeypatch.setattr(pydata_google_auth, "default", mock_default)
    
        bq_backend.connect(
            project_id=project_id,
            dataset_id='bigquery-public-data.stackoverflow',
            auth_external_data=True,
        )
    
        assert len(mock_calls) == 1
        args, _ = mock_calls[0]
        assert len(args) == 1
        scopes = args[0]
>       assert scopes == bq_backend.EXTERNAL_DATA_SCOPES
E       AttributeError: 'Backend' object has no attribute 'EXTERNAL_DATA_SCOPES'

tests/system/test_connect.py:127: AttributeError

Looks like this can be fixed by updating the test to refer to the module instead of backend.

test_projection_fusion_only_peeks_at_immediate_parent failing

____________ test_projection_fusion_only_peeks_at_immediate_parent _____________

    def test_projection_fusion_only_peeks_at_immediate_parent():
        schema = [
            ('file_date', 'timestamp'),
            ('PARTITIONTIME', 'date'),
            ('val', 'int64'),
        ]
        table = ibis.table(schema, name='unbound_table')
        table = table[table.PARTITIONTIME < ibis.date('2017-01-01')]
        table = table.mutate(file_date=table.file_date.cast('date'))
        table = table[table.file_date < ibis.date('2017-01-01')]
        table = table.mutate(XYZ=table.val * 2)
        expr = table.join(table.view())[table]
        result = ibis.bigquery.compile(expr)
        expected = """\
    WITH t0 AS (
      SELECT *
      FROM unbound_table
      WHERE `PARTITIONTIME` < DATE '2017-01-01'
    ),
    t1 AS (
      SELECT CAST(`file_date` AS DATE) AS `file_date`, `PARTITIONTIME`, `val`
      FROM t0
    ),
    t2 AS (
      SELECT t1.*
      FROM t1
      WHERE t1.`file_date` < DATE '2017-01-01'
    ),
    t3 AS (
      SELECT *, `val` * 2 AS `XYZ`
      FROM t2
    )
    SELECT t3.*
    FROM t3
      CROSS JOIN t3 t4"""
>       assert result == expected
E       AssertionError: assert 'WITH t0 AS (...ER JOIN t3 t4' == 'WITH t0 AS (...SS JOIN t3 t4'
E         Skipping 328 identical leading characters in diff, use -v to show
E           FROM t3
E         -   CROSS JOIN t3 t4
E         ?   ^ ---
E         +   INNER JOIN t3 t4
E         ?   ^^^^

tests/test_compiler.py:402: AssertionError

Add STARTS_WITH and ENDS_WITH

We registered the two functions in our project and could make a quick PR to enable them in this repo. Let me know if that's something you're interested!

BigQuery UDFs: Builtins

Carryover from ibis-project/ibis#1470.

This issue is to track the list of Python builtins that should ship out of the box with the JavaScript UDF translator in ibis. Ideally we can implement all of them, but that might not be possible. We should track the various builtins and see what's within scope.

As of Python 3.6.5 here are all the builtin functions:

[BigQuery] test_string[split] is failing

Failing test

https://github.com/ibis-project/ibis/blob/master/ibis/tests/all/test_string.py#L230-L234

Test output

$ pytest ibis/tests/all/test_string.py::test_string[BigQuery-split]
======================================= test session starts ========================================
platform darwin -- Python 3.7.8, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /Users/swast/src/ibis, inifile: setup.cfg
plugins: forked-1.2.0, mock-3.1.1, cov-2.10.0, xdist-1.34.0
collected 1 item                                                                                   

ibis/tests/all/test_string.py F                                                              [100%]

============================================= FAILURES =============================================
___________________________________ test_string[BigQuery-split] ____________________________________

backend = <ibis.tests.backends.BigQuery object at 0x7fb9598ee950>
alltypes = BigQueryTable[table]
  name: swast-scratch.testing.functional_alltypes
  schema:
    index : int64
    Unnamed_0 : int...4
    date_string_col : string
    string_col : string
    timestamp_col : timestamp
    year : int64
    month : int64
df =       index  Unnamed_0    id  bool_col  ...  string_col           timestamp_col  year  month
0       300        300  1...
7299   7296       7296  3956      True  ...           6 2010-01-31 05:06:13.650  2010      1

[7300 rows x 15 columns]
result_func = <function <lambda> at 0x7fb9598f2f80>
expected_func = <function <lambda> at 0x7fb9598f3050>

    @pytest.mark.parametrize(
        ('result_func', 'expected_func'),
        [
...
          param(
                lambda t: t.date_string_col.split('/'),
                lambda t: t.date_string_col.str.split('/'),
                id='split',
            ),
            param(
                lambda t: ibis.literal('-').join(['a', t.string_col, 'c']),
                lambda t: 'a-' + t.string_col + '-c',
                id='join',
            ),
        ],
    )
    @pytest.mark.xfail_unsupported
    def test_string(backend, alltypes, df, result_func, expected_func):
        expr = result_func(alltypes)
        result = expr.execute()
    
        expected = backend.default_series_rename(expected_func(df))
>       backend.assert_series_equal(result, expected)

ibis/tests/all/test_string.py:248: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
ibis/tests/backends.py:146: in assert_series_equal
    left = left.sort_values().reset_index(drop=True)
../../miniconda3/envs/ibis-dev/lib/python3.7/site-packages/pandas/core/series.py:3167: in sort_values
    argsorted = _try_kind_sort(arr[good])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

arr = array([array(['06', '01', '09'], dtype=object),
       array(['06', '02', '09'], dtype=object),
       array(['06', '0...object),
       array(['01', '30', '10'], dtype=object),
       array(['01', '31', '10'], dtype=object)], dtype=object)

    def _try_kind_sort(arr):
        # easier to ask forgiveness than permission
        try:
            # if kind==mergesort, it can fail for object dtype
>           return arr.argsort(kind=kind)
E           ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

../../miniconda3/envs/ibis-dev/lib/python3.7/site-packages/pandas/core/series.py:3153: ValueError
========================================= warnings summary =========================================
ibis/tests/all/test_string.py::test_string[BigQuery-split]
  /Users/swast/src/ibis/ibis/bigquery/client.py:545: PendingDeprecationWarning: Client.dataset is deprecated and will be removed in a future version. Use a string like 'my_project.my_dataset' or a cloud.google.bigquery.DatasetReference object, instead.
    table_ref = self.client.dataset(dataset, project=project).table(name)

ibis/tests/all/test_string.py::test_string[BigQuery-split]
  /Users/swast/src/ibis/ibis/bigquery/client.py:432: PendingDeprecationWarning: Client.dataset is deprecated and will be removed in a future version. Use a string like 'my_project.my_dataset' or a cloud.google.bigquery.DatasetReference object, instead.
    dataset_ref = self.client.dataset(dataset, project=project)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
===================================== short test summary info ======================================
FAILED ibis/tests/all/test_string.py::test_string[BigQuery-split] - ValueError: The truth value o...
================================== 1 failed, 2 warnings in 4.55s ===================================

Operating on arrays of struct<int64, int64> in BigQuery

As a substitute for a map type, which is missing in BigQuery, we've implemented maps as arrays of <key, value> structs. Some of these keys and values are int64-typed.

Using Ibis, we can't flatten these arrays (#1146) and we can't write UDFs that accept these arrays as inputs (#1478; discussion in ibis-project/ibis#1469) to access members, because Ibis rejects int64s appearing anywhere in a BigQuery UDF signature. (We wrote a sql UDF for this use case outside Ibis.)

Trying to cast the struct members to another type (floats or strings) so that we can pass them to a UDF that Ibis accepts will fail on execution; BigQuery complains like:

BadRequest: 400 Casting between arrays with incompatible element types is not supported: Invalid cast from ARRAY<STRUCT<key INT64, value INT64>> to ARRAY<STRUCT<key STRING, value STRING>> at [17:33]

I think we're also prevented from doing anything baroque like calling TO_JSON_STRING on the column and having the UDF accept the JSON blob as a string because we can't call arbitrary BigQuery functions, though I see that's contemplated in the roadmap.

I would like to be able to define UDFs to operate on integer types. If you create a Javascript UDF signed to accept an int64 in BigQuery, BigQuery will actually pass in a string-encoded number. Maybe Ibis could accept a type named integer_string for the UDF definition to allow UDFs to work on these types while ensuring that a user's bought into this behavior?

Add `ibis_bigquery.__version__` property

Make version strings more specific

This is so that for example tarballs generated from ibis-bigquery are of the form ibis-bigquery-0.1.0+2.g1076c97.tar.gz (currently, the version format is just the version number, e.g. ibis-bigquery-0.1.0.tar.gz regardless of which specific commit the tarball is being built from)

The main Ibis repo's setup.py uses versioneer, and we may be able to do something similar here

TST: doctest in ibis.bigquery.udf.api is failing for python 3.8

A doctest in ibis.bigquery.udf.api is failing for python 3.8


============================================= FAILURES =============================================
_______________________________ [doctest] ibis.bigquery.udf.api.udf ________________________________
070     `the BigQuery documentation
071     <https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions#sql-type-encodings-in-javascript>`_.
072 
073     Examples
074     --------
075     >>> from ibis.bigquery import udf
076     >>> import ibis.expr.datatypes as dt
077     >>> @udf(input_type=[dt.double], output_type=dt.double)
UNEXPECTED EXCEPTION: NotImplementedError("'visit_Constant' nodes not yet implemented")
Traceback (most recent call last):

  File "/home/xmn/miniconda3/envs/ibis-py38/lib/python3.8/doctest.py", line 1329, in __run
    exec(compile(example.source, filename, "single",

  File "<doctest ibis.bigquery.udf.api.udf[3]>", line 2, in <module>

  File "/home/xmn/dev/quansight/ibis-project/ibis/ibis/bigquery/udf/api.py", line 215, in wrapper
    source = PythonToJavaScriptTranslator(f).compile()

  File "/home/xmn/dev/quansight/ibis-project/ibis/ibis/bigquery/udf/core.py", line 133, in compile
    return self.visit(self.ast)

  File "/home/xmn/dev/quansight/ibis-project/ibis/ibis/bigquery/udf/core.py", line 146, in visit
    result = method(node)

  File "/home/xmn/dev/quansight/ibis-project/ibis/ibis/bigquery/udf/core.py", line 413, in visit_Module
    return '\n\n'.join(map(self.visit, node.body))

  File "/home/xmn/dev/quansight/ibis-project/ibis/ibis/bigquery/udf/core.py", line 146, in visit
    result = method(node)

  File "/home/xmn/dev/quansight/ibis-project/ibis/ibis/bigquery/udf/core.py", line 211, in visit_FunctionDef
    body = indent(map(self.visit, node.body))

  File "/home/xmn/dev/quansight/ibis-project/ibis/ibis/bigquery/udf/core.py", line 49, in indent
    text = '\n'.join(lines)

  File "/home/xmn/dev/quansight/ibis-project/ibis/ibis/bigquery/udf/core.py", line 146, in visit
    result = method(node)

  File "/home/xmn/dev/quansight/ibis-project/ibis/ibis/bigquery/udf/core.py", line 63, in wrapper
    return f(*args, **kwargs) + ';'

  File "/home/xmn/dev/quansight/ibis-project/ibis/ibis/bigquery/udf/core.py", line 232, in visit_Return
    return 'return {}'.format(self.visit(node.value))

  File "/home/xmn/dev/quansight/ibis-project/ibis/ibis/bigquery/udf/core.py", line 146, in visit
    result = method(node)

  File "/home/xmn/dev/quansight/ibis-project/ibis/ibis/bigquery/udf/core.py", line 273, in visit_BinOp
    self.visit(left), self.visit(op), self.visit(right)

  File "/home/xmn/dev/quansight/ibis-project/ibis/ibis/bigquery/udf/core.py", line 141, in visit
    raise NotImplementedError(

NotImplementedError: 'visit_Constant' nodes not yet implemented

No translation rule for date/datetime diff.

Substracting 2 dates or 2 timestamp fails:

import ibis
import ibis_bigquery
from google.cloud import bigquery
import pandas as pd
import datetime as dt

my_table = 'my_table'

# pdf = pd.DataFrame({'date': [dt.date(2020, 1, 1), dt.date(2021, 5, 15), dt.date(2021, 7, 9)]})

pdf = pd.DataFrame({'date': [dt.datetime(2020, 1, 1), dt.datetime(2021, 5, 15), dt.datetime(2021, 7, 9)]})

# Load client
client = bigquery.Client(project=my_project)

# Load data to BQ
client.delete_table(my_dataset + "." + my_table, not_found_ok=True)
job = client.load_table_from_dataframe(pdf, my_dataset + "." + my_table)

ibis.options.interactive = False

conn = ibis_bigquery.connect(
    project_id=my_project,
    dataset_id=my_dataset)

t = conn.table(my_table)

e1 = t.mutate(z=t.date.sub(t.date))

print(ibis.bigquery.compile(e1))

And here is the exception:

Traceback (most recent call last):
  File "<input>", line 3, in <module>
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis/backends/bigquery/__init__.py", line 38, in compile
    return to_sql(expr, dialect.make_context(params=params))
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis/backends/bigquery/compiler.py", line 86, in to_sql
    compiled = query_ast.compile()
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/compiler.py", line 55, in compile
    compiled_queries = [q.compile() for q in self.queries]
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/compiler.py", line 55, in <listcomp>
    compiled_queries = [q.compile() for q in self.queries]
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/compiler.py", line 1650, in compile
    select_frag = self.format_select_set()
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/compiler.py", line 1705, in format_select_set
    expr_str = self._translate(expr, named=True)
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/compiler.py", line 1598, in _translate
    return translator.get_result()
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/compiler.py", line 1362, in get_result
    translated = self.translate(self.expr)
  File "/Users/V3/windev/carrefour/gcp/venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/compiler.py", line 1403, in translate
    raise com.OperationNotDefinedError(
ibis.common.exceptions.OperationNotDefinedError: No translation rule for <class 'ibis.expr.operations.TimestampDiff'>

isort is failing

Run isort --check-only .
  isort --check-only .
  shell: /usr/bin/bash -e {0}
  env:
    pythonLocation: /opt/hostedtoolcache/Python/3.7.10/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.7.10/x64/lib
ERROR: /home/runner/work/ibis-bigquery/ibis-bigquery/setup.py Imports are incorrectly sorted and/or formatted.
ERROR: /home/runner/work/ibis-bigquery/ibis-bigquery/ibis_bigquery/datatypes.py Imports are incorrectly sorted and/or formatted.
ERROR: /home/runner/work/ibis-bigquery/ibis-bigquery/ibis_bigquery/compiler.py Imports are incorrectly sorted and/or formatted.
ERROR: /home/runner/work/ibis-bigquery/ibis-bigquery/ibis_bigquery/client.py Imports are incorrectly sorted and/or formatted.
ERROR: /home/runner/work/ibis-bigquery/ibis-bigquery/ibis_bigquery/__init__.py Imports are incorrectly sorted and/or formatted.
ERROR: /home/runner/work/ibis-bigquery/ibis-bigquery/tests/test_datatypes.py Imports are incorrectly sorted and/or formatted.
ERROR: /home/runner/work/ibis-bigquery/ibis-bigquery/tests/test_connect.py Imports are incorrectly sorted and/or formatted.
ERROR: /home/runner/work/ibis-bigquery/ibis-bigquery/tests/conftest.py Imports are incorrectly sorted and/or formatted.
Skipped 2 files
ERROR: /home/runner/work/ibis-bigquery/ibis-bigquery/tests/test_compiler.py Imports are incorrectly sorted and/or formatted.
ERROR: /home/runner/work/ibis-bigquery/ibis-bigquery/tests/test_client.py Imports are incorrectly sorted and/or formatted.
ERROR: /home/runner/work/ibis-bigquery/ibis-bigquery/tests/udf/test_core.py Imports are incorrectly sorted and/or formatted.
ERROR: /home/runner/work/ibis-bigquery/ibis-bigquery/tests/udf/test_udf_execute.py Imports are incorrectly sorted and/or formatted.
ERROR: /home/runner/work/ibis-bigquery/ibis-bigquery/tests/udf/test_find.py Imports are incorrectly sorted and/or formatted.
Error: Process completed with exit code 1.

CI: use conda to create Python environments instead of GitHub Actions Python matrix

By using conda, we should be able to use an editable install without all the permissions issues we were encountering in #38 This should allow us to remove this hack:

# See https://github.com/pypa/pip/issues/7953
echo "import site
import sys
site.ENABLE_USER_SITE = '--user' in sys.argv[1:]
$(cat ./ibis/setup.py)" > ./ibis/setup.py

See: ibis-project/ibis#2738

document how to use ibis-bigquery package directly

Recommended way should be ibis.bigquery.connect() (or is it ibis.backends.bigquery.connect()?), but also can use ibis_bigquery.client.BigQueryClient directly.

This should hopefully allow us to be compatible with more versions of Ibis and be able to release before Ibis 2.0.

Support for BIGNUMERIC type

Hey--I've been really enjoying using ibis with bigquery! I've noticed that for bigquery BIGNUMERIC columns, it throws this error when connecting...

SyntaxError: Type cannot be parsed: BIGNUMERIC

It looks like google-cloud-bigquery has configuration for BIGNUMERIC--does this mean that supporting it in this library requires these two changes?:

BigQuery: support for the BigQuery Storage API for faster reads of large query results

See: https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas

There is now an optional parameter on to_dataframe to pass in a BigQuery Storage client for faster reads of large query results. In pandas-gbq, we pass this client in when use_bqstorage_api=True (added in googleapis/python-bigquery-pandas#270). I imagine in Ibis, we'd want to set this option at the initial connection time, as another parameter to ibis.bigquery.connect.

If there's anything I can do on the google-cloud-bigquery package to make this easier, I'd be glad to hear it.

test with multiple versions of ibis

We may explicitly decide to only support ibis-framework 2.0+, but even if so we'll want to test against more than just the version from GitHub on our CI.

ENH: Persistent UDFs for BigQuery

It's very nice to have a python interface for defining UDFs that are compiled to javascript as temporary functions.
It would be even more useful if these UDFs could be registered as persistent UDFs (which can be used in logical views).
This would allow data scientists w/o javascript skills to inject basic python logic for common tasks like string parsing into logical views.
CC: @tswast

BigQuery: groupby can generate LEFT SEMI JOIN, which is not supported syntax

Failing test

https://github.com/ibis-project/ibis/blob/a70d443c7931cb8bb47c52f97999589566e03cb2/ibis/tests/all/test_aggregation.py#L266-L293

Test output

$ pytest ibis/tests/all/test_aggregation.py::test_topk_filter_op[BigQuery-string_col_filter_top3]

Output:

======================================================= test session starts =======================================================
platform darwin -- Python 3.7.8, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /Users/swast/src/ibis, inifile: setup.cfg
plugins: forked-1.2.0, mock-3.1.1, cov-2.10.0, xdist-1.34.0
collected 1 item                                                                                                                  

ibis/tests/all/test_aggregation.py F                                                                                        [100%]

============================================================ FAILURES =============================================================
______________________________________ test_topk_filter_op[BigQuery-string_col_filter_top3] _______________________________________

backend = <ibis.tests.backends.BigQuery object at 0x7f818a7bb710>
alltypes = BigQueryTable[table]
  name: swast-scratch.testing.functional_alltypes
  schema:
    index : int64
    Unnamed_0 : int...4
    date_string_col : string
    string_col : string
    timestamp_col : timestamp
    year : int64
    month : int64
df =       index  Unnamed_0    id  bool_col  tinyint_col  ...  date_string_col  string_col           timestamp_col  year  m...    False            9  ...         01/31/10           9 2010-01-31 05:09:13.860  2010      1

[7300 rows x 15 columns]
result_fn = <function <lambda> at 0x7f818a803560>, expected_fn = <function <lambda> at 0x7f818a8035f0>

    @pytest.mark.parametrize(
        ('result_fn', 'expected_fn'),
        [
            param(
                lambda t: t[t.string_col.topk(3)],
                lambda t: t[
                    t.string_col.isin(
                        t.groupby('string_col')['string_col'].count().head(3).index
                    )
                ],
                id='string_col_filter_top3',
            )
        ],
    )
    @pytest.mark.xfail_unsupported
    # Issues ibis-project/ibis#2133 ibis-project/ibis#2132# ibis-project/ibis#2133
    @pytest.mark.xfail_backends([Clickhouse, MySQL, Postgres])
    @pytest.mark.skip_backends([SQLite], reason='Issue ibis-project/ibis#2128')
    def test_topk_filter_op(backend, alltypes, df, result_fn, expected_fn):
        # TopK expression will order rows by "count" but each backend
        # can have different result for that.
        # Note: Maybe would be good if TopK could order by "count"
        # and the field used by TopK
        t = alltypes.sort_by(alltypes.string_col)
        df = df.sort_values('string_col')
>       result = result_fn(t).execute()

ibis/tests/all/test_aggregation.py:291: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ibis/expr/types.py:219: in execute
    self, limit=limit, timecontext=timecontext, params=params, **kwargs
ibis/client.py:368: in execute
    return backend.execute(expr, limit=limit, params=params, **kwargs)
ibis/client.py:221: in execute
    result = self._execute_query(query, **kwargs)
ibis/client.py:228: in _execute_query
    return query.execute()
ibis/bigquery/client.py:194: in execute
    query_parameters=self.query_parameters,
ibis/bigquery/client.py:475: in _execute
    query.result()  # blocks until finished
../../miniconda3/envs/ibis-dev/lib/python3.7/site-packages/google/cloud/bigquery/job.py:3207: in result
    super(QueryJob, self).result(retry=retry, timeout=timeout)
../../miniconda3/envs/ibis-dev/lib/python3.7/site-packages/google/cloud/bigquery/job.py:812: in result
    return super(_AsyncJob, self).result(timeout=timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <google.cloud.bigquery.job.QueryJob object at 0x7f818aa0f110>, timeout = None

    def result(self, timeout=None):
        """Get the result of the operation, blocking if necessary.
    
        Args:
            timeout (int):
                How long (in seconds) to wait for the operation to complete.
                If None, wait indefinitely.
    
        Returns:
            google.protobuf.Message: The Operation's result.
    
        Raises:
            google.api_core.GoogleAPICallError: If the operation errors or if
                the timeout is reached before the operation completes.
        """
        self._blocking_poll(timeout=timeout)
    
        if self._exception is not None:
            # pylint: disable=raising-bad-type
            # Pylint doesn't recognize that this is valid in this case.
>           raise self._exception
E           google.api_core.exceptions.BadRequest: 400 Syntax error: Expected keyword JOIN but got identifier "SEMI" at [7:8]
E           
E           (job ID: fa67b0d0-eae0-4c80-9fff-3a9ece611d55)
E           
E                          -----Query Job SQL Follows-----                
E           
E               |    .    |    .    |    .    |    .    |    .    |
E              1:SELECT t0.*
E              2:FROM (
E              3:  SELECT *
E              4:  FROM `swast-scratch.testing.functional_alltypes`
E              5:  ORDER BY `string_col`
E              6:) t0
E              7:  LEFT SEMI JOIN (
E              8:    SELECT *
E              9:    FROM (
E             10:      SELECT `string_col`, count(`string_col`) AS `count`
E             11:      FROM `swast-scratch.testing.functional_alltypes`
E             12:      GROUP BY 1
E             13:      ORDER BY `string_col`
E             14:    ) t2
E             15:    ORDER BY `count` DESC
E             16:    LIMIT 3
E             17:  ) t1
E             18:    ON t0.`string_col` = t1.`string_col`
E             19:LIMIT 10000
E               |    .    |    .    |    .    |    .    |    .    |

../../miniconda3/envs/ibis-dev/lib/python3.7/site-packages/google/api_core/future/polling.py:130: BadRequest
======================================================== warnings summary =========================================================
ibis/tests/all/test_aggregation.py::test_topk_filter_op[BigQuery-string_col_filter_top3]
  /Users/swast/src/ibis/ibis/bigquery/client.py:545: PendingDeprecationWarning: Client.dataset is deprecated and will be removed in a future version. Use a string like 'my_project.my_dataset' or a cloud.google.bigquery.DatasetReference object, instead.
    table_ref = self.client.dataset(dataset, project=project).table(name)

ibis/tests/all/test_aggregation.py::test_topk_filter_op[BigQuery-string_col_filter_top3]
  /Users/swast/src/ibis/ibis/bigquery/client.py:432: PendingDeprecationWarning: Client.dataset is deprecated and will be removed in a future version. Use a string like 'my_project.my_dataset' or a cloud.google.bigquery.DatasetReference object, instead.
    dataset_ref = self.client.dataset(dataset, project=project)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
===================================================== short test summary info =====================================================
FAILED ibis/tests/all/test_aggregation.py::test_topk_filter_op[BigQuery-string_col_filter_top3] - google.api_core.exceptions.Bad...
================================================== 1 failed, 2 warnings in 4.02s ==================================================

Bigquery tests failing

The Bigquery build seems to be finally working correctly, but some of the tests are failing:

FAILED ibis/tests/all/test_aggregation.py::test_reduction_ops[BigQuery-no_cond-covar]
FAILED ibis/tests/all/test_aggregation.py::test_reduction_ops[BigQuery-is_in-covar]
FAILED ibis/tests/all/test_aggregation.py::test_topk_filter_op[BigQuery-string_col_filter_top3]
FAILED ibis/tests/all/test_array.py::test_array_concat[BigQuery] - ValueError...
FAILED ibis/tests/all/test_generic.py::test_fillna_nullif[BigQuery-expr2-None]
FAILED ibis/tests/all/test_numeric.py::test_complex_math_functions_columns[BigQuery-ln]
FAILED ibis/tests/all/test_numeric.py::test_complex_math_functions_columns[BigQuery-log10]
FAILED ibis/tests/all/test_string.py::test_string[BigQuery-split] - ValueErro...
FAILED ibis/tests/all/test_temporal.py::test_timestamp_extract[BigQuery-epoch_seconds]
FAILED ibis/bigquery/tests/test_client.py::test_scalar_param_array - ValueErr...
FAILED ibis/bigquery/tests/test_client.py::test_scalar_param_nested - Asserti...
FAILED ibis/bigquery/tests/test_client.py::test_cross_project_query - google....
FAILED ibis/bigquery/tests/test_client.py::test_approx_median - assert 6 == 7.0
FAILED ibis/bigquery/tests/test_compiler.py::test_cov - AssertionError: asser...

See https://github.com/ibis-project/ibis/runs/1067434574?check_suite_focus=true#step:5:1981

@tswast do you want to have a look?

document CONTRIBUTING steps

We're using "conventional commits" and "release-please" to manage the CHANGELOG here, so commit subjects and PR titles need to be user-facing

Some notes I've written from another project that may be useful to include:

Conventional Commits

This project uses Conventional
Commits
to manage the
CHANGELOG and releases.

Allowed commit prefixes are defined in the release-please source
code
:

User-facing commits

  • feat: section: 'Features'
  • fix: section: 'Bug Fixes'
  • perf: section: 'Performance Improvements'
  • deps: section: 'Dependencies'
  • revert: section: 'Reverts'
  • docs: section: 'Documentation'

Hidden commits (not shown in CHANGELOG)

  • style: section: 'Styles', hidden: true
  • chore: section: 'Miscellaneous Chores', hidden: true
  • refactor: section: 'Code Refactoring', hidden: true
  • test: section: 'Tests', hidden: true
  • build: section: 'Build System', hidden: true
  • ci: section: 'Continuous Integration', hidden: true

Register pivot

Could someone point me to where I could see an example for registering a new BigQuery operator to ibis?

More specifically, I would like to add the new pivot operator.

I imagine it to be called like this

table = con.table('table')
pivot = table.pivot(index='family', column={'type': ['fire', 'water']}, values=['sum', 'strength'])

And I would expect it to produce something like

SELECT * FROM
  (SELECT * FROM `table`)
  PIVOT(SUM(strength) FOR type IN ('fire', 'water', ))

I had a quick look at compiler.py but it isn't clear to me how to achieve this.

flake8 is failing

Run flake8 .
./ibis_bigquery/udf/__init__.py:19:16: E203 whitespace before ':'
./ibis_bigquery/udf/__init__.py:19:80: E501 line too long (85 > 79 characters)
./tests/unit/test_client.py:10:80: E501 line too long (81 > 79 characters)
./tests/udf/test_core.py:6:1: F401 'ibis.compat.PY38' imported but unused
./tests/udf/test_udf_execute.py:8:1: F401 'ibis.compat.PY38' imported but unused
./tests/udf/test_find.py:3:1: F401 'pytest' imported but unused
./tests/udf/test_find.py:4:1: F401 'ibis.compat.PY38' imported but unused
Error: Process completed with exit code 1.

pydocstyle is failing

Run pydocstyle .
  pydocstyle .
  shell: /usr/bin/bash -e {0}
  env:
    pythonLocation: /opt/hostedtoolcache/Python/3.7.10/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.7.10/x64/lib
./ibis_bigquery/datatypes.py:1 at module level:
        D100: Missing docstring in public module
./ibis_bigquery/datatypes.py:7 in public class `TypeTranslationContext`:
        D205: 1 blank line required between summary line and description (found 0)
./ibis_bigquery/datatypes.py:7 in public class `TypeTranslationContext`:
        D400: First line should end with a period (not 's')
./ibis_bigquery/datatypes.py:20 in public class `UDFContext`:
        D101: Missing docstring in public class
./ibis_bigquery/datatypes.py:28 in public function `trans_string_default`:
        D103: Missing docstring in public function
./ibis_bigquery/datatypes.py:33 in public function `trans_default`:
        D103: Missing docstring in public function
./ibis_bigquery/datatypes.py:38 in public function `trans_string_context`:
        D103: Missing docstring in public function
./ibis_bigquery/datatypes.py:43 in public function `trans_float64`:
        D103: Missing docstring in public function
./ibis_bigquery/datatypes.py:48 in public function `trans_integer`:
        D103: Missing docstring in public function
./ibis_bigquery/datatypes.py:53 in public function `trans_binary`:
        D103: Missing docstring in public function
./ibis_bigquery/datatypes.py:60 in public function `trans_lossy_integer`:
        D103: Missing docstring in public function
./ibis_bigquery/datatypes.py:67 in public function `trans_array`:
        D103: Missing docstring in public function
./ibis_bigquery/datatypes.py:74 in public function `trans_struct`:
        D103: Missing docstring in public function
./ibis_bigquery/datatypes.py:86 in public function `trans_date`:
        D103: Missing docstring in public function
./ibis_bigquery/datatypes.py:91 in public function `trans_timestamp`:
        D103: Missing docstring in public function
./ibis_bigquery/datatypes.py:98 in public function `trans_type`:
        D103: Missing docstring in public function
./ibis_bigquery/datatypes.py:103 in public function `trans_integer_udf`:
        D103: Missing docstring in public function
./ibis_bigquery/datatypes.py:114 in public function `trans_numeric`:
        D103: Missing docstring in public function
./ibis_bigquery/datatypes.py:124 in public function `trans_numeric_udf`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:1 at module level:
        D100: Missing docstring in public module
./ibis_bigquery/compiler.py:33 in public function `build_ast`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:38 in public class `BigQueryUDFNode`:
        D101: Missing docstring in public class
./ibis_bigquery/compiler.py:42 in public class `BigQuerySelectBuilder`:
        D101: Missing docstring in public class
./ibis_bigquery/compiler.py:48 in public class `BigQueryUDFDefinition`:
        D101: Missing docstring in public class
./ibis_bigquery/compiler.py:53 in public method `compile`:
        D102: Missing docstring in public method
./ibis_bigquery/compiler.py:57 in public class `BigQueryUnion`:
        D101: Missing docstring in public class
./ibis_bigquery/compiler.py:59 in public method `keyword`:
        D102: Missing docstring in public method
./ibis_bigquery/compiler.py:63 in public function `find_bigquery_udf`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:71 in public class `BigQueryQueryBuilder`:
        D101: Missing docstring in public class
./ibis_bigquery/compiler.py:76 in public method `generate_setup_queries`:
        D102: Missing docstring in public method
./ibis_bigquery/compiler.py:89 in public class `BigQueryContext`:
        D101: Missing docstring in public class
./ibis_bigquery/compiler.py:113 in public function `bigquery_cast_timestamp_to_integer`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:118 in public function `bigquery_cast_generate`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:436 in public class `BigQueryExprTranslator`:
        D101: Missing docstring in public class
./ibis_bigquery/compiler.py:454 in public function `bigquery_day_of_week_index`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:461 in public function `bigquery_day_of_week_name`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:467 in public function `bigquery_compiles_divide`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:472 in public function `compiles_strftime`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:491 in public function `compiles_string_to_timestamp`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:503 in public class `BigQueryTableSetFormatter`:
        D101: Missing docstring in public class
./ibis_bigquery/compiler.py:510 in public class `BigQuerySelect`:
        D101: Missing docstring in public class
./ibis_bigquery/compiler.py:515 in public method `table_set_formatter`:
        D102: Missing docstring in public method
./ibis_bigquery/compiler.py:520 in public function `identical_to`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:526 in public function `log2`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:532 in public function `bq_sum`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:542 in public function `bq_mean`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:555 in public function `compiles_timestamp_from_unix`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:561 in public function `compiles_floor`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:568 in public function `compiles_approx`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:582 in public function `compiles_covar`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:610 in public function `bigquery_any_all_no_op`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:615 in public function `bigquery_compile_any`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:620 in public function `bigquery_compile_notany`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:627 in public function `bigquery_compile_all`:
        D103: Missing docstring in public function
./ibis_bigquery/compiler.py:632 in public function `bigquery_compile_notall`:
        D103: Missing docstring in public function
./ibis_bigquery/client.py:170 in public class `BigQueryQuery`:
        D101: Missing docstring in public class
./ibis_bigquery/client.py:190 in public method `execute`:
        D102: Missing docstring in public method
./ibis_bigquery/client.py:210 in public function `bq_param_struct`:
        D103: Missing docstring in public function
./ibis_bigquery/client.py:217 in public function `bq_param_array`:
        D103: Missing docstring in public function
./ibis_bigquery/client.py:245 in public function `bq_param_timestamp`:
        D103: Missing docstring in public function
./ibis_bigquery/client.py:256 in public function `bq_param_string`:
        D103: Missing docstring in public function
./ibis_bigquery/client.py:261 in public function `bq_param_integer`:
        D103: Missing docstring in public function
./ibis_bigquery/client.py:266 in public function `bq_param_double`:
        D103: Missing docstring in public function
./ibis_bigquery/client.py:271 in public function `bq_param_boolean`:
        D103: Missing docstring in public function
./ibis_bigquery/client.py:276 in public function `bq_param_date_string`:
        D103: Missing docstring in public function
./ibis_bigquery/client.py:281 in public function `bq_param_date_datetime`:
        D103: Missing docstring in public function
./ibis_bigquery/client.py:286 in public function `bq_param_date`:
        D103: Missing docstring in public function
./ibis_bigquery/client.py:290 in public class `BigQueryTable`:
        D101: Missing docstring in public class
./ibis_bigquery/client.py:294 in public function `rename_partitioned_column`:
        D103: Missing docstring in public function
./ibis_bigquery/client.py:423 in public method `project_id`:
        D102: Missing docstring in public method
./ibis_bigquery/client.py:427 in public method `dataset_id`:
        D102: Missing docstring in public method
./ibis_bigquery/client.py:430 in public method `table`:
        D102: Missing docstring in public method
./ibis_bigquery/client.py:479 in public method `database`:
        D102: Missing docstring in public method
./ibis_bigquery/client.py:489 in public method `current_database`:
        D102: Missing docstring in public method
./ibis_bigquery/client.py:492 in public method `set_database`:
        D102: Missing docstring in public method
./ibis_bigquery/client.py:495 in public method `exists_database`:
        D102: Missing docstring in public method
./ibis_bigquery/client.py:506 in public method `list_databases`:
        D102: Missing docstring in public method
./ibis_bigquery/client.py:520 in public method `exists_table`:
        D102: Missing docstring in public method
./ibis_bigquery/client.py:532 in public method `list_tables`:
        D102: Missing docstring in public method
./ibis_bigquery/client.py:546 in public method `get_schema`:
        D102: Missing docstring in public method
./ibis_bigquery/client.py:553 in public method `version`:
        D102: Missing docstring in public method
./ibis_bigquery/__init__.py:39 in public class `Backend`:
        D101: Missing docstring in public class
./ibis_bigquery/__init__.py:143 in public method `register_options`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/rewrite.py:1 at module level:
        D100: Missing docstring in public module
./ibis_bigquery/udf/rewrite.py:46 in public method `register`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/rewrite.py:53 in public method `__call__`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:1 at module level:
        D200: One-line docstring should fit on one line with quotes (found 2)
./ibis_bigquery/udf/core.py:1 at module level:
        D400: First line should end with a period (not 't')
./ibis_bigquery/udf/core.py:27 in public method `__getitem__`:
        D105: Missing docstring in magic method
./ibis_bigquery/udf/core.py:70 in public function `rewrite_print`:
        D103: Missing docstring in public function
./ibis_bigquery/udf/core.py:83 in public function `rewrite_len`:
        D103: Missing docstring in public function
./ibis_bigquery/udf/core.py:89 in public function `rewrite_append`:
        D103: Missing docstring in public function
./ibis_bigquery/udf/core.py:100 in public function `rewrite_array_from`:
        D103: Missing docstring in public function
./ibis_bigquery/udf/core.py:108 in public class `PythonToJavaScriptTranslator`:
        D101: Missing docstring in public class
./ibis_bigquery/udf/core.py:133 in public method `compile`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:136 in public method `visit`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:150 in public method `visit_Name`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:155 in public method `visit_Yield`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:159 in public method `visit_YieldFrom`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:164 in public method `visit_Assign`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:189 in public method `translate_special_method`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:192 in public method `visit_FunctionDef`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:232 in public method `visit_Return`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:235 in public method `visit_Add`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:238 in public method `visit_Sub`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:241 in public method `visit_Mult`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:244 in public method `visit_Div`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:247 in public method `visit_FloorDiv`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:250 in public method `visit_Pow`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:253 in public method `visit_UnaryOp`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:256 in public method `visit_USub`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:259 in public method `visit_UAdd`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:262 in public method `visit_BinOp`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:277 in public method `visit_NameConstant`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:290 in public method `visit_Str`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:293 in public method `visit_Num`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:296 in public method `visit_List`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:299 in public method `visit_Tuple`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:303 in public method `visit_Dict`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:312 in public method `visit_Expr`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:315 in public method `visit_Starred`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:318 in public method `visit_Call`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:330 in public method `visit_Attribute`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:333 in public method `visit_For`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:344 in public method `visit_While`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:352 in public method `visit_Break`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:356 in public method `visit_Continue`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:359 in public method `visit_Eq`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:362 in public method `visit_NotEq`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:365 in public method `visit_Or`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:368 in public method `visit_And`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:371 in public method `visit_BoolOp`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:378 in public method `visit_Lt`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:381 in public method `visit_LtE`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:384 in public method `visit_Gt`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:387 in public method `visit_GtE`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:390 in public method `visit_Compare`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:406 in public method `visit_AugAssign`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:413 in public method `visit_Module`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:416 in public method `visit_arg`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:421 in public method `visit_arguments`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:428 in public method `visit_Lambda`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:435 in public method `local_scope`:
        D200: One-line docstring should fit on one line with quotes (found 2)
./ibis_bigquery/udf/core.py:443 in public method `visit_If`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:457 in public method `visit_IfExp`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:464 in public method `visit_Index`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:467 in public method `visit_Subscript`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:470 in public method `visit_ClassDef`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:486 in public method `visit_Not`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:490 in public method `visit_ListComp`:
        D400: First line should end with a period (not 'n')
./ibis_bigquery/udf/core.py:539 in public method `visit_Delete`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/core.py:553 in public function `my_func`:
        D103: Missing docstring in public function
./ibis_bigquery/udf/find.py:1 at module level:
        D100: Missing docstring in public module
./ibis_bigquery/udf/find.py:7 in public class `NameFinder`:
        D200: One-line docstring should fit on one line with quotes (found 2)
./ibis_bigquery/udf/find.py:12 in public method `find`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/find.py:27 in public method `find_Name`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/find.py:31 in public method `find_list`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/find.py:34 in public method `find_Call`:
        D102: Missing docstring in public method
./ibis_bigquery/udf/__init__.py:1 at module level:
        D104: Missing docstring in public package
./ibis_bigquery/udf/__init__.py:42 in public function `udf`:
        D400: First line should end with a period (not 'y')
./tests/conftest.py:1 at module level:
        D100: Missing docstring in public module
./tests/conftest.py:48 in public class `TestConf`:
        D101: Missing docstring in public class
./tests/conftest.py:54 in public method `connect`:
        D102: Missing docstring in public method
./tests/conftest.py:72 in public method `batting`:
        D102: Missing docstring in public method
./tests/conftest.py:76 in public method `awards_players`:
        D102: Missing docstring in public method
./tests/conftest.py:81 in public function `project_id`:
        D103: Missing docstring in public function
./tests/conftest.py:86 in public function `credentials`:
        D103: Missing docstring in public function
./tests/conftest.py:91 in public function `client`:
        D103: Missing docstring in public function
./tests/conftest.py:98 in public function `client2`:
        D103: Missing docstring in public function
./tests/conftest.py:105 in public function `alltypes`:
        D103: Missing docstring in public function
./tests/conftest.py:110 in public function `df`:
        D103: Missing docstring in public function
./tests/conftest.py:115 in public function `parted_alltypes`:
        D103: Missing docstring in public function
./tests/conftest.py:120 in public function `parted_df`:
        D103: Missing docstring in public function
./tests/conftest.py:125 in public function `struct_table`:
        D103: Missing docstring in public function
./tests/conftest.py:130 in public function `numeric_table`:
        D103: Missing docstring in public function
./tests/conftest.py:135 in public function `public`:
        D103: Missing docstring in public function
./tests/__init__.py:1 at module level:
        D104: Missing docstring in public package
./tests/udf/__init__.py:1 at module level:
        D104: Missing docstring in public package
Error: Process completed with exit code 1.

[BigQuery] `test_array_concat` is failing

Failing test

https://github.com/ibis-project/ibis/blob/a70d443c7931cb8bb47c52f97999589566e03cb2/ibis/tests/all/test_array.py#L11-L15

Test output

$ pytest ibis/tests/all/test_array.py::test_array_concat[BigQuery]
=============================== test session starts ===============================
platform darwin -- Python 3.7.8, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /Users/swast/src/ibis, inifile: setup.cfg
plugins: forked-1.2.0, mock-3.1.1, cov-2.10.0, xdist-1.34.0
collected 1 item                                                                  

ibis/tests/all/test_array.py F                                              [100%]

==================================== FAILURES =====================================
___________________________ test_array_concat[BigQuery] ___________________________

backend = <ibis.tests.backends.BigQuery object at 0x7fc3cbea21d0>
con = <ibis.bigquery.client.BigQueryClient object at 0x7fc3cbea2850>

    @pytest.mark.xfail_unsupported
    @pytest.mark.skip_missing_feature(
        ['supports_arrays', 'supports_arrays_outside_of_select']
    )
    # Issues #
    #@pytest.mark.xfail_backends([BigQuery])
    def test_array_concat(backend, con):
        left = ibis.literal([1, 2, 3])
        right = ibis.literal([2, 1])
        expr = left + right
        result = con.execute(expr)
>       assert result == [1, 2, 3, 2, 1]
E       ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

ibis/tests/all/test_array.py:18: ValueError
============================= short test summary info =============================
FAILED ibis/tests/all/test_array.py::test_array_concat[BigQuery] - ValueError: T...
================================ 1 failed in 1.33s ================================

Thoughts on fix

I suspect the query is returning correct results, but arrays are now coming back as numpy arrays. This is probably because google-cloud-bigquery is now using Arrow as an intermediate format before converting to a DataFrame.

Possible fixes:

  • If a list object is required, add some additional conversion logic to convert arrays to lists.
  • If a numpy object is okay, update the test to use a more general elementwise comparison.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.