Giter Site home page Giter Site logo

Comments (10)

nicor88 avatar nicor88 commented on May 14, 2024 1

@rumbin this should be fixed in v1.4.0. Give it a shot, if still don't work let us know.

from dbt-athena.

nicor88 avatar nicor88 commented on May 14, 2024

@rumbin could you attach the filename that is raising this exception? Should be available in the raise exception from the adapter.

Possible change affecting this could be -> https://github.com/dbt-athena/dbt-athena/pull/88/files#diff-629c67ee6aeee24555537b786b7560cbf4d17496c1388aa932de34081c76f668R178-R188

from dbt-athena.

rumbin avatar rumbin commented on May 14, 2024

@nicor88

16:28:46.345016 [debug] [Thread-1  ]: Began running node model.bumblebee.health_errors
16:28:46.345295 [info ] [Thread-1  ]: 1 of 1 START sql table model dbt_dev_philipp.health_errors ..................... [RUN]
16:28:46.345760 [debug] [Thread-1  ]: Acquiring new athena connection "model.bumblebee.health_errors"
16:28:46.346020 [debug] [Thread-1  ]: Began compiling node model.bumblebee.health_errors
16:28:46.346193 [debug] [Thread-1  ]: Compiling model.bumblebee.health_errors
16:28:46.350230 [debug] [Thread-1  ]: Writing injected SQL for node "model.bumblebee.health_errors"
16:28:46.350664 [debug] [Thread-1  ]: finished collecting timing info
16:28:46.350833 [debug] [Thread-1  ]: Began executing node model.bumblebee.health_errors
16:28:46.361999 [debug] [Thread-1  ]: Opening a new connection, currently in state closed
16:28:46.603291 [debug] [Thread-1  ]: finished collecting timing info
16:28:46.603554 [debug] [Thread-1  ]: On model.bumblebee.health_errors: Close
16:28:46.604141 [error] [Thread-1  ]: Unhandled error while executing model.bumblebee.health_errors
Parameter validation failed:
Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
16:28:46.604332 [debug] [Thread-1  ]: 
Traceback (most recent call last):
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/task/base.py", line 385, in safe_run
    result = self.compile_and_execute(manifest, ctx)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/task/base.py", line 338, in compile_and_execute
    result = self.run(ctx.node, manifest)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/task/base.py", line 429, in run
    return self.execute(compiled_node, manifest)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/task/run.py", line 281, in execute
    result = MacroGenerator(
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/clients/jinja.py", line 326, in __call__
    return self.call_macro(*args, **kwargs)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/clients/jinja.py", line 253, in call_macro
    return macro(*args, **kwargs)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/runtime.py", line 763, in __call__
    return self._invoke(arguments, autoescape)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/runtime.py", line 777, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 55, in macro
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/sandbox.py", line 393, in call
    return __context.call(__obj, *args, **kwargs)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/runtime.py", line 298, in call
    return __obj(*args, **kwargs)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/clients/jinja.py", line 326, in __call__
    return self.call_macro(*args, **kwargs)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/clients/jinja.py", line 253, in call_macro
    return macro(*args, **kwargs)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/runtime.py", line 763, in __call__
    return self._invoke(arguments, autoescape)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/runtime.py", line 777, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 22, in macro
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/sandbox.py", line 393, in call
    return __context.call(__obj, *args, **kwargs)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/runtime.py", line 298, in call
    return __obj(*args, **kwargs)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/adapters/athena/impl.py", line 131, in clean_up_table
    self._delete_from_s3(client, s3_location)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/adapters/athena/impl.py", line 156, in _delete_from_s3
    if self._s3_path_exists(client, bucket_name, prefix):
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/adapters/athena/impl.py", line 193, in _s3_path_exists
    response = client.session.client(
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/client.py", line 919, in _make_api_call
    request_dict = self._convert_to_request_dict(
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/client.py", line 987, in _convert_to_request_dict
    api_params = self._emit_api_params(
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/client.py", line 1026, in _emit_api_params
    self.meta.events.emit(
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/hooks.py", line 412, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/hooks.py", line 256, in emit
    return self._emit(event_name, kwargs)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/hooks.py", line 239, in _emit
    response = handler(**kwargs)
  File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/handlers.py", line 285, in validate_bucket_name
    raise ParamValidationError(report=error_msg)
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
16:28:46.610316 [debug] [Thread-1  ]: Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '4abb105a-6d2c-454d-986f-1a01c4621093', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11d192eb0>]}
16:28:46.610728 [error] [Thread-1  ]: 1 of 1 ERROR creating sql table model dbt_dev_philipp.health_errors ............ [ERROR in 0.26s]
16:28:46.611110 [debug] [Thread-1  ]: Finished running node model.bumblebee.health_errors
16:28:46.612337 [debug] [MainThread]: Acquiring new athena connection "master"
16:28:46.612839 [info ] [MainThread]: 
16:28:46.612998 [info ] [MainThread]: Finished running 1 table model in 0 hours 0 minutes and 2.48 seconds (2.48s).
16:28:46.613157 [debug] [MainThread]: Connection 'master' was properly closed.
16:28:46.613311 [debug] [MainThread]: Connection 'model.bumblebee.health_errors' was properly closed.
16:28:46.706722 [info ] [MainThread]: 
16:28:46.706891 [info ] [MainThread]: Completed with 1 error and 0 warnings:
16:28:46.707026 [info ] [MainThread]: 
16:28:46.707158 [error] [MainThread]: Parameter validation failed:
16:28:46.707277 [error] [MainThread]: Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"

from dbt-athena.

nicor88 avatar nicor88 commented on May 14, 2024

@rumbin the issue is inside _delete_from_s3, this function was added in 1.3.4. The function call _parse_s3_path, that then pass the result to _s3_path_exists, somehow checking on your full error trace, the bucket that is returned is empty.
This means that somehow the location property for your model is not configured right.
I'm currently using 1.3.4 in production without issues.

I suspect that there could be a potential misconfiguration in your model, your profiles.yml looks right.

Now, could you provide to me the config that you use in your model?

from dbt-athena.

rumbin avatar rumbin commented on May 14, 2024

@nicor88

The model config is very standard:

{{
    config(
        materialized='table'
        , tags=['daily']
    )
}}

The folder/schema config in dbt_project.yml is also not quite special:

models:
  +incremental_strategy: "insert_overwrite"
  bumblebee:
    level_1:
      schema: data_warehouse_l1
      +tags: l1
    level_2:
      schema: data_warehouse_l2
      +tags: l2
      +meta:
        # BI integration setings (Superset): override in model YAMLs, if needed
        model_maturity: high 
        certification:
          certified_by: Business Intelligence Team
          details: dbt-managed Level 2 (L2) model
        owners:
          # User IDs of Superset's internal database:
          - 5 # Philipp

The only thing that comes to my mind, which might interfere is this override macro:

{% macro generate_schema_name(schema_name, node) -%}

    {%- set default_schema = target.schema -%}
    {%- if target.name == 'prod' and schema_name is not none -%}

        {{ schema_name | trim }}

    {%- elif var('ci_schema', 'dummy') != 'dummy' -%}

        {{ var('ci_schema') | trim }}

    {%- else -%}

        {{ default_schema }}

    {%- endif -%}

{%- endmacro %}

from dbt-athena.

nicor88 avatar nicor88 commented on May 14, 2024

@rumbin I still didn't spot the bug, I have a similar profile to yours (that is very standard).

As you are not specifying the external location per model, the s3_data_dir will be used...

I run the _parse_s3_path function on that path that you use as data_dir, and it return to me a correct bucket name...

from dbt-athena.

nihakue avatar nihakue commented on May 14, 2024

I was facing this issue as well when changing the materialization of a model from 'view' to 'table'. You may also be doing the same thing.

The issue is in clean_up_table in imply.py. On line 124 you call table = glue_client.get_table and then later if table is not None: s3_location = table["Table"]["StorageDescriptor"]["Location"] self._delete_from_s3(client, s3_location)

The problem is that get_table will return a response for get_table on a view, but it has an empty 'Location'.

See for example:

p table["Table"]["StorageDescriptor"]
{'Columns': [{'Name': 'impressionid', 'Type': 'string'}, {'Name': 'servertimestamp', 'Type': 'timestamp'}, {'Name': 'devicetype', 'Type': 'string'}, {'Name': 'uid', 'Type': 'string'}, {'Name': 'metadata', 'Type': 'string'}, {'Name': 'os', 'Type': 'string'}, {'Name': 'persistedat', 'Type': 'timestamp'}, {'Name': 'browser', 'Type': 'string'}, {'Name': 'name', 'Type': 'string'}, {'Name': 'pageurl', 'Type': 'string'}, {'Name': 'useragent', 'Type': 'string'}, {'Name': 'id', 'Type': 'string'}, {'Name': 'snapshot_timestamp', 'Type': 'string'}], 'Location': '', 'Compressed': False, 'NumberOfBuckets': 0, 'SerdeInfo': {}, 'SortColumns': [], 'StoredAsSubDirectories': False}
(Pdb) p table["Table"]["StorageDescriptor"]["Location"]
''

Should be easy enough to delete the view manually and then run again, but it would be good if clean_up_table was view aware!

from dbt-athena.

nicor88 avatar nicor88 commented on May 14, 2024

@nihakue Nice hint. We could easy add an exception for views as recently we added get_relation_type method, that allow us to pick easily the relation and act accordingly.

from dbt-athena.

rumbin avatar rumbin commented on May 14, 2024

Wow, @nihakue, this explanation fits perfectly.
The model where we observe this flaw already has a view of the same name existing in the target schema.

In fact, we use views to populate our schemas used for development runs, so we don't need to run everything upstream for each developer environment. This is more handy than using dbt --defer --state.

from dbt-athena.

rumbin avatar rumbin commented on May 14, 2024

Looks like this issue is fixed now.
Thanks a lot @nicor88 for all your efforts!

from dbt-athena.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.