Comments (10)
@rumbin this should be fixed in v1.4.0. Give it a shot, if still don't work let us know.
from dbt-athena.
@rumbin could you attach the filename that is raising this exception? Should be available in the raise exception from the adapter.
Possible change affecting this could be -> https://github.com/dbt-athena/dbt-athena/pull/88/files#diff-629c67ee6aeee24555537b786b7560cbf4d17496c1388aa932de34081c76f668R178-R188
from dbt-athena.
16:28:46.345016 [debug] [Thread-1 ]: Began running node model.bumblebee.health_errors
16:28:46.345295 [info ] [Thread-1 ]: 1 of 1 START sql table model dbt_dev_philipp.health_errors ..................... [RUN]
16:28:46.345760 [debug] [Thread-1 ]: Acquiring new athena connection "model.bumblebee.health_errors"
16:28:46.346020 [debug] [Thread-1 ]: Began compiling node model.bumblebee.health_errors
16:28:46.346193 [debug] [Thread-1 ]: Compiling model.bumblebee.health_errors
16:28:46.350230 [debug] [Thread-1 ]: Writing injected SQL for node "model.bumblebee.health_errors"
16:28:46.350664 [debug] [Thread-1 ]: finished collecting timing info
16:28:46.350833 [debug] [Thread-1 ]: Began executing node model.bumblebee.health_errors
16:28:46.361999 [debug] [Thread-1 ]: Opening a new connection, currently in state closed
16:28:46.603291 [debug] [Thread-1 ]: finished collecting timing info
16:28:46.603554 [debug] [Thread-1 ]: On model.bumblebee.health_errors: Close
16:28:46.604141 [error] [Thread-1 ]: Unhandled error while executing model.bumblebee.health_errors
Parameter validation failed:
Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
16:28:46.604332 [debug] [Thread-1 ]:
Traceback (most recent call last):
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/task/base.py", line 385, in safe_run
result = self.compile_and_execute(manifest, ctx)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/task/base.py", line 338, in compile_and_execute
result = self.run(ctx.node, manifest)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/task/base.py", line 429, in run
return self.execute(compiled_node, manifest)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/task/run.py", line 281, in execute
result = MacroGenerator(
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/clients/jinja.py", line 326, in __call__
return self.call_macro(*args, **kwargs)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/clients/jinja.py", line 253, in call_macro
return macro(*args, **kwargs)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/runtime.py", line 763, in __call__
return self._invoke(arguments, autoescape)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/runtime.py", line 777, in _invoke
rv = self._func(*arguments)
File "<template>", line 55, in macro
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/sandbox.py", line 393, in call
return __context.call(__obj, *args, **kwargs)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/runtime.py", line 298, in call
return __obj(*args, **kwargs)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/clients/jinja.py", line 326, in __call__
return self.call_macro(*args, **kwargs)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/clients/jinja.py", line 253, in call_macro
return macro(*args, **kwargs)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/runtime.py", line 763, in __call__
return self._invoke(arguments, autoescape)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/runtime.py", line 777, in _invoke
rv = self._func(*arguments)
File "<template>", line 22, in macro
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/sandbox.py", line 393, in call
return __context.call(__obj, *args, **kwargs)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jinja2/runtime.py", line 298, in call
return __obj(*args, **kwargs)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/adapters/athena/impl.py", line 131, in clean_up_table
self._delete_from_s3(client, s3_location)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/adapters/athena/impl.py", line 156, in _delete_from_s3
if self._s3_path_exists(client, bucket_name, prefix):
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/dbt/adapters/athena/impl.py", line 193, in _s3_path_exists
response = client.session.client(
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/client.py", line 530, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/client.py", line 919, in _make_api_call
request_dict = self._convert_to_request_dict(
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/client.py", line 987, in _convert_to_request_dict
api_params = self._emit_api_params(
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/client.py", line 1026, in _emit_api_params
self.meta.events.emit(
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/hooks.py", line 412, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/hooks.py", line 256, in emit
return self._emit(event_name, kwargs)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/hooks.py", line 239, in _emit
response = handler(**kwargs)
File "/Users/philippleufke/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/handlers.py", line 285, in validate_bucket_name
raise ParamValidationError(report=error_msg)
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
16:28:46.610316 [debug] [Thread-1 ]: Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '4abb105a-6d2c-454d-986f-1a01c4621093', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11d192eb0>]}
16:28:46.610728 [error] [Thread-1 ]: 1 of 1 ERROR creating sql table model dbt_dev_philipp.health_errors ............ [ERROR in 0.26s]
16:28:46.611110 [debug] [Thread-1 ]: Finished running node model.bumblebee.health_errors
16:28:46.612337 [debug] [MainThread]: Acquiring new athena connection "master"
16:28:46.612839 [info ] [MainThread]:
16:28:46.612998 [info ] [MainThread]: Finished running 1 table model in 0 hours 0 minutes and 2.48 seconds (2.48s).
16:28:46.613157 [debug] [MainThread]: Connection 'master' was properly closed.
16:28:46.613311 [debug] [MainThread]: Connection 'model.bumblebee.health_errors' was properly closed.
16:28:46.706722 [info ] [MainThread]:
16:28:46.706891 [info ] [MainThread]: Completed with 1 error and 0 warnings:
16:28:46.707026 [info ] [MainThread]:
16:28:46.707158 [error] [MainThread]: Parameter validation failed:
16:28:46.707277 [error] [MainThread]: Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
from dbt-athena.
@rumbin the issue is inside _delete_from_s3, this function was added in 1.3.4. The function call _parse_s3_path, that then pass the result to _s3_path_exists, somehow checking on your full error trace, the bucket that is returned is empty.
This means that somehow the location property for your model is not configured right.
I'm currently using 1.3.4 in production without issues.
I suspect that there could be a potential misconfiguration in your model, your profiles.yml looks right.
Now, could you provide to me the config that you use in your model?
from dbt-athena.
The model config is very standard:
{{
config(
materialized='table'
, tags=['daily']
)
}}
The folder/schema config in dbt_project.yml
is also not quite special:
models:
+incremental_strategy: "insert_overwrite"
bumblebee:
level_1:
schema: data_warehouse_l1
+tags: l1
level_2:
schema: data_warehouse_l2
+tags: l2
+meta:
# BI integration setings (Superset): override in model YAMLs, if needed
model_maturity: high
certification:
certified_by: Business Intelligence Team
details: dbt-managed Level 2 (L2) model
owners:
# User IDs of Superset's internal database:
- 5 # Philipp
The only thing that comes to my mind, which might interfere is this override macro:
{% macro generate_schema_name(schema_name, node) -%}
{%- set default_schema = target.schema -%}
{%- if target.name == 'prod' and schema_name is not none -%}
{{ schema_name | trim }}
{%- elif var('ci_schema', 'dummy') != 'dummy' -%}
{{ var('ci_schema') | trim }}
{%- else -%}
{{ default_schema }}
{%- endif -%}
{%- endmacro %}
from dbt-athena.
@rumbin I still didn't spot the bug, I have a similar profile to yours (that is very standard).
As you are not specifying the external location per model, the s3_data_dir
will be used...
I run the _parse_s3_path
function on that path that you use as data_dir, and it return to me a correct bucket name...
from dbt-athena.
I was facing this issue as well when changing the materialization of a model from 'view' to 'table'. You may also be doing the same thing.
The issue is in clean_up_table in imply.py. On line 124 you call table = glue_client.get_table
and then later if table is not None: s3_location = table["Table"]["StorageDescriptor"]["Location"] self._delete_from_s3(client, s3_location)
The problem is that get_table will return a response for get_table on a view, but it has an empty 'Location'.
See for example:
p table["Table"]["StorageDescriptor"]
{'Columns': [{'Name': 'impressionid', 'Type': 'string'}, {'Name': 'servertimestamp', 'Type': 'timestamp'}, {'Name': 'devicetype', 'Type': 'string'}, {'Name': 'uid', 'Type': 'string'}, {'Name': 'metadata', 'Type': 'string'}, {'Name': 'os', 'Type': 'string'}, {'Name': 'persistedat', 'Type': 'timestamp'}, {'Name': 'browser', 'Type': 'string'}, {'Name': 'name', 'Type': 'string'}, {'Name': 'pageurl', 'Type': 'string'}, {'Name': 'useragent', 'Type': 'string'}, {'Name': 'id', 'Type': 'string'}, {'Name': 'snapshot_timestamp', 'Type': 'string'}], 'Location': '', 'Compressed': False, 'NumberOfBuckets': 0, 'SerdeInfo': {}, 'SortColumns': [], 'StoredAsSubDirectories': False}
(Pdb) p table["Table"]["StorageDescriptor"]["Location"]
''
Should be easy enough to delete the view manually and then run again, but it would be good if clean_up_table
was view aware!
from dbt-athena.
@nihakue Nice hint. We could easy add an exception for views as recently we added get_relation_type
method, that allow us to pick easily the relation and act accordingly.
from dbt-athena.
Wow, @nihakue, this explanation fits perfectly.
The model where we observe this flaw already has a view of the same name existing in the target schema.
In fact, we use views to populate our schemas used for development runs, so we don't need to run everything upstream for each developer environment. This is more handy than using dbt --defer --state
.
from dbt-athena.
Looks like this issue is fixed now.
Thanks a lot @nicor88 for all your efforts!
from dbt-athena.
Related Issues (20)
- model contract enforce is not done at compile time HOT 8
- Add issue templates
- seeds with columns starting with underscore fail HOT 2
- Unable to compile test models in Athena HOT 1
- Add parameter `detailed_table_type` for `list_relations_without_caching` function
- Support column level statistics HOT 4
- Add default_lf_inherited_tags
- When using `force_batch=true` with incremental models, it will fail if there is no data to write HOT 1
- [Communication] Install Slack on the repository to enable release subscription HOT 4
- Prune old table version for incremental models. HOT 1
- Athena partitions limit fix (#360) fails with partitions defined as non-Athena functions HOT 4
- Log the query id HOT 2
- Post hook when too many partitions failed HOT 3
- Allow dbt-athena to grant permissions via lakeformation named resource method
- `external_location` ignored when `table_type='hive'` HOT 5
- Support multi-engine views in Athena
- Fine-Grain Permissions make re-creating tables difficult. HOT 9
- (Potential) bug with incremental iceberg tables HOT 6
- [Feature] set external_location for iceberg table HOT 1
- [Feature] Introduce ha flag for table materialisation and iceberg tables to allow users to control final location
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-athena.