Comments (9)
@octiva '__ha'
suffix is used only for "table" materialization when table_type=hive, and ha=true, I expect that there is something off with your table config, that we don't catch/handle as we should in the adapter.
Could you attach your table "config" please?
from dbt-athena.
Regarding this, you are right, tmp_relation is created via api.Relation.create
so it's a Python object, and will always be true.
Therefore this
{%- if tmp_relation is not none -%}
{%- do adapter.delete_from_glue_catalog(tmp_relation) -%}
{%- endif -%}
Doesn't fully work.
After further debugging in my setup I see this:
Athena adapter: Table "awsdatacatalog"."dbt_nicola"."iceberg_example_1__ha" does not exists - Ignoring
that in my case it's catched, because of this, but in your case I suspect that due to the fine grain permissions, it's not catched, as it raises a permissions error??
If we talk about solutions, I'm wondering why option 3 will even work for you. - just out of curiosity maybe articulate more why it's an option, because IMHO it's not - I'm for sure not seen something.
So my proposal is the following:
{%- if adapter.get_relation(database=tmp_relation.database, schema=tmp_relation.schema, identifier=tmp_relation.identifier) is not none -%}
{%- do adapter.delete_from_glue_catalog(tmp_relation) -%}
{%- endif -%}
adapter.get_relation
(see impl here) uses list_relations, that avoid to use get_table
(as you mentioned), that doesn't work in your scenario.
Will that work for you? I believe that is a mix between 1 and 2.
We avoid deleting something because doesn't exist, and to avoid deleting something we search for the tmp relation.
Bare in mind that that check is there because there are scenarios where the the tmp table actually exist (due to whatever reason, broken models for example), and we want to drop it before we proceed. So it's a mandatory step to have - it's just half-baked.
Note
Also, I would like to know a bit more about your "fine grained permissions" setup - there will be some future improvements in our CI, where we run functional tests, and we want to be sure for example that we use:
- AWS account/region without lakeformation - the current setup
- AWS account/region with lakeformation setup - missing
- AWS account/region with lakeformation and specific role for example where we have fine-grained permissions
PS: happy to chat with you about that in another form (e.g. slack dbt or whatever you prefer).
from dbt-athena.
Please submit a PR :)
Regarding the "Notes" section, many thanks, I will skim it through after the holidays.
from dbt-athena.
Could you provide some more details?
- version of the adapter that you use
- do you experience this error in the first run (where the target table doesn't exists) or in the incremental runs?
from dbt-athena.
Of course @nicor88. And thanks for your input.
1.7.0
i do not experience this on the first run, as the table is just created in target location before any tmp swapping logic.
from dbt-athena.
@nicor88 That is not true. https://github.com/dbt-athena/dbt-athena/blob/main/dbt/include/athena/macros/materializations/models/table/table.sql#L84-L86, the tmp_relation is used for all iceberg table which is the table with suffix '__ha'
. This table is then used to re-create the table before swapping it into the target.
The adapter throws the lakeformation access denied error as I've shown above.
Here is my config:
{{ config(materialized="table", table_type="iceberg") }}
from dbt-athena.
you are right. With iceberg we first "create" an __ha table, then rename the old one to the new one - I forgot about that refactor :D
Can you confirm the following:
- you are not switching a current hive table to iceberg table?
- are you sure that you have all the lake-formation permissions right? If you see here, we try to drop ONLY if the table exists. Therefore the only reason I can think of, is that the table exists, and somehow you have no permission to drop it (see the error:
glue:DeleteTable
missing iam permission)
Few tips:
- my IAM principals have all glue permissions on tables
glue:*Table*
, but I use lakeformation permissions to control which type of access they have on specific tags, e.g. super, read-only, write - do your principal haveglue:DeleteTable
IAM permission? that is required for both iceberg and hive tables.
from dbt-athena.
- Yes, these tables have only been iceberg. Not hive.
- Yes, the lake-formation permissions are being provisioned correctly. (on creation)
On that line you linked, the tmp_relation will never be none, as it is actually just a dbt relation that we created above and will always be truthy.
I propose some ideas to resolve:
- We dont delete tables before we have created them
- We search the list of tables for the one: __ha, and only delete if we can see it in the list. (using a
get_table
would be futile as the fine-grain permissions only allow access to entities that exist. - We fail quietly on 403:AccessDenied when we delete, knowing that if the user doesnt have access for future operations, they will get stopped there (no harm no foul)
Im happy to implement any of these over the break, so let me know what you think would be the best resolution
from dbt-athena.
@nicor88 Yes sorry, three was a bad morning thought..
I think that solution should work nicely! Do you want me to open a PR? I do not currently have the full dev environment setup but can do so if needed.
Notes
- Current setup without LF should be considered a AWS managed policy
AmazonAthenaFullAccess
. This should cover all fine-grained access without LF more info on managed policy more info on fine-grained glue access. As these will fail as expected (Entity not found)
Lake Formation (LF) has 2 main methods: found here
-
Open, which aligns with existing AWS Glue functionality, which utilises IAM roles. We dont use this, so I cant say I know much, but I imagine it does not suffer the same issues give it is stated that it is backwards compatible.
-
Fine-grained, AWS recommended method of securing the data lake.
granting limited Lake Formation permissions to individual principals on Data Catalog resources, Amazon S3 locations, and the underlying data in those locations
- This is done by Lake Formation using coarse-grained IAM permissions to allow users to create things, and then creating subsequent policies that allow for actions (delete, get, update etc).
While this is sometimes a little overkill, it reduces the blast radius and risk of malicious actors being able to do to much.
In terms of cross-account, I recently upgraded astronomer-cosmos
to allow for role assumption from an airflow connection, meaning we only use temporary credentials in the profile.yml
so we have not seen any issues in that regard (except for our own incorrect setup at times haha).
Happy to discuss further in Slack. I will join the dbt channel.
from dbt-athena.
Related Issues (20)
- Implement support for `dbt-metricflow` HOT 1
- Config order for Iceberg tables HOT 4
- model contract enforce is not done at compile time HOT 8
- Add issue templates
- seeds with columns starting with underscore fail HOT 2
- Unable to compile test models in Athena HOT 1
- Add parameter `detailed_table_type` for `list_relations_without_caching` function
- Support column level statistics HOT 4
- Add default_lf_inherited_tags
- When using `force_batch=true` with incremental models, it will fail if there is no data to write HOT 1
- [Communication] Install Slack on the repository to enable release subscription HOT 4
- Prune old table version for incremental models. HOT 1
- Athena partitions limit fix (#360) fails with partitions defined as non-Athena functions HOT 4
- Log the query id HOT 2
- Post hook when too many partitions failed HOT 3
- Allow dbt-athena to grant permissions via lakeformation named resource method
- `external_location` ignored when `table_type='hive'` HOT 5
- Support multi-engine views in Athena
- (Potential) bug with incremental iceberg tables HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-athena.