Hello, I am using dbt-athena profes

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Could you provide some more details? version of the adapter th

Of course <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Fine-Grain Permissions make re-creating tables difficult. about dbt-athena HOT 9 OPEN

octiva commented on May 24, 2024

Fine-Grain Permissions make re-creating tables difficult.

from dbt-athena.

Comments (9)

nicor88 commented on May 24, 2024 1

@octiva '__ha' suffix is used only for "table" materialization when table_type=hive, and ha=true, I expect that there is something off with your table config, that we don't catch/handle as we should in the adapter.

Could you attach your table "config" please?

from dbt-athena.

nicor88 commented on May 24, 2024 1

Regarding this, you are right, tmp_relation is created via api.Relation.create so it's a Python object, and will always be true.

Therefore this

{%- if tmp_relation is not none -%}
   {%- do adapter.delete_from_glue_catalog(tmp_relation) -%}
{%- endif -%}

Doesn't fully work.
After further debugging in my setup I see this:
Athena adapter: Table "awsdatacatalog"."dbt_nicola"."iceberg_example_1__ha" does not exists - Ignoring

that in my case it's catched, because of this, but in your case I suspect that due to the fine grain permissions, it's not catched, as it raises a permissions error??

If we talk about solutions, I'm wondering why option 3 will even work for you. - just out of curiosity maybe articulate more why it's an option, because IMHO it's not - I'm for sure not seen something.

So my proposal is the following:

{%- if adapter.get_relation(database=tmp_relation.database, schema=tmp_relation.schema, identifier=tmp_relation.identifier)  is not none -%}
   {%- do adapter.delete_from_glue_catalog(tmp_relation) -%}
{%- endif -%}

adapter.get_relation (see impl here) uses list_relations, that avoid to use get_table (as you mentioned), that doesn't work in your scenario.

Will that work for you? I believe that is a mix between 1 and 2.
We avoid deleting something because doesn't exist, and to avoid deleting something we search for the tmp relation.
Bare in mind that that check is there because there are scenarios where the the tmp table actually exist (due to whatever reason, broken models for example), and we want to drop it before we proceed. So it's a mandatory step to have - it's just half-baked.

Note

Also, I would like to know a bit more about your "fine grained permissions" setup - there will be some future improvements in our CI, where we run functional tests, and we want to be sure for example that we use:

AWS account/region without lakeformation - the current setup
AWS account/region with lakeformation setup - missing
AWS account/region with lakeformation and specific role for example where we have fine-grained permissions
PS: happy to chat with you about that in another form (e.g. slack dbt or whatever you prefer).

from dbt-athena.

nicor88 commented on May 24, 2024 1

Please submit a PR :)

Regarding the "Notes" section, many thanks, I will skim it through after the holidays.

from dbt-athena.

nicor88 commented on May 24, 2024

Could you provide some more details?

version of the adapter that you use
do you experience this error in the first run (where the target table doesn't exists) or in the incremental runs?

from dbt-athena.

octiva commented on May 24, 2024

Of course @nicor88. And thanks for your input.

1.7.0

i do not experience this on the first run, as the table is just created in target location before any tmp swapping logic.

from dbt-athena.

octiva commented on May 24, 2024

@nicor88 That is not true. https://github.com/dbt-athena/dbt-athena/blob/main/dbt/include/athena/macros/materializations/models/table/table.sql#L84-L86, the tmp_relation is used for all iceberg table which is the table with suffix '__ha'. This table is then used to re-create the table before swapping it into the target.

The adapter throws the lakeformation access denied error as I've shown above.

Here is my config:

{{ config(materialized="table", table_type="iceberg") }}

from dbt-athena.

nicor88 commented on May 24, 2024

you are right. With iceberg we first "create" an __ha table, then rename the old one to the new one - I forgot about that refactor :D

Can you confirm the following:

you are not switching a current hive table to iceberg table?
are you sure that you have all the lake-formation permissions right? If you see here, we try to drop ONLY if the table exists. Therefore the only reason I can think of, is that the table exists, and somehow you have no permission to drop it (see the error: glue:DeleteTable missing iam permission)

Few tips:

my IAM principals have all glue permissions on tablesglue:*Table*, but I use lakeformation permissions to control which type of access they have on specific tags, e.g. super, read-only, write - do your principal have glue:DeleteTable IAM permission? that is required for both iceberg and hive tables.

from dbt-athena.

octiva commented on May 24, 2024

@nicor88

Yes, these tables have only been iceberg. Not hive.
Yes, the lake-formation permissions are being provisioned correctly. (on creation)

On that line you linked, the tmp_relation will never be none, as it is actually just a dbt relation that we created above and will always be truthy.

I propose some ideas to resolve:

We dont delete tables before we have created them
We search the list of tables for the one: __ha, and only delete if we can see it in the list. (using a get_table would be futile as the fine-grain permissions only allow access to entities that exist.
We fail quietly on 403:AccessDenied when we delete, knowing that if the user doesnt have access for future operations, they will get stopped there (no harm no foul)

Im happy to implement any of these over the break, so let me know what you think would be the best resolution

from dbt-athena.

octiva commented on May 24, 2024

@nicor88 Yes sorry, three was a bad morning thought..

I think that solution should work nicely! Do you want me to open a PR? I do not currently have the full dev environment setup but can do so if needed.

Notes

Current setup without LF should be considered a AWS managed policy AmazonAthenaFullAccess. This should cover all fine-grained access without LF more info on managed policy more info on fine-grained glue access. As these will fail as expected (Entity not found)

Lake Formation (LF) has 2 main methods: found here

Open, which aligns with existing AWS Glue functionality, which utilises IAM roles. We dont use this, so I cant say I know much, but I imagine it does not suffer the same issues give it is stated that it is backwards compatible.
Fine-grained, AWS recommended method of securing the data lake.
granting limited Lake Formation permissions to individual principals on Data Catalog resources, Amazon S3 locations, and the underlying data in those locations

This is done by Lake Formation using coarse-grained IAM permissions to allow users to create things, and then creating subsequent policies that allow for actions (delete, get, update etc).

While this is sometimes a little overkill, it reduces the blast radius and risk of malicious actors being able to do to much.

In terms of cross-account, I recently upgraded astronomer-cosmos to allow for role assumption from an airflow connection, meaning we only use temporary credentials in the profile.yml so we have not seen any issues in that regard (except for our own incorrect setup at times haha).

Happy to discuss further in Slack. I will join the dbt channel.

from dbt-athena.

Fine-Grain Permissions make re-creating tables difficult. about dbt-athena HOT 9 OPEN

Comments (9)

Note

Notes

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent