Giter Site home page Giter Site logo

Comments (9)

nicor88 avatar nicor88 commented on May 24, 2024 1

@octiva '__ha' suffix is used only for "table" materialization when table_type=hive, and ha=true, I expect that there is something off with your table config, that we don't catch/handle as we should in the adapter.

Could you attach your table "config" please?

from dbt-athena.

nicor88 avatar nicor88 commented on May 24, 2024 1

Regarding this, you are right, tmp_relation is created via api.Relation.create so it's a Python object, and will always be true.

Therefore this

{%- if tmp_relation is not none -%}
   {%- do adapter.delete_from_glue_catalog(tmp_relation) -%}
{%- endif -%}

Doesn't fully work.
After further debugging in my setup I see this:
Athena adapter: Table "awsdatacatalog"."dbt_nicola"."iceberg_example_1__ha" does not exists - Ignoring

that in my case it's catched, because of this, but in your case I suspect that due to the fine grain permissions, it's not catched, as it raises a permissions error??

If we talk about solutions, I'm wondering why option 3 will even work for you. - just out of curiosity maybe articulate more why it's an option, because IMHO it's not - I'm for sure not seen something.

So my proposal is the following:

{%- if adapter.get_relation(database=tmp_relation.database, schema=tmp_relation.schema, identifier=tmp_relation.identifier)  is not none -%}
   {%- do adapter.delete_from_glue_catalog(tmp_relation) -%}
{%- endif -%}

adapter.get_relation (see impl here) uses list_relations, that avoid to use get_table (as you mentioned), that doesn't work in your scenario.

Will that work for you? I believe that is a mix between 1 and 2.
We avoid deleting something because doesn't exist, and to avoid deleting something we search for the tmp relation.
Bare in mind that that check is there because there are scenarios where the the tmp table actually exist (due to whatever reason, broken models for example), and we want to drop it before we proceed. So it's a mandatory step to have - it's just half-baked.

Note

Also, I would like to know a bit more about your "fine grained permissions" setup - there will be some future improvements in our CI, where we run functional tests, and we want to be sure for example that we use:

  • AWS account/region without lakeformation - the current setup
  • AWS account/region with lakeformation setup - missing
  • AWS account/region with lakeformation and specific role for example where we have fine-grained permissions
    PS: happy to chat with you about that in another form (e.g. slack dbt or whatever you prefer).

from dbt-athena.

nicor88 avatar nicor88 commented on May 24, 2024 1

Please submit a PR :)

Regarding the "Notes" section, many thanks, I will skim it through after the holidays.

from dbt-athena.

nicor88 avatar nicor88 commented on May 24, 2024

Could you provide some more details?

  • version of the adapter that you use
  • do you experience this error in the first run (where the target table doesn't exists) or in the incremental runs?

from dbt-athena.

octiva avatar octiva commented on May 24, 2024

Of course @nicor88. And thanks for your input.

1.7.0

i do not experience this on the first run, as the table is just created in target location before any tmp swapping logic.

from dbt-athena.

octiva avatar octiva commented on May 24, 2024

@nicor88 That is not true. https://github.com/dbt-athena/dbt-athena/blob/main/dbt/include/athena/macros/materializations/models/table/table.sql#L84-L86, the tmp_relation is used for all iceberg table which is the table with suffix '__ha'. This table is then used to re-create the table before swapping it into the target.

The adapter throws the lakeformation access denied error as I've shown above.

Here is my config:

{{ config(materialized="table", table_type="iceberg") }}

from dbt-athena.

nicor88 avatar nicor88 commented on May 24, 2024

you are right. With iceberg we first "create" an __ha table, then rename the old one to the new one - I forgot about that refactor :D

Can you confirm the following:

  • you are not switching a current hive table to iceberg table?
  • are you sure that you have all the lake-formation permissions right? If you see here, we try to drop ONLY if the table exists. Therefore the only reason I can think of, is that the table exists, and somehow you have no permission to drop it (see the error: glue:DeleteTable missing iam permission)

Few tips:

  • my IAM principals have all glue permissions on tablesglue:*Table*, but I use lakeformation permissions to control which type of access they have on specific tags, e.g. super, read-only, write - do your principal have glue:DeleteTable IAM permission? that is required for both iceberg and hive tables.

from dbt-athena.

octiva avatar octiva commented on May 24, 2024

@nicor88

  • Yes, these tables have only been iceberg. Not hive.
  • Yes, the lake-formation permissions are being provisioned correctly. (on creation)

On that line you linked, the tmp_relation will never be none, as it is actually just a dbt relation that we created above and will always be truthy.

I propose some ideas to resolve:

  1. We dont delete tables before we have created them
  2. We search the list of tables for the one: __ha, and only delete if we can see it in the list. (using a get_table would be futile as the fine-grain permissions only allow access to entities that exist.
  3. We fail quietly on 403:AccessDenied when we delete, knowing that if the user doesnt have access for future operations, they will get stopped there (no harm no foul)

Im happy to implement any of these over the break, so let me know what you think would be the best resolution

from dbt-athena.

octiva avatar octiva commented on May 24, 2024

@nicor88 Yes sorry, three was a bad morning thought..

I think that solution should work nicely! Do you want me to open a PR? I do not currently have the full dev environment setup but can do so if needed.

Notes

Lake Formation (LF) has 2 main methods: found here

  1. Open, which aligns with existing AWS Glue functionality, which utilises IAM roles. We dont use this, so I cant say I know much, but I imagine it does not suffer the same issues give it is stated that it is backwards compatible.

  2. Fine-grained, AWS recommended method of securing the data lake.
    granting limited Lake Formation permissions to individual principals on Data Catalog resources, Amazon S3 locations, and the underlying data in those locations

  • This is done by Lake Formation using coarse-grained IAM permissions to allow users to create things, and then creating subsequent policies that allow for actions (delete, get, update etc).

While this is sometimes a little overkill, it reduces the blast radius and risk of malicious actors being able to do to much.

In terms of cross-account, I recently upgraded astronomer-cosmos to allow for role assumption from an airflow connection, meaning we only use temporary credentials in the profile.yml so we have not seen any issues in that regard (except for our own incorrect setup at times haha).

Happy to discuss further in Slack. I will join the dbt channel.

from dbt-athena.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.