Giter Site home page Giter Site logo

Comments (13)

casparwylie avatar casparwylie commented on June 2, 2024 1

My team aren't going to be looking at this anymore so probably not neccassary - the first issue mentioned is probably the main one though in case you're keen to look into it further! Thanks anyway.

from dbt_apple_store.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on June 2, 2024

Hi @casparwylie thank you for raising this issue. Did Fivetran support say why these duplicates are not a mistake?

I also was looking at a past issue #12 and saw another customer experienced a similar error due to the territory_name being slightly different. Would you be able to share a few of these duplicate records so we may see if they have some differences that we may be able to correct within the package.

If they do not differ, I would encourage reaching out to Apple to understand why there are duplicates in the source data. I do not believe there should be duplicate subscription report entries in the raw data. That seems like a data integrity issue that this failed test is appropriately flagging.

from dbt_apple_store.

casparwylie avatar casparwylie commented on June 2, 2024

To clarify, the rows are still unique by the fivetran/meta fields, just often not by date_day, account_id, app_id, subscription_name, territory_long, state.

We are seeing plenty of duplicate rows where every column except the meta columns (e.g _index) are the same in sales_subscription_event_summary.

@fivetran-markgaughran am I right in saying from your end, the duplicates are expected?

from dbt_apple_store.

fivetran-markgaughran avatar fivetran-markgaughran commented on June 2, 2024

Hi @fivetran-joemarkiewicz @casparwylie ,

duplicates do not exist in the SALES_SUBSCRIPTION_EVENT_SUMMARY table for the Fivetran assigned Primary keys but they do appear to exist for the transformation output unique keys (date_day, account_id, app_id, subscription_name, territory_long, state), thus causing the transformation to fail.

from dbt_apple_store.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on June 2, 2024

Thanks for adding context @casparwylie and @fivetran-markgaughran.

@casparwylie would you be able to share an example of a duplicate in the apple_store__subscription_report? Mainly I would be curious to take a look at the territory_long field as this has caused some issues in the past with Apple not being consistent with territory naming. An example of the duplicate record will help us understand what next steps may be needed to resolve the error.

from dbt_apple_store.

casparwylie avatar casparwylie commented on June 2, 2024

I'm not sure why, but the tests are now passing, likely due to a new historic sync. I now can't find examples other than what I described above! I'm closing the issue. Thank you both.

from dbt_apple_store.

casparwylie avatar casparwylie commented on June 2, 2024

Apologies but the issue as resurfaced now. Here is are 2 fresh examples in JSON result format given the query

SELECT date_day, account_id, app_id, subscription_name, territory_long, state, count(*) as qty 
 FROM `project.apple_store.apple_store__subscription_report` 
 GROUP BY date_day, account_id, app_id, subscription_name, territory_long, state
 HAVING count(*)> 1

(in total there are 211183 results)

[{
    "date_day": "2022-07-21",
    "account_id": "<our account id>",
    "app_id": null,
    "subscription_name": "Offer name",
    "territory_long": "Armenia",
    "state": null,
    "qty": "2"
}, {
    "date_day": "2022-11-10",
    "account_id": "<our account id>",
    "app_id": null,
    "subscription_name": "Offer name",
    "territory_long": "Armenia",
    "state": null,
    "qty": "2"
}]

Let me know your thoughts. Thank you.

from dbt_apple_store.

casparwylie avatar casparwylie commented on June 2, 2024

@fivetran-joemarkiewicz Hey - just wondering if any updates on this! Thanks.

from dbt_apple_store.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on June 2, 2024

Hi @casparwylie I am sorry to see that the issue has resurfaced. Would you be able to share the select * of one of those duplicates you came across? I am wondering if this is in fact a duplicate issue that needs to be traced back to the source or code logic in the package, or if this is a scenario where we simply need to update our tests to factor in more than the specified fields for uniqueness.

from dbt_apple_store.

casparwylie avatar casparwylie commented on June 2, 2024

I've included the query that fetched all the rows (and hidden some more sensitive properties) in the previous comment. Is there any column(s) in particular you'd be keen to see?

from dbt_apple_store.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on June 2, 2024

Yeah I am wondering if there are any columns where you saw the rows were not unique? If they are sensitive no need to share, but I am curious if rows were duplicates across every single field?

Additionally, it would be worthwhile to check the source again and make sure these duplicates don't exist there.

from dbt_apple_store.

casparwylie avatar casparwylie commented on June 2, 2024

Yea so we are seeing plenty of duplicate rows where every column except the meta columns (e.g _index) are the same in sales_subscription_event_summary. However, unrelated, it looks like app_name is the only difference in the duplicates in apple_store__subscription_report. I suppose if the app_name changes in the app store, the report here causes duplicates.

from dbt_apple_store.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on June 2, 2024

@casparwylie thank you for sharing! The insight into the app_name duplicates does make sense and probably is something we should update in our test to account for the name of the app as that may change.

However, I am still struggling with the duplicates in the source that are only not duplicates due to the Fivetran metadata columns. Would you be interested in meeting sometime this week for my team and I to review these live with you and determine the best approach forward?

from dbt_apple_store.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.