Giter Site home page Giter Site logo

Input dataset format about efficient-apriori HOT 4 CLOSED

tommyod avatar tommyod commented on June 19, 2024
Input dataset format

from efficient-apriori.

Comments (4)

tommyod avatar tommyod commented on June 19, 2024 3

No problem at all.

# Original data
transactions = [('eggs', 'bacon', 'soup'),
                ('eggs', 'bacon', 'apple'),
                ('soup', 'bacon', 'banana')]

# Convert to panas.DataFrame
df = pd.DataFrame(transactions)

# Convert back to list of tuples
transactions_from_df = [tuple(row) for row in df.values.tolist()]

# They are equal, so this evaluates to True
assert transactions == transactions_from_df

A list of lists will also work, it doesn't have to be a list of tuples.

from efficient-apriori.

tommyod avatar tommyod commented on June 19, 2024 2

NaN likely represents nothing, so convert ('bread', nan, 'milk', nan) to ('bread', 'milk'). It really depends on your problem at hand. Each tuple should represent a transaction, and having "none-tokens" in a transaction is a no-no. The values in the tuples should be strings.

from efficient-apriori.

benpowis avatar benpowis commented on June 19, 2024

Thank you @tommyod this looks great - how would you suggest dealing with NaN values? When feeding my df directly to apriori() I get the error:
TypeError: object of type 'int' has no len()

I can use your code above to transform into a list, but in my data I have a couple of baskets which are huge, leading to many 'nan' values in the lists, will these have an adverse effect on the results?

from efficient-apriori.

benpowis avatar benpowis commented on June 19, 2024

Cool, thank you - should this help anyone else in the future, here is the method I used to remove nans from lists of varying sizes:

from math import isnan
for y in range(0,len(transactions_from_df)):
    
    transactions_from_df[y] = [x for x in transactions_from_df[y] if not (
                          type(x) == float # let's drop all float values…
                          and isnan(x) # … but only if they are nan
                          )]

from efficient-apriori.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.