Giter Site home page Giter Site logo

cookbook's Introduction

Cookbook

A repository of data science codes written for re-usability and reducing the time taken from Idea to execution in developing a data science solution. This repository contains codes and sample notebooks to how to use these codes in building a data science solution.

Codes -

Generic Preprocessing - Helper functions to do EDA, missing value analysis & treatment and generic preprocessing like (scaling, encoding etc.)

Machine learning - Classification - Helper functions to solve classification type problems in machine learning. It contains codes for holdout/cross validation, model explaination codes (LIME and variable importance plot), and codes for general classification algorithms (Xgboost, LightGBM, Extra trees, random forest, logistic regression, decision trees, K-nearest neighbours and SVM)

Machine learning - Regression - Helper functions to solve regression type problems in machine learning. It contains codes for holdout/cross validation, model explaination codes (LIME and variable importance plot), and codes for general regression algorithms (Xgboost, LightGBM, Extra trees, random forest, linear regression, regression trees, K-nearest neighbours and SVM)

Recommender systems - Helper functions to build recommender systems using Matrix factorization using LightFM package.

Natural language processing - Helper functions for NLP text processing and analysis like N-grams, wordcloud, tokenization, lowercasing, punctuation/stopwords removal, stemmer/lemmatizer, TF-IDF & count vectorizer

Notebooks -

Classification examples - Example notebooks to solve a classification problem

Regression examples - Example notebooks to solve a regression problem

Recommender systems - Example notebooks to build recommender systems

cookbook's People

Contributors

aayushmnit avatar connectedblue avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cookbook's Issues

import error when running the Recommender notebook

Hello, I was just trying your notebook after reading your wonderful blog , I got the following error and unable to debug ,any help please.

steps taken:
1.cloned the repo
2.opened the Recommender notebook
3.cleared the previous outputs in the note book.
4.executed the first cell , that is the importing cell and got the following error:
ModuleNotFoundError Traceback (most recent call last)
in ()
3 sys.path.append(os.path.abspath("../"))
4 from recsys import *
----> 5 from generic_preprocessing import *
6 from IPython.display import HTML

ModuleNotFoundError: No module named 'generic_preprocessing'

I am on windows 10

Remove users/items ??

Hey,
First, thanks for providing this amazing files that helped me a lot.
I'm trying to create a recommendation system, and I was wondering how to change/remove user/items/ratings.
I've created this function to add stuff (they work):

# add a user
def addUsers(interactions, model, users=[], user_dict={}, epoch=5, n_jobs=4):
    '''
    Adds ratings for a user
    Required Input -
        - interactions: the matrix containg each ratings for a user/item
        - model: the lightfm model
        - users: All the user ids of the new users
        - user_dict: Dictionary type output containing item_id as key and item_name as value
        - epoch = number of epochs to run
        - n_jobs = number of cores used for execution

    Expected Output -
        - interactions: The new matrix containing the new user
        - model: The new model trained with the new user
    '''
    for user_id in users:
        interactions.loc[user_id] = 0
        user_dict[user_id] = interactions.shape[0] - 1
    val = [[0 for y in range(interactions.shape[1])]
           for x in range(interactions.shape[0])]
    x = sparse.csr_matrix(val)
    model.fit_partial(interactions=x, epochs=epoch, num_threads=n_jobs)
    return interactions, user_dict, model

#add an item
def addItems(interactions, model, items=[], items_dict={}, epoch=5, n_jobs=4):
    for item_id, title in items:
        interactions[item_id] = 0
        items_dict[item_id] = title
    val = [[0 for y in range(interactions.shape[1])]
           for x in range(interactions.shape[0])]
    x = sparse.csr_matrix(val)
    model.fit_partial(interactions=x, epochs=epoch, num_threads=n_jobs)
    return interactions, items_dict, model

#add a rating
def addRatings(interactions,
               model,
               user_dict={},
               user=0,
               items=[0],
               ratings=[0],
               epochs=30,
               n_jobs=4):
    '''
    Adds ratings for a user
    Required Input -
        - interactions: the matrix containg each ratings for a user/item
        - model: the lightfm model
        - user_dict: Dictionary type output containing item_id as key and item_name as value
        - user: The user id
        - ratings: Array of ratings for each item -> [[ item, rating ]]
        - epoch = number of epochs to run 
        - n_jobs = number of cores used for execution

    Expected Output -
        - interactions: The new matrix containing new ratings
        - model: The new model trained with new ratings
    '''
    for item, rating in ratings:
        interactions[item][user] = rating
        uindex = user_dict[user]
        val = []
        for user_index in range(interactions.shape[0]):
            val.append(interactions.loc[user].values if user_index ==
                       uindex else [0 for x in range(interactions.shape[1])])
        x = sparse.csr_matrix(val)
        model.fit_partial(interactions=x, epochs=epochs, num_threads=n_jobs)
    return interactions, model

All of them work !
But how can I do to remove a rating or a user.
I don't know how to do that because fit_partial method is made to add data and not to remove
And finally, I was asking myself if setting a rating to 0 would be the same as just removing it
Thanks in advance !

How work with big dataset?

Hello.
I try to work with the ml-latest dataset and got error "Memory error".
How to work with a large dataset, I should split this data, but how I can recommend something based on half of the data?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.