aayushmnit / cookbook Goto Github PK

A repository of machine learning codes written for re-usability

License: MIT License

Python 0.75% Jupyter Notebook 54.32% HTML 44.92%

cookbook's Introduction

Cookbook

A repository of data science codes written for re-usability and reducing the time taken from Idea to execution in developing a data science solution. This repository contains codes and sample notebooks to how to use these codes in building a data science solution.

Codes -

Generic Preprocessing - Helper functions to do EDA, missing value analysis & treatment and generic preprocessing like (scaling, encoding etc.)

Machine learning - Classification - Helper functions to solve classification type problems in machine learning. It contains codes for holdout/cross validation, model explaination codes (LIME and variable importance plot), and codes for general classification algorithms (Xgboost, LightGBM, Extra trees, random forest, logistic regression, decision trees, K-nearest neighbours and SVM)

Machine learning - Regression - Helper functions to solve regression type problems in machine learning. It contains codes for holdout/cross validation, model explaination codes (LIME and variable importance plot), and codes for general regression algorithms (Xgboost, LightGBM, Extra trees, random forest, linear regression, regression trees, K-nearest neighbours and SVM)

Recommender systems - Helper functions to build recommender systems using Matrix factorization using LightFM package.

Natural language processing - Helper functions for NLP text processing and analysis like N-grams, wordcloud, tokenization, lowercasing, punctuation/stopwords removal, stemmer/lemmatizer, TF-IDF & count vectorizer

Notebooks -

Classification examples - Example notebooks to solve a classification problem

Regression examples - Example notebooks to solve a regression problem

Recommender systems - Example notebooks to build recommender systems

cookbook's People

Contributors

Stargazers

Watchers

Forkers

kartheekpnsn gokul180288 joshij1979 divsinha99 vineettanna vivek2319 bhanusrivatsavmatta rajneesh-tiwari krithi07 darshitmvora steffanobrito sethips chenglu66 amirunpri2018 rohitnadgiri tanmayeewaghmare satyam-harsh npsr91 mmortazavi justayush manishn1202 silentashish adelahmadi2002 alexal14 harsimranbh naivomah3 shashwatwork battlegg mbrhd khaledabousamak connectedblue balivada987 fosterleejoe slavachpok yongb azzelena vinayakarannil deepak1011dp merinsaan bigrlab apparell eliekawerk arshad115 clipps sontt22791 nithish08 oarmandoc5483 kamal1262 bcy0123 pandaxubo jubaer145 anudeepreddy-katta amaljoseph jitender-saini sandy4321 kha1135123 meeribad srwang2019 equalsn snikhil17 kzis anshuman23 ksore yasark ol-yad anandanne ayushthakur

cookbook's Issues

How can i calculate the accuracy value?

import error when running the Recommender notebook

Hello, I was just trying your notebook after reading your wonderful blog , I got the following error and unable to debug ,any help please.

steps taken:
1.cloned the repo
2.opened the Recommender notebook
3.cleared the previous outputs in the note book.
4.executed the first cell , that is the importing cell and got the following error:
ModuleNotFoundError Traceback (most recent call last)
in ()
3 sys.path.append(os.path.abspath("../"))
4 from recsys import *
----> 5 from generic_preprocessing import *
6 from IPython.display import HTML

ModuleNotFoundError: No module named 'generic_preprocessing'

I am on windows 10

Remove users/items ??

Hey,
First, thanks for providing this amazing files that helped me a lot.
I'm trying to create a recommendation system, and I was wondering how to change/remove user/items/ratings.
I've created this function to add stuff (they work):

# add a user
def addUsers(interactions, model, users=[], user_dict={}, epoch=5, n_jobs=4):
    '''
    Adds ratings for a user
    Required Input -
        - interactions: the matrix containg each ratings for a user/item
        - model: the lightfm model
        - users: All the user ids of the new users
        - user_dict: Dictionary type output containing item_id as key and item_name as value
        - epoch = number of epochs to run
        - n_jobs = number of cores used for execution

    Expected Output -
        - interactions: The new matrix containing the new user
        - model: The new model trained with the new user
    '''
    for user_id in users:
        interactions.loc[user_id] = 0
        user_dict[user_id] = interactions.shape[0] - 1
    val = [[0 for y in range(interactions.shape[1])]
           for x in range(interactions.shape[0])]
    x = sparse.csr_matrix(val)
    model.fit_partial(interactions=x, epochs=epoch, num_threads=n_jobs)
    return interactions, user_dict, model

#add an item
def addItems(interactions, model, items=[], items_dict={}, epoch=5, n_jobs=4):
    for item_id, title in items:
        interactions[item_id] = 0
        items_dict[item_id] = title
    val = [[0 for y in range(interactions.shape[1])]
           for x in range(interactions.shape[0])]
    x = sparse.csr_matrix(val)
    model.fit_partial(interactions=x, epochs=epoch, num_threads=n_jobs)
    return interactions, items_dict, model

#add a rating
def addRatings(interactions,
               model,
               user_dict={},
               user=0,
               items=[0],
               ratings=[0],
               epochs=30,
               n_jobs=4):
    '''
    Adds ratings for a user
    Required Input -
        - interactions: the matrix containg each ratings for a user/item
        - model: the lightfm model
        - user_dict: Dictionary type output containing item_id as key and item_name as value
        - user: The user id
        - ratings: Array of ratings for each item -> [[ item, rating ]]
        - epoch = number of epochs to run 
        - n_jobs = number of cores used for execution

    Expected Output -
        - interactions: The new matrix containing new ratings
        - model: The new model trained with new ratings
    '''
    for item, rating in ratings:
        interactions[item][user] = rating
        uindex = user_dict[user]
        val = []
        for user_index in range(interactions.shape[0]):
            val.append(interactions.loc[user].values if user_index ==
                       uindex else [0 for x in range(interactions.shape[1])])
        x = sparse.csr_matrix(val)
        model.fit_partial(interactions=x, epochs=epochs, num_threads=n_jobs)
    return interactions, model

All of them work !
But how can I do to remove a rating or a user.
I don't know how to do that because fit_partial method is made to add data and not to remove
And finally, I was asking myself if setting a rating to 0 would be the same as just removing it
Thanks in advance !

How work with big dataset?

Hello.
I try to work with the ml-latest dataset and got error "Memory error".
How to work with a large dataset, I should split this data, but how I can recommend something based on half of the data?

aayushmnit / cookbook Goto Github PK

cookbook's Introduction

Cookbook

Codes -

Notebooks -

cookbook's People

Contributors

Stargazers

Watchers

Forkers

cookbook's Issues

How can i calculate the accuracy value?

import error when running the Recommender notebook

Remove users/items ??

How work with big dataset?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent