Comments (4)
Sure. The data was provided by my colleague. I will ask him when he comes tomorrow.
I think it is not complicated, just some operations like reading original MovieLens data with Pandas and then write to a pkl file.
from openlearning4deeprecsys.
@Leavingseason Thanks! That would be a greate help.
from openlearning4deeprecsys.
`import time
import numpy as np
from six import next
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
import scipy
import pickle
#import _pickle as cPickle
import codecs
def get_100k_data():
df = pd.read_csv(r"\e$\Users\v-fuz\Dataset\FlatFile\Recommendation_Dataset\MovieLens\ml-latest-100k\ratings.csv"
, sep=',', engine='python')
df["rating"] = df["rating"].astype(np.float32)
user_mapping = {}
movie_mapping = {}
index = 0
for x in list(df["userId"].unique()):
user_mapping[x] = index
index += 1
index = 0
for x in list(df["movieId"].unique()):
movie_mapping[x] = index
index += 1
df["userId"] = df["userId"].map(user_mapping)
df["movieId"] = df["movieId"].map(movie_mapping)
#for col in ("userId", "movieId"):
# df[col] = df[col].astype(np.int32)
movies = pd.read_csv(r"\e$\Users\v-fuz\Dataset\FlatFile\Recommendation_Dataset\MovieLens\ml-latest-100k\movies.csv"
, sep=',', engine='python')
movies["movieId"]= movies["movieId"].map(movie_mapping)
movies = movies.set_index('movieId')
movies["genres"]= movies["genres"].map(lambda x: x.replace('|', ' ').lower())
#vectorizer = CountVectorizer(binary = True)
#vectorizer = vectorizer.fit(list(movies["genres"]))
#movies["genres"]= movies["genres"].map(lambda x: vectorizer.transform([x]))
movie_content = []
index_set = set(movies.index)
for i in range(len(movie_mapping)):
if i in index_set:
movie_content.append(movies.loc[[i]].iloc[0]["genres"])
else:
movie_content.append('')
vectorizer = CountVectorizer(binary = True)
movie_content = vectorizer.fit_transform(movie_content)
movie_content = movie_content.astype(np.float32)
users = pd.read_csv(r"\\e$\Users\v-fuz\Dataset\FlatFile\Recommendation_Dataset\MovieLens\ml-latest-100k\tags.csv"
, sep=',', engine='python')
users["userId"]= users["userId"].map(user_mapping)
users = users.set_index('userId')
user_content = []
index_set = set(users.index)
for i in range(len(user_mapping)):
if i in index_set:
user_content.append(' '.join(list(users.loc[[i]]["tag"])))
else:
user_content.append('')
user_content = vectorizer.fit_transform(user_content)
user_content = user_content.astype(np.float32)
#users = pd.DataFrame(users.groupby('userId')['tag'].agg(lambda x: ' '.join(x)).reset_index(name = "tags"))
#vectorizer = CountVectorizer(binary = True)
#vectorizer = vectorizer.fit(list(users["tags"]))
#users["tags"]= users["tags"].map(lambda x: vectorizer.transform([x]))
df = df.rename(columns={"userId":"user", "movieId":"item", "rating":"rate"})
rows = len(df)
df = df.iloc[np.random.permutation(rows)].reset_index(drop=True)
split_index = int(rows * 0.9)
df_train = df[0:split_index]
df_test = df[split_index:].reset_index(drop=True)
with codecs.open('movielens_100k.pkl', 'wb') as outfile:
pickle.dump((df_train,df_test,user_content,movie_content), outfile, pickle.HIGHEST_PROTOCOL)
if name == 'main':
get_100k_data()
print("Done!")`
from openlearning4deeprecsys.
Wow! Thanks a lot! @Leavingseason
from openlearning4deeprecsys.
Related Issues (19)
- some questions HOT 2
- can deepFM use sparse data format? HOT 10
- 您好,求教一个小问题,谢谢
- Which version of python should I use to run this code? HOT 2
- ccfnet HOT 2
- About cross-domain-ccfnet HOT 4
- seems no usage of dropout HOT 3
- CCFNET : For predicting recommended items for users HOT 1
- DeepFM_bow issue with criteo dataset
- deepMF cant get 0.692 accu
- Whipper snappers
- Could you please share the dataset? HOT 3
- deepFM issue with ml-1m dataset HOT 4
- S1_4_and_S5.zip数据含义 HOT 1
- GPU usage HOT 1
- AUC=0.4 for deepFM HOT 2
- can default parameters give the best result?
- movielen_100k.pkl dataset HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openlearning4deeprecsys.