haowei01 / pytorch-examples Goto Github PK

train models in pytorch, Learn to Rank, Collaborative Filter, Heterogeneous Treatment Effect, Uplift Modeling, etc

Python 99.40% Shell 0.60%

learning-to-rank lambdarank pytorch-implementation pytorch-ranking ranknet ndcg inverse-propensity-score positional-bias uplift-modeling heterogeneous-treatment-effects

pytorch-examples's People

Contributors

Stargazers

Watchers

Forkers

rswezey jmitnik disenwang zhoujintao990131 axeltchaikovsky linktsang shiyongde avanigoel mieg0mak skyisnotwarm kyrie-zhao iannliu rovedream chen4519902 tonellotto vinayakpathak fangguo34 haif-liu

pytorch-examples's Issues

Relevance sort direction

Hi! Thanks for this excellent repo, it's very informative.

I've noticed that in your implementation of LambdaRank (specifically, on line 178) you sort the rank data frame by relevance in ascending order (which is the default for pandas' sort_values function).
Every other implementation of LambdaRank I've looked at (allRank, tensorflow-LTR) seems to order relevance levels in a descending order - and I'd have to agree with them (since you want higher relevance items towards the front of the array).

Sorry if I've misunderstood. Thanks.

Data Preprocessing might add more flavour!

Thank you for publishing the great repo and I definitely am learning a lot from this repo!!!
One thing I just noticed when I was working on investigating the dataset(Personalize Expedia Hotel Searches - ICDM 2013) was that some columns contain the large amount of Null so that removing those columns or imputing the missing values might improve the result!

Anyway, I will try myself as well!

Code

pytorch-examples/ranking/data_loaders/load_expedia.py

Lines 17 to 38 in 6c217bb

    
           def __init__(self): 
        
               cur_file = os.path.abspath(__file__) 
        
               self.data_dir = os.path.join(os.path.dirname(cur_file), DATA_DIR) 
        
               print('{} loading data from DATA dir {}'.format(get_time(), self.data_dir)) 
        
               pkl_file = os.path.join(self.data_dir, 'train.pkl') 
        
               if os.path.isfile(pkl_file): 
        
                   train_df = pd.read_pickle(pkl_file) 
        
               else: 
        
                   train_df = pd.read_csv(os.path.join(self.data_dir, 'train.zip')) 
        
                   train_df.to_pickle(pkl_file) 
        
               self.random_df = train_df[train_df[RANDOM_COL] == 1] 
        
               self.biased_click_df = train_df[train_df[RANDOM_COL] == 0] 
        
               print('{} unbiased training data rows {}'.format(get_time(), self.random_df.shape[0])) 
        
               print('{} biased training data rows {}'.format(get_time(), self.biased_click_df.shape[0])) 
        
               test_pkl_file = os.path.join(self.data_dir, 'test.pkl') 
        
               if os.path.isfile(test_pkl_file): 
        
                   self.test_df = pd.read_pickle(test_pkl_file) 
        
               else: 
        
                   self.test_df = pd.read_csv(os.path.join(self.data_dir, 'test.zip')) 
        
                   self.test_df.to_pickle(test_pkl_file) 
        
               print('{} test file size {}'.format(get_time(), self.test_df.shape[0]))

Data Analysis on the dataset

Something Wrong with decay_diff ?

Hi~Thank you for providing such an excellent work! But I have some questions about the calculation of the "decay_diff" term. ( LambdaRank.py, line 197)

I noticed that you are using "sort_order" as the discounted factor.
Sort_order is the the indices that would sort Y, which means sort_order[0] is the index of the document which is the most relevant to the query. But this is not cosistent with the gain_diff calculation!
Gain_diff (i, j) is the gain difference between douc i and douc j，but decay_diff is not the decay difference between douc i and douc j.

haowei01 / pytorch-examples Goto Github PK

pytorch-examples's People

Contributors

Stargazers

Watchers

Forkers

pytorch-examples's Issues

Relevance sort direction

Data Preprocessing might add more flavour!

Code

Data Analysis on the dataset

Something Wrong with decay_diff ?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def __init__(self):
	cur_file = os.path.abspath(__file__)
	self.data_dir = os.path.join(os.path.dirname(cur_file), DATA_DIR)
	print('{} loading data from DATA dir {}'.format(get_time(), self.data_dir))
	pkl_file = os.path.join(self.data_dir, 'train.pkl')
	if os.path.isfile(pkl_file):
	train_df = pd.read_pickle(pkl_file)
	else:
	train_df = pd.read_csv(os.path.join(self.data_dir, 'train.zip'))
	train_df.to_pickle(pkl_file)
	self.random_df = train_df[train_df[RANDOM_COL] == 1]
	self.biased_click_df = train_df[train_df[RANDOM_COL] == 0]
	print('{} unbiased training data rows {}'.format(get_time(), self.random_df.shape[0]))
	print('{} biased training data rows {}'.format(get_time(), self.biased_click_df.shape[0]))

	test_pkl_file = os.path.join(self.data_dir, 'test.pkl')
	if os.path.isfile(test_pkl_file):
	self.test_df = pd.read_pickle(test_pkl_file)
	else:
	self.test_df = pd.read_csv(os.path.join(self.data_dir, 'test.zip'))
	self.test_df.to_pickle(test_pkl_file)
	print('{} test file size {}'.format(get_time(), self.test_df.shape[0]))