Giter Site home page Giter Site logo

2018-tencent-ad-competition-baseline's People

Contributors

youchounobb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

2018-tencent-ad-competition-baseline's Issues

你好,tocsv的程序有点不理解

为什么既要把数据分开写入csv,又把数据单独写入csv呀;
为什么在循环体内已经del userFeature_data了,并且把它设为空了,它还是能写入文件呢?
这两点不太懂,请多多指教

数据下载连接失效

我想下载你分享的数据链接,但是连接失效无法下载,你能再次分享一下吗 谢谢

大神报这个错:DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.

大神,请教个问题,clf.predict的时候报错:
DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use array.size > 0 to check that an array is not empty.

会是什么原因呢,刚接触不是很懂,能否指点下

creativeSize 是什么意思

你好,请问代码52,53行这两句话中 'creativeSize' 是什么意思,从哪来的, 这两句话的作用是什么?
train_x=train[['creativeSize']]
test_x=test[['creativeSize']]

数据集

忘记报名了,可以给一下数据集吗?

为什么稀疏的数据是无法分片?

这个地方有点不理解,将数据全部稀疏化后,只是列数增多了,行数没变化。为什么不能使用切片呢?是不是因为数据中有多变量特征,如果全部是单变量特征,是不是稀疏化后就可以进行切片。

使用 user_feature_tocsv.py 将用户特征转换成csv文件时出错

执行这句话时,出错了,请问该怎么办啊?谢谢
--可以先使用 user_feature_tocsv.py 将用户特征转换成csv文件,以便后面直接pd.read_csv读入Traceback

错误信息:
... ...
11300000
11400000
(most recent call last):
File "user_feature_tocsv.py", line 28, in
user_feature=pd.concat([pd.DataFrame('../data/userFeature_' + str(i) + '.csv') for i in range(cnt+1)])
File "user_feature_tocsv.py", line 28, in
user_feature=pd.concat([pd.DataFrame('../data/userFeature_' + str(i) + '.csv') for i in range(cnt+1)])
File "/home/wc/.local/lib/python3.5/site-packages/pandas/core/frame.py", line 404, in init
raise ValueError('DataFrame constructor not properly called!')
ValueError: DataFrame constructor not properly called!
(tfpy35) wc@ubuntu:~/Desktop/Tencent/code$ ^C

v3 内存仍然不够 读取4000000后killed

您好,我现在给电脑新装了8G的内存条,用的ubuntu16.04系统,打开了ulimit -c unlimited,但是跑v3版本还是会在读取4000000行左右的user_feature.data后进程被killed,请问可能有什么解决办法吗?非常感谢!

LGBMClassifier模型怎么保存?

LGBMClassifier训练完之后如何保存,下一次使用的时候直接加载.
还有预处理完的矩阵是不是也可以保存,每次都得预处理一遍,太慢了

bryan_baseline_v3.py#98行 train_x = sparse.hstack((train_x, train_a)) 中为什么要用sparse.hstack而不是df.concat?

bryan_baseline_v3.py#98行
train_x = sparse.hstack((train_x, train_a)) 中
为什么要用sparse.hstack而不是df.concat?

2.#66-#68
one_hot_feature=['LBS','age','carrier','cons......
vector_feature =['appIdAction','a','....
one_hot_feature是需要进行标准化处理的(x-u/标准差)
vector_feature是需要进行文本特征提取的。因而对两部分特征分别遍历,然后一列一列加上。
不知道这么理解对不对

CountVectorizer()需要设定token_pattern参数

默认的token_pattern为'(?u)\b\w\w+\b',这样的话似乎会忽略长度为1的字符,如'1'、'2',从而导致特征缺失。若nan填充为'-1'的话,可考虑设置token_pattern为'(?u)\b(?<!-)\d+\b'

我再对baseline3跑的时候遇到了问题

问题是出在
slice = train[start:end]
这一行,显示的是 can not do slice indexing on with these indexers [0.0] <type 'float'>
我打印了一下start 和 end都是float类型,是因为我的pandas版本太低没法操作吗?

内存实测

大佬标记的8g可用的,8gram+4g虚拟内存,读取到770wuserdata会memoryerror;选择部分数据集,250w吃满,500wmemoryerror
标记为16g可用的,32g内存,实测在读取userdata的时候,满占用;最后迭代的时候,稳定占用内存28g左右。
以上数据分别是我在自己电脑上和实验室电脑上跑所得数据,仅供参考,顺便感谢大佬
(本渣也入坑datascience了)

萌新求教,跑了两个小时后出现这些错误,是哪里不对啊?

Traceback (most recent call last):
File "E:/2018-tencent-ad-competition-baseline-master/bryan_baseline_v2.py", line 76, in batch_predict
data[feature] = LabelEncoder().fit_transform(data[feature].apply(int))
File "E:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py", line 2551, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/src/inference.pyx", line 1521, in pandas._libs.lib.map_infer
ValueError: invalid literal for int() with base 10: '1 2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:/2018-tencent-ad-competition-baseline-master/bryan_baseline_v2.py", line 115, in
result.append(batch_predict(slice,i))
File "E:/2018-tencent-ad-competition-baseline-master/bryan_baseline_v2.py", line 78, in batch_predict
data[feature] = LabelEncoder().fit_transform(data[feature])
File "E:\ProgramData\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py", line 112, in fit_transform
self.classes_, y = np.unique(y, return_inverse=True)
File "E:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py", line 210, in unique
return _unique1d(ar, return_index, return_inverse, return_counts)
File "E:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py", line 274, in _unique1d
perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
TypeError: '<' not supported between instances of 'int' and 'str'

Process finished with exit code 1

萌新求教 出现keyerror aid

Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\base.py", line 2525, in get_loc
return self._engine.get_loc(key)
File "pandas_libs\index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'aid'

During handling of the above exception, another exception occurred:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.