Giter Site home page Giter Site logo

2018-tencent-ad-competition-baseline's Issues

LGBMClassifier模型怎么保存?

LGBMClassifier训练完之后如何保存,下一次使用的时候直接加载.
还有预处理完的矩阵是不是也可以保存,每次都得预处理一遍,太慢了

CountVectorizer()需要设定token_pattern参数

默认的token_pattern为'(?u)\b\w\w+\b',这样的话似乎会忽略长度为1的字符,如'1'、'2',从而导致特征缺失。若nan填充为'-1'的话,可考虑设置token_pattern为'(?u)\b(?<!-)\d+\b'

大神报这个错:DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.

大神,请教个问题,clf.predict的时候报错:
DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use array.size > 0 to check that an array is not empty.

会是什么原因呢,刚接触不是很懂,能否指点下

萌新求教,跑了两个小时后出现这些错误,是哪里不对啊?

Traceback (most recent call last):
File "E:/2018-tencent-ad-competition-baseline-master/bryan_baseline_v2.py", line 76, in batch_predict
data[feature] = LabelEncoder().fit_transform(data[feature].apply(int))
File "E:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py", line 2551, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/src/inference.pyx", line 1521, in pandas._libs.lib.map_infer
ValueError: invalid literal for int() with base 10: '1 2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:/2018-tencent-ad-competition-baseline-master/bryan_baseline_v2.py", line 115, in
result.append(batch_predict(slice,i))
File "E:/2018-tencent-ad-competition-baseline-master/bryan_baseline_v2.py", line 78, in batch_predict
data[feature] = LabelEncoder().fit_transform(data[feature])
File "E:\ProgramData\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py", line 112, in fit_transform
self.classes_, y = np.unique(y, return_inverse=True)
File "E:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py", line 210, in unique
return _unique1d(ar, return_index, return_inverse, return_counts)
File "E:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py", line 274, in _unique1d
perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
TypeError: '<' not supported between instances of 'int' and 'str'

Process finished with exit code 1

使用 user_feature_tocsv.py 将用户特征转换成csv文件时出错

执行这句话时,出错了,请问该怎么办啊?谢谢
--可以先使用 user_feature_tocsv.py 将用户特征转换成csv文件,以便后面直接pd.read_csv读入Traceback

错误信息:
... ...
11300000
11400000
(most recent call last):
File "user_feature_tocsv.py", line 28, in
user_feature=pd.concat([pd.DataFrame('../data/userFeature_' + str(i) + '.csv') for i in range(cnt+1)])
File "user_feature_tocsv.py", line 28, in
user_feature=pd.concat([pd.DataFrame('../data/userFeature_' + str(i) + '.csv') for i in range(cnt+1)])
File "/home/wc/.local/lib/python3.5/site-packages/pandas/core/frame.py", line 404, in init
raise ValueError('DataFrame constructor not properly called!')
ValueError: DataFrame constructor not properly called!
(tfpy35) wc@ubuntu:~/Desktop/Tencent/code$ ^C

数据下载连接失效

我想下载你分享的数据链接,但是连接失效无法下载,你能再次分享一下吗 谢谢

bryan_baseline_v3.py#98行 train_x = sparse.hstack((train_x, train_a)) 中为什么要用sparse.hstack而不是df.concat?

bryan_baseline_v3.py#98行
train_x = sparse.hstack((train_x, train_a)) 中
为什么要用sparse.hstack而不是df.concat?

2.#66-#68
one_hot_feature=['LBS','age','carrier','cons......
vector_feature =['appIdAction','a','....
one_hot_feature是需要进行标准化处理的(x-u/标准差)
vector_feature是需要进行文本特征提取的。因而对两部分特征分别遍历,然后一列一列加上。
不知道这么理解对不对

我再对baseline3跑的时候遇到了问题

问题是出在
slice = train[start:end]
这一行,显示的是 can not do slice indexing on with these indexers [0.0] <type 'float'>
我打印了一下start 和 end都是float类型,是因为我的pandas版本太低没法操作吗?

v3 内存仍然不够 读取4000000后killed

您好,我现在给电脑新装了8G的内存条,用的ubuntu16.04系统,打开了ulimit -c unlimited,但是跑v3版本还是会在读取4000000行左右的user_feature.data后进程被killed,请问可能有什么解决办法吗?非常感谢!

你好,tocsv的程序有点不理解

为什么既要把数据分开写入csv,又把数据单独写入csv呀;
为什么在循环体内已经del userFeature_data了,并且把它设为空了,它还是能写入文件呢?
这两点不太懂,请多多指教

内存实测

大佬标记的8g可用的,8gram+4g虚拟内存,读取到770wuserdata会memoryerror;选择部分数据集,250w吃满,500wmemoryerror
标记为16g可用的,32g内存,实测在读取userdata的时候,满占用;最后迭代的时候,稳定占用内存28g左右。
以上数据分别是我在自己电脑上和实验室电脑上跑所得数据,仅供参考,顺便感谢大佬
(本渣也入坑datascience了)

数据集

忘记报名了,可以给一下数据集吗?

萌新求教 出现keyerror aid

Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\base.py", line 2525, in get_loc
return self._engine.get_loc(key)
File "pandas_libs\index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'aid'

During handling of the above exception, another exception occurred:

creativeSize 是什么意思

你好,请问代码52,53行这两句话中 'creativeSize' 是什么意思,从哪来的, 这两句话的作用是什么?
train_x=train[['creativeSize']]
test_x=test[['creativeSize']]

为什么稀疏的数据是无法分片?

这个地方有点不理解,将数据全部稀疏化后,只是列数增多了,行数没变化。为什么不能使用切片呢?是不是因为数据中有多变量特征,如果全部是单变量特征,是不是稀疏化后就可以进行切片。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.