Comments (5)
XBoost does not read from csv file directly so far. For text format, we
support libsvm format. Check the example to read using numpy
On Tuesday, September 16, 2014, explorerr [email protected] wrote:
I have a large file with 300+ features in each record.
While trying to load the data with DMatrix in python, I got the following
message:dtest = xgb.DMatrix(tsDir+'xgbTest.csv', missing=-999.0)
86x397 matrix with 328730778 entries is loaded from ../data/xgbTest.csvI know I have 1834123 lines of record.
I looked into the file at 86 line, which is no different than any other
line.What could be a possible reason for this?
Thanks very much!
Rui
—
Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/78.
from xgboost.
Thanks very much for your response.
I name the file as csv, but actually I have converted the file into libvm format.
I have successfully loaded the training dataset. I am having problem with the testing dataset.
so the one record of the training file is like this:
0 0:1 1:0.0397 2:0 3:318.77 4:1 21:0 6:0.00 7:0.00 8:0 9:1 10:0.0281 11:0.01 12:0.0397 137:0.0397 76:0.00 68:0.03 16:0.00 18:74500602 19:0.00 100:0.00 22:0 23:1 24:1 25:0.00 136:0.47 140:1 27:9999999.99 28:0 29:1 30:4 32:42 33:0.0281 105:0.0419 35:4 99:0.00 36:-1.23 37:59 38:0 39:281.21 103:1 41:0 42:1 43:0 44:1 45:3 46:0 221:0 47:9999999.99 48:0 49:0 50:0.00 51:1 52:1.00 53:0.0006 54:0.0281 55:0 56:0 57:0.0000 58:0.0281 59:295.52 193:1.00 61:1.00 62:0.0397 145:1 65:0.00 66:0.0281 121:0.0281 208:0 69:0.000 71:0 146:0 74:0 75:999 14:699 77:0.00 78:0 79:0 80:0.0397 198:1 82:7 83:2.127 84:0 239:0.00 86:0 87:1 88:0 107:0 91:0 93:1 264:0 95:2608 96:1 5:0.00 92:0.05 67:0 20:1.00 101:4 102:0.0005 151:0.0281 104:0.00 141:711 106:0 175:0 186:-3 108:38 109:0 110:0.00 111:0.01 203:1 113:0 114:0 115:0.0397 116:262 72:9999 118:1 157:5 120:0 31:4 122:0 123:0.0397 124:9999999.99 125:41 126:0 127:14 128:9999 129:0.0871 130:5 248:80.19 132:0 160:115.62 134:2 260:0.00 135:5.00 94:0 15:1 138:0 139:0 26:0 117:0.00 267:1 142:1 143:-3 252:1 13:68 73:-0.46 147:1 148:-3 149:1.00 150:9999999.99 97:0 152:0 164:79 154:1 155:12 156:0 119:2 158:0.0397 159:2.10 133:1392 161:0.05 162:0 163:0 153:0.00 166:-0.68 167:679 90:0.00 169:0 265:0 171:0.00 173:0 174:0.00 177:0 170:0 179:9999 176:999.99 181:0 182:-1.59 183:0.00 184:12 34:0 89:1 40:96 188:1459 189:0 165:0 191:0 192:1 60:0 63:0 195:0.0397 196:0 197:19 81:0.00 199:0.0281 200:0.0011 201:0.00 168:1 112:1.00 204:1.00 216:47 206:0 207:0.05 70:1 209:33 210:0 211:0 212:0.00 190:1 213:2 214:0.0397 215:0.0281 205:1 217:-3 266:1.00 218:0.00 219:1 220:0 172:1.000 222:0.0281 223:0 224:0.25 225:0 226:76 187:1 227:2777.65 228:0.00 229:1 230:18 231:9999 232:0 233:1 234:1 235:9999 236:0.00 237:0 238:263 257:1 85:0 240:0.0397 241:0.0000 202:1 242:0 243:0.00 244:0.00 245:-1.53 246:5 247:0 131:0.00 249:-1.90 250:1.00 251:-711 144:999 253:0 98:0 254:1 255:-0.66 256:0.0000 178:999.99 258:0 259:0 185:1.00 261:1 270:1 263:0 194:23 180:1 17:0.00 64:1.00 268:538.23 269:1 262:0 271:3073.17 337:1
The testing dataset look exactly the same, just without the first column (the label)...
Thanks!
Rui
from xgboost.
Oh, you need dummy label for testset as well
On Tuesday, September 16, 2014, explorerr [email protected] wrote:
Thanks very much for your response.
I name the file as csv, but actually I have converted the file into libvm
format.I have successfully loaded the training dataset. I am having problem with
the testing dataset.so the one record of the training file is like this:
0 0:1 1:0.0397 2:0 3:318.77 4:1 21:0 6:0.00 7:0.00 8:0 9:1 10:0.0281
11:0.01 12:0.0397 137:0.0397 76:0.00 68:0.03 16:0.00 18:74500602 19:0.00
100:0.00 22:0 23:1 24:1 25:0.00 136:0.47 140:1 27:9999999.99 28:0 29:1 30:4
32:42 33:0.0281 105:0.0419 35:4 99:0.00 36:-1.23 37:59 38:0 39:281.21 103:1
41:0 42:1 43:0 44:1 45:3 46:0 221:0 47:9999999.99 48:0 49:0 50:0.00 51:1
52:1.00 53:0.0006 54:0.0281 55:0 56:0 57:0.0000 58:0.0281 59:295.52
193:1.00 61:1.00 62:0.0397 145:1 65:0.00 66:0.0281 121:0.0281 208:0
69:0.000 71:0 146:0 74:0 75:999 14:699 77:0.00 78:0 79:0 80:0.0397 198:1
82:7 83:2.127 84:0 239:0.00 86:0 87:1 88:0 107:0 91:0 93:1 264:0 95:2608
96:1 5:0.00 92:0.05 67:0 20:1.00 101:4 102:0.0005 151:0.0281 104:0.00
141:711 106:0 175:0 186:-3 108:38 109:0 110:0.00 111:0.01 203:1 113:0 114:0
115:0.0397 116:262 72:9999 118:1 157:5 120:0 31:4 122:0 123:0.0397
124:9999999.99 125:41 126:0 127:14 128:9999 129:0.0871 130:5 248:80.19
132:0 160:115.62 134:2 260:0.00 135:5.00 94:0 15:1 138:0 139:0 26:0
117:0.00 267:1 142:1 143:-3 252:1 13:68 73:-0.46 147:1 148:-3 149:1.00
150:9999999.99 97:0 152:0 164:79 154:1 155:12 156:0 119:2 158:0.0397
159:2.10 133:1392 161:0.05 162:0 163:0 153:0.00 166:-0.68 167:679 90:0.00
169:0 265:0 171:0.00 173:0 174:0.00 177:0 170:0 179:9999 176:999.99 181:0
182:-1.59 183:0.00 184:12 34:0 89:1 40:96 188:1459 189:0 165:0 191:0 192:1
60:0 63:0 195:0.0397 196:0 197:19 81:0.00 199:0.0281 200:0.0011 201:0.00
168:1 112:1.00 204:1.00 216:47 206:0 207:0.05 70:1 209:33 210:0 211:0
212:0.00 190:1 213:2 214:0.0397 215:0.0281 205:1 217:-3 266:1.00 218:0.00
219:1 220:0 172:1.000 222:0.0281 223:0 224:0.25 225:0 226:76 187:1
227:2777.65 228:0.00 229:1 230:18 231:9999 232:0 233:1 234:1 235:9999
236:0.00 237:0 238:263 257:1 85:0 240:0.0397 241:0.0000 202:1 242:0
243:0.00 244:0.00 245:-1.53 246:5 247:0 131:0.00 249:-1.90 250:1.00
251:-711 144:999 253:0 98:0 254:1 255:-0.66 256:0.0000 178:999.99 258:0
259:0 185:1.00 261:1 270:1 263:0 194:23 180:1 17:0.00 64:1.00 268: 538.23
269:1 262:0 271:3073.17 337:1The testing dataset look exactly the same, just without the first column
(the label)...Thanks!
Rui
—
Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/78#issuecomment-55839738.
Sincerely,
Tianqi Chen
Computer Science & Engineering, University of Washington
from xgboost.
I see, thanks very much :)
On Tue, Sep 16, 2014 at 10:07 PM, Tianqi Chen [email protected]
wrote:
Oh, you need dummy label for testset as well
On Tuesday, September 16, 2014, explorerr [email protected]
wrote:Thanks very much for your response.
I name the file as csv, but actually I have converted the file into
libvm
format.I have successfully loaded the training dataset. I am having problem
with
the testing dataset.so the one record of the training file is like this:
0 0:1 1:0.0397 2:0 3:318.77 4:1 21:0 6:0.00 7:0.00 8:0 9:1 10:0.0281
11:0.01 12:0.0397 137:0.0397 76:0.00 68:0.03 16:0.00 18:74500602 19:0.00
100:0.00 22:0 23:1 24:1 25:0.00 136:0.47 140:1 27:9999999.99 28:0 29:1
30:4
32:42 33:0.0281 105:0.0419 35:4 99:0.00 36:-1.23 37:59 38:0 39:281.21
103:1
41:0 42:1 43:0 44:1 45:3 46:0 221:0 47:9999999.99 48:0 49:0 50:0.00 51:1
52:1.00 53:0.0006 54:0.0281 55:0 56:0 57:0.0000 58:0.0281 59:295.52
193:1.00 61:1.00 62:0.0397 145:1 65:0.00 66:0.0281 121:0.0281 208:0
69:0.000 71:0 146:0 74:0 75:999 14:699 77:0.00 78:0 79:0 80:0.0397 198:1
82:7 83:2.127 84:0 239:0.00 86:0 87:1 88:0 107:0 91:0 93:1 264:0 95:2608
96:1 5:0.00 92:0.05 67:0 20:1.00 101:4 102:0.0005 151:0.0281 104:0.00
141:711 106:0 175:0 186:-3 108:38 109:0 110:0.00 111:0.01 203:1 113:0
114:0
115:0.0397 116:262 72:9999 118:1 157:5 120:0 31:4 122:0 123:0.0397
124:9999999.99 125:41 126:0 127:14 128:9999 129:0.0871 130:5 248:80.19
132:0 160:115.62 134:2 260:0.00 135:5.00 94:0 15:1 138:0 139:0 26:0
117:0.00 267:1 142:1 143:-3 252:1 13:68 73:-0.46 147:1 148:-3 149:1.00
150:9999999.99 97:0 152:0 164:79 154:1 155:12 156:0 119:2 158:0.0397
159:2.10 133:1392 161:0.05 162:0 163:0 153:0.00 166:-0.68 167:679
90:0.00
169:0 265:0 171:0.00 173:0 174:0.00 177:0 170:0 179:9999 176:999.99
181:0
182:-1.59 183:0.00 184:12 34:0 89:1 40:96 188:1459 189:0 165:0 191:0
192:1
60:0 63:0 195:0.0397 196:0 197:19 81:0.00 199:0.0281 200:0.0011 201:0.00
168:1 112:1.00 204:1.00 216:47 206:0 207:0.05 70:1 209:33 210:0 211:0
212:0.00 190:1 213:2 214:0.0397 215:0.0281 205:1 217:-3 266:1.00
218:0.00
219:1 220:0 172:1.000 222:0.0281 223:0 224:0.25 225:0 226:76 187:1
227:2777.65 228:0.00 229:1 230:18 231:9999 232:0 233:1 234:1 235:9999
236:0.00 237:0 238:263 257:1 85:0 240:0.0397 241:0.0000 202:1 242:0
243:0.00 244:0.00 245:-1.53 246:5 247:0 131:0.00 249:-1.90 250:1.00
251:-711 144:999 253:0 98:0 254:1 255:-0.66 256:0.0000 178:999.99 258:0
259:0 185:1.00 261:1 270:1 263:0 194:23 180:1 17:0.00 64:1.00 268:
538.23
269:1 262:0 271:3073.17 337:1The testing dataset look exactly the same, just without the first column
(the label)...Thanks!
Rui
—
Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/78#issuecomment-55839738.Sincerely,
Tianqi Chen
Computer Science & Engineering, University of Washington—
Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/78#issuecomment-55839923.
from xgboost.
You did a great job with xgboost, bravo :)
On Tue, Sep 16, 2014 at 10:08 PM, Zhang Rui [email protected] wrote:
I see, thanks very much :)
On Tue, Sep 16, 2014 at 10:07 PM, Tianqi Chen [email protected]
wrote:Oh, you need dummy label for testset as well
On Tuesday, September 16, 2014, explorerr [email protected]
wrote:Thanks very much for your response.
I name the file as csv, but actually I have converted the file into
libvm
format.I have successfully loaded the training dataset. I am having problem
with
the testing dataset.so the one record of the training file is like this:
0 0:1 1:0.0397 2:0 3:318.77 4:1 21:0 6:0.00 7:0.00 8:0 9:1 10:0.0281
11:0.01 12:0.0397 137:0.0397 76:0.00 68:0.03 16:0.00 18:74500602
19:0.00
100:0.00 22:0 23:1 24:1 25:0.00 136:0.47 140:1 27:9999999.99 28:0 29:1
30:4
32:42 33:0.0281 105:0.0419 35:4 99:0.00 36:-1.23 37:59 38:0 39:281.21
103:1
41:0 42:1 43:0 44:1 45:3 46:0 221:0 47:9999999.99 48:0 49:0 50:0.00
51:1
52:1.00 53:0.0006 54:0.0281 55:0 56:0 57:0.0000 58:0.0281 59:295.52
193:1.00 61:1.00 62:0.0397 145:1 65:0.00 66:0.0281 121:0.0281 208:0
69:0.000 71:0 146:0 74:0 75:999 14:699 77:0.00 78:0 79:0 80:0.0397
198:1
82:7 83:2.127 84:0 239:0.00 86:0 87:1 88:0 107:0 91:0 93:1 264:0
95:2608
96:1 5:0.00 92:0.05 67:0 20:1.00 101:4 102:0.0005 151:0.0281 104:0.00
141:711 106:0 175:0 186:-3 108:38 109:0 110:0.00 111:0.01 203:1 113:0
114:0
115:0.0397 116:262 72:9999 118:1 157:5 120:0 31:4 122:0 123:0.0397
124:9999999.99 125:41 126:0 127:14 128:9999 129:0.0871 130:5 248:80.19
132:0 160:115.62 134:2 260:0.00 135:5.00 94:0 15:1 138:0 139:0 26:0
117:0.00 267:1 142:1 143:-3 252:1 13:68 73:-0.46 147:1 148:-3 149:1.00
150:9999999.99 97:0 152:0 164:79 154:1 155:12 156:0 119:2 158:0.0397
159:2.10 133:1392 161:0.05 162:0 163:0 153:0.00 166:-0.68 167:679
90:0.00
169:0 265:0 171:0.00 173:0 174:0.00 177:0 170:0 179:9999 176:999.99
181:0
182:-1.59 183:0.00 184:12 34:0 89:1 40:96 188:1459 189:0 165:0 191:0
192:1
60:0 63:0 195:0.0397 196:0 197:19 81:0.00 199:0.0281 200:0.0011
201:0.00
168:1 112:1.00 204:1.00 216:47 206:0 207:0.05 70:1 209:33 210:0 211:0
212:0.00 190:1 213:2 214:0.0397 215:0.0281 205:1 217:-3 266:1.00
218:0.00
219:1 220:0 172:1.000 222:0.0281 223:0 224:0.25 225:0 226:76 187:1
227:2777.65 228:0.00 229:1 230:18 231:9999 232:0 233:1 234:1 235:9999
236:0.00 237:0 238:263 257:1 85:0 240:0.0397 241:0.0000 202:1 242:0
243:0.00 244:0.00 245:-1.53 246:5 247:0 131:0.00 249:-1.90 250:1.00
251:-711 144:999 253:0 98:0 254:1 255:-0.66 256:0.0000 178:999.99 258:0
259:0 185:1.00 261:1 270:1 263:0 194:23 180:1 17:0.00 64:1.00 268:
538.23
269:1 262:0 271:3073.17 337:1The testing dataset look exactly the same, just without the first
column
(the label)...Thanks!
Rui
—
Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/78#issuecomment-55839738.Sincerely,
Tianqi Chen
Computer Science & Engineering, University of Washington—
Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/78#issuecomment-55839923.
from xgboost.
Related Issues (20)
- clarification needed for model/saving loading HOT 2
- Missing XGBoostRanker in xgboost4j-spark jvm package HOT 3
- multi label support in Scala xgboost. HOT 8
- SparkXGBClassifier does not validate params HOT 6
- feature_weights only compatible with CPU ? HOT 1
- Improve XGBoost quantile predictions HOT 3
- Latest version training crashes HOT 12
- Error when trying to build HOT 4
- Model provides different results for different Python versions HOT 2
- error in the docs for ranking HOT 1
- Defining a callback to write hessians of train observations to a csv file HOT 2
- Slow inference on sphoradic stremaing data HOT 2
- XGBoost GPU Warning When Working with BayesSearchCV (XGBoost is running on: cuda:0, while the input data is on: cpu.)
- help installing xgboost with gpu HOT 2
- Potential Documentation Inaccuracy Regarding Feature Interaction Constraints
- Horizontal Federated Learning with Secure Features RFC
- [bug] Python - Cuda error (without using Cuda) HOT 5
- Pandas 2.2: Index.format is deprecated
- ArrayInterface handler for cuDF DataFrame cannot yet handle Boolean columns HOT 1
- src/metric/auc.cc:322: Check failed: auc <= local_area HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xgboost.