Comments (6)
@fleapapa
A1: Yes, you need to split the dataset into non-overlapping blocks prior use of the distributed version of implicit ALS.
A2: With Intel DAAL you can verify the trained model by computing, for example, an RMSE for the same training data set. In order to do this, compute RMSE between the training data set and the predictions.
I attach the example that shows the flow of the computations. See testModelQuality() function in the attached example.
To align the computations with MLLib you may also provide test data set for RMSE computation. To do this, please replace transposedDataTable[0], …, transposedDataTable[nBlocks - 1] with the numeric tables that contain test ratings in CSR format.
Best regards,
Victoriya
from onedal.
Thanks for the example code! It's very helpful.
Regarding
To align the computations with MLLib you may also provide test data set for RMSE computation. To do this, please replace transposedDataTable[0], …, transposedDataTable[nBlocks - 1] with the numeric tables that contain test ratings in CSR format.
why are test ratings placed in transposedData instead of both data and transposedData?
from onedal.
Hi Victoriya,
In your example code, i found two undefined member functions:
size_t *colIndices = sparseBlock.getBlockColumnIndicesSharedPtr().get(); size_t *rowOffsets = sparseBlock.getBlockRowIndicesSharedPtr().get();
The two functions, getBlockColumnIndicesSharedPtr and getBlockRowIndicesSharedPtr, seem only available in 2018 Beta. I am using 2017 release. May i just copy the latest header file @ https://github.com/01org/daal/blob/92f4dde5a1e2d7f132111588f4513cc7c4578052/include/data_management/data/csr_numeric_table.h without any negative impact to my application?
from onedal.
why are test ratings placed in transposedData instead of both data and transposedData?
Both data and transposedData arrays define the same distributed numeric table. In the data array the table is split by rows (users), and in the transposedData array the table is split by columns (items). The code I have provided uses transposedData as the ground truth in the testModelQuality() function. That is why to test the quality of the trained model you need only the transposedData.
May i just copy the latest header file
It would be better not to copy a header file, but to modify the example to make it work with DAAL 2017. Please replace those two lines of code with the following code:
size_t *colIndices = sparseBlock.getBlockColumnIndicesPtr();
size_t *rowOffsets = sparseBlock.getBlockRowIndicesPtr();
Best regards,
Victoriya
from onedal.
Victoriya,
Thanks for the replacement code. It works:)
However, afterward my app crashed with error on a call to free(). [i didn't call free():] If i comment out testModelQuality (thus mergePredictions too), then no crash.
I'm investigating the crash, and found it most likely with incorrect shape of the matrix 'predictions'. I put some logging messages which show as follow:
predictedRatings[0][0]: 1360, 2500
predictedRatings[0][1]: 1360, 2500
predictedRatings[0][2]: 1360, 2500
predictedRatings[0][3]: 1358, 2500
predictedRatings[1][0]: 1360, 2500
predictedRatings[1][1]: 1360, 2500
predictedRatings[1][2]: 1360, 2500
predictedRatings[1][3]: 1358, 2500
predictedRatings[2][0]: 1360, 2500
predictedRatings[2][1]: 1360, 2500
predictedRatings[2][2]: 1360, 2500
predictedRatings[2][3]: 1358, 2500
predictedRatings[3][0]: 1360, 2499
predictedRatings[3][1]: 1360, 2499
predictedRatings[3][2]: 1360, 2499
predictedRatings[3][3]: 1358, 2499
while predictions' is allocated to be in a shape of (5438, 9998). I don't know why it is not 9999, because my input matrix is in a shape of (5438, 9999).
However, even i manually change the statement
HomogenNumericTable<float> predictions(nItems, nUsers, NumericTable::doAllocate);
to
HomogenNumericTable predictions(9999, nUsers, NumericTable::doAllocate);
The code still crashes.
By the way, final RMSE is 0.79 which is unreasonable high (with SPARK ML, it is 0.11 only:).
Most likely incorrect shape of the matrixes is the culprit of these issues. I'm hunting it...
from onedal.
Victoriya,
After making the following two changes to your example code, finally i got it working:
- Use dataTables[] instead of transposedDataTables[] in testModelQuality()
- Modify mergePredictions() according to the shapes of predictedRatingsMaster[][]
RMSE is only reduced to 0.32 (from previous 0.79) and still higher than that obtained using SPARK ML, but i am very happy because my app works and doesn't crash now. And, it's much faster than pyspark!
I will port my 'model selection' Python code used with SPARK to my DAAL ALS C++ app and see if by tuning some hyper-parameters i can get a RMSE as close as 0.11 :)
I am attached my changed code, FYI.
my-daal-als-changes.txt
I'm closing this issue.
Many thanks again!
from onedal.
Related Issues (20)
- NaiveBayes hang in the middle of localAlgorithm.compute HOT 2
- Implement HistGradientBoostingRegressor HOT 1
- Improvements in cmake configuration HOT 2
- Makefile build against official MKL and TBB possible? HOT 3
- Issues with the makefile - what about a cmake based build system? HOT 2
- Can not deploy an application that is dynamically linked against oneDAL HOT 6
- How to make a debug build (missing daal_vmlipp_cored.lib)? HOT 7
- Feature request: Add support for ClangCl on MSVC HOT 4
- Looking forward to kd-tree implementation.. HOT 1
- Support for accelerating NGBoost HOT 1
- Compile Failure: "No kernel name provided without -fsycl-unnamed-lambda enabled!" HOT 2
- Support for RISC-V architecture HOT 1
- Error in running daal c++ examples HOT 4
- Datasets used for producing speedup benchmarks in scikit-learn intelex HOT 1
- Remote branch develop not found in upstream origin HOT 2
- `xsyrk` vs `xxsyrk` vs `dsyrk` (in netlib LAPACK) HOT 4
- PCA tests fail periodically.
- Not able to build daal/cpp examples HOT 3
- How to validate results when adding a new backend (i.e., OpenBLAS, Lapack, etc) HOT 1
- Differences betwen SPBLAS - xcsrmultd & xcsrmm ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onedal.