ferortega / cf4j Goto Github PK
View Code? Open in Web Editor NEWCF4J: Collaborative Filtering for Java
License: Apache License 2.0
CF4J: Collaborative Filtering for Java
License: Apache License 2.0
Dear @ferortega Sir,
During execution of a UBCF on 20M dataset, the
sample_console.docx
console shows some % values. Can you please explain what are these values??
Example: attached file
In Movielens 1M dataset, the number of movies is ~ 3900 (i.e. 3883) as specified in the dataset.
But using the CF4J library's kernel, if we find the number of items in the dataset by using the method "getNumberOfItems()" it only prints 3706 items.
why is it so? Plz, explain @ferortega
Hi, @ferortega , @jesusmayor , I'd like to report a vulnerability issue in es.upm.etsisi:cf4j:2.2.7.
es.upm.etsisi:cf4j:2.2.7 directly or transitively depends on 639 C libraries (.so) cross many platforms(such as x86-64, x86, arm64, armhf). However, I noticed that some C libraries are vulnerable, containing the following CVEs:
liblept.so
from C project libjpeg-turbo(version:1.5.3) exposed 1 vulnerabilities:
CVE-2018-14498
libjpeg-turbo has fixed the vulnerabilities in versions >=2.1.0
Java build tools cannot report vulnerable C libraries, which may induce potential security issues to many downstream Java projects.
Could you please upgrade the above shared libraries to their patch versions?
Thanks for your help~
Best regards,
Helen Parr
Dear Sir @ferortega,
We know that Adjusted cosine similarity measure (ACOS) had been introduced to measure the similarity between items [Ref. Article Below], why cf4j framework includes this measure, ACOS, to calculate the similarity between users via class: cf4j.knn.userToUser.similarities.MetricAjustedCosine?
Article:
Sarwar, Badrul, George Karypis, Joseph Konstan, and John Reidl. 2001. “Item-Based Collaborative Filtering Recommendation Algorithms.” Proceedings of the Tenth International Conference on World Wide Web - WWW ’01, 285–95
Hi,
I noticed that the release jar file is about 500Mb in size! And it bundles all the dependencies inside the jar. This gave me trouble because of a version conflict and duplicated classes on the classpath of different versions.
It would be great if you could provide a normal release aswell without the bundled dependencies.
For now I made a fork (https://github.com/atexp/cf4j/commits/master) which allows to build a release with jitpack and depend on that.
Thank you for this library, it helped me a lot to get started quickly with a recommender!
Hi,
Is there any nDCG method implemented, or the possibility of implementing it?
Best Regards,
Márcia Barros
how to specify a test set stored in a file instead of giving the full ratings file ?
Hi,
would it be possible for you to provide a small example of how to use the explicit DatasetSplitters?
Best Regards,
Márcia Barros
After increasing the heap size to 2.5GB, the library is not able to execute the Movielens 10M dataset (-Xms2048M -Xmx2560M).
Please consider this issue.
Hi Sir
when i run the project the following error appears:
run:
Exception in thread "main" java.io.StreamCorruptedException: invalid stream header: 4D6F7665
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:806)
at java.io.ObjectInputStream.(ObjectInputStream.java:299)
at es.upm.etsisi.cf4j.data.BenchmarkDataModels.loadRemoteDataModel(BenchmarkDataModels.java:245)
at es.upm.etsisi.cf4j.data.BenchmarkDataModels.MovieLens100K(BenchmarkDataModels.java:35)
at MSE_movieLens100k.main(MSE_movieLens100k.java:31)
Java Result: 1
BUILD SUCCESSFUL (total time: 0 seconds)
Hey there,
I've been comparing the results I got with the ones cf4j provides, and I've found huge differences on some cases, for example: by using the userToUser knn model on the ML100k dataset, with a 20% test ratio and a k=10, when doing a top-10 evaluation I get a recall value of approximately 0.7, while cf4j obtains a value of approximately 0.23.
Interested in finding out why the difference was so big, I dived into the source code, and I finally found the one bit that is not similar to what I have: the cosine similarity implementation.
For reference, the cosine similarity is computed as follows:
And assuming that R_A denotes the items rated by user A and R_B the items rated by user B, then the cosine similarity is equivalent to:
And for what I can see in https://github.com/ferortega/cf4j/blob/master/src/cf4j/knn/userToUser/similarities/MetricCosine.java#L19-L33, it seems like what is being computed is:
because both denActive
as well as denTarget
are only incremented when an item in common is found.
This introduces some differences to the similarity outcome, since the denominator factors will not include ratings for items that were interacted by only one of the users, therefore affecting the computation of the k nearest neighbours and providing worse results.
I've locally changed the cosine similarity to represent the original cosine similarity formula, and the obtained results are exactly the ones I had using another framework.
Is this indeed an implementation bug? If so, I would be happy to provide a PR with the required changes.
Or is this some variant I'm not aware of?
Thanks,
Fábio Colaço
Is it rating matrix or what?
Regards,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.