Giter Site home page Giter Site logo

cf4j's People

Contributors

dan-agilecoder avatar dionisioc avatar ferortega avatar jesusmayor avatar jesusutad avatar laracabrera avatar lazargugleta avatar uwemaurer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cf4j's Issues

Mismatch in number of items

In Movielens 1M dataset, the number of movies is ~ 3900 (i.e. 3883) as specified in the dataset.

But using the CF4J library's kernel, if we find the number of items in the dataset by using the method "getNumberOfItems()" it only prints 3706 items.

why is it so? Plz, explain @ferortega

Potential secutiry vulnerability in the shared library which cf4j depends on. Can you help upgrade to patch versions?

Hi, @ferortega , @jesusmayor , I'd like to report a vulnerability issue in es.upm.etsisi:cf4j:2.2.7.

Issue Description

es.upm.etsisi:cf4j:2.2.7 directly or transitively depends on 639 C libraries (.so) cross many platforms(such as x86-64, x86, arm64, armhf). However, I noticed that some C libraries are vulnerable, containing the following CVEs:

liblept.so from C project libjpeg-turbo(version:1.5.3) exposed 1 vulnerabilities:
CVE-2018-14498

Suggested Vulnerability Patch Versions

libjpeg-turbo has fixed the vulnerabilities in versions >=2.1.0

Java build tools cannot report vulnerable C libraries, which may induce potential security issues to many downstream Java projects.
Could you please upgrade the above shared libraries to their patch versions?

Thanks for your help~
Best regards,
Helen Parr

Adjusted cosine similarity

Dear Sir @ferortega,
We know that Adjusted cosine similarity measure (ACOS) had been introduced to measure the similarity between items [Ref. Article Below], why cf4j framework includes this measure, ACOS, to calculate the similarity between users via class: cf4j.knn.userToUser.similarities.MetricAjustedCosine?

Article:
Sarwar, Badrul, George Karypis, Joseph Konstan, and John Reidl. 2001. “Item-Based Collaborative Filtering Recommendation Algorithms.” Proceedings of the Tenth International Conference on World Wide Web - WWW ’01, 285–95

Release is a fat jar with bundled dependencies

Hi,

I noticed that the release jar file is about 500Mb in size! And it bundles all the dependencies inside the jar. This gave me trouble because of a version conflict and duplicated classes on the classpath of different versions.

It would be great if you could provide a normal release aswell without the bundled dependencies.

For now I made a fork (https://github.com/atexp/cf4j/commits/master) which allows to build a release with jitpack and depend on that.

Thank you for this library, it helped me a lot to get started quickly with a recommender!

nDCG

Hi,

Is there any nDCG method implemented, or the possibility of implementing it?

Best Regards,
Márcia Barros

DatasetSplitters

Hi,

would it be possible for you to provide a small example of how to use the explicit DatasetSplitters?

Best Regards,
Márcia Barros

Out Of memory

After increasing the heap size to 2.5GB, the library is not able to execute the Movielens 10M dataset (-Xms2048M -Xmx2560M).
Please consider this issue.

Exception in thread "main" java.io.StreamCorruptedException: invalid stream header: 4D6F7665

Hi Sir
when i run the project the following error appears:
run:
Exception in thread "main" java.io.StreamCorruptedException: invalid stream header: 4D6F7665
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:806)
at java.io.ObjectInputStream.(ObjectInputStream.java:299)
at es.upm.etsisi.cf4j.data.BenchmarkDataModels.loadRemoteDataModel(BenchmarkDataModels.java:245)
at es.upm.etsisi.cf4j.data.BenchmarkDataModels.MovieLens100K(BenchmarkDataModels.java:35)
at MSE_movieLens100k.main(MSE_movieLens100k.java:31)
Java Result: 1
BUILD SUCCESSFUL (total time: 0 seconds)

Cosine Similarity Implementation

Hey there,

I've been comparing the results I got with the ones cf4j provides, and I've found huge differences on some cases, for example: by using the userToUser knn model on the ML100k dataset, with a 20% test ratio and a k=10, when doing a top-10 evaluation I get a recall value of approximately 0.7, while cf4j obtains a value of approximately 0.23.

Interested in finding out why the difference was so big, I dived into the source code, and I finally found the one bit that is not similar to what I have: the cosine similarity implementation.

For reference, the cosine similarity is computed as follows:
image

And assuming that R_A denotes the items rated by user A and R_B the items rated by user B, then the cosine similarity is equivalent to:
image

And for what I can see in https://github.com/ferortega/cf4j/blob/master/src/cf4j/knn/userToUser/similarities/MetricCosine.java#L19-L33, it seems like what is being computed is:
image
because both denActive as well as denTarget are only incremented when an item in common is found.
This introduces some differences to the similarity outcome, since the denominator factors will not include ratings for items that were interacted by only one of the users, therefore affecting the computation of the k nearest neighbours and providing worse results.

I've locally changed the cosine similarity to represent the original cosine similarity formula, and the obtained results are exactly the ones I had using another framework.

Is this indeed an implementation bug? If so, I would be happy to provide a PR with the required changes.
Or is this some variant I'm not aware of?

Thanks,
Fábio Colaço

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.