ferortega / cf4j Goto Github PK

View Code? Open in Web Editor NEW

53.0 53.0 23.0 11.87 MB

CF4J: Collaborative Filtering for Java

License: Apache License 2.0

Java 100.00%

cf4j's People

Contributors

Stargazers

Watchers

cf4j's Issues

% value in the console

Dear @ferortega Sir,

During execution of a UBCF on 20M dataset, the
sample_console.docx
console shows some % values. Can you please explain what are these values??

Example: attached file

Mismatch in number of items

In Movielens 1M dataset, the number of movies is ~ 3900 (i.e. 3883) as specified in the dataset.

But using the CF4J library's kernel, if we find the number of items in the dataset by using the method "getNumberOfItems()" it only prints 3706 items.

why is it so? Plz, explain @ferortega

Potential secutiry vulnerability in the shared library which cf4j depends on. Can you help upgrade to patch versions?

Hi, @ferortega , @jesusmayor , I'd like to report a vulnerability issue in es.upm.etsisi:cf4j:2.2.7.

Issue Description

es.upm.etsisi:cf4j:2.2.7 directly or transitively depends on 639 C libraries (.so) cross many platforms(such as x86-64, x86, arm64, armhf). However, I noticed that some C libraries are vulnerable, containing the following CVEs:

liblept.so from C project libjpeg-turbo(version:1.5.3) exposed 1 vulnerabilities:
CVE-2018-14498

Suggested Vulnerability Patch Versions

libjpeg-turbo has fixed the vulnerabilities in versions >=2.1.0

Java build tools cannot report vulnerable C libraries, which may induce potential security issues to many downstream Java projects.
Could you please upgrade the above shared libraries to their patch versions?

Thanks for your help~
Best regards,
Helen Parr

Adjusted cosine similarity

Dear Sir @ferortega,
We know that Adjusted cosine similarity measure (ACOS) had been introduced to measure the similarity between items [Ref. Article Below], why cf4j framework includes this measure, ACOS, to calculate the similarity between users via class: cf4j.knn.userToUser.similarities.MetricAjustedCosine?

Article:
Sarwar, Badrul, George Karypis, Joseph Konstan, and John Reidl. 2001. “Item-Based Collaborative Filtering Recommendation Algorithms.” Proceedings of the Tenth International Conference on World Wide Web - WWW ’01, 285–95

Release is a fat jar with bundled dependencies

Hi,

I noticed that the release jar file is about 500Mb in size! And it bundles all the dependencies inside the jar. This gave me trouble because of a version conflict and duplicated classes on the classpath of different versions.

It would be great if you could provide a normal release aswell without the bundled dependencies.

For now I made a fork (https://github.com/atexp/cf4j/commits/master) which allows to build a release with jitpack and depend on that.

Thank you for this library, it helped me a lot to get started quickly with a recommender!

nDCG

Hi,

Is there any nDCG method implemented, or the possibility of implementing it?

Best Regards,
Márcia Barros

is there a way to specify the testset from a file ?

how to specify a test set stored in a file instead of giving the full ratings file ?

DatasetSplitters

Hi,

would it be possible for you to provide a small example of how to use the explicit DatasetSplitters?

Best Regards,
Márcia Barros

Out Of memory

After increasing the heap size to 2.5GB, the library is not able to execute the Movielens 10M dataset (-Xms2048M -Xmx2560M).
Please consider this issue.

Exception in thread "main" java.io.StreamCorruptedException: invalid stream header: 4D6F7665

Hi Sir
when i run the project the following error appears:
run:
Exception in thread "main" java.io.StreamCorruptedException: invalid stream header: 4D6F7665
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:806)
at java.io.ObjectInputStream.(ObjectInputStream.java:299)
at es.upm.etsisi.cf4j.data.BenchmarkDataModels.loadRemoteDataModel(BenchmarkDataModels.java:245)
at es.upm.etsisi.cf4j.data.BenchmarkDataModels.MovieLens100K(BenchmarkDataModels.java:35)
at MSE_movieLens100k.main(MSE_movieLens100k.java:31)
Java Result: 1
BUILD SUCCESSFUL (total time: 0 seconds)

Cosine Similarity Implementation

Hey there,

I've been comparing the results I got with the ones cf4j provides, and I've found huge differences on some cases, for example: by using the userToUser knn model on the ML100k dataset, with a 20% test ratio and a k=10, when doing a top-10 evaluation I get a recall value of approximately 0.7, while cf4j obtains a value of approximately 0.23.

Interested in finding out why the difference was so big, I dived into the source code, and I finally found the one bit that is not similar to what I have: the cosine similarity implementation.

For reference, the cosine similarity is computed as follows:

And assuming that R_A denotes the items rated by user A and R_B the items rated by user B, then the cosine similarity is equivalent to:

And for what I can see in https://github.com/ferortega/cf4j/blob/master/src/cf4j/knn/userToUser/similarities/MetricCosine.java#L19-L33, it seems like what is being computed is:

because both denActive as well as denTarget are only incremented when an item in common is found.
This introduces some differences to the similarity outcome, since the denominator factors will not include ratings for items that were interacted by only one of the users, therefore affecting the computation of the k nearest neighbours and providing worse results.

I've locally changed the cosine similarity to represent the original cosine similarity formula, and the obtained results are exactly the ones I had using another framework.

Is this indeed an implementation bug? If so, I would be happy to provide a PR with the required changes.
Or is this some variant I'm not aware of?

Thanks,
Fábio Colaço

What is the format of the file text?

Is it rating matrix or what?

Regards,

ferortega / cf4j Goto Github PK

cf4j's People

Contributors

Stargazers

Watchers

Forkers

cf4j's Issues

Issue Description

Suggested Vulnerability Patch Versions

Recommend Projects

Recommend Topics

Recommend Org