Giter Site home page Giter Site logo

cdkhashfingerprint's Introduction

This is an attempt to improve the CDK HashFingerprint (Fingerprinter class).
The idea behind the improved version is borrowed from my blog improvised hashing function and their impact on the fingerprints. 

http://chembioinfo.com/2011/10/30/revisiting-molecular-hashed-fingerprints/

Command line interface

/*  Test improved CDK FP */

java -jar BenchmarkHashedFingerprinter.jar test/data/mol hash  2  2000
 
/* Test CDK default FP */
 
java -jar BenchmarkHashedFingerprinter.jar test/data/mol cdk  2  2000
   
***************************************
Improved CDK HashedFingerprinter class with 1024 size FP
***************************************
CASES:          TP:     FP:	TN:	FN:   ACCURACY:	TPR:	FPR:   Time (mins): 
200*200         629	189	39182	0	0.995	1.000	0.005	0.11
400*400         2428	972	156600	0	0.994	1.000	0.006	0.37
600*600         4940	2449	352611	0	0.993	1.000	0.007	0.75
800*800         8562	5083	626355	0	0.992	1.000	0.008	1.27
1000*1000	12802	9011	978187	0	0.991	1.000	0.009	2.04
1200*1200	17178	12727	1410095	0	0.991	1.000	0.009	2.94

***************************************
Improved New HashedFingerprinter class with 2048 size FP
***************************************

------------------------------------------------------------------------------
CASES:		TP:	FP:	TN:	FN:	ACCURACY:	TPR:	FPR:	Time (mins): 
------------------------------------------------------------------------------
200*200		629	189	39182	0	0.995		1.000	0.005	0.1
400*400		2381	974	156645	0	0.994		1.000	0.006	0.35
600*600		4882	2452	352666	0	0.993		1.000	0.007	0.71
800*800		8484	5085	626431	0	0.992		1.000	0.008	1.19
1000*1000	12710	9014	978276	0	0.991		1.000	0.009	1.93
1200*1200	17070	12730	1410200	0	0.991		1.000	0.009	2.77

***************************************
CDK Default Fingerprinter class with 1024 size FP
***************************************
CASES:		TP:	FP:	TN:	FN:   ACCURACY:	TPR:	FPR:   Time (mins): 
200*200		629	298	39073	0	0.993	1.000	0.008	0.11
400*400		2428	1691	155881	0	0.989	1.000	0.011	0.37
600*600		4940	3765	351295	0	0.990	1.000	0.011	0.74
800*800		8562	7522	623916	0	0.988	1.000	0.012	1.26
1000*1000	12802	13922	973276	0	0.986	1.000	0.014	2.05
1200*1200	17178	19262	1403560	0	0.987	1.000	0.014	2.92



Results:

The improved hashed fingerprinter has better "Accuracy" 
and ~30-40% lesser false positives (FPs) than the original version!

/* Test new FP with ring matcher */

java -jar BenchmarkHashedFingerprinter.jar test/data/mol hash  1  2000

------------------------------------------------------------------------------
CASES:		TP:	FP:	TN:	FN:	ACCURACY:   TPR:    FPR:	Time (mins): 
------------------------------------------------------------------------------
200*200		629	144	39227	0	0.996       1.000   0.004	0.1
400*400		2381	842	156777	0	0.995       1.000   0.005	0.34
600*600		4882	2161	352957	0	0.994       1.000   0.006	0.71
800*800		8484	4477	627039	0	0.993       1.000   0.007	1.2
1000*1000	12710	7977	979313	0	0.992       1.000   0.008	1.97
1200*1200	17070	11429	1411501	0	0.992       1.000   0.008	2.82

The improved hashed fingerprinter with ring matcher has better "Accuracy" 
and ~40% lesser false positives (FPs) than the original version!

/* Test new FP with bloom filter and ring matcher */

java -jar BenchmarkHashedFingerprinter.jar test/data/mol hashbloom  1  2000

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.