pasmod / paradox Goto Github PK
View Code? Open in Web Editor NEWAn Automatic Paraphrase Detection System
License: MIT License
An Automatic Paraphrase Detection System
License: MIT License
I added a verbose mode for the similarity transformer using tqdm. let me know if you want a PR for this. it looks like this when running benchmark.py
the benchmark script takes a rather long time on my computer so I wanted to figure out what was going on and save the model for reuse as well
diff --git a/paradox/benchmark.py b/paradox/benchmark.py
index b0beba2..5922ac7 100644
--- a/paradox/benchmark.py
+++ b/paradox/benchmark.py
@@ -1,3 +1,4 @@
+import logging
from metrics import pearson, mse
from pipeline import pipeline
import k_neighbors_regressor
@@ -5,6 +6,7 @@ import numpy as np
import similarity
import parser
+logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
def report(correlations, errors, y_pred_fold):
print("PC:\t\t\t%0.2f\t(+/- %0.2f)" % (np.mean(correlations),
@@ -28,11 +30,14 @@ def test(model=None, categories=[]):
pairs = parser.parse(mode="train")
X = [pair[0] for pair in pairs]
y = [pair[1] for pair in pairs]
-transformer = similarity.build()
+transformer = similarity.build(verbose=True)
estimator = k_neighbors_regressor.build(n_neighbors=4)
p = pipeline(transformers=[transformer], estimator=estimator)
p.fit(X, y)
+import pickle
+with open('model.pickle', 'wb') as f:
+ pickle.dump(p, f)
test(p, categories=["answer-answer"])
test(p, categories=["question-question"])
diff --git a/paradox/similarity.py b/paradox/similarity.py
index d5b20b5..6496787 100644
--- a/paradox/similarity.py
+++ b/paradox/similarity.py
@@ -41,8 +41,8 @@ def similarity(text1, text2, levels=['surface', 'context']):
return sims
-def build(levels=['surface', 'context']):
- pipeline = Pipeline([('transformer', Similarity(levels=levels))])
+def build(levels=['surface', 'context'], verbose=False):
+ pipeline = Pipeline([('transformer', Similarity(levels=levels, verbose=verbose))])
return ('similarity', pipeline)
@@ -52,15 +52,24 @@ def param_grid():
class Similarity(BaseEstimator):
- def __init__(self, levels=['surface']):
+ def __init__(self, levels=['surface'], verbose=False):
self.levels = levels
+ self.verbose = verbose
def fit(self, X, y):
return self
def transform(self, X):
a = []
- for x in X:
+
+ tqdm = lambda x: x
+ if self.verbose:
+ try:
+ from tqdm import tqdm
+ except ImportError:
+ pass
+
+ for x in tqdm(X):
a.append(self._transform(x))
return a
estimate_svm_baseline(... )
and estimate_svm_baseline(..., True)
should be wrapped
Seperate functions for:
Each model should specify the following parameters
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.