Giter Site home page Giter Site logo

zoj613 / pyloras Goto Github PK

View Code? Open in Web Editor NEW
12.0 1.0 3.0 5.11 MB

Experimental implementations of several (over/under)-sampling techniques not yet available in the imbalanced-learn library.

License: BSD 3-Clause "New" or "Revised" License

Python 99.18% Makefile 0.82%
loras prowras imbalanced-learn

pyloras's People

Contributors

vortex-17 avatar zoj613 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

pyloras's Issues

ENH: Allow passing in a pre-trained t-SNE embedding

Given that tSNE is the bottleneck. To speed things up we can allow a pre-trained TSNE embedding object (one that implements the transform method) to be passed into the constructor or the transform method. Either

loras = LORAS(..., pretrained_tsne=None)

or

loras.transform(X, y, pretrained_tsne=None)

MAINT/DOC: Add type information to arguments

Add some typing information to make arguments a little easier to read (maybe). I tend to prefer stub files (e.g. see here) over in-place typing since there is less distracting lines of code. However it means the file count doubles because every python source is accompanied by a .pyi file.

ENH: Use a faster t-SNE implementation

Currently, scikit-learn's implementation is used. One of the following would likely perform better in terms of runtime:

The current lines that would need to be changed are:

pyloras/pyloras/_loras.py

Lines 112 to 115 in b896367

self.tsne_ = TSNE(n_components=2, n_jobs=self.n_jobs, random_state=rng)
if self.embedding_params is not None:
self.tsne_.set_params(**self.embedding_params)

X_embedded = self.tsne_.fit_transform(X_res)

ENH: Improve runtime of the re-sampling

Currently, the runtime is less than ideal. I think its possible to improve the speed by cythonizing the loop in

pyloras/pyloras/_loras.py

Lines 163 to 177 in b896367

for class_sample, n_samples in self.sampling_strategy_.items():
data_indices = np.flatnonzero(y == class_sample)
# number of synthetic samples per neighborhood group
n_gen = n_samples // data_indices.shape[0]
neighborhood_groups = self.nn_.kneighbors(
X_embedded[data_indices],
return_distance=False
)
with parallel_backend('loky', n_jobs=self.n_jobs):
samples = Parallel()(
delayed(func)(X[i], class_sample, n_gen)
for i in neighborhood_groups
)
X_res.extend(samples)
y_res.extend([class_sample] * n_gen * neighborhood_groups.shape[0])

and maybe also rewriting

def _make_samples(
in Cython.

I will need to first profile and determine where the most time is spent.

MAINT: Make copy of `manifold_learner` before using it, if passed to the constructor.

Currently no copy is made, and the passed learner is modified in-place.

pyloras/pyloras/_loras.py

Lines 129 to 131 in d39c5b5

if self.manifold_learner:
self._check_2d_manifold_learner()
self.manifold_learner_ = self.manifold_learner

pyloras/pyloras/_loras.py

Lines 134 to 139 in d39c5b5

if self.manifold_learner_params is not None:
self.manifold_learner_.set_params(**self.manifold_learner_params)
try:
self.manifold_learner_.set_params(random_state=rng)
except ValueError:
pass

A copy should be made using sklearn's clone function so as to preserve the passed object.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.