Giter Site home page Giter Site logo

ddaskan / trendypy Goto Github PK

View Code? Open in Web Editor NEW
36.0 3.0 4.0 2.66 MB

Python package for sequence (e.g. trend line, sentence, image) clustering

Home Page: https://trendypy.readthedocs.io/

License: MIT License

Python 100.00%
trendy trend-line clustering clustering-algorithm trend

trendypy's Introduction

TrendyPy

PyPI tests Codecov Documentation Status Downloads GitHub last commit Twitter

TrendyPy is a small Python package for sequence clustering. It is initially developed to create time series clusters by calculating trend similarity distance with Dynamic Time Warping.

Installation

You can install TrendyPy with pip.

pip install trendypy

TrendyPy depends on Pandas, Numpy and fastdtw and works in Python 3.7+.

Quickstart

Trendy has scikit-learn like api to allow easy integration to existing programs. Below is a quick example to show how it clusters increasing and decreasing trends.

>>> from trendypy.trendy import Trendy
>>> a = [1, 2, 3, 4, 5] # increasing trend
>>> b = [1, 2.1, 2.9, 4.4, 5.1] # increasing trend
>>> c = [6.2, 5, 4, 3, 2] # decreasing trend
>>> d = [7, 6, 5, 4, 3, 2, 1] # decreasing trend
>>> trendy = Trendy(n_clusters=2)
>>> trendy.fit([a, b, c, d])
>>> print(trendy.labels_)
[0, 0, 1, 1]
>>> trendy.predict([[0.9, 2, 3.1, 4]]) # another increasing trend
[0]

It can also be utilized to cluster strings by using string similarity metrics.

>>> from trendypy.trendy import Trendy
>>> from trendypy.algos import levenshtein_distance
>>> company_names = [
... 	'apple inc', 
... 	'Apple Inc.', 
... 	'Microsoft Corporation', 
... 	'Microsft Corp.']
>>> trendy = Trendy(n_clusters=2, algorithm=levenshtein_distance)
>>> trendy.fit(company_names)
>>> print(trendy.labels_)
[0, 0, 1, 1]
>>> trendy.predict(['Apple'])
[0]

Refer to extensive demo to see it in clustering stock trends, images or to see how to define your own metric or just check API Reference for details.

Post

The idea is originated from the post Trend Clustering.

trendypy's People

Contributors

ddaskan avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

trendypy's Issues

For the large memory cost.

Hey, I am currently using the function Trendy for processing 300 sequences to 10 clusters, the length of each sequence is 24. The problem I face is when processing, the memory cost is just increasing to 80G even I compute on a server, and return the memoryerror message. Is this normal? Thanks a lot.

DTW between open_TWTR and open_FB seems largest??

Hello Dogan,

It's so great that you wrote such a completed introduction for trend/time series clustering.
However, when I was trying to re-run the script in your github repo. I found out there is some weird figures on DTW distance.

As my DTW result between open_TWTR and open_FB is largest, but that should be the smallest, right? You can see the result image below. It would be great if you could help on my question. Thanks a lot.
(although I don't understand the logics behind DTW, but I assume the smaller value on DTW means the two series are the most similar)

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.