Giter Site home page Giter Site logo

kuhaha / seq2pat Goto Github PK

View Code? Open in Web Editor NEW

This project forked from fidelity/seq2pat

0.0 1.0 0.0 8.87 MB

[AAAI 2022] Seq2Pat: Sequence-to-Pattern Generation Library

Home Page: https://fidelity.github.io/seq2pat/

License: GNU General Public License v2.0

C++ 23.02% Python 54.96% Makefile 0.23% Batchfile 0.27% Jupyter Notebook 19.01% Cython 2.51%

seq2pat's Introduction

ci PyPI version fury.io PyPI license PRs Welcome Downloads

Seq2Pat: Sequence-to-Pattern Generation Library

Seq2Pat (AAAI'22) is a research library for sequence-to-pattern generation to discover sequential patterns that occur frequently in large sequence databases. The library supports constraint-based reasoning to specify desired properties over patterns.

Dichomotic Pattern Mining (AAAI'22, Frontiers'22) embeds Seq2Pat to exploit the dichotomy of positive vs. negative outcomes in populations. This allows constraint-based sequence analysis to generate patterns that uniquely distinguishes cohorts.

From an algorithmic perspective, the library takes advantage of multi-valued decision diagrams (AAAI'19).

From an implementation perspective, the library is written in Cython that brings together the efficiency of a low-level C++ backend and the expressiveness of a high-level Python public interface.

Seq2Pat is developed as a joint collaboration between Fidelity Investments and the Tepper School of Business at CMU. Documentation is available at fidelity.github.io/seq2pat.

Quick Start

Constraint-based Sequential Pattern Mining

# Example to show how to find frequent sequential patterns
# from a given sequence database subject to constraints
from sequential.seq2pat import Seq2Pat, Attribute

# Seq2Pat over 3 sequences
seq2pat = Seq2Pat(sequences=[["A", "A", "B", "A", "D"],
                             ["C", "B", "A"],
                             ["C", "A", "C", "D"]])

# Price attribute corresponding to each item
price = Attribute(values=[[5, 5, 3, 8, 2],
                          [1, 3, 3],
                          [4, 5, 2, 1]])

# Average price constraint
seq2pat.add_constraint(3 <= price.average() <= 4)

# Patterns that occur at least twice (A-D)
patterns = seq2pat.get_patterns(min_frequency=2)

Dichotomic Pattern Mining

# Example to show how to run Dichotomic Pattern Mining 
# on sequences with positive and negative outcomes
from sequential.seq2pat import Seq2Pat
from sequential.pat2feat import Pat2Feat
from sequential.dpm import dichotomic_pattern_mining, DichotomicAggregation

# Create seq2pat model for positive sequences
sequences_pos = [["A", "A", "B", "A", "D"]]
seq2pat_pos = Seq2Pat(sequences=sequences_pos)

# Create seq2pat model for negative sequences
sequences_neg = [["C", "B", "A"], ["C", "A", "C", "D"]]
seq2pat_neg = Seq2Pat(sequences=sequences_neg)

# Run DPM to get mined patterns
aggregation_to_patterns = dichotomic_pattern_mining(seq2pat_pos, seq2pat_neg, 
                                                    min_frequency_pos=1, 
                                                    min_frequency_neg=2)

# DPM patterns with Union aggregation
dpm_patterns = aggregation_to_patterns[DichotomicAggregation.union]

# Encodings of all sequences
sequences = sequences_pos + sequences_neg
pat2feat = Pat2Feat()
encodings = pat2feat.get_features(sequences, dpm_patterns, drop_pattern_frequency=False)

Available Constraints

The library offers various constraint types, including a number of non-monotone constraints.

  • Average: This constraint specifies the average value of an attribute across all events in a pattern.
  • Gap: This constraint specifies the difference between the attribute values of every two consecutive events in a pattern.
  • Median: This constraint specifies the median value of an attribute across all events in a pattern.
  • Span: This constraint specifies the difference between the maximum and the minimum value of an attribute across all events in a pattern.

Usage Examples

Examples on how to use the available constraints can be found in the Usage Example Notebook.

Supported by Seq2Pat, we proposed Dichotomic Pattern Mining (X. Wang and S. Kadioglu, 2022) to analyze the correlations between mined patterns and different outcomes of sequences. DPM plays an integrator role between Sequential Pattern Mining and the downstream modeling tasks, by generating embeddings of sequences based on the mined frequent patterns. An example on how to run DPM and generate pattern embeddings can be found in Dichotomic Pattern Mining Notebook.

Installation

Seq2Pat can be installed from PyPI using pip install seq2pat. It can also be installed from source by following the instructions in our documentation.

Requirements

The library requires Python 3.6+, the Cython package, and a C++ compiler. See requirements.txt for dependencies.

Support

Please submit bug reports, questions and feature requests as Issues.

Citation

If you use Seq2Pat in a publication, please cite it as:

  @article{seq2pat2022,
    title={Seq2Pat: Sequence-to-Pattern Generation for Constraint-based Sequential Pattern Mining},
    author={Wang Xin, Hosseininasab Amin, Colunga Pablo, Kadioglu Serdar, van Hoeve Willem-Jan},
    url={https://github.com/fidelity/textwiser},
    journal={Proceedings of the AAAI Conference on Artificial Intelligence},
    volume={TBD},
    number={TBD},
    pages={TBD},
    year={2022}
  }

To cite the Dichotomic Pattern Mining framework, please cite it as:

  @article{Frontiers2022,
    title={Dichotomic Pattern Mining Integrated with Constraint Reasoning for Digital Behaviour Analyses}, 
    author={Sohom Ghosh, Shefali Yadav, Xin Wang, Bibhash Chakrabarty, Serdar Kadioglu},
    journal={Frontiers Journal on Knowledge Discovery from Unstructured Data in Finance},
    volume={TBD},
    number={TBD},
    pages={TBD},
    year={2022}    
}
@inproceedings{DPM2022,
    title={Dichotomic Pattern Mining with Applications to Intent Prediction from Semi-Structured Clickstream Datasets}, 
    author={Xin Wang and Serdar Kadioglu},
    booktitle={The AAAI-22 Workshop on Knowledge Discovery from Unstructured Data in Financial Services},
    year={2022},
    eprint={2201.09178},
    archivePrefix={arXiv}
}

License

Seq2Pat is licensed under the GNU GPL License 2.0.


seq2pat's People

Contributors

takojunior avatar skadio avatar wddcheng avatar dorukkilitcioglu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.