Giter Site home page Giter Site logo

cosenet's Introduction

CoSeNet: An Excellent Approach for Optimal Segmentation of Correlation Matrices.

CoSeNet - 1.3.0.

The proposed approach is known as CoSeNet (Correlation Segmentation Network), and is based on a four-layer architecture that includes several processing layers: an input layer, formatting, re-scaling and a final segmentation layer. The proposed model is able to effectively identify correlated segments in such matrices, better than previous approaches for similar problems. Internally, the proposed model utilizes an overlapping technique and makes use of pre-trained Machine Learning (ML) algorithms, which makes it robust and generalizable. CoSeNet model also includes a method that optimizes the parameters of the re-scaling layer using a heuristic algorithm and a fitness metric based on the Window Difference metric. The results of our model are binary matrices with the noise removed and can be used in a variety of applications and the compromise solutions between efficiency, memory and speed of the proposed deployment model are chosen.

About

Author: A.Palomo-Alonso ([email protected])
Universidad de Alcalá.
Escuela Politécnica Superior.
Departamento de Teoría De la Señal y Comunicaciones (TDSC).
ISDEFE Chair of Research.

What's new?

< 1.0.0

  1. All included:
    • CoSeNet: Main class.
    • Fitness method: Genetic and PSO algorithms.
    • Solve: Pipeline solver.
    • Pre-trained models: Ridge and MLP for 16 throughput.

1.3.0

  1. Major bug fixing.
  2. Paper experiments included.

Install

To install the API you need to install the following software:

  • CUDA: To use the GPU training (in case you your GPU for trining a model).
  • Ray: For distributed computing, used by the fit method.

You will need to install all the Python packages shown in the requirements.txt file. Once done, you can install the package via pip.

  pip install cosenet

Architecture

We propose a novel approach for optimal segmentation of correlation matrices, based on a complete sequential architecture which involves different processing layers, each implementing several algorithms. Specifically, the proposed approach consists of four layers, which can be grouped into 4 categories. The first layer is an input/output layer, responsible for inputting and outputting the correlation matrices to be processed or exported by the architecture. The second layer consists of different procedures essential to optimally prepare the input data. The third layer, Metaheuristic, normalizes the input and output data using classical algorithms. Finally, the fourth layer is formed by different Machine Learning (ML) algorithms able to accurately identify the boundaries in the provided correlation matrix. Thus, the proposed architecture is able to process square correlation matrices of any scale and size, using a ML model capable of identifying segments with high performance, even for highly noisy data. The proposed approach is able to adapt any matrix, regardless of its size, to the ML model with excellent performance. The proposed architecture also runs faster on general-purpose processors, making it a more practical solution for real-world applications. The performance of the algorithm has been evaluated with a highly nonlinear and noisy database. The problem proposed in the comparative is a problem of text segmentation by topics. We obtain random articles from Wikipedia, concatenate them and divide them by sentences. With a language model (BERT) we generate a sentence similarity coefficient, used as correlation value and correlation matrices are generated with these values sentence by sentence. The effectiveness in identifying correlated group segmentation and its superiority to some state-of-the-art algorithms such as unsupervised, Community Detection and Deep Clustering have been tested, reaching improvements of 6% - 22% in terms of performance. The pipeline aims to propose a unified solution to the problem, with the possibility of performing fine-tuning with a few samples from the database.

Fast Usage:

You can use the package easily by importing the main class:

from cosenet import CoSeNet

DISCLAIMER: CoSeNet makes use of BaseNetAPI to work with efficient and easy-to-use databases and models. It also contains optimization packages using Ray for distributed computing. It may take quite long to install, but the final model will not have these dependencies.

MAKE SURE YOUR PYTHON ENVIRONMENT HAS RAY INSTALLED: PIP HAS SOME PROBLEMS INSTALLING IT, YOU MAY NEED TO INSTALL IT MANUALLY AND HAS NO DIST FOR PYTHON 11. YOU NEED PYTHON 9 OR 10.

MAKE SURE EVERYTHING WORKS FINE RUNNING ./test/test.py

Then, you can create an instance of the class and use the methods:

number_of_matrix_to_solve = 600
matrix_size = 100
model = CoSeNet()
x = np.random.rand(number_of_matrix_to_solve, matrix_size, matrix_size)
solved_matrix, predicted_segmentation = model.solve(x)

To fit a highly non-linear database you can use the fit method:

# x has a shape of (number_of_matrix_to_solve, matrix_size, matrix_size)
x = load_my_numpy_data()
# y has a shape of (number_of_matrix_to_solve, matrix_size) and it is a binary matrix.
y = load_my_numpy_segmentation_boundaries()
model.fit(50, x_train, y_train, 'genetic')

Note that the fit method is a distributed method, so you need to have Ray installed. You also need to notica that the y matrix is a binary matrix, where the 1s are the boundaries of the segments and 0 otherwise.

Documentation:

You can find the documentation of the package in the repository in the doc folder.

https://github.com/iTzAlver/[...]

You can also find the research article in the following link:

[j.dsp.2023.104270](https://doi.org/10.1016/j.dsp.2023.104270)

Cite as

@article{PALOMOALONSO2024104270,
title = {CoSeNet: A novel approach for optimal segmentation of correlation matrices},
journal = {Digital Signal Processing},
volume = {144},
pages = {104270},
year = {2024},
issn = {1051-2004},
doi = {https://doi.org/10.1016/j.dsp.2023.104270},
author = {A. Palomo-Alonso and D. Casillas-Pérez and S. Jiménez-Fernández and A. Portilla-Figueras and S. Salcedo-Sanz}
}

cosenet's People

Contributors

itzalver avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.