Giter Site home page Giter Site logo

poac's Introduction

Package information: Python 3.10 License: LGPL v3

Overview

Problem-oriented AutoML in Clustering (PoAC) is a flexible and powerful framework designed to enhance the automation of clustering tasks within the AutoML landscape. PoAC leverages meta-learning and surrogate modeling to optimize clustering pipelines, offering a flexible approach that allows customization of meta-features, Clustering Validation Indices (CVIs).

Features

  • Problem Space Generation: Synthesize labeled clustering datasets through combinatorial analysis of dataset archetype parameters.
  • Clustering Simulations: Create partitionings with multiple noise levels, calculate CVIs, and similarity metrics to simulate clustering performance.
  • Feature Space Construction: Extract meta-features from the problem space datasets and combine them with the CVIs and similarity metrics to build a comprehensive meta-database.
  • Surrogate Modeling: Train a regression model as a surrogate to predict the quality of clustering pipelines, enabling task-agnostic optimization across various clustering scenarios.
  • Clustering pipeline synthesis: Seamlessly integrate the trained surrogate model with popular AutoML frameworks like TPOT to enhance clustering evaluations.

Installation

To get started with PoAC, follow these steps:

  1. Clone the repository:

    git clone https://github.com/your-repo/PoAC.git
    cd PoAC

It’s recommended to use a virtual environment to manage dependencies.

  1. Create a virtual environment:
    python3 -m venv poac-env
    source poac-env/bin/activate  # On Windows, use `poac-env\Scripts\activate`
  2. Install the required packages:
    pip install -r requirements.txt

Usage

We have divided the PoAC framework into two main stages: Training of the Surrogate Model and the Pipeline Synthesis. While the framework is designed to guide users through these stages sequentially, it is flexible enough to allow users to execute individual modules based on their specific needs. Additionally, PoAC comes with a pre-trained default surrogate model, enabling users to quickly start synthesizing and optimizing clustering pipelines without the need for training a new model.

1. Surrogate Model

import poac
import joblib

surrogate = poac.Surrogate()

# Start by defining the problem space, where you synthesize clustering datasets:
surrogate.populate_problem_space(sample_size=5, keep=False)
# Simulate clustering partitionings with varying levels of noise:
surrogate.simulate_solutions()
# Extract meta-features and combine with CVIs and similarity metrics
surrogate.extract_metafeatures()
# Train the surrogate model
surrogate_model = surrogate.build_model()

# Optionally, save the surrogate model
joblib.dump(surrogate_model, 'optimization/tpot/models/random_forest_model.joblib')

2. Pipeline Synthesis

import poac
from sklearn.datasets import load_breast_cancer

# Example of using PoAC with TPOT
data = load_breast_cancer().data
optimizer = poac.Optimizer(data)

sv6light_meta_features = ['attr_ent.sd','sparsity.sd', 'cov.mean','var.mean','eigenvalues.mean','sparsity.mean', 'wg_dist.sd', 'iq_range.mean','sil','dbs']
code, pipeline, labels = optimizer.synthesize(generations=3,population_size=5,meta_features=sv6light_meta_features)

Results

In our experiments, integrating the PoAC surrogate model into TPOT achieved a mean Adjusted Rand Index (ARI) of 70% across 100 synthetic datasets. The model's flexibility and robustness make it suitable for a wide range of clustering tasks and AutoML applications.

Contributing

We welcome contributions to PoAC! Please fork the repository, create a new branch, and submit a pull request. For major changes, please open an issue to discuss your proposed changes.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use PoAC in your research, please cite our paper:

@inproceedings{
silva2024benchmarking,
title={Benchmarking Auto{ML} Clustering Frameworks},
author={Matheus Camilo da Silva and Biagio Licari and Gabriel Marques Tavares and Sylvio Barbon Junior},
booktitle={AutoML Conference 2024 (ABCD Track)},
year={2024},
url={https://openreview.net/forum?id=RzUKJnph1g}
}

poac's People

Contributors

mcamilo avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.