Address Clustering Optimization

Project Overview

The project addresses the challenge of optimizing address clustering in densely populated areas like India, where non-standardized and incomplete address data complicate efficient delivery and logistical operations. Using advanced Machine Learning (ML) and Deep Learning (DL) techniques, our aim is to systematically organize addresses into cohesive clusters based on proximity, enhancing delivery route optimization, reducing transit times, and improving address verification processes.

Team Members

Aditya Mehta
Andi Lian
Manav Parmar
Soumya Gupta
Vanshaj Gupta
Xinyuan Wang

Dataset

The data, collected from a GitHub repository hosted by the Machine Learning Research Group at Université Laval (GRAAL/GRAIL), consists of addresses that have been cleaned and structured for analysis. This dataset includes addresses with multilingual entries, which were standardized using the GoogleTrans API for accurate clustering.

Methods and Algorithms

We employ a variety of state-of-the-art clustering algorithms including:

K-Means clustering
DBSCAN
Self-organizing maps (SOMs)
Hierarchical clustering
Neural networks, including CNNs, RNNs (LSTM), and Transformer models (BERT)

Project Structure

Introduction and Reference Collecting: Establish a foundational understanding and collect necessary literature.
Dataset Preparation: Prepare and preprocess the dataset.
Methodology Development: Develop and test various ML and DL models.
Implementation and Testing: Implement the models and evaluate their performance.
Results Analysis and Optimization: Analyze the outcomes and optimize the models.
Conclusion and Future Work: Summarize findings and propose future research directions.

Evaluation Plan

Model performance is evaluated using:

Accuracy
CH Index
Silhouette Coefficient

Installation and Usage

Clone the repository
- git clone https://github.com/Vanshaj5101/Address-Clustering-Optimization/tree/main
- cd Address-Clustering-Optimization
Install dependencies
- pip install -r requirements.txt
Run the application
- python app.py

Project Milestones

Feb 2024: Problem definition and data collection.
Mar 2024: Data preprocessing and model selection.
Apr 2024: Model implementation and performance evaluation.
May 2024: Finalization of project and future work proposals.

References

A list of academic and practical references used throughout the project is included in the references section of the repository.

manavparmar1609 / address-clustering-optimization Goto Github PK