The project addresses the challenge of optimizing address clustering in densely populated areas like India, where non-standardized and incomplete address data complicate efficient delivery and logistical operations. Using advanced Machine Learning (ML) and Deep Learning (DL) techniques, our aim is to systematically organize addresses into cohesive clusters based on proximity, enhancing delivery route optimization, reducing transit times, and improving address verification processes.
- Aditya Mehta
- Andi Lian
- Manav Parmar
- Soumya Gupta
- Vanshaj Gupta
- Xinyuan Wang
The data, collected from a GitHub repository hosted by the Machine Learning Research Group at Université Laval (GRAAL/GRAIL), consists of addresses that have been cleaned and structured for analysis. This dataset includes addresses with multilingual entries, which were standardized using the GoogleTrans API for accurate clustering.
We employ a variety of state-of-the-art clustering algorithms including:
- K-Means clustering
- DBSCAN
- Self-organizing maps (SOMs)
- Hierarchical clustering
- Neural networks, including CNNs, RNNs (LSTM), and Transformer models (BERT)
- Introduction and Reference Collecting: Establish a foundational understanding and collect necessary literature.
- Dataset Preparation: Prepare and preprocess the dataset.
- Methodology Development: Develop and test various ML and DL models.
- Implementation and Testing: Implement the models and evaluate their performance.
- Results Analysis and Optimization: Analyze the outcomes and optimize the models.
- Conclusion and Future Work: Summarize findings and propose future research directions.
Model performance is evaluated using:
- Accuracy
- CH Index
- Silhouette Coefficient
-
Clone the repository
- git clone https://github.com/Vanshaj5101/Address-Clustering-Optimization/tree/main
- cd Address-Clustering-Optimization
-
Install dependencies
- pip install -r requirements.txt
-
Run the application
- python app.py
- Feb 2024: Problem definition and data collection.
- Mar 2024: Data preprocessing and model selection.
- Apr 2024: Model implementation and performance evaluation.
- May 2024: Finalization of project and future work proposals.
A list of academic and practical references used throughout the project is included in the references section of the repository.