Giter Site home page Giter Site logo

data_cleansing's Introduction


Data Cleansing for Indoor Positioning Wi-Fi Fingerprinting Datasets

pub package

Abstract

Wearable and IoT devices requiring positioning and localisation services grow in number exponentially every year. This rapid growth also produces millions of data entries that need to be pre-processed prior to being used in any indoor positioning system to ensure the data quality and provide a high Quality of Service (QoS) to the end-user. In this paper, we offer a novel and straightforward data cleansing algorithm for WLAN fingerprinting radio maps. This algorithm is based on the correlation among fingerprints using the Received Signal Strength (RSS) values and the Access Points (APs)'s identifier. We use those to compute the correlation among all samples in the dataset and remove fingerprints with low level of correlation from the dataset. We evaluated the proposed method on 14 independent publicly-available datasets. As a result, an average of 14% of fingerprints were removed from the datasets. The 2D positioning error was reduced by 2.7% and 3D positioning error by 5.3% with a slight increase in the floor hit rate by 1.2% on average. Consequently, the average speed of position prediction was also increased by 14%.

@INPROCEEDINGS{9861169,
  author={Quezada-Gaibor, Darwin and Klus, Lucie and Torres-Sospedra, Joaquín and Lohan, Elena Simona and Nurmi, Jari and Granell, Carlos and Huerta, Joaquín},
  booktitle={2022 23rd IEEE International Conference on Mobile Data Management (MDM)}, 
  title={Data Cleansing for Indoor Positioning Wi-Fi Fingerprinting Datasets}, 
  year={2022},
  volume={},
  number={},
  pages={349-354},
  doi={10.1109/MDM55031.2022.00079}}

Built With

This algorithm has been developed using:

Getting Started

.
├── datasets                      # WiFi fingerprinting datasets
├── cleaned_db                    
│   ├── csv                       # Cleaned datasets .csv format
│   ├── mat                       # Cleaned datasets .mat format
├── positioning
│   ├── position.py               # KNN (Classifier and regressor)
├── preprocessing
│   ├── data_cleaning.py          # Data cleaning
│   ├── data_processing.py        # Normalization, Standardization, ...
│   ├── data_representation.py    # Positive, Powerd, etc.
├── results                       # Positioning results by dataset
├── report                        # Generate plots
│   ├── cdf.py                    # Individual CDF plot
│   ├── data_size_time.py         # Barplot data size and time
│   ├── kde_cdf.py                # KDE and CDF plots
│   ├── removed_fps.py            # Scater plot
│   ├── PLOTS  
├── config.json                   # Configuration file
├── main.py                       
├── run_cleandb.py                # Run the data cleansing algorithm
├── requirements.txt              # Python libraries - requirements
└── README.md                     # The most important file :)

Libraries

  • pandas, numpy, seaborn, matplotlib, sklearn, colorama

Datasets

The datasets can be downloaded either from authors' repository (see README file in dataset folder) or from the following repository:

  "Joaquín Torres-Sospedra, Darwin Quezada-Gaibor, Germán Mendoza-Silva,
  Jari Nurmi, Yevgeny Koucheryavy, & Joaquín Huerta. (2020). Supplementary
  Materials for 'New Cluster Selection and Fine-grained Search for k-Means
  Clustering and Wi-Fi Fingerprinting' (1.0).
  Zenodo. https://doi.org/10.5281/zenodo.3751042"

Converting datasets from .mat to .csv

1.- Copy the original datasets (.mat) into dataset folder. 2.- Modify the list of datasets in the /miscellaneous/datasets_mat_to_csv.py (line 23) with the dataset or datasets to be converted to csv.

list_datasets = [ 'DSI1', 'DSI2', 'LIB1', 'LIB2', 'MAN1', 'MAN2', 'TUT1', 'TUT2', 'TUT3', 'TUT4', 'TUT5', 'TUT6', 'TUT7','UJI1','UTS1', 'UJIB1', 'UJIB2']

3.- Run the /miscellaneous/datasets_mat_to_csv.py.

  $ python /miscellaneous/datasets_mat_to_csv.py

Usage

General parameters:

  • --config-file : Datasets and model's configuration (see config.json)

  • -- dataset : Dataset or datasets to be tested (i.e., UJI1 or UJI1,UJI2,TUT1)

  • -- algorithm : KNN or CLEAN

  • Data cleansing

    • Execute the data cleansing algorithm + KNN (To generate the cleansed dataset the "clean_db" has to be changed to ** TRUE ** in the config file.)
  $ python main.py --config-file config.json --dataset LIB1 --algorithm CLEAN

License

CC By 4.0

Acknowledgements

The authors gratefully acknowledge funding from the European Union’s Horizon 2020 Research and Innovation programme under the Marie Sk\l{}odowska Curie grant agreement No. $813278$, A-WEAR.

data_cleansing's People

Contributors

darwinquezada avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.