To cite this work please refer to
or use the following : R. Abani, "Glitch Classification for Gravitational Wave Interferometry Using Machine Learning," 2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT), Bangalore, India, 2022, pp. 1-3, doi: 10.1109/GCAT55367.2022.9971811.
This project is a part of the coursework pertaining to ECS 308 Data Science and Machine Learning taught by Dr Tanmay Basu, mentored by Mr. Vishisht Sharma at the 6th semester in IISER Bhopal.
Gravitational waves are disturbances in the curvature of spacetime, generated by accelerated masses, that propagate as waves outward from their source at the speed of light. The detection of gravitational waves demands a thorough understanding of instrumental responses in the ecosystem of environmental noise. Hence of pertinent interest is the study of anomalous non Gaussian noise transients called ‘Glitches’. The classification of glitches is essential owing to their high occurrence rates in LIGO data that often hazard and mimic true gravitational wave signals. The data used in this project has been extracted from LIGO’s Gravity Sky portal and contains metadata about these ‘Glitches’. The train data contains information about the characteristics of a glitch like bandwidth, signal to noise ratio etc. (there are a total of 7 such features). The test data contains the glitch labels or the 22 types of glitches along with unique identification labels. In this project, various machine learning models from the sklearn or the scikit learn library in python namely K-nearest neighbours, Support Vector Machines, Random Forest and Decision Trees were used to train the data and develop an accurate model to classify glitches. That model was then run on the test data. In another sub-instance of this project, One Hot Encoding was used deal with the categorical variable ifo or detector location and hence to target the research question ’Does the location of the interferometer have any impact on the classification of glitches’ The scatter plot below shows the correlation between various variables in the train data set.
This project explored the use of sklearn ML models like KNNs, SVMs, Decision Trees and Random
Forest via a pipeline based on the ’Divide and Conquer Algorithm’ juxtaposed to the generally used
pipeline which is defined to run everything from the pre-processing to hyperparameter tuning to the
evaluation and prediction at one go. The pseudo-code and other technical jargon can be found in my
This divide and conquer approach of splitting the pipeline into a training routine and a parameter
tuning routine reduced the time complexity which was evident owing to reduced time execution that
was sufficed by a local intel i7 NVIDIA processor (without a GPU). For each of the ML models, i.e
KNN, SVM, Decision Tree and Random Forest, parameter tuning was done followed by using the
training routine with those optimal parameters. We considered 2 scenarios, the first one which didn’t
consider location of the detector, the second scenario where one hot encoding was used to convert the
location (categorical variable into to numerical).
The research problem in Scenario 2, where we were trying to find out a correlation between the location of the interferometer and glitch classification hasn’t been made conducively clear through the results obtained. The Listed Color Maps show the variations in visulaizing the data dsitribution after running the model with the highest f-score. Over-fitting of data, and most importantly imbalance might be a contributing factor. As can be seen from the Glitch distribution bar plot, the number of non glitch events is extremely low and the percentage of Blip glitches is very high.
Listed Color map pertaining to the Decision Tree based classifer model in scenario 1
Listed Color Map for classification performed on data from Hanford after one Hot encoding
Listed Color Map for Classification performed on data from Livingston after one hot encoding
Distribution of Glitches