Giter Site home page Giter Site logo

Finalising results about rfi-nln HOT 11 CLOSED

mesarcik avatar mesarcik commented on June 6, 2024
Finalising results

from rfi-nln.

Comments (11)

mesarcik avatar mesarcik commented on June 6, 2024

Replication of HERA sim paper:

  • Each training sample is 6.4MHz x 60 time samples (i.e. each training sample uses a bandwidth of 6.4MHz in the range of 100-200MHz
  • Passband for the simulator according to the paper is 100-200MHz (hence why we see the notches in the plot above at 100 to 200 MHZ.
  • The sources shown above are from a "psuedo-sky model" based off the GSM models. However their models have no relation to "real astronomical entities".
  • A discrete visibility equation is used to model them, this is gathered from the discrete model here
    image

Here, we see a measured visibility expressed as a discrete sum over point sources, each entering at a different delay with a different inherent frequency spectrum. The delay transform maps flux from each celestial source to a Dirac delta function, δD, centered at the corresponding group delay, convolved by a kernel representing the Fourier transforms of frequency-dependent interferometer gains, A˜(τ, sˆn), and the inherent spectrum of each source, S˜ n.

  • Essentially each telescope has a gain pattern that changes with frequency that is given by A(t,s) and the sources-spectrum are distributed by a power law with a lower bound of 0.3 Jy as given by:
    image
  • In the "foreground" the number of sources is between 1000 and 10000 according to the model, and the positions are randomly sampled from a uniform distribution
  • The baseline depenant effects such as fringes are then convolved with the input visibilities.

Finalised training set:

  • We simulate hera visibilities with the following parameters:
    • Duration 30 minutes, integration time 3.52s (this is done to keep the visibilities square.
    • Bandwidth 90MHz from from 105MHz to 195MHz
    • Hexagonal array layout with distance of 14.6m between antennas
    • We use the H1C "observation season" as speicified by the simulator
    • We use default "diffuse foregrounds" with the default parameters specified above
    • We add thermal noise using the default parameters, such that

power-law temperature with 180 K at 180 MHz and spectral index of -2.5

- We add the RFI stations models as defined in the simulator generated by the ORBCOMM satelites
- We add DTV RFI with default parameters 
- Impulse and Noise like RFI as well with default parameters 
- Then cross talk is added that is modelled by the convolution between noise and the simulated visibilities (as described in the paper) 
-   Finally we apply a bandpass model (with varying gains, and group delays per station) 

Diffuse foregrounds:

foreground

RFI

scatter

Instrumentation noise

noise

Cross coupling

xtalk

Bandpass effects

bandpass

from rfi-nln.

mesarcik avatar mesarcik commented on June 6, 2024

Experiments ran for HERA

  • AOFlagger threshold vs metric sensitivity
    • Expand results to thresholds of 100 and 200
  • OOD RFI vs metric sensitivity
  • Table showing the AUROC vs AO-Flagger threshold for aoflagger

OOD RFI

  • currently running on patch size of 8x8, i think the performance should increase for nln based models:
  • Note for AOFlagger I find the average maximum threshold for all OOD RFI runs and use that (in this case it is 2)
    rfi

Analysis:

  • On AUROC alone NLN performs best for OOD RFI detection, however AFAIK AUROC is not the best metric for class imbalance problems as we have here
  • its interesting to me that the AUPRC is so much better for Impulse RFI and IOU than NLN
    • One explanation is the impulse based RFI waveforms are similar to the stations based waveforms, so the UNET can detect the RFI with a low SNR

Metric sensitivity to AO flagger threshold:

  • I am current running the experiments to add thresholds of 0.25, 100 and 500 to see what the extreme effects are
  • It seems that in the extremes the NLN based methods perform best, however this is not the case for the AE for thresholds 20 and 50, im not 100% why this is the case, and am looking into it

thresholds

  • Below we can see the result of the threshold on the actual ground truth metrics, here it can be seen that the optimal threhold is at approximately 10.
  • Intrerestingly this threhsold is not all the same of the optimals for the OOD RFI situation
  • Note that these "scores" are misleading because the AOFlagger outputs the thresholded values, so we only really have 1 point in our "curves".
Metric 0.5 1.0 2.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 20.0 50.0
AUROC 0.696 0.965 0.975 0.978 0.977 0.977 0.979 0.977 0.977 0.978 0.977 0.964
AUPRC 0.493 0.620 0.663 0.692 0.712 0.738 0.774 0.777 0.779 0.784 0.788 0.752
IOU 0.044 0.297 0.384 0.445 0.492 0.547 0.619 0.626 0.629 0.639 0.648 0.592

from rfi-nln.

mesarcik avatar mesarcik commented on June 6, 2024

Boosting results

Analysis of metrics

  • Inspecting previous results it always seemed strange to me that we can get good AUROC yet our AUPRC and IOU scores were always far away from UNET
  • As far as I understand IOU is the most sensitive as any subtle change in the allignemnt of the output masks will decrease the performance (this is the same for both classes)
  • AUROC = TPR/FPR at some threshold
  • AUPRC = precision/recall at some threshold, precision= TP/TP+FP, recall = TP/TP+FN
    • In effect AUPRC is much more sensitive to false negatives (i.e. it really is RFI but we say its not)
  • I've somewhat solved this problem by removing the logarithmic normalisation
  • Changed BCE to MSE

!!! Figure!!!

More work:

  • In bringing together the AUPRC and AUROC scores I introduced another problem, that our models are halucinating RFI even when its not there.
    image
  • In the figure you can see that when the "station RFI" is present the AE reconstructs the patch that contains the RFI with a slightly higher magnitude than when not. However it also does this when patches near to the RFI/patches that also contain the edges of the RFI.
  • The effect of this is that the thresholded error that we calculate has halucinated RFI for the stations class (only)
  • I think it has to do with the way patches are constructed
  • another thing to investigate is to see if it has to do with the magnitude of the RFI (i.e. if stronger RFI leads to more halucinated features)
    • This may mean a way to fix this is to perform a clip of the training data such that any RFI above som threshodshd gets clipped

Clippping at 100

12

  • seems to make things worse as other interference becomes equally as strong
  • i have inccreased the clip to 200 to see if that brings down the maximum as well as not inflating other RFI

Illustration of potential fixes

35

  • theres a bug in the roll operation, but i think if we shift the patches and do the NLN algorithm twice we can maybe resolve the "shadows" that are created

Adding a roll:

-it fixes the problem, but introduces another one, AUPRC, AUROC decrease but IOU increases by like 10 percent

  • The way i've implemented it is by finding the minimum between the rolled and unrolled errors and then using that for computation.

Fixed

131

More problems

  • it seems that doing the min operation in this setting increases incorrect predictions on the autocorreclations
  • I think this is happening because by performing the minumum operation which is probably "bringing down" the higher values such that we introduce some weird artifacts
    107

from rfi-nln.

mesarcik avatar mesarcik commented on June 6, 2024

Some hera results to discuss:

  • I have regenerated the HERA data and this is the distribution of magnitude:
    scatter_2000
  • 61.5% of the RFI lies in the same magnitude range as the astronomical data, 38.5% is higher.
  • This means that theoretically a single threshold could detect all of the 38.5 of the RFI.
  • What I find is that with a single threshold we can achieve the following on the dataset: 0.8283647308084154 0.4699700327099673 0.05943569979393715
  • The reason that auroc is high is because the dataset is so imbalanced, we can easily detect non-RFI (and only 2.76% of the data contains RFI). Note I think the percentage contamination is less than the original paper because we are simulating much larger spectrograms.
  • Looking at the naive threshold, we get the following precision recall breakdown:
                   precision    recall   f1-score   support

        No RFI       0.99      0.64       0.78     142745554
        RFI          0.06      0.81       0.11      4055086
  • So our the naive threshold can almost perfectly detect no RFI, i.e. the number of false positive Non-RFI classes is very low
  • However, the recall shows that there are many false negatives (i.e. it misses a number of non-rfi classes to flag
  • For detcting RFI there are many false postives (low precision), but few false negatives (high recall)
    -**To summarise, I dont think that AUROC is a very good metric for this class embalanced problem, here it is way too overly optimistic. **
  • Below it is clear that the threshold (argmax(tpr-fpr)) fails for autocorrelations.
    323
    516

New fixed results:

  • here we can see that I have clearly improved the performance on AUPRC and IOU metrics.
  • This was done by using a discriminative loss, evaluating without absolute error and clipping the training data between 0.5 and 50.
    rfi
    thresholds

from rfi-nln.

mesarcik avatar mesarcik commented on June 6, 2024

AO Flagger threshold results analysis

thresholds

  • The plot above should be in relative performance
  • the AOFlagger produces the best results when the threshold is between 3 and 10
  • In this range it is clear that AOFlagger obtains the best AUROC performance, however this does not translate into improved AUPRC or IOU scores.
  • I think this is the case because the AUROC score is not sensitive enough in this class-imbalanced setting, so that the morphological operator of joining together RFI emissions doesnt really impact the AUROC result. However, the PRC metric is far more sensitive to false positives and shows a degredation in performance similarly with IOU. Below I illustrate this with a particular baseline from HERA, the first number in the last plot is the AUROC and the second is the AUPRC.

53

  • It is clear that the UNET is more sensitive to under-flagging than the NLN algorithm, this makes sense as when we overflag we have less data to train on for NLN, but output more false positives for UNET
  • However when we increase the threshold above 50, it is clear that the performance on the NLN algorithm degrades quickly . This is because our NLN algorithm will start producing False negatives (i.e. flagging RFI as not RFI). This is shown in the plot below when the threshold is 200
    89

from rfi-nln.

mesarcik avatar mesarcik commented on June 6, 2024

Breaking points:

  • It seems that the NLN predictions break only at certain positions, this is at the edge of the light and dark areas.

  • I think this is because the model is trained on MSE so it produces low amplitude patches for such areas (as the mean would be low). This means that when subtracting the mean from the "boundry" areas we produce a "high" output

  • It could also be because of the "shadow" idea i mentioned before, but in conjection with these edge effects.

  • I.e. why is this only happening on the edge with particular type of RFI?

  • We detect RFI perfectly expect for a few cases such as shown below.
    90

  • Things tried to improve the performance futher

    • Denoising AE
      • This seems to exacerbate the problem
    • Changing clip (before and after logarithm)
    • Increasing patchsize to 64x64
    • I cant seem to beat (0.96, 0.94, 0.88)

Potentially reason for low AUROC

  • The RFI stations model contains some very low amplitude RFI

  • I never realised but we mostly do not detect it, or we need to put the threshold lowever to accomodate it which results in more problems.
    Screenshot 2022-03-25 at 18 30 08

  • it seems that clipping at 0.5 clips it.

from rfi-nln.

mesarcik avatar mesarcik commented on June 6, 2024

Fixes to clipping issues:

thresholds
rfi

from rfi-nln.

mesarcik avatar mesarcik commented on June 6, 2024

LOFAR Results:

First experiment

  • Need to tune the alpha parameter properly
  • Need to change preprocessing, change the clipping and normalisation (the AE's are much more sensitive to data processing)
  • I had this same problem with HERA that was fixed through more accurate preprocessing
  • In the plot below, alpha is 0.1 patch size is 32x32, nneighbours = 20
  • Clearly we get improved AUROC, but low AUPRC and IOU
    • This is both because we are more heavily weighting the latent distances (which typically gives auprc and iou decreases due the the incompatible patch size to RFI size)
  • Here it is also interesting that UNET has good AUPRC scores but very low IOU (relative to AOFlagger)

thresholds

from rfi-nln.

mesarcik avatar mesarcik commented on June 6, 2024

Distributions of labelled data

  • Im trying to figure out how to preprocess the LOFAR data, the distributions are very skewed
  • In the plot below you can see the log scale plots of the labelled training set, with the RFI and non-RFI classes separated
  • From this it seems sensible to threshold at 1e7 such that we have enough "headroom" for both the RFI and astronimcal signals.

hist

  • For training data (using the magnitude based aoflagger masks:
    hist

from rfi-nln.

mesarcik avatar mesarcik commented on June 6, 2024

LOFAR Results:

  • Here I evaluate NLN and UNET on 2 different datasets from the LTA.
  • They are both calibration sets with few time samples, but this is done to show that regardless of the training data we can still obtain good performance on the hand-labelled dataset.
  • I have also decided to exclude IOU score from our evaluation, I will go back and calculate the F1 score based on the maximisation of precision and recall.
  • Note that the AOFlagger labels are those taken from the original datasets and not ones that i computed.
  • Here we use a patch size of 32x32 for both UNET and NLN
  • The NLN backbone is a descriminative AE
Training set Model AUROC AUPRC
N/A AOFlagger 0.7883 0.5716
L631961 UNET 0.7332 0.6070
L631961 NLN 0.8525 0.6000
- - - -
L629174 UNET 0.7948 0.5220
L629174 NLN 0.8893 0.6142

(note results taken from gpu-01: outputs/results_LOFAR_04-21-2022-04-29_26c2a9.csv and gpu-02: outputs/results_LOFAR_04-21-2022-06-25_f4a439.csv)

NLN modifications

  • Here I have modified the NLN algorithm to work for the LOFAR data in the following ways:
    1. I threshold the distance based metrics such that all values aboved >2*median(d) for d in dists are True and below are False
    2. I clip the NLN reconstructions between clip(5*std(nln_recon), 1.0)
    3. I multiply the clipped reconstructions with the distance based masks

35

from rfi-nln.

mesarcik avatar mesarcik commented on June 6, 2024

final resultls:

  • L629174 : outputs/results_LOFAR_05-28-2022-01-32_96c554.csv
  • L631961 : outputs/results_LOFAR_05-30-2022-01-57_660653.csv
  • All: outputs/results_LOFAR_06-14-2022-09-54_c3e64c.csv
  • HERA: ``

from rfi-nln.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.