Giter Site home page Giter Site logo

lotss-catalogue's People

Contributors

ggurkan avatar hughdickinson avatar jcroston avatar mhardcastle avatar rohitk-10 avatar wndywllms avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lotss-catalogue's Issues

Tests

We need a set of automated tests to check for catalogue inconsistencies so that we catch problems that are introduced or reintroduced by updates in the cataloguing code.

I have started these in tests/testsuite.py. Additional contributions are very welcome.

ID flags in the current value added catalogue

The current ID_flag value aren't consistent with the documentation for the optical catalogue (e.g. what are 311, 312, etc?) Please could the documentation be updated to explain what these are.

Whitespace in Source_Name v0.5 catalogue

Minor one, but the new v0.5 has unnecessary whitespace at the end of the Source_Name entries (don't think this was there in the previous version). Fixable with an rstrip() in processing code, so not urgent but would be useful to fix with the next release.

Duplicate sources?

Hi,

I think there are some duplicate sources in the v1.2 optical catalogue.
These are just some examples I found, but I suspect there might be more in there:

ILTJ112101.32+490457.3
ILTJ112101.29+490456.5

ILTJ105143.62+513451.5
ILTJ105143.76+513454.5

ILTJ110953.46+482703.8
ILTJ110953.36+482708.0

ILTJ115041.76+473207.7
ILTJ115041.38+473211.1

I found these by checking the nearest neighbour distance in a small subset of the catalogue I am using. It might be PyBDSF splitting a source into multiple sources?

Isl_ID

Isl_ID was in the radio catalogue but is missing from Isl_ID the value added catalogue. I thought we were going to keep this one as perhaps useful for complex sources in the same island.

Missing artefacts?

We have found some PyBDSF sources (52; in LOFAR_HBA_T1_DR1_catalog_v0.9.srl.fixed.fits) that are in the PyBDSF catalogue but do not appear either in the artefact list (LOFAR_HBA_T1_DR1_merge_ID_v1.1.art.fits) or in the components table (they are neither components nor "deblended_from" sources in LOFAR_HBA_T1_DR1_merge_ID_v1.1b.comp.fits).

We think that these sources may be perhaps artefacts that are missing from the artefact table. The number is very low but, if that is they are true artefacts, they could be added to the artefact list for consistency.

The list of PyBDSF Source_Names is:
diff_sources.csv.txt

Sources with zero position error

There are a few sources with E_RA=0 and E_DEC=0 (and Min=0)

Fortunately, I think they're all artefacts so should just be flagged as such.

ML sources with no ID co-ords but valid LR

some ML sources (ID_flag==1) don't have ID_ra and ID_dec values in the catalogue but do have a ML_LR value, and so there must be a valid ID but for some reason the coords are missing

missing ID's for 3 2MASX sources

3 ID_flag==2 sources in v1.1 were missing the 2MASX name in their ID_name so ended up with no ID

ILTJ105900.19+500255.0 2MASXJ10590015+5002557
ILTJ123201.77+462057.2 2MASXJ12320182+4620528
ILTJ140921.74+490217.9 2MASXJ14092132+4902197

(fixed in 60ddc4a)

Deconvolved source sizes

The current catalogue has source sizes convolved with a 6 arcsecond beam. It would be great to get the deconvolved sizes to use for conversion to proper linear sizes.

unWISE w1/w2 mag errs

Hi @mhardcastle,
Currently working on the codes to transform the LoTSS DR2 combined catalogues into the required formats so I can generate our SV/trimester 1 catalogues. Are the uncertainties for the unWISE magnitudes present in the catalogue(s) pulled in from likelihood ratio analysis? If they are, can these be propagated into future versions of the merged catalog? - This would prevent the need for another expensive join back in to make those columns available in the WEAVE archive.

Broken sources

Here's a place to submit IDs of sources where the catalogued association seems wrong.

ILTJ131451.56+542820.5

host galaxy broken up

There are quite a few of these and we should decide what to do with them -- they are probably all identified with catalogued bright galaxies and so perhaps their route is similar to the 2MASS sources dealt with in Wendy's code.

Lack of w3Mag and w4Mag errors

Seems to be a lot of missing absolute w3Mag and w4Mag errors in LOFAR_HBA_T1_DR1_merge_ID_optical_v1.1b_restframe.fits.

This seems to occur only for sources with absolute mags > 17, and strangely errors are only missing where the corresponding w3Flux = 0, but the w3Flux errors do actually exist. This is the same case for w4 mags.

Truncated mosaic names

Same issue as #27 in the final (v1.2) catalogue.

In the final catalogue, the values of 'Mosaic_ID' are sometimes truncated to 8 characters. The width of the column is 11 but we can find the mosaics with long names truncated to 8 characters in some cases. For example, it is possible to find "P10Hetde" and "P10Hetdex" to refer to the same mosaic.

Fortunately, there is no degeneration in the truncated values and the original mosaic name can always be recovered from the truncated version. For the public release I think we should at least add a warning about this problem.

Components with bad DEC values

There seem to be 93 entries in the v0.8 components catalogue with bad DEC values (DEC=-90.). Most of them seem to have been flagged as artefacts, but not all.

I have saved the relevant rows on a directory in the LOFAR server:
/data/lofar/bmingo/tests/v0.8_comp_bad_DEC.fits

EDIT: while some of the sources seem to have artefact_flag=FALSE, they all seem to belong to the Artefact_flag subset

Documentation

We should make sure we have complete documentation for all the columns in the final data products, including all the metadata that might be required for the VO.

sources with ID_names of 2MASS instead of 2MASX J

The referee noticed that there are a few sources with ID_names of 2MASS instead of 2MASX J

There are indeed 4 of these... all have ID_flag = [31,32] I think they get the 2MASS name in processing the hostbrokenup sources but my merging code is not dealing with them correctly their components are not already associated with any/the same 2MASX source

spectroscopic redshifts

When I download redshifts from SDSS DR14 I find a bunch of sources in the final catalogue (888 of them in total) that have a warning-free spectroscopic redshift in DR14 but no z_spec in our catalogue. These don't seem to be particularly different from the existing spec-zs (see image).

I understand that Joe may have some extra spec-zs too.

I guess what we want to do is insert these into Ken's catalogue -- there may be reasons to exclude these sources from the training set but we probably want them in the final output, and we want the rest-frame magnitudes to be correct for the spec-zs. Comments anyone?
redshifts

Bright star masking/flagging

In regions where there a very bright optical stars ( G_gaia ~ 16), the statistics for LR matching will be potentially skewed and many of the optical sources will be spurious.
Ideally, we should mask the optical catalogs in advance of any x-ID efforts. But as a minimum it would be useful to have flagging of radio sources (matches) which are in regions around the very brightest stars.

HELP are developing scripts to do this automatically which may be applicable here, saving some efforts on our part.

Duplicate too zoomed in sources

Beatriz has spotted that there are some duplicate source names in the current main table. These all have ID_flag==312, i.e. too zoomed in sources from LGZv1. In some cases the duplicate entries have different coordinates and fluxes, etc. There is a table of these entries on lofar.herts here: /data/lofar/bmingo/morph/hetdex/Multi_rows.fits .

blends

We need visual inspection code to deal with the sources flagged as blends by LGZ (or otherwise).

Source_Name 's

The Source_Name has the form ILTJHHMMSS.ss+DDMMSS.s (*except some earlier iteration of the catalouge that has ILTJHHMMSS.sss+DDMMSS.ss) but I think we may still have this a bit wrong.

Names are derived from the coordinates as follows:

ilt=[]
sc=SkyCoord(t['RA'],t['DEC'],frame='icrs')
strings=sc.to_string(style='hmsdms',sep='',precision=2)
for s in strings:
    ilt.append(str('ILTJ'+s).replace(' ','')[:-1])

(as in https://github.com/mhardcastle/lotss-catalogue/blob/master/utils/fix_sourceid.py)

So truncated to 1 d.p. in DEC (after rounding to 2 dp) and rounded to 2 d.p. in RA

The IAU convention to truncate is rather annoying: http://cdsweb.u-strasbg.fr/Dic/iau-spec.html#S3.2.1
but I think we shouldn't be truncating to N d.p.'s after rounding to N+1 d.p.'s.

Also we should have a space between ILT and J.

Linking PyBDSF sources to optical catalogue sources

As we understand, the "Source_Name" in the optical source catalogue (OSC) is a final source name that in some cases corresponds to the original PyBDSF "Source_Name". However, in some cases the "Source_Name" is new (one PyBDSF source may be split in several OSC sources or many PyBDSF sources may be combined in a single OSC source) and there is no way to match the sources between catalogues.
It would be nice to have a table with the correspondence between sources in the two catalogues (like a many to many relationship; https://en.wikipedia.org/wiki/Many-to-many_(data_model))

documentation

All sub-directories of the code should be documented at least at the level of what the scripts are.

-99s in phot_z catalogue

@dunkenj is it OK to replace the -99s in the phot_z catalogue with mask values? It drives me nuts when I try to do scatter plots in topcat, which will handle masked data just fine but thinks -99 needs to go on a plot.

Duplicated source in optical and component catalogues

I found one duplicated source in the optical catalogue (LOFAR_HBA_T1_DR1_merge_ID_optical_v1.0.fits):
Source Name ILTJ132633.10+484745.7. The match is the same (same AllWISE and objID) but the RA, DEC, major axis, etc, are slightly different. It is the same source selected from different mosaics.
This source also appears duplicated in the component catalogue (LOFAR_HBA_T1_DR1_merge_ID_v1.0.comp.fits).

Sizes are inconsistent

Size estimates for PyBDSF and LGZ sources are inconsistent. We should derive better sizes using the properties of the Gaussians in the LGZ sources as well as their separation.

Strange bad pixels in P196+55

There are odd individual pixels or collections of 2-3 pixels which have odd values in the P196+55 mosaic (most obvious towards the NW). I have been aware of this for a while but haven't got around to looking at FITS images and tracking down an actual location. This noise can't (I think) be a result of deconvolution -- it must appear in mosaicing. We ought to track it down. Fortunately I don't think it is likely to affect the cataloguing much.

artefacts in associations

Artefacts flagged by other means (in various visual inspections in the flowchart, and edge sources in particular) can be associated as part of a new LGZ source or can be resurrected if they were in the LGZ sample but not classified as an artefact. Probably best to filter them out as well.

Incorrect RA,Dec reported for some sources in value-added catalogue

I think there is an issue with some sources having incorrect RA, Dec reported in the value added catalogue. I am assuming the RA and DEC columns reported in the value-added catalogue should be identical to the radio only catalogue (i.e. it is not updated).

For example, I am comparing the position reported for ILTJ112416.41+513338.5 in LOFAR_HBA_T1_DR1_merge_ID_optical_v0.6.fits and LOFAR_HBA_T1_DR1_catalog_v0.9.srl.fixed.fits. Below is the snapshot of where the RA and DEC reported in the value-added catalogue is indicated by the cross-hair. This is approximately ~30" from the radio source (and the reported position in the radio-only catalogue is right on top of the source).

image

I didn't think the RA/DEC columns should be changed at all between the catalogues but you might be trying to report the centre of some FRIIs? Clearly something is amiss with this source. There are several sources like this but not as extreme (e.g. ILTJ150049.52+475105.3 and ILTJ124833.80+512800.1).

PS Just talked to Ken Duncan and he thinks Leah Morabito is also seeing something similar?

Naming of optical IDs

The ID_name column in the final catalogue is useful but inconsistent about its naming of sources depending on where they came from. Should probably be rewritten in a standard form after merging.

New masking columns

Ive nearly finished making two new columns

Number_Pointings -- Number of pointings that are mosaiced together that contain the source (takes account of the astronomic blanking too)
Number_Masked -- Number of the above pointings in which the source is masked.

Do we want these in the final catalogue do you think? It does give an insight into the deconvolution but its a bit tricky as e.g. I just use the catalogued RA and DEC and also doesnt reflect in anyway the distances from the various pointing centres.

I dont think we need to try to expand it to take into account the full sizes of sources (users can look at the residual mosaics for these). But I suppose I could change to have e.g. fraction of source weight deconvolved that is masked (so summing up all the weight images with an without masks). Then we probably would only need that column rather than two new columns.

Point Sources Missing Flux (cutoff) Near Edge of Survey Region

There a number of point sources that are cut off near the edge of the survey region. This means that their flux densities are underestimated and they look like peaked-spectrum sources. Need to think of a clever way of adding a flag or excising these sources from the catalogue.

Below are the names of a few of the sources that are cutoff at the edge of the survey, causing them to be missing flux:

ILTJ151524.99+543046.0
ILTJ150319.52+454944.5
ILTJ142450.82+502655.3
ILTJ142342.77+524831.0
ILTJ141236.02+512257.8
ILTJ124236.50+562844.6
ILTJ114646.72+562353.2

LGZ sizes too large

@mhardcastle I'm checking my 2MASX merging of components (uses Make_Shape) and the sizes seem too large by a factor 2. Shouldn't merging a single component give LGZ_Size, LGZ_Width \approx DC_Maj, DC_Min ? [should be an easy fix if you agree this is a problem...]

Plotting source LGZ_Size against component DC_Maj for all LGZ/2MASX sources (obviously quite different for many component sources), but should be the same for single component, or all components contained within another (for 2MASX sources with only one component I'm setting LGZ_Size=DC_Maj):
image

column 'Mosaic_ID' correct information?

I'm looking at LOFAR_HBA_T1_DR1_merge_ID_optical_v0.4.fits and it seems that somehow the column 'Mosaic_ID' is not properly catching the Mosaic_ID -- some are fine, with P214+55 but then there are columns where it's just 'P10Hetde' or 'P1Hetdex1' ... what's going on here? Is there a standard format that we should be following for the mosaic IDs or are they listed somewhere that's accessible?

catalogued 14 degree source...

@bmingo has flagged up a strange catalogue entry. Source ILTJ114824.92+551438.2 has an LGZ_Size of around 13.6 degrees. Its central coordinates aren't within the mosaic that is listed for it in the catalogue. Its optical ID is a long way away from the central coordinates.

It has an ID_Flag of 31 suggesting LGZv1, but it doesn't exist in earlier versions of the catalogue (I looked at version 0.6 as that's one we've processed previously - there is neither a source of that name, or anything with such a large size). @mhardcastle or @wllwen007 do you have any idea what can have happened here? Obviously it's only one source, but would be good to know what's happened in case it's flagging up a wider bug. (I have checked and there aren't any other ridiculously large sources).

Source missmatch between components and optical catalogue

There are 22 sources that are in the optical catalogue but not in the components one and 82 sources that are in the component catalogue but not in the optical one.

These sources can be found doing a exact value match in Topcat between the "Component_Name" column in the components table and the "Source_Name" column in the optical table with the settings "All matches" and "1 xor 2" .

Resolved column?

The formula 1.25 + 3.01*(peak_flux/isl_rms)**(-0.53) seems reasonable at separating resolved and unresolved sources. Do we want a column in the catalogue indicating whether a source is resolved (according to this formula). I dont mind either way really the the formula isnt perfect but perhaps its useful to give an indication.

Repeated ID_name/AllWISE/ObjID values

There seem to be some catalogue entries in version 0.6 (also present in version 0.5) in which the same ID_name, AllWISE, and/or ObjID value is assigned to different Source_Name values (with different fluxes and slightly different coordinates).

In the case of the repeated ID_name, there is at least one known issue covered (entries with ID_name="mult" on the catalogue), but I seem to have picked up a further 49 pairs of sources with identical ID_names, all of which also have identical AllWISE and ObjID values (when those are present). All the pairs seem to also have had the IDs assigned via the same method (same ID_flag).

There are 97 pairs of sources with the same AllWISE value (partially overlapping the previous group). Some of these have a different ID_name for each source, and different values of ID_flag. Where available, all pairs also have the same ObjID value.

There are 82 pairs of sources with the same ObjID value. This group seems to be a subset of the previous one, as in all cases the AllWISE values are identical for both sources in each pair (there are two pairs where allWISE=N/A, and ID_name and ID_flag have different values, so this selection by ObjID is still necessary to catch those cases).

I investigated this issue after finding that this galaxy zoo source had two separate entries in the catalogue (one matched through galaxy zoo, the other matched to a bright galaxy), which used different combinations of components:
https://www.zooniverse.org/projects/mjh22/lofar-galaxy-zoo/talk/subjects/10509209

I believe there may be two possible issues here:
The first is where one of the matches to the same optical source has been achieved through galaxy zoo, and the second through maximum likelihood or match to a bright galaxy.
The second case, more prevalent for repeated values of ID_name, has the same ID_flag for both entries in the pair with the same optical ID, likely meaning that different combinations of components are used for the same source through the same method (galaxy zoo or maximum likelihood).

I have saved the tables for the three sets of duplicate entries on the Herts LOFAR server, on
/data/lofar/bmingo/morph/hetdex/tests

Document ID_flag

flowchart/README.md should be updated with the latest meanings of ID_flag.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.