mhardcastle / lotss-catalogue Goto Github PK
View Code? Open in Web Editor NEWCode for LOFAR catalogue creation
License: GNU General Public License v3.0
Code for LOFAR catalogue creation
License: GNU General Public License v3.0
LGZ_Assoc (or Assoc in process_lgz) is set to 0 for LGZ sources with only one component. Should be 1 for consistency.
We need a set of automated tests to check for catalogue inconsistencies so that we catch problems that are introduced or reintroduced by updates in the cataloguing code.
I have started these in tests/testsuite.py
. Additional contributions are very welcome.
As these ID_flag==2 sources appear like LGZ sources in other respects they should probably have quality flags -- otherwise we run the risk of them being filtered by accident.
The current ID_flag value aren't consistent with the documentation for the optical catalogue (e.g. what are 311, 312, etc?) Please could the documentation be updated to explain what these are.
Minor one, but the new v0.5 has unnecessary whitespace at the end of the Source_Name entries (don't think this was there in the previous version). Fixable with an rstrip() in processing code, so not urgent but would be useful to fix with the next release.
We need to decide what columns to keep in the component catalogue and document them.
Hi,
I think there are some duplicate sources in the v1.2 optical catalogue.
These are just some examples I found, but I suspect there might be more in there:
ILTJ112101.32+490457.3
ILTJ112101.29+490456.5
ILTJ105143.62+513451.5
ILTJ105143.76+513454.5
ILTJ110953.46+482703.8
ILTJ110953.36+482708.0
ILTJ115041.76+473207.7
ILTJ115041.38+473211.1
I found these by checking the nearest neighbour distance in a small subset of the catalogue I am using. It might be PyBDSF splitting a source into multiple sources?
Isl_ID was in the radio catalogue but is missing from Isl_ID the value added catalogue. I thought we were going to keep this one as perhaps useful for complex sources in the same island.
We have found some PyBDSF sources (52; in LOFAR_HBA_T1_DR1_catalog_v0.9.srl.fixed.fits) that are in the PyBDSF catalogue but do not appear either in the artefact list (LOFAR_HBA_T1_DR1_merge_ID_v1.1.art.fits) or in the components table (they are neither components nor "deblended_from" sources in LOFAR_HBA_T1_DR1_merge_ID_v1.1b.comp.fits).
We think that these sources may be perhaps artefacts that are missing from the artefact table. The number is very low but, if that is they are true artefacts, they could be added to the artefact list for consistency.
The list of PyBDSF Source_Names is:
diff_sources.csv.txt
There are a few sources with E_RA=0 and E_DEC=0 (and Min=0)
Fortunately, I think they're all artefacts so should just be flagged as such.
some ML sources (ID_flag==1) don't have ID_ra and ID_dec values in the catalogue but do have a ML_LR value, and so there must be a valid ID but for some reason the coords are missing
3 ID_flag==2 sources in v1.1 were missing the 2MASX name in their ID_name so ended up with no ID
ILTJ105900.19+500255.0 2MASXJ10590015+5002557
ILTJ123201.77+462057.2 2MASXJ12320182+4620528
ILTJ140921.74+490217.9 2MASXJ14092132+4902197
(fixed in 60ddc4a)
The current catalogue has source sizes convolved with a 6 arcsecond beam. It would be great to get the deconvolved sizes to use for conversion to proper linear sizes.
Hi @mhardcastle,
Currently working on the codes to transform the LoTSS DR2 combined catalogues into the required formats so I can generate our SV/trimester 1 catalogues. Are the uncertainties for the unWISE magnitudes present in the catalogue(s) pulled in from likelihood ratio analysis? If they are, can these be propagated into future versions of the merged catalog? - This would prevent the need for another expensive join back in to make those columns available in the WEAVE archive.
Here's a place to submit IDs of sources where the catalogued association seems wrong.
ILTJ131451.56+542820.5
There are quite a few of these and we should decide what to do with them -- they are probably all identified with catalogued bright galaxies and so perhaps their route is similar to the 2MASS sources dealt with in Wendy's code.
Sent by @mhardcastle ([email protected]). Created by fire.
We should have a flag for the presence of intelligent life
Seems to be a lot of missing absolute w3Mag and w4Mag errors in LOFAR_HBA_T1_DR1_merge_ID_optical_v1.1b_restframe.fits.
This seems to occur only for sources with absolute mags > 17, and strangely errors are only missing where the corresponding w3Flux = 0, but the w3Flux errors do actually exist. This is the same case for w4 mags.
Same issue as #27 in the final (v1.2) catalogue.
In the final catalogue, the values of 'Mosaic_ID' are sometimes truncated to 8 characters. The width of the column is 11 but we can find the mosaics with long names truncated to 8 characters in some cases. For example, it is possible to find "P10Hetde" and "P10Hetdex" to refer to the same mosaic.
Fortunately, there is no degeneration in the truncated values and the original mosaic name can always be recovered from the truncated version. For the public release I think we should at least add a warning about this problem.
There seem to be 93 entries in the v0.8 components catalogue with bad DEC values (DEC=-90.). Most of them seem to have been flagged as artefacts, but not all.
I have saved the relevant rows on a directory in the LOFAR server:
/data/lofar/bmingo/tests/v0.8_comp_bad_DEC.fits
EDIT: while some of the sources seem to have artefact_flag=FALSE, they all seem to belong to the Artefact_flag subset
We should make sure we have complete documentation for all the columns in the final data products, including all the metadata that might be required for the VO.
The referee noticed that there are a few sources with ID_names of 2MASS instead of 2MASX J
There are indeed 4 of these... all have ID_flag = [31,32] I think they get the 2MASS name in processing the hostbrokenup sources but my merging code is not dealing with them correctly their components are not already associated with any/the same 2MASX source
When I download redshifts from SDSS DR14 I find a bunch of sources in the final catalogue (888 of them in total) that have a warning-free spectroscopic redshift in DR14 but no z_spec in our catalogue. These don't seem to be particularly different from the existing spec-zs (see image).
I understand that Joe may have some extra spec-zs too.
I guess what we want to do is insert these into Ken's catalogue -- there may be reasons to exclude these sources from the training set but we probably want them in the final output, and we want the rest-frame magnitudes to be correct for the spec-zs. Comments anyone?
In regions where there a very bright optical stars ( G_gaia ~ 16), the statistics for LR matching will be potentially skewed and many of the optical sources will be spurious.
Ideally, we should mask the optical catalogs in advance of any x-ID efforts. But as a minimum it would be useful to have flagging of radio sources (matches) which are in regions around the very brightest stars.
HELP are developing scripts to do this automatically which may be applicable here, saving some efforts on our part.
Greetings, fire here!
@mhardcastle recently invited @fire-bot to this repository. Before fire is enabled, @mhardcastle needs to complete a few steps:
Only @mhardcastle will be able to enable fire using the above link. If someone added fire by mistake, feel free to remove @fire-bot from your repo's collaborators.
On the component catalogue there is a source with 9 duplicated rows:
Component_Name = ILTJ115037.81+465929.4
Beatriz has spotted that there are some duplicate source names in the current main table. These all have ID_flag==312, i.e. too zoomed in sources from LGZv1. In some cases the duplicate entries have different coordinates and fluxes, etc. There is a table of these entries on lofar.herts here: /data/lofar/bmingo/morph/hetdex/Multi_rows.fits .
We need visual inspection code to deal with the sources flagged as blends by LGZ (or otherwise).
The Source_Name has the form ILTJHHMMSS.ss+DDMMSS.s (*except some earlier iteration of the catalouge that has ILTJHHMMSS.sss+DDMMSS.ss) but I think we may still have this a bit wrong.
Names are derived from the coordinates as follows:
ilt=[]
sc=SkyCoord(t['RA'],t['DEC'],frame='icrs')
strings=sc.to_string(style='hmsdms',sep='',precision=2)
for s in strings:
ilt.append(str('ILTJ'+s).replace(' ','')[:-1])
(as in https://github.com/mhardcastle/lotss-catalogue/blob/master/utils/fix_sourceid.py)
So truncated to 1 d.p. in DEC (after rounding to 2 dp) and rounded to 2 d.p. in RA
The IAU convention to truncate is rather annoying: http://cdsweb.u-strasbg.fr/Dic/iau-spec.html#S3.2.1
but I think we shouldn't be truncating to N d.p.'s after rounding to N+1 d.p.'s.
Also we should have a space between ILT and J.
As we understand, the "Source_Name" in the optical source catalogue (OSC) is a final source name that in some cases corresponds to the original PyBDSF "Source_Name". However, in some cases the "Source_Name" is new (one PyBDSF source may be split in several OSC sources or many PyBDSF sources may be combined in a single OSC source) and there is no way to match the sources between catalogues.
It would be nice to have a table with the correspondence between sources in the two catalogues (like a many to many relationship; https://en.wikipedia.org/wiki/Many-to-many_(data_model))
All sub-directories of the code should be documented at least at the level of what the scripts are.
@dunkenj is it OK to replace the -99s in the phot_z catalogue with mask values? It drives me nuts when I try to do scatter plots in topcat, which will handle masked data just fine but thinks -99 needs to go on a plot.
I found one duplicated source in the optical catalogue (LOFAR_HBA_T1_DR1_merge_ID_optical_v1.0.fits):
Source Name ILTJ132633.10+484745.7. The match is the same (same AllWISE and objID) but the RA, DEC, major axis, etc, are slightly different. It is the same source selected from different mosaics.
This source also appears duplicated in the component catalogue (LOFAR_HBA_T1_DR1_merge_ID_v1.0.comp.fits).
Size estimates for PyBDSF and LGZ sources are inconsistent. We should derive better sizes using the properties of the Gaussians in the LGZ sources as well as their separation.
There are odd individual pixels or collections of 2-3 pixels which have odd values in the P196+55 mosaic (most obvious towards the NW). I have been aware of this for a while but haven't got around to looking at FITS images and tracking down an actual location. This noise can't (I think) be a result of deconvolution -- it must appear in mosaicing. We ought to track it down. Fortunately I don't think it is likely to affect the cataloguing much.
Artefacts flagged by other means (in various visual inspections in the flowchart, and edge sources in particular) can be associated as part of a new LGZ source or can be resurrected if they were in the LGZ sample but not classified as an artefact. Probably best to filter them out as well.
I think there is an issue with some sources having incorrect RA, Dec reported in the value added catalogue. I am assuming the RA and DEC columns reported in the value-added catalogue should be identical to the radio only catalogue (i.e. it is not updated).
For example, I am comparing the position reported for ILTJ112416.41+513338.5 in LOFAR_HBA_T1_DR1_merge_ID_optical_v0.6.fits and LOFAR_HBA_T1_DR1_catalog_v0.9.srl.fixed.fits. Below is the snapshot of where the RA and DEC reported in the value-added catalogue is indicated by the cross-hair. This is approximately ~30" from the radio source (and the reported position in the radio-only catalogue is right on top of the source).
I didn't think the RA/DEC columns should be changed at all between the catalogues but you might be trying to report the centre of some FRIIs? Clearly something is amiss with this source. There are several sources like this but not as extreme (e.g. ILTJ150049.52+475105.3 and ILTJ124833.80+512800.1).
PS Just talked to Ken Duncan and he thinks Leah Morabito is also seeing something similar?
The ID_name column in the final catalogue is useful but inconsistent about its naming of sources depending on where they came from. Should probably be rewritten in a standard form after merging.
Ive nearly finished making two new columns
Number_Pointings -- Number of pointings that are mosaiced together that contain the source (takes account of the astronomic blanking too)
Number_Masked -- Number of the above pointings in which the source is masked.
Do we want these in the final catalogue do you think? It does give an insight into the deconvolution but its a bit tricky as e.g. I just use the catalogued RA and DEC and also doesnt reflect in anyway the distances from the various pointing centres.
I dont think we need to try to expand it to take into account the full sizes of sources (users can look at the residual mosaics for these). But I suppose I could change to have e.g. fraction of source weight deconvolved that is masked (so summing up all the weight images with an without masks). Then we probably would only need that column rather than two new columns.
There a number of point sources that are cut off near the edge of the survey region. This means that their flux densities are underestimated and they look like peaked-spectrum sources. Need to think of a clever way of adding a flag or excising these sources from the catalogue.
Below are the names of a few of the sources that are cutoff at the edge of the survey, causing them to be missing flux:
ILTJ151524.99+543046.0
ILTJ150319.52+454944.5
ILTJ142450.82+502655.3
ILTJ142342.77+524831.0
ILTJ141236.02+512257.8
ILTJ124236.50+562844.6
ILTJ114646.72+562353.2
@mhardcastle I'm checking my 2MASX merging of components (uses Make_Shape) and the sizes seem too large by a factor 2. Shouldn't merging a single component give LGZ_Size, LGZ_Width \approx DC_Maj, DC_Min ? [should be an easy fix if you agree this is a problem...]
Plotting source LGZ_Size against component DC_Maj for all LGZ/2MASX sources (obviously quite different for many component sources), but should be the same for single component, or all components contained within another (for 2MASX sources with only one component I'm setting LGZ_Size=DC_Maj):
Strip of data without PanSTARRS photometry needs a flag in the optical-based catalogues.
I'm looking at LOFAR_HBA_T1_DR1_merge_ID_optical_v0.4.fits and it seems that somehow the column 'Mosaic_ID' is not properly catching the Mosaic_ID -- some are fine, with P214+55 but then there are columns where it's just 'P10Hetde' or 'P1Hetdex1' ... what's going on here? Is there a standard format that we should be following for the mosaic IDs or are they listed somewhere that's accessible?
@bmingo has flagged up a strange catalogue entry. Source ILTJ114824.92+551438.2 has an LGZ_Size of around 13.6 degrees. Its central coordinates aren't within the mosaic that is listed for it in the catalogue. Its optical ID is a long way away from the central coordinates.
It has an ID_Flag of 31 suggesting LGZv1, but it doesn't exist in earlier versions of the catalogue (I looked at version 0.6 as that's one we've processed previously - there is neither a source of that name, or anything with such a large size). @mhardcastle or @wllwen007 do you have any idea what can have happened here? Obviously it's only one source, but would be good to know what's happened in case it's flagging up a wider bug. (I have checked and there aren't any other ridiculously large sources).
There are 22 sources that are in the optical catalogue but not in the components one and 82 sources that are in the component catalogue but not in the optical one.
These sources can be found doing a exact value match in Topcat between the "Component_Name" column in the components table and the "Source_Name" column in the optical table with the settings "All matches" and "1 xor 2" .
The formula 1.25 + 3.01*(peak_flux/isl_rms)**(-0.53) seems reasonable at separating resolved and unresolved sources. Do we want a column in the catalogue indicating whether a source is resolved (according to this formula). I dont mind either way really the the formula isnt perfect but perhaps its useful to give an indication.
There seem to be some catalogue entries in version 0.6 (also present in version 0.5) in which the same ID_name, AllWISE, and/or ObjID value is assigned to different Source_Name values (with different fluxes and slightly different coordinates).
In the case of the repeated ID_name, there is at least one known issue covered (entries with ID_name="mult" on the catalogue), but I seem to have picked up a further 49 pairs of sources with identical ID_names, all of which also have identical AllWISE and ObjID values (when those are present). All the pairs seem to also have had the IDs assigned via the same method (same ID_flag).
There are 97 pairs of sources with the same AllWISE value (partially overlapping the previous group). Some of these have a different ID_name for each source, and different values of ID_flag. Where available, all pairs also have the same ObjID value.
There are 82 pairs of sources with the same ObjID value. This group seems to be a subset of the previous one, as in all cases the AllWISE values are identical for both sources in each pair (there are two pairs where allWISE=N/A, and ID_name and ID_flag have different values, so this selection by ObjID is still necessary to catch those cases).
I investigated this issue after finding that this galaxy zoo source had two separate entries in the catalogue (one matched through galaxy zoo, the other matched to a bright galaxy), which used different combinations of components:
https://www.zooniverse.org/projects/mjh22/lofar-galaxy-zoo/talk/subjects/10509209
I believe there may be two possible issues here:
The first is where one of the matches to the same optical source has been achieved through galaxy zoo, and the second through maximum likelihood or match to a bright galaxy.
The second case, more prevalent for repeated values of ID_name, has the same ID_flag for both entries in the pair with the same optical ID, likely meaning that different combinations of components are used for the same source through the same method (galaxy zoo or maximum likelihood).
I have saved the tables for the three sets of duplicate entries on the Herts LOFAR server, on
/data/lofar/bmingo/morph/hetdex/tests
flowchart/README.md should be updated with the latest meanings of ID_flag.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.