jhurley13 / automatingcbc Goto Github PK
View Code? Open in Web Editor NEWAutomating the Christmas Bird Count
License: Apache License 2.0
Automating the Christmas Bird Count
License: Apache License 2.0
select H$2:J$238 Format to Accounting/No symbol/0 decimal places (not conditional formatting)
Fix printing. May need to add some number of blank lines (see blank_row)
Add formatting rules so that zeros show as blanks (but are still 0)
see Barry's email 12-27-20 5:26PM Re: Automating the Christmas Bird Count article
John,
The way to get zeroes to show as blanks is to make a custom format that looks like “0;0;” (this means format with no decimals for postive and negative numbers and format as nothing for zeroes).
The custom formats are pretty powerful, and reminiscent of regular expressions. For example, they have this one:
($* #,##0);($* (#,##0);($* "-");(@_)
Not sure what it means! Like regexes, I know just enough to be dangerous!
Barry
May require support from eBird team
Pete Dunten visits missing, but details are available
dunten_subids = ['S78036994', 'S78035225']
dunten_locids = ['L7194495', 'L13043990']
dunten_visits['obsDt'] = '26 Dec 2020'
As part of prepare_for_compiler, highlight CountSpecial count fields if all counts
in Group are zero. For example, count "peep sp." iff all other on list
in "Sandpipers and Allies" are zero
Or "gull sp." iff "Gulls, Terns, and Skimmers" all zero
This isn't quite right though, since "peep sp." would require all Shorebirds to be 0
See e.g. "Western/Clark's Grebe" and "Western/Clark's Grebe"
SLASH [clear path]
For "Western/Clark's Grebe", both ['WEGR', 'CLGR'] entries must be zero
For "Short-billed/Long-billed Dowitcher" both ['SBDO', 'LBDO'] entries must be zero
Field in Taxonomy is "comNameCodes"
ISSF
Northern Flicker (Yellow-shafted) has ['NOFL'] in comNameCodes; This must be zero
INTERGRADE
Northern Flicker (Yellow-shafted x Red-shafted) has ['NOFL'] in comNameCodes; This must be zero
HYBRID
Northern x Gilded Flicker (hybrid) ['GIFL', 'NOFL'] in comNameCodes
SPUH
gull sp. has [] in comNameCodes
peep sp. has "Sandpipers and Allies" in familyComName, SPECIES_GROUP is "Shorebirds"
this means we have to populate familyComName as well
duck sp. familyComName is "Ducks, Geese, and Waterfowl"
DOMESTIC
Mallard (Domestic type) not countable
Muscovy Duck (Domestic type) not countable
To get 4 char code, it may be in comNameCodes, or if that's [], then bandingCodes
See e.g. Peregrine Falcon, also Wood Duck
familyComName - column J
comNameCodes - column G
SPECIES_GROUP - column X
bandingCodes - column F
Leads to general rule for checklists: if SLASH/HYBRID/ISSF present,
then base species must be on list as well. Seen with Dark-eyed Junco
See https://nationalaudubon.app.box.com/s/h5h73acc0ix86vap55hf8miici88bniw
"Total Number of Species" from
https://www.audubon.org/christmas-bird-count-compiler-resources
There just isn't enough data from eBird, but this is a big pain point
To support SCVAS.org results chart in the Avocet magazine
https://netapp.audubon.org/CBCObservation/CurrentYear/ResultsByCount.aspx
The Excel spreadsheet downloaded from here has other information besides species counts. Transform into something that can be used for merging. The specific use case is to take results from CASJ, CACR, CAPA and CAMH and produce a summary for publication in the Avocet, the magazine of the Santa Clara Valley Audubon society (SCVAS.org).
For example, Alum Rock is https://ebird.org/hotspot/L370800
This might not work for non-hotspots
Bob Hirt 26 Dec 2020 12:01 obsDt wrong in personal_checklists, should be 2020-12-26 12:01
ref raw_csv_to_checklist
For example, "WARNING *** file size (56569) not 512 + multiple of sector size (512)"
See https://stackoverflow.com/questions/7619319/python-xlrd-suppress-warning-messages.
Currently will die at line 208 of count_day_tasks.py, since outputs_path / f'{circle_prefix}Single.xlsx' not present.
e.g. Bob Hirt ref L5551212
Lumping all the days together is just too much. Maybe one sheet per day in the EBirdSummary files. See CAPA for test case.
Also, don't mark anything as rare, since there is no base checklist.
see generate_frequency_combined_checklist for feature request from Mike Azevado
split into separate service
It looks like there is a limit of 200 checklist max, see
GET Recent checklists feed
https://api.ebird.org/v2/product/lists/{{regionCode}}
CAMP CBC hits this.
Document what can be done without eBird API key (or during outage!)
Where are all the inputs? This includes parameters, rarities, annotations, contacts
Services
CBCPipeline => CBCServices or BirdCountServices
Directories
2019 CBC email from Bill: "2019 Christmas Bird Count - Evergreen & Alum Rock" 12-03-19
(78 email addresses)
Here, True positives are the number of cases where the algorithm detects an example as an anomaly and in reality, it is an anomaly.
False Positives occur when the algorithm detects an example as anomalous but in the ground truth, it is not.
False Negative means the algorithm detects an example as not anomalous but in reality, it is an anomalous example.
Documentation note: For CSV files, the header cannot have any spaces:
"CommonName,Total" not "CommonName, Total"
as the second column will be read as " Total"
Post-mortem
Things that don't work:
no way to tell what things are rare, i.e. need writeup
Please don't put an X in for a species count
Add SPECIES/SPUH, etc
Note that things like "Muscovy Duck (Domestic type)" are not countable, so filter
Really a pain to merge data right now
It looks like Total column for Sectors is a number not a formula. Fix to allow additions
All it is possible to figure out who has to file a rare bird report from the Summary,
would be much better to produce a list (in addition to pre-populating forms)
Really want to move back to my pipeline idea, where info for summary can come from
many sources and have different transformations done (e.g. merge, which inserts into species)
Must agree on hotspots to use when entering data to avoid having to manually
merge/max columns later. See e.g. Bruce Barrett and my mixup over Almaden Lakes
cache data for eBird generated locIds that don't have info, but where we can get
location info by scraping the web page for a checklist
See Pete Dunten issue
This is to fill in data that may have been dropped by the 200 checklist limit in get_visits. See #15
Note that nacc_sort_order == 0 is valid (Highland Tinamou)
For Barry/Ginger almost duplicates, add conditional formatting
Formula: =$N2<>$O2
applies to $N2:$O199
Barry & Ginger almost dup checklists: ['S78035409', 'S78022175', 'S78029825', 'S78013998']
Almaden Lakes sector a good one for finding dups and near dups, particularly locations
duplicate code4 can cause problems: ref HOGR for horned grebe and hooded Grebe
PARSE PROCESS FOR CACR
Download checklist from https://scvas.org/christmas-bird-count-party-leaders
rename Calero-Morgan+Hill+CBC+Checklist+2020.pdf => CACR-2020-checklist.pdf
Move to Inputs/Parse
Add column to ground_truths called CACR-2020
May have to update Inputs/Parse/CACR-2020-Annotations.xlsx
In Service-Parse, make sure count_prefix is set to 'CACR-2020-'
Under Kernel, Restart & Run all
For Al Eisner
Greater White-fronted Goose, Red-breasted Merganser, Virginia Rail,
Brown Pelican, Red-breasted Nuthatch, House Wren (which has increased
in the last few years), Golden-crowned Kinglet, Brown-headed Cowbird.
1 Virginia Rail - Portola
L12968060-Tricia Gardner-11:07-07-Portola Valley-S77808640
1 House Wren - Palo Alto
L4970491-Richard Jeffers-15:24-02-Palo Alto-S77821260
2 GCKI - Portola
L3063713-Cédric Duhalde-07:21-07-Portola Valley-S77816096
L6581306-Whitney Mortimer-07:29-07-Portola Valley-S77807596
3 Brown-headed Cowbird - Portola
L12968060-Tricia Gardner-08:25-07-Portola Valley-S77800072
Here are the URLs for the checklists:
https://ebird.org/checklist/S77808640
https://ebird.org/checklist/S77821260
https://ebird.org/checklist/S77816096
https://ebird.org/checklist/S77807596
https://ebird.org/checklist/S77800072
Greater White-fronted Goose 0
Red-breasted Merganser 0
Virginia Rail 3
Brown Pelican 0
House Wren 1
Brown-headed Cowbird 3
Red-breasted Nuthatch 1 ***
Mountain View
L153282-Sarah Chan-11:05-04-Mountain View-Sunnyvale-S77846952
Golden-crowned Kinglet 4 ***
Palo Alto 1
L594012-Liz Frith-12:05-02-Palo Alto-S77711697
Menlo Park 1
L5453625-Barbara Coll-09:34-03-Menlo Park-Atherton-S77843938
Portola 2
L3063713-Cédric Duhalde-07:21-07-Portola Valley-S77816096
L6581306-Whitney Mortimer-07:29-07-Portola Valley-S77807596
https://ebird.org/checklist/S77846952 (Tue 22 Dec 2020)
https://ebird.org/checklist/S77711697 (Sat 19 Dec 2020)
https://ebird.org/checklist/S77843938 (Tue 22 Dec 2020)
https://ebird.org/checklist/S77816096 (Mon 21 Dec 2020)
https://ebird.org/checklist/S77807596 (Mon 21 Dec 2020)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.