Giter Site home page Giter Site logo

honours's Introduction

๐Ÿ’ฃ Task manager

โ›๏ธ Data collection

  • Body mass (5400 spp)
    • EltonTraits 1.0 (will only use this for now)
      • collected
      • cleaned
        - [ ] ADW
        - [x] scraped
        - [ ] cleaned
        - [ ] AnAge
        - [x] collected
        - [ ] cleaned
        - [ ] PanTheria
        - [x] collected
        - [ ] cleaned
        - [ ] Smith et al.
        - [x] collected
        - [ ] cleaned
  • GBIF geographical location (4721 spp)
  • Phylogeny (6952 spp)
  • Human use (1472 spp)
    • collected from IUCN
    • try pivot_wider()
  • IUCN status (5934 spp)
    • collected
    • cleaned

๐Ÿงฎ Pre-analysis

  • Combine datasets (8308 spp)
  • More cleaning up
    • remove species with only genus or species name (8305 spp)
  • Add classification levels (taxize)
    • will have to come back to do this after itis is working again
  • Synonym matching (rotl)
    • create a species list in long form with the ID
  • h-index (specieshindex)
    • put quotation marks around synonyms
    • fix weird NA dataframe (putting synonyms around the synonyms seems to have fixed the problem)
    • divide species list into 2 to fix timeout issue
    • run the 2nd list next week since Iโ€™ve reached the limit of 20,000 requests this week
  • Missing data (mice)
  • Google Trends (gtrendsR)

๐ŸŽจ Making graphs

  • h vs mass
  • h vs phylogeny ๐Ÿ“
    • something wrong with the new artificial tree, need to find out why
  • h vs location
  • h-index map
  • h vs human use
  • h vs domestication
  • h vs iucn
  • h with vs without conservation keyword
    • check for patterns
  • Google Trends sum
  • For all plots
    • change font
    • change h-index axis scale to true values
    • add p-values / other stats on the plots

๐Ÿš€ Statistical analysis ๐Ÿ’Ž๐Ÿ“

  • correlation matrix of complete entries
  • Phylogenetic signals (phytools)
  • MCMCglmm
    • impute 10 datasets
    • trim all trees + pick 50 randomly
    • write nested loop + TESTING
    • katana
    • extract data from models ๐Ÿ“
    • Pagelโ€™s ฮป (calculated from MCMCglmm results) :round_pushpin:

๐Ÿงฉ Other materials ๐Ÿ’Ž๐Ÿ“

  • data summary table for thesis
  • supplementary material

ย 

Legend
What Iโ€™m working on now ๐Ÿ“
Priority ๐Ÿ’Ž

honours's People

Contributors

wcornwell avatar itchyshin avatar jesstytam avatar

Watchers

 avatar  avatar Malgorzata Lagisz avatar  avatar

honours's Issues

check for missing data

check if mammals with higher h-index have less missing data
convert data to binary (1 for data present, 0 for data missing)

plotting relationships!

ggplot - log10(h) vs log(BodyMass) etc. Also, you can explore relationships among all variables
ggplot - log(h) vs IUCN status and human use

parsing animaldiversity.ummz.umich.edu

@jessicatytam has got all the data but it's coming in like this:

Cryptotis squamipes\n      \n      \n      \n      \n      \n    \n        Cryptotis thomasi\n      \n      \n      \n      \n      \n    \n        Ctenodactylus gundi\n      \n      185\n      175\n      195\n      \n    \n        Ctenodactylus vali\n      \n      \n      \n      \n      \n    \n        Ctenomys argentinus\n      \n      \n      \n      \n      \n    \n        Ctenomys australis\n      \n      \n      \n      \n      \n    \n        Ctenomys azarae\n      \n      \n      \n      \n      \n    \n        Ctenomys boliviensis\n      \n      \n      \n      \n      \n    \n        Ctenomys boliviensis nattereri\n      \n      \n      \n      \n      \n    \n        Ctenomys bonettoi\n      \n      \n      \n      \n      \n    \n        Ctenomys brasiliensis\n      \n      \n      \n      \n      \n    \n        Ctenomys colburni\n      \n      \n      \n      \n      \n    \n        Ctenomys conoveri\n      \n      900\n      \n      \n      \n    \n        Ctenomys dorsalis\n      \n      \n      \n      \n      \n    \n        

so we need some clever regex help from @mlagisz

MCMCglmm

since i didn't match the synonyms properly before i did it again and got a pretty similar list of mammals.
i imputed data from this new list, trimmed the tree, and run the model again, but now body mass is not significant (everything else is the same), is this bad?

post.mean l-95% CI u-95% CI eff.samp pMCMC
logmass 0.03489 -0.01843 0.08686 1117 0.204

human use dataset

Seems like IUCN data is a good plan for this. but might need some parsing

google trends

log_sumgtrends_latmedian
google trends is transformed using log10(sum of hits +1)

group

group the mammals into smaller groups
check how they split
e.g. terrestrial vs marine, eutherian vs non-eutherians, etc.

IUCN analysis

ordinary variable for IUCN status LC, VU, EN, CE, EW
other categories set to NA

combining list for the 3rd time

Ok so i thought i did not match the synonyms correctly, which removed some important spp from the list (e.g. sheep).
Turns out the problem is with the rotl phylogeny list, which didn't have some of these spp and i removed spp that were not on this list.
So i am now making a new list keeping spp not on the rotl list, and getting h-index and google trends again.

scaling & models

need to use a model with log values
model with real or log values?

human use

change categories to binary
find the slope

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.