Comments (14)
https://earthdata.nasa.gov/nasa-data-policy
You may need an earthdata login to access some of the data, it's a free registration.
Also here is a list of all the servers FTP and HTTP from data.gov, which includes many NASA ftp servers https://gist.github.com/maxogden/9885244926c1ab576287ff5047dd0e5f
from data-rescue-pdx.
Working on Goddard Space Flight Center. Mr. Google sent me here:
And they encourage wget!!
https://disc.gsfc.nasa.gov/recipes/?q=recipes/How-to-Download-Data-Files-from-HTTP-Service-with-wget
I can code this up ... do we want to put it up on a server somewhere?
from data-rescue-pdx.
https://genelab-data.ndc.nasa.gov/genelab/projects
A very nice database for genetic research done IN SPACE!
from data-rescue-pdx.
Sam and I are doing NSIDC
from data-rescue-pdx.
For the Earth Sciences Level 1 and Atmosphere Archive and Distribution System (LAADS) DAACS, they have archived all of their data on both ftp and http sites:
ftp://ladsweb.modaps.eosdis.nasa.gov
https://ladsweb.nascom.nasa.gov/archive
Useful Readme of the data contained and how to access is here:
https://ladsweb.nascom.nasa.gov/archive/README
from data-rescue-pdx.
Actually, it looks like all the DAACS' data is contained in the Common Metadata Repository:
https://wiki.earthdata.nasa.gov/display/CMR/CMR+Client+Partner+User+Guide. Based off this, we would only need one scraper to pull all data from this system?
from data-rescue-pdx.
I've got dibs on crawling https://opendap.larc.nasa.gov/opendap/ 🚀
from data-rescue-pdx.
I took a look at the CMR page and started parsing the metadata provided at https://cmr.sit.earthdata.nasa.gov/search/collections.json.
I put together a script that traces the the files linked there with curl and outputs their final place after redirects: https://gist.github.com/crhallberg/eebc86dd74ec36e9f2f522ac1559cb7b.
That's just the bare-bones version. I also have one that does a lot more (saves collections.json, separates files into data, webpage, and broken, has status output) if needed.
from data-rescue-pdx.
@crhallberg awesomeness, do you have an idea of how many datasets are available under that collections endpoint? is each collection a big group of datasets? do you have an example of the metadata that your script produces?
from data-rescue-pdx.
I'm glad you asked because I'm still very new to this. There is a LOT more info here than I thought. My initial thought that what I was parsing was an update feed. Turns out I was on page 1 of 19,590 items. I still don't know how many. A part of the documentation I just found says "You can not page past the 1 millionth item." so there is (obviously) a heck of a lot.
Do you have any examples of good metadata that I can aim for as I interate on this?
from data-rescue-pdx.
@crhallberg hah! that's a lot of data :) if you wanna check out the data.gov metadata, the gold standard in my opinion, check out this guide i wrote last month https://github.com/jsonlines/guide. the main idea is you have a JSON object for each dataset, and that object has an array of resource URLs, one for each data file.
from data-rescue-pdx.
Is this related to the tweet https://twitter.com/denormalize/status/838550043397234691 ? I was wondering if you found a solution to the parallel ftp problem.
from data-rescue-pdx.
Update: I've identified 48,126 links. Some are invalid, some are ftp folders, I'm weeding through now by checking headers. After I've separated the wheat links from the chaff links, I'll reconcile it with the original metadata.
I will place a link here when I have a centralized place to show and tell progress: https://github.com/crhallberg/nasa-cmr-scraper.
from data-rescue-pdx.
I wasn't sure where else to push this, so I just made a new repository: https://github.com/crhallberg/nasa-cmr-scraper
from data-rescue-pdx.
Related Issues (20)
- NOAA NDBC Ship Observations Report
- ff9ae098-eccc-41d8-bfcd-5e8ed047db05 - Danielle Example issue
- ac5675c4-79a1-4403-8332-bdbcbb699ea5
- https://oceandata.sci.gsfc.nasa.gov/
- 2dd2cc96-31da-45c2-8ebe-a12217fdfbd2
- F0BF3259-BC80-4617-894A-D0652495773F.json
- F157BFAA-B746-4623-95DF-D1C8BA855204
- CDC1623F-7C1A-438B-80F3-F147659BB31C HOT 1
- DED4D95D-CCA4-455A-9CE6-B19B6554724C
- 63753c86-15c9-4a4d-a214-411854083777
- A1E4629E-3BA5-4117-918D-C2E0B2076A52
- E00A0304-8D7C-4D77-AD67-9EFF42CBA4A2
- BFC94895-EB66-4C49-B6BF-4C88F3EBB7D2
- 12ac81d0-d4f0-4dc0-9708-e7f29a9742c1
- 29C5D29E-FB79-4AA6-B39E-0EBCDEF2DDD3
- 2dd2cc96-31da-45c2-8ebe-a12217fdfbd2
- 19884568-DDBD-4AE4-80BD-9281DCE1C501.json
- 3810C977-AA76-4A5A-85E1-60254192039B.json
- 5C3DB57A-904B-449D-9753-00D68467F57C
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from data-rescue-pdx.