Comments (16)
Open Data Nepal does have this dataset, but most of them are in the raw format - which needs to be analyzed to fit the NepalMap datasets pattern.
Since that datasets pattern of both NationalData Portal and NepalMap are same (local bodies>districts>provinces>country), I suggest scraping datasets from National Data Portal, it's gonna be very easy and less time-consuming.
Maybe @amitness or @pratimakandel can generate the python script to scrap the entire portal, which will not affect the dataset's structure. Later, we can use volunteers and fellows to aggregate the scraped data according to our needs. Or we can organize a small datathon proposed by @amitness previously.
from data.
@pratimakandel Are you interested to work on this? I have found the pattern in the URL as:
Province
http://nationaldata.gov.np/Province/Index/1
Change number from 1 to 7 for the seven provinces.
For each province,
http://nationaldata.gov.np/District/Index/103
here: 101 where 1 means province and 01 means district number for that province. Keep increasing it until you get an error page. Repeat for each province.
For each district in province,
http://nationaldata.gov.np/LocalLevel/Index/10701
here, 10701, province 1, district 07 and local level 01. Keep increasing local level until error page comes.
from data.
Yes, I can look into this.
from data.
@pratimakandel Great, thanks for looking into it. Please store the district/province/local name as well when scraping as we will need to map it to geocodes/geolevels in NepalMap later on.
from data.
@pratimakandel How is the progress on this?
from data.
@pratimakandel How is the progress on this?
I am very sorry for the delay, I actually have not started working into this. I defintely know there are great web-scraping tools to do this, however I will not be able to complete this quickly. I hope you can understand.
from data.
Any update guys?
We can ask some more volunteers to join this task if we require the helping hands. Let me know.
from data.
@pratimakandel had emailed us informing her limited availability right now due to college.
So, yeah we could use some help from volunteers for the data scraping.
from data.
No worries @amitness
I just discussed with @nirmalrizal regarding this issue. He got a good experience of scrapping the data from the Nepal government website: https://github.com/nirmalrizal
Please send him an invite to join the C4N GitHub repo, his email address is: [email protected] cc @ravinepal @cliftonmcintosh
from data.
@nikeshbalami, I will work on this 😄
from data.
@amitness @nikeshbalami
I have scraped provinces data with a script and for now, updated that data here in this repo https://github.com/nirmalrizal/nationaldata
Can anyone verify the structure of this data?
And after that, I will start my work on District, LocalLevel and Ward data
from data.
@nirmalrizal You can scrap in a format that seems reasonable to you. I have explained below how it's going to be used after it's scraped. Hopefully, that should give you some idea.
This data is going to be loaded into NepalMap which needs a specific format. For example, you have scraped the population data here.
To load that population data into NepalMap, we will need to write a SQL script that sets the male/female population value for all possible levels (local/district/province/national). An example script for that task is here
As seen in that SQL, we need a format like "geo_code, geo_level, key, value". The geo_code will be the id for the district/province/local level as per mapping here. The values will be the scraped population data for that geography.
Let's assume we want to show literacy levels. Then, we would need a sql file like this:
geo_code, geo_level, gender, total
1, local, male, 50
1, local, female, 20
...
100, district, male, 50
...
2000, national, male, 75
2000, national, female, 60
As seen, we have values for all ids present in the geography.sql file.
from data.
Thank you @nirmalrizal for catching up on this quickly. I believe you may now have a good idea of how we are going to use it after going through @amitness explanation.
Local level data is the key, and the structure of data scrapped by you looks perfectly fit. However, each CSV file may need some further cleaning so that the tasks of the developer can be made easier. How about catching up for the hangout so that we can discuss it and start working on cleaning and push it on C4N data repo. Let me know which time works for you guys.
from data.
Thank you @amitness and @nikeshbalami dai for the explanation.
For now, I have updated all of the available data upto ward level on this repository and we can talk more on our meeting about how I can help to structure data more to ease work for developers.
from data.
This is great @nirmalrizal , thank you so much. I owe you a Chiya for this awesome work. Looking forward to discussing more.
from data.
I'm currently integrating the data to NepalMap. Thank you very much for lending help in scraping part, @nirmalrizal. The structure you scraped in, was very close to what we needed. Closing this issue now.
from data.
Related Issues (20)
- In the new federal system, how many local bodies are there? HOT 4
- Clean up federal population data files HOT 2
- Consolidate data for "PRADESH 6"
- Create data set on travel time to nearest government health facility HOT 3
- Create data set on places for washing hands HOT 2
- Create data set on mosquito net availability HOT 3
- Create data set on food security HOT 3
- Create data set on antenatal care HOT 2
- Create data set on oral polio vaccination rates in children ages 12-23 months HOT 2
- Create data set on DPT-HepB-Hib (pentavalent) vaccination rates among children ages 12-23 months HOT 3
- Create data set on pneumococcal vaccination rates among children ages 12-23 months HOT 3
- Create data set on BCG (tuberculosis) vaccination rates among children ages 12-23 months HOT 2
- Create data set on measles/rubella vaccination rates for children ages 12-23 months HOT 2
- Create data set on overall vaccination rates for children ages 12-23 months HOT 10
- Scrape data on education from 2017 report HOT 9
- Scrape STATISTICAL INFORMATION ON NEPALESE AGRICULTURE 2073/74 (2016/17) HOT 1
- Organize and clean data from 2018 health report for NepalMap
- Understand and Clean Agriculture Census 2011 - province wise
- Add wind power potential HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from data.