Giter Site home page Giter Site logo

Comments (16)

nikeshbalami avatar nikeshbalami commented on September 23, 2024

Open Data Nepal does have this dataset, but most of them are in the raw format - which needs to be analyzed to fit the NepalMap datasets pattern.

Since that datasets pattern of both NationalData Portal and NepalMap are same (local bodies>districts>provinces>country), I suggest scraping datasets from National Data Portal, it's gonna be very easy and less time-consuming.

Maybe @amitness or @pratimakandel can generate the python script to scrap the entire portal, which will not affect the dataset's structure. Later, we can use volunteers and fellows to aggregate the scraped data according to our needs. Or we can organize a small datathon proposed by @amitness previously.

from data.

amitness avatar amitness commented on September 23, 2024

@pratimakandel Are you interested to work on this? I have found the pattern in the URL as:

image

Province

http://nationaldata.gov.np/Province/Index/1
Change number from 1 to 7 for the seven provinces.

For each province,
http://nationaldata.gov.np/District/Index/103

here: 101 where 1 means province and 01 means district number for that province. Keep increasing it until you get an error page. Repeat for each province.

For each district in province,
http://nationaldata.gov.np/LocalLevel/Index/10701

here, 10701, province 1, district 07 and local level 01. Keep increasing local level until error page comes.

from data.

pratimakandel avatar pratimakandel commented on September 23, 2024

Yes, I can look into this.

from data.

amitness avatar amitness commented on September 23, 2024

@pratimakandel Great, thanks for looking into it. Please store the district/province/local name as well when scraping as we will need to map it to geocodes/geolevels in NepalMap later on.

from data.

amitness avatar amitness commented on September 23, 2024

@pratimakandel How is the progress on this?

from data.

pratimakandel avatar pratimakandel commented on September 23, 2024

@pratimakandel How is the progress on this?

I am very sorry for the delay, I actually have not started working into this. I defintely know there are great web-scraping tools to do this, however I will not be able to complete this quickly. I hope you can understand.

from data.

nikeshbalami avatar nikeshbalami commented on September 23, 2024

Any update guys?

We can ask some more volunteers to join this task if we require the helping hands. Let me know.

from data.

amitness avatar amitness commented on September 23, 2024

@nikeshbalami

@pratimakandel had emailed us informing her limited availability right now due to college.

So, yeah we could use some help from volunteers for the data scraping.

from data.

nikeshbalami avatar nikeshbalami commented on September 23, 2024

No worries @amitness

I just discussed with @nirmalrizal regarding this issue. He got a good experience of scrapping the data from the Nepal government website: https://github.com/nirmalrizal

Please send him an invite to join the C4N GitHub repo, his email address is: [email protected] cc @ravinepal @cliftonmcintosh

from data.

nirmalrizal avatar nirmalrizal commented on September 23, 2024

@nikeshbalami, I will work on this 😄

from data.

nirmalrizal avatar nirmalrizal commented on September 23, 2024

@amitness @nikeshbalami
I have scraped provinces data with a script and for now, updated that data here in this repo https://github.com/nirmalrizal/nationaldata

Can anyone verify the structure of this data?
And after that, I will start my work on District, LocalLevel and Ward data

from data.

amitness avatar amitness commented on September 23, 2024

@nirmalrizal You can scrap in a format that seems reasonable to you. I have explained below how it's going to be used after it's scraped. Hopefully, that should give you some idea.

This data is going to be loaded into NepalMap which needs a specific format. For example, you have scraped the population data here.

To load that population data into NepalMap, we will need to write a SQL script that sets the male/female population value for all possible levels (local/district/province/national). An example script for that task is here

As seen in that SQL, we need a format like "geo_code, geo_level, key, value". The geo_code will be the id for the district/province/local level as per mapping here. The values will be the scraped population data for that geography.
image

Let's assume we want to show literacy levels. Then, we would need a sql file like this:
geo_code, geo_level, gender, total
1, local, male, 50
1, local, female, 20
...
100, district, male, 50
...
2000, national, male, 75
2000, national, female, 60

As seen, we have values for all ids present in the geography.sql file.

from data.

nikeshbalami avatar nikeshbalami commented on September 23, 2024

Thank you @nirmalrizal for catching up on this quickly. I believe you may now have a good idea of how we are going to use it after going through @amitness explanation.

Local level data is the key, and the structure of data scrapped by you looks perfectly fit. However, each CSV file may need some further cleaning so that the tasks of the developer can be made easier. How about catching up for the hangout so that we can discuss it and start working on cleaning and push it on C4N data repo. Let me know which time works for you guys.

from data.

nirmalrizal avatar nirmalrizal commented on September 23, 2024

Thank you @amitness and @nikeshbalami dai for the explanation.

For now, I have updated all of the available data upto ward level on this repository and we can talk more on our meeting about how I can help to structure data more to ease work for developers.

from data.

nikeshbalami avatar nikeshbalami commented on September 23, 2024

This is great @nirmalrizal , thank you so much. I owe you a Chiya for this awesome work. Looking forward to discussing more.

from data.

amitness avatar amitness commented on September 23, 2024

I'm currently integrating the data to NepalMap. Thank you very much for lending help in scraping part, @nirmalrizal. The structure you scraped in, was very close to what we needed. Closing this issue now.

from data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.