Comments (4)
I am interested in taking this issue, i will be splitting the pipelines into 2 parts:
- Scrapper API
- And the actual Pipelines
just like the Google POI pipelines
Currently still working on the instagram scrapper and looking at what's possible.
from kuwala.
It should be possible to scrape public Instagram posts using hashtags, locations, and (public) users. This article provides some insights and ideas: https://blog.apify.com/scrape-instagram-posts-comments-and-more-21d05506aeb3/
Since @bmahmoudyan can't continue working on this issue, it's up for grabs again. :)
from kuwala.
The requirements for a new pipeline are the following:
- Saving the raw data (in this case the scraping results) in a file as is
- Transforming at least lat/lng to H3 and moving all nested properties to a column/table format using meaningful variable names (since we are switching to Postgres and dbt for transformations)
- Ideally saving the results in Parquet format since itβs much more storage efficient and optimized for parallel processing (just one command with PySpark)
from kuwala.
PR for this issue #74
from kuwala.
Related Issues (20)
- π³ Dockerizing the CLI
- π Make the Robyn R demo executable with Python HOT 2
- βοΈ Snowflake connector HOT 4
- create logging subsystem with access via web interface HOT 2
- population density downloading; attributes besides 'total' breaks when creating parquet files
- OSM-POI: processing step fails
- Data Block creation with snowflake connector
- Why don't you see the data source MySQL?
- Data block from data warehouses
- Group: Transformation block from transformation catalog
- Transformation specifications
- Transformation catalog
- Transformation block on canvas HOT 2
- Save project state
- Save result data set as CSV
- Visualization block for geospatial data
- Export result data set to Google Sheets
- Transformations to prepare data for MMM
- Model block for MMM
- Report block for MMM
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kuwala.