alexlitel / congresstweets Goto Github PK
View Code? Open in Web Editor NEWDatasets of the daily Twitter output of Congress.
License: MIT License
Datasets of the daily Twitter output of Congress.
License: MIT License
Hi,
Thanks for building this! We've a daily Airflow job that grabs the raw JSON files from this repo and ingests them into the https://www.splitgraph.com/splitgraph/congress_tweets dataset (making it importable into Postgres / queryable over a REST API/SQL).
I noticed that https://raw.githubusercontent.com/alexlitel/congresstweets/master/data/2020-09-01.json and https://raw.githubusercontent.com/alexlitel/congresstweets/master/data/2020-09-02.json are empty -- is this supposed to happen?
HI,
Nice resource. I'm a bit puzzled about what the id
is supposed to represent in the dataset. In many cases (I think it's in the case for retweets), the id
doesn't match the number in the link. That sorta makes sense, since the link is the link for the original tweet, not the retweet. And I thought that was a bit odd, since I thought the link would go the legislator's timeline, not the timeline for the person who tweeted it.
The reason I ask is that I'm trying to hydrate the entire tweets for the tweets ID in your dataset. I just tried a bunch, and I only got about half (30,000 out of 60,000). People do delete tweets, so they wouldn't get collected during the hydration process, but that's a much higher percentage of missing tweets than I've ever seen before. I'm not entirely sure yet which ones are missing: I started looking at the IDs and the links and got confused as to how the dataset was put together.
Are the tweets no longer updated?
hi @alexlitel, I was wondering why you removed 2017 data in this commit? 4b11913
also, have you considered mirroring to a public s3 bucket if there are any concerns with data size? the cost should be minimal
Empty string for 2020-10-11 through 2020-10-17. Looks like the scraper is not pulling information successfully.
Hello,
I am interested in analyzing only Congress members tweets. I realized that not all the info in the metadata file (users-filtered.json) matches the tweeter data.
Do you suggest a reliable way to match Congress metadata info (e.g., party affiliation) to the tweets data? Screen names are not always the same in both data sources.
Thanks!
I really appreciate your incredible work on this project. I have a question about the data. In the randomly drawn samples of Tweets I reviewed, I noticed that all the URLs in the posts are extended (not shortened), at least in the samples I checked. For double-checking, could I ask whether this was systematically processed during the data collection, or am I only seeing selective cases? I tried to find the answer in the readme and other source codes but couldn't find it. I apologize if I overlooked it and would be grateful if you could let me know.
Hi, the first entry in the 2018-01-08.json
file is a single open bracket "["
. Its hard to tell if this suggests that some data is missing here or if there was simply a parsing error or what. Do you know what might have caused this and whether data might be missing in that spot?
Tweets after 2021-06-23 are available.
BTW, thanks to your data, we published a paper at NAACL'21: Self Promotion in US Congressional Tweets. We really appreciated the time and effort you put in to collect and make the data available.
Thanks for the repo.
I understand that code is under MIT license, is the data in public domain?
Page 2 gives an error:
https://alexlitel.github.io/congresstweets/
https://alexlitel.github.io/page2
I am getting weird json files:
[{},{},{},{},{},{}, ....
https://alexlitel.github.io/congresstweets/data/2017-09-14.json
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.