Giter Site home page Giter Site logo

efischer19 / twitter-archive-parser Goto Github PK

View Code? Open in Web Editor NEW

This project forked from timhutton/twitter-archive-parser

0.0 1.0 0.0 104 KB

Python code to parse a Twitter archive and output in various ways

License: GNU General Public License v3.0

Python 100.00%

twitter-archive-parser's Introduction

How do I use it?

  1. Download your Twitter archive (Settings > Your account > Download an archive of your data).
  2. Unzip to a folder.
  3. Right-click this link parser.py and select "Save Link as", and save into the folder where you extracted the archive.
  4. Run parser.py with Python3. e.g. python parser.py from a command prompt opened in that folder.

If you want to download full-sized images:

  1. Right-click this link download_better_images.py and select "Save Link as", and save into the folder where you extracted the archive.
  2. Run download_better_images.py with Python3. e.g. python download_better_images.py from a command prompt opened in that folder.

If you are having problems, the discussion here might be useful: https://mathstodon.xyz/@timhutton/109316834651128246

What does it do?

The Twitter archive gives you a bunch of data and an HTML file (Your archive.html). Open that file to take a look! It lets you view your tweets in a nice interface. It has some flaws but maybe that's all you need. If so then stop here, you don't need our script.

Flaws of the Twitter archive:

  • It shows you tweets you posted with images, but if you click on one of the images to expand it then it takes you to the Twitter website. If you are offline or have deleted your account or twitter.com is down then that won't work.
  • The tweets are stored in a complex JSON structure so you can't just copy them into your blog for example.
  • The images they give you are smaller than the ones you uploaded. I don't know why they would do this to us.
  • The links are all obfuscated in a short form using t.co, which hides their origin and redirects traffic to Twitter, giving them analytics. Also they will stop working if t.co goes down.

Our script does the following:

  • Converts the tweets to markdown with embedded images, videos and links.
  • Replaces t.co URLs with their original versions.
  • Copies used images to an output folder, to allow them to be moved to a new home.
  • Afterwards, it asks if you want to try downloading the original size images using download_better_images.py.
  • It then asks if you want to convert to HTML using convert_to_html.py.

TODO:

  • Parse likes and DMs too (Issues #22 and #6)

Related tools:

If our script doesn't do what you want then maybe a different tool will help:

twitter-archive-parser's People

Contributors

timhutton avatar twoscomplement avatar sweh avatar andrewbaker-uk avatar granthenninger avatar masukomi avatar achisto avatar rossgrady avatar svisser avatar miniupnp avatar clayote avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.