Giter Site home page Giter Site logo

chrome-extensions-archive's Introduction

Chrome Extensions Archive: No updates since Feb 4. 2019

In maintenance: disk is full ! (2 To)

The goal is to provide a complete archive of the chrome web store with version history.

You can see the current status of what's archived and download the files here: dam.io/chrome-extensions-archive/

Installing the extensions

To install an extension, go to chrome://extensions/ and drop the file.

To avoid the auto-update, load it as an unpacked extension

Files are named as .zip but they are the exact same .crx stored on the store.

Running the scripts

scripts are python 3.5+ only

Install dependencies: pip3 install -r req.txt

Create some folders and initialize some files:

mkdir data
mkdir crawled
mkdir crawled/sitemap
mkdir crawled/pages
mkdir crawled/crx
mkdir crawled/tmp
mkdir ../site
mkdir ../site/chrome-extensions-archive
mkdir ../site/chrome-extensions-archive/ext
echo "{}" > data/not_in_sitemap.json

Crawling:

  • crawl_sitemap.py: gets you the list of all the extensions in data/sitemap.json
  • crawl_crx.py: use data/sitemap.json to download the crx

Site & stats:

  • scan_pages_history_to_big_list.py: makes data/PAGES.json by scanning the pages you crawled
  • crx_stats.py: makes data/crx_stats.json (what's currently stored)
  • make_site.py: use data/crx_stats.json + data/PAGES.json to generate the site
  • make_json_site.py: data/crx_stats.json + data/PAGES.json to generate JSON

Then I serve the files directly with nginx (see nginx.conf file for example)

Helping out

I have a few things in mind for the future:

  • diff of extensions versions as a web interface
  • malware/adware analysis
  • running an alternative web store (better search, firefox support,...)

Don't hesitate to reach out (here on issues, [email protected] or @dam_io on twitter)

To propose changes, just do a PR.

chrome-extensions-archive's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chrome-extensions-archive's Issues

License

Can you include a LICENSE file?

Searching through the source code

Could you index extensions by manifest keys (e.g. permissions / optional_permissions, content_scripts), and add the ability to search through the source code of extensions that match the query?

For example, I'd like to know how many extensions use the webRequest API in a blocking fashion (=webRequest + webRequestBlocking permissions) and block "ping" requests (to make informed decisions on https://crbug.com/611453).

(view source) manifest.json has no (WebStore) key

The main .zip (.crx) downloads have the webstore key.
The view source downloads manifest.json does not. I've been looking at
gmlllbghnfkpflemihljekbapjopfjik (Bookmark Manager (by Google))
meoeeoaohbmgbocpdpnjklmfmjjagkkf (Save To Google)
Is this intentional?

If I load the source as an unpacked extension, it generates a new key and I end up with two versions: one from the webstore and my unpacked version. After updating the downloaded source manifest with the original key, the unpacked version replaces the webstore instance, which is what I had expected.

This is a great archive. I'm using the unpacked versions to recover Google extensions that have been discontinued, like the two above, discontinued 8/15/18.

Curious about hosting costs

Hey this is a pretty neat service.

I was just wondering roughly how much it costs per month to host this (since I see you don't have any ads).

I'm guessing since you only store top 20k, extension sizes are small, and updates can be stored as deltas the physical space used won't be too much, maybe on the order of 100 Gigs or so.

Also, since the built website is static, the main cost will be from scraping and processing extension pages.


Also, how were you able to discover the sitemap url for the chrome webstore? I thought about doing a similar service a few years back but gave up when I thought there was no way to programmatically get a list of extensions without using a full headless browser instance.

(Incidentally, doesn't google rate limit requests? How were you able to work around that?)

popup width problem

translator

I don't know what happens but the pop-up windows change and now is too small to be useful, I try manually but only height change.
There is a way to fix it? maybe a css file?
thanks for helping

Contributing

You should add a contribute section to the readme so people know how to contribute to this project.

I recommend you also add a requirements.txt file which would contain all the dependencies for this project and a gitter page where people can talk about this project.

search bar

there needs to be a search bar please

For all extensions on the web?

Is this only for archiving "your" extension or does it work for all extensions across the web?

How do you manage storage then?

Add removal request

Need to find a way to transparently filter out extension who requested removal and publish the removal request log.

  • tranparency log: which extension (reason too)
  • add a "request removal" button.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.