Giter Site home page Giter Site logo

pypi / inspector Goto Github PK

View Code? Open in Web Editor NEW
78.0 78.0 13.0 341 KB

🕵️ File browser for distributions on PyPI

Home Page: https://inspector.pypi.io

License: Apache License 2.0

Dockerfile 4.36% Makefile 2.82% Procfile 0.14% Python 54.37% HTML 16.96% Shell 2.07% CSS 19.28%

inspector's People

Contributors

angelod2022 avatar dependabot[bot] avatar di avatar ewdurbin avatar hugovk avatar import-pandas-as-numpy avatar miketheman avatar saip007 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

inspector's Issues

Don't 404 when package has been removed

Currently we depend on PyPI's JSON API to render a given project/release page. When that project/release has been removed, inspector becomes 404 as well.

Ideally, we'd still be able to display all data for removed releases, but this isn't currently possible.

May be blocked on #5.

Feature: support inspection of test.pypi.org packages

As a way to examine packages that have been uploaded to test.pypi.org as well.

Some folks may only upload their packages to the test server, and ask users to install from test.pypi.org via chats with specific pip commands.

Currently inspector will show a 404 for any package on test.pypi.org as it only supports retrieval from the production index.

resp = requests.get(f"https://pypi.org/pypi/{project_name}/json")

I’m not sure if this should manifest as a separate test-inspector instance and differ via config, or if there’s another way we should support test retrieval like a specific header/query string.

Update Application to Single Page Application (SPA)

As the project evolves, it's crucial to enhance the user experience by transforming the application into a Single Page Application (SPA). Currently, the application relies on traditional page navigation, which can result in longer loading times and a less fluid user interface. By converting it into a SPA, we can improve the overall performance and provide a more seamless browsing experience.

The primary goals of this task are as follows:

Implement a client-side routing mechanism: Integrate a Python-based library or framework (e.g., Flask, Django, or FastAPI) that enables client-side routing. This will allow us to handle navigation within the application without full page reloads.

Refactor the existing codebase: Modify the application's architecture to support the SPA model. This involves breaking down the user interface into modular components that can be dynamically loaded and rendered as needed.

Implement asynchronous data retrieval: Utilize AJAX or similar techniques to retrieve data from the server asynchronously, without requiring full page reloads. This will enable smoother transitions and improve overall performance.

Enhance user experience: Implement visual indicators or loading spinners to provide feedback during data fetching or navigation transitions. This will help users understand that the application is still actively processing their requests.

By transforming our application into a SPA, we can significantly enhance the user experience, reduce loading times, and create a more modern and responsive web application.

Feel free to add any additional ideas, suggestions, or insights to further improve this transition. Let's collaborate and work towards making our application a more efficient and user-friendly SPA!

Please note: This task may require refactoring and modifications to the existing codebase. Let's discuss the implementation strategy and any potential challenges together.

Let me know if you need any further adjustments or information in the issue description!

Shorter URLs

Currently URLs look something like this:

https://inspector.pypi.io/project/pip/22.1/packages/f3/77/23152f90de45957b59591c34dcb39b78194eb67d088d4f8799e9aa9726c4/pip-22.1-py3-none-any.whl/pip/_internal/models/format_control.py

That's pretty long! Obviously this is done because that gives enough information to fetch the URL from files.pythonhosted.org, but it might be nice to use shorter URLs, and query PyPI to get the long URL for the file distribution?

We could go as simple as:

https://inspector.pypi.io/file/pip/pip-22.1-py3-none-any.whl/pip/_internal/models/format_control.py

That's enough information to know the project name (since sdists don't have a well formed name) and the filename (which we can then look up on PyPI's /simple/<project>/ page), and get the long URL.

We could even go a bit simpler, and do:

https://inspector.pypi.io/file/pip-22.1-py3-none-any.whl/pip/_internal/models/format_control.py

Note all this embeds is the filename, we would need a way to look up the URL given nothing but the filename, but filenames are unique on PyPI, so we could just have a route on PyPI that does a redirect of filename to pythonhosted.org and does that look up for us.

The main thing we'd lose is that these links would then "die" if the file is removed from PyPI but still exists in files.pythonhosted.org. Maybe with #5 we could store the filename => file url mapping as we load them, which would mean they would continue to work in the future.

Alternatively, maybe still support the long URLs, and have a button to turn the short url into a permalink (think how github does).

Alternatively, maybe this is a silly idea and we should just stick with the long URLs :)

Serve 404 from Inspector instead of pypi.org

When using Inspector in an iframe, if the package lookup isn't found, a 404 from pypi.org is served.

This makes setting frame-src directives in a content security policy longer, since now it has to allow two domains, instead of serving the 404 directly.

In or around here:

if resp.status_code != 200:
return redirect(pypi_project_url, 307)

Add Search Button to Each Page for Easy Package Navigation

This proposal suggests a set of enhancements to the application's user interface (UI), including the addition of a search button to each page and transitioning to a Single Page Application (SPA) architecture. Additionally, it is proposed to enable seamless navigation between files and versions within the application.

Proposed Enhancements

UI Improvements

Refine the UI to enhance user-friendliness, efficiency, and intuitiveness. This includes improving the layout, styling, and responsiveness of the application across different devices.

Search Button on Each Page

Add a search button prominently to each page to simplify content navigation. This feature will allow users to quickly search for specific information within the application.

Seamless Navigation between Files/Versions

Implement a navigation mechanism that enables users to switch between different files and versions without the need to go back to the previous tag. This feature will streamline the browsing experience and provide quick access to the desired content.

Transition to Single Page Application (SPA)

Restructure the application's architecture to adopt a Single Page Application (SPA) approach. This transition will eliminate page refreshes, resulting in a faster and more seamless browsing experience.

Expected Benefits

  • Improved user experience: The UI enhancements will make the application more visually appealing and user-friendly.
  • Enhanced navigation: The addition of a search button on each page and seamless navigation between files/versions will improve efficiency in finding and accessing desired content.
  • Seamless browsing experience: Transitioning to a SPA architecture will eliminate page reloads and provide a smoother user experience.

Please provide any additional information or specific requirements you may have regarding the proposed enhancements.

Code that requires horizontal scrolling can easily be missed.

I encountered this package that appeared like this in my browser:
package-no-wrap
Being MacOS there was no horizontal scrollbar indicating there was text further to the right.

I added white-space: pre-wrap to the code block and this is what I found:
package-wrapping-on

This solution messes with the line numbers, but it made it obvious where the malicious code was.

Ability to search files by sha256

This will be a generic method of reporting without meta information about the project, paths, etc.

This will be handy for some researchers and for automation purposes.

Cert error while trying to access https://inspector.pypi.io/

Problem: Current certificate on inspector.pypi.io is invalid. This site uses HSTS this way you cannot bypass an exception in chrome / edge / firefox without disabling HSTS, but it is very insecure.

Firefox:

Websites prove their identity via certificates. Firefox does not trust this site because it uses a
certificate that is not valid for inspector.pypi.io. The certificate is only valid for the following
names: *.ingress.cmh1.psfhosted.org, test.pypi.org, upload.pypi.org, *.cmh1.psfhosted.com,
*.pyfound.org, *.ingress.cmh1.psfhosted.com, *.cmh1.psfhosted.org

feature: automatically identify code removed previously for being malicious.

A couple ideas for approaching this (just spitballing, possible better solutions exist as well):

  • taking a cryptographic hash of a file (language agnostic but inflexible to minor code changes)
  • computing a locality-sensitive hash of the malicious file using opcode disassembly or AST features (python-specific)
    • the similarity of another file to a known malicious hash could be taken using the Levinshtein distance of the hash of a file with a known malicious file's hash.

This would obviously require a database of some sort (and committing thereto malicious file hashes in response to reports).

Inspector "Project Removed" Indicator Can Be Inaccurate

REF: #110

Problem: Inspector can serve a 'Project Removed' response when a package has not yet been removed.

Background: When a package is uploaded, in our experience, it can often take a moment for PyPI to serve the appropriate content on the package's page, while Inspector is able to serve the contents of the files relatively immediately.

Steps to Reproduce:

  1. Identify a recently uploaded package.
  2. Visit the inspector link of said package prior to the content being served on PyPI.

Example:
We were alerted to pipcryptov2 at 2:46PM.
I visited the Inspector URL to confirm malicious content. I was met with a package removed notification.
image
The PyPI page initially 404'd, but refreshing it moments later provided the appropriate webpage, and the package had not yet been removed.
image

Discussion: I understand this is probably a transient issue and likely not impactful as a whole to the service, as very few people are visiting inspector within the time frame that a package is uploaded and the time the PyPI content is served. Given that we tend to respond within ~60 seconds of receiving notification of a package upload, this is likely an issue that will only affect our service and services similar, so from our end, we can inform our team accurately that this should be ignored unless responding to a package significantly after the fact.

Don't load entire distribution into memory

Currently this fetches the distribution from PyPI into a BytesIO object, after doing a requests.get() call (not streaming).

That means that while we're inside of _get_dist, we'll currently be using 2x the file size of the distribution worth of extra RAM, and outside of it we'll be using 1x the file size of extra RAM.

This should probably buffer to a temporary file and use streaming requests so that a large distribution doesn't kill us on memory.

This might just be #5 but I wanted to call it out explicitly since this applies even if we're storing the files somewhere.

Set up CDN

The following routes will ~never change once a distribution is published:

  • /project/<project_name>/<version>/packages/<first>/<second>/<rest>/<distname>/
  • /project/<project_name>/<version>/packages/<first>/<second>/<rest>/<distname>/<path:filepath>

We should put these behind a CDN with a very long-lived expiry.

Method for selecting multiple lines

Right now this requires manually editing the anchor in the url from something like line.1 to line.1-20.

Ideally this would be similar to GitHub (click & drag to select multiple lines) but I don't think the JS framework we're using supports that currently.

Bug: Line numbers above 9999 are wrapping improperly

Issue: Line numbers past 9999 wrap in their column element.

image
https://inspector.pypi.io/project/rp/0.1.914/packages/07/67/ceeb07d5b8165c270e729f6fb950061b7afea5283ecb546b6e1bed915ea8/rp-0.1.914.tar.gz/rp-0.1.914/rp/r.py#line.10250

This was missed behavior in the line wrapping change (#146). I'm aware of this, bringing it to your attention while I stumble my way through a fix here locally. Line linking still behaves correctly, just looks a bit strange graphically.

Provide a file tree

Currently this has just a flat listing of files in directories:

  • a/b/c/foo.txt
  • a/b/c/bar.txt

We should make this a tree instead:

  • a
    • b
      • c
        • foo.txt
        • bar.txt

Figure out a datastore

Currently this loads all files into memory, and does some rudimentary in-process caching of files.

Ideally this would be replaced with something that would perform slightly better, without having to hold all of PyPI in memory forever. Something like redis with a 24 hour timeout.

Any potential solution should probably not store files on disk.

Show file sizes in index

When browsing an index of a package, it's helpful to see file sizes as well.

I don't know if that data is available in the contexts yet, but wanted to file this while it was in my head.

IPv6 Inspector

Can we add AAAA records please :). Happy to help if I can.

We only have legacy IP offered today:

[cooper@work ~]$ host inspector.pypi.io
inspector.pypi.io is an alias for inspector.cmh1.psfhosted.org.
inspector.cmh1.psfhosted.org has address 13.58.193.163
inspector.cmh1.psfhosted.org has address 18.217.27.127
inspector.cmh1.psfhosted.org has address 3.13.211.4

Support disassembling `.pyc` files

We've seen some instances of malware being hidden in .pyc files (example here) Currently this tool refuses to display .pyc files because they are binary. Instead, we should attempt to disassemble the bytecode to some degree and display as much as possible in the UI.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.