pypi / inspector Goto Github PK
View Code? Open in Web Editor NEW🕵️ File browser for distributions on PyPI
Home Page: https://inspector.pypi.io
License: Apache License 2.0
🕵️ File browser for distributions on PyPI
Home Page: https://inspector.pypi.io
License: Apache License 2.0
Currently we depend on PyPI's JSON API to render a given project/release page. When that project/release has been removed, inspector becomes 404 as well.
Ideally, we'd still be able to display all data for removed releases, but this isn't currently possible.
May be blocked on #5.
As a way to examine packages that have been uploaded to test.pypi.org as well.
Some folks may only upload their packages to the test server, and ask users to install from test.pypi.org via chats with specific pip commands.
Currently inspector will show a 404 for any package on test.pypi.org as it only supports retrieval from the production index.
Line 57 in cfd7b09
I’m not sure if this should manifest as a separate test-inspector instance and differ via config, or if there’s another way we should support test retrieval like a specific header/query string.
As the project evolves, it's crucial to enhance the user experience by transforming the application into a Single Page Application (SPA). Currently, the application relies on traditional page navigation, which can result in longer loading times and a less fluid user interface. By converting it into a SPA, we can improve the overall performance and provide a more seamless browsing experience.
The primary goals of this task are as follows:
Implement a client-side routing mechanism: Integrate a Python-based library or framework (e.g., Flask, Django, or FastAPI) that enables client-side routing. This will allow us to handle navigation within the application without full page reloads.
Refactor the existing codebase: Modify the application's architecture to support the SPA model. This involves breaking down the user interface into modular components that can be dynamically loaded and rendered as needed.
Implement asynchronous data retrieval: Utilize AJAX or similar techniques to retrieve data from the server asynchronously, without requiring full page reloads. This will enable smoother transitions and improve overall performance.
Enhance user experience: Implement visual indicators or loading spinners to provide feedback during data fetching or navigation transitions. This will help users understand that the application is still actively processing their requests.
By transforming our application into a SPA, we can significantly enhance the user experience, reduce loading times, and create a more modern and responsive web application.
Feel free to add any additional ideas, suggestions, or insights to further improve this transition. Let's collaborate and work towards making our application a more efficient and user-friendly SPA!
Please note: This task may require refactoring and modifications to the existing codebase. Let's discuss the implementation strategy and any potential challenges together.
Let me know if you need any further adjustments or information in the issue description!
Currently URLs look something like this:
https://inspector.pypi.io/project/pip/22.1/packages/f3/77/23152f90de45957b59591c34dcb39b78194eb67d088d4f8799e9aa9726c4/pip-22.1-py3-none-any.whl/pip/_internal/models/format_control.py
That's pretty long! Obviously this is done because that gives enough information to fetch the URL from files.pythonhosted.org
, but it might be nice to use shorter URLs, and query PyPI to get the long URL for the file distribution?
We could go as simple as:
https://inspector.pypi.io/file/pip/pip-22.1-py3-none-any.whl/pip/_internal/models/format_control.py
That's enough information to know the project name (since sdists don't have a well formed name) and the filename (which we can then look up on PyPI's /simple/<project>/
page), and get the long URL.
We could even go a bit simpler, and do:
https://inspector.pypi.io/file/pip-22.1-py3-none-any.whl/pip/_internal/models/format_control.py
Note all this embeds is the filename, we would need a way to look up the URL given nothing but the filename, but filenames are unique on PyPI, so we could just have a route on PyPI that does a redirect of filename to pythonhosted.org and does that look up for us.
The main thing we'd lose is that these links would then "die" if the file is removed from PyPI but still exists in files.pythonhosted.org. Maybe with #5 we could store the filename => file url mapping as we load them, which would mean they would continue to work in the future.
Alternatively, maybe still support the long URLs, and have a button to turn the short url into a permalink (think how github does).
Alternatively, maybe this is a silly idea and we should just stick with the long URLs :)
When using Inspector in an iframe, if the package lookup isn't found, a 404 from pypi.org is served.
This makes setting frame-src
directives in a content security policy longer, since now it has to allow two domains, instead of serving the 404 directly.
In or around here:
Lines 67 to 68 in 5756f29
This proposal suggests a set of enhancements to the application's user interface (UI), including the addition of a search button to each page and transitioning to a Single Page Application (SPA) architecture. Additionally, it is proposed to enable seamless navigation between files and versions within the application.
Refine the UI to enhance user-friendliness, efficiency, and intuitiveness. This includes improving the layout, styling, and responsiveness of the application across different devices.
Add a search button prominently to each page to simplify content navigation. This feature will allow users to quickly search for specific information within the application.
Implement a navigation mechanism that enables users to switch between different files and versions without the need to go back to the previous tag. This feature will streamline the browsing experience and provide quick access to the desired content.
Restructure the application's architecture to adopt a Single Page Application (SPA) approach. This transition will eliminate page refreshes, resulting in a faster and more seamless browsing experience.
Please provide any additional information or specific requirements you may have regarding the proposed enhancements.
This would be incredibly useful in understanding what a new package contains.
I encountered this package that appeared like this in my browser:
Being MacOS there was no horizontal scrollbar indicating there was text further to the right.
I added white-space: pre-wrap
to the code block and this is what I found:
This solution messes with the line numbers, but it made it obvious where the malicious code was.
This will be a generic method of reporting without meta information about the project, paths, etc.
This will be handy for some researchers and for automation purposes.
Problem: Current certificate on inspector.pypi.io is invalid. This site uses HSTS this way you cannot bypass an exception in chrome / edge / firefox without disabling HSTS, but it is very insecure.
Firefox:
Websites prove their identity via certificates. Firefox does not trust this site because it uses a
certificate that is not valid for inspector.pypi.io. The certificate is only valid for the following
names: *.ingress.cmh1.psfhosted.org, test.pypi.org, upload.pypi.org, *.cmh1.psfhosted.com,
*.pyfound.org, *.ingress.cmh1.psfhosted.com, *.cmh1.psfhosted.org
Line 760 is causing a container overflow and subsequently causing the page to assume an incorrect width.
We should do something other than 500-error for things like https://inspector.pypi.io/project/tensorflow/2.9.1/packages/51/86/f5db15a6403a8ecf377807e93cdcd5cddb2f57e73604143cc02917d24db4/tensorflow-2.9.1-cp310-cp310-macosx_10_14_x86_64.whl/tensorflow/libtensorflow_framework.2.9.1.dylib
A couple ideas for approaching this (just spitballing, possible better solutions exist as well):
This would obviously require a database of some sort (and committing thereto malicious file hashes in response to reports).
REF: #110
Problem: Inspector can serve a 'Project Removed' response when a package has not yet been removed.
Background: When a package is uploaded, in our experience, it can often take a moment for PyPI to serve the appropriate content on the package's page, while Inspector is able to serve the contents of the files relatively immediately.
Steps to Reproduce:
Example:
We were alerted to pipcryptov2
at 2:46PM.
I visited the Inspector URL to confirm malicious content. I was met with a package removed notification.
The PyPI page initially 404'd, but refreshing it moments later provided the appropriate webpage, and the package had not yet been removed.
Discussion: I understand this is probably a transient issue and likely not impactful as a whole to the service, as very few people are visiting inspector within the time frame that a package is uploaded and the time the PyPI content is served. Given that we tend to respond within ~60 seconds of receiving notification of a package upload, this is likely an issue that will only affect our service and services similar, so from our end, we can inform our team accurately that this should be ignored unless responding to a package significantly after the fact.
Currently only supports .zip files
For error reporting
Currently this fetches the distribution from PyPI into a BytesIO
object, after doing a requests.get()
call (not streaming).
That means that while we're inside of _get_dist
, we'll currently be using 2x the file size of the distribution worth of extra RAM, and outside of it we'll be using 1x the file size of extra RAM.
This should probably buffer to a temporary file and use streaming requests so that a large distribution doesn't kill us on memory.
This might just be #5 but I wanted to call it out explicitly since this applies even if we're storing the files somewhere.
All file contents are placed in the HTML without anything preventing XSS.
Simple example: https://inspector.pypi.io/project/inspector-test-package/0.0.0/packages/71/9a/24c8c3286a09bd3f82e17723562493128c6dc89e8fe177b3697bd31bb524/inspector-test-package-0.0.0.tar.gz/inspector-test-package-0.0.0/inspector-test-package/__init__.py
The following routes will ~never change once a distribution is published:
/project/<project_name>/<version>/packages/<first>/<second>/<rest>/<distname>/
/project/<project_name>/<version>/packages/<first>/<second>/<rest>/<distname>/<path:filepath>
We should put these behind a CDN with a very long-lived expiry.
It would be useful if pages like:
Also linked back to https://inspector.pypi.io/project/hatchling/
So for example, add a "hatchling" link between "Inspector" and "hatchling==0.6" here:
Right now this requires manually editing the anchor in the url from something like line.1
to line.1-20
.
Ideally this would be similar to GitHub (click & drag to select multiple lines) but I don't think the JS framework we're using supports that currently.
Issue: Line numbers past 9999 wrap in their column element.
https://inspector.pypi.io/project/rp/0.1.914/packages/07/67/ceeb07d5b8165c270e729f6fb950061b7afea5283ecb546b6e1bed915ea8/rp-0.1.914.tar.gz/rp-0.1.914/rp/r.py#line.10250
This was missed behavior in the line wrapping change (#146). I'm aware of this, bringing it to your attention while I stumble my way through a fix here locally. Line linking still behaves correctly, just looks a bit strange graphically.
Currently this has just a flat listing of files in directories:
We should make this a tree instead:
Currently this loads all files into memory, and does some rudimentary in-process caching of files.
Ideally this would be replaced with something that would perform slightly better, without having to hold all of PyPI in memory forever. Something like redis with a 24 hour timeout.
Any potential solution should probably not store files on disk.
When browsing an index of a package, it's helpful to see file sizes as well.
I don't know if that data is available in the contexts yet, but wanted to file this while it was in my head.
When processing the content of whl packages(zipfiles) of a project, all files show content as "None"
Currently versions are sorted alphanumerically, this should use the sorting that https://github.com/pypa/packaging provides instead.
https://inspector.pypi.io/ shows:
502 Bad Gateway
nginx/1.13.9
Can we add AAAA records please :). Happy to help if I can.
We only have legacy IP offered today:
[cooper@work ~]$ host inspector.pypi.io
inspector.pypi.io is an alias for inspector.cmh1.psfhosted.org.
inspector.cmh1.psfhosted.org has address 13.58.193.163
inspector.cmh1.psfhosted.org has address 18.217.27.127
inspector.cmh1.psfhosted.org has address 3.13.211.4
Source:
inspector/inspector/templates/code.html
Lines 8 to 9 in 1ef5cda
When viewing a non-python file via Inspector, like a README.md, the browser highlights the contents incorrectly.
Prism supports a lot of languages and advises on using their autoloader for languages.
We've seen some instances of malware being hidden in .pyc
files (example here) Currently this tool refuses to display .pyc
files because they are binary. Instead, we should attempt to disassemble the bytecode to some degree and display as much as possible in the UI.
Currently the version numbers are sorted as strings (e.g. 0.1, 0.10, 0.11, 0.2) rather than as version numbers (e.g. 0.1, 0.2, 0.10)
See for instance https://inspector.pypi.io/project/whey/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.