Giter Site home page Giter Site logo

hexpm / hexpm Goto Github PK

View Code? Open in Web Editor NEW
1.0K 42.0 277.0 16.72 MB

API server and website for Hex

Home Page: https://hex.pm

HTML 9.87% Elixir 81.93% JavaScript 0.81% Dockerfile 0.08% Shell 0.04% SCSS 2.75% Euphoria 0.01% PLpgSQL 4.50%
hacktoberfest elixir phoenix

hexpm's Introduction

Hexpm

CI

API server and website for https://hex.pm.

Contributing

To contribute to Hexpm you need to properly set up your development environment.

Also see the client repository: hex. The client uses hexpm for integration tests, so hexpm needs to support all versions the client supports. CI ensures that tests are run on all supported versions.

Setup

  1. Run mix setup to install dependencies, create and seed database etc
  2. Run mix test
  3. Run iex -S mix phx.server and visit http://localhost:4000/

After this succeeds you should be good to go!

See setup alias in mix.exs and sections below for more information or when you run into issues.

PostgreSQL Modules And Version

PostgreSQL version should be >= 9.4, as Hexpm uses the jsonb type, that is available from PostgreSQL 9.4 onward.

Hexpm requires the PostgreSQL modules pg_trgm and pgcrypto to be available.

This is located in the "postgresql-contrib" package, however the package name can vary depending on your operating system. If the module is not installed the ecto migrations will fail.

Database

By default, Hexpm connects to a localhost PostgreSQL database hexpm_dev using the username postgres with the password postgres.

Create the database and user 'postgres' if not already done:

docker-compose up -d db

Now you are fine to create the hexpm_dev database and run the ecto migrations:

mix do ecto.create, ecto.migrate

Sample Data

Using the following command you can seed your local Hexpm instance with some sample data:

mix run priv/repo/seeds.exs

Node Dependencies

For assets compilation we need to install Node dependencies:

cd assets && yarn install

If you don't have yarn installed, cd assets && npm install will work too.

Running Hexpm

Once the database is set up you can start Hexpm:

# with console
iex -S mix phx.server

# without console
mix phx.server

Hexpm will be available at http://localhost:4000/.

License

Copyright 2015 Six Colors AB

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

hexpm's People

Contributors

benjamin-philip avatar benwilson512 avatar bjfish avatar ciaran avatar dependabot[bot] avatar dsdshcym avatar eksperimental avatar ericmj avatar ferd avatar fteem avatar gfvcastro avatar henrik avatar jaimeiniesta avatar josevalim avatar leaexplores avatar marcandre avatar michalmuskala avatar milmazz avatar mkaszubowski avatar msoedov avatar nashby avatar pragmaticivan avatar starbelly avatar supersimple avatar taktes avatar tsloughter avatar tuxified avatar wesleimp avatar wojtekmach avatar yordis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hexpm's Issues

Change tarball checksum to sha-2

This is a backwards compatible change for clients because they do not verify the checksum yet.

Should we update the checksum for existing tarballs? One part of me says never change an existing tarball, but otoh the change is backwards compatible.

Local hex_web to mirror hex.pm

Is it possible to have a locally running hex_web that mirrors hex.pm? I'm thinking along the same lines as you would run up something like geminabox to mirror rubygems.org.

The benefit would be that people would be able to publish private packages to their locally running hex_web, but still have everything from hex.pm available while only pointing to their mirror.

I think it'd be a cool feature to add to hex_web, rather than have it built as a separate project (as in the case of geminabox)

Redirect hexdocs.pm/package

Today, it seems we are serving packages from hexdocs.pm/package but we should rather redirect otherwise we can have caching issues. There is an html-refresh meta tag thing we can (ab)use. For example, some people are experiencing caching issues with Phoenix

Add full stack integration tests against older client versions

We need to run integration tests with older Hex clients to ensure that we maintain API and registry compatibility and don't break other stuff.

The tests need to use the Hex client code archives and run full stack tests through the Mix CLI.

Tests we need to run are:

  • Fetch deps
  • Update deps
  • Publish package (and then fetch it)
  • Publish docs

Render package README on package page

Using Earmark, we can do the following to render README files on package pages:

On package upload

Once we've worked out where the README file is located (eg. README.md, Readme.md, etc.), we render it's contents into HTML, as well as stripping all <script> tags and removing all JavaScript attributes (eg. onload, onclick) from it.

Once completed, we store this HTML alongside the package, either in the database or in a cache.

If the README exists, but is plain text, etc, we simply display the plaintext in the page, and if there is no README whatsoever, we simply use the description instead.

On /packages/:package

When the page is requested, we simply fetch the HTML from the database/cache and display it on the page.

This will get us closer to closing #25.

Thoughts and feedback would be appreciated, as always πŸ˜„

πŸ‘―

Phase out PATCH "users/:name"

The password reset functionality of this route has been replaced by /api/users/:user/reset, and the email reset functionality is forthcoming.

If this is correct, I'll open up a PR.

@josevalim @ericmj

Display Elixir version requirement for packages

Here's a package with lots of versions: https://hex.pm/packages/exrm

From the first sight it's not clear at all what each version offers compared to the other ones, or even which Elixir version it supports.

I would also go as far as propose some mechanism to display the relevant changes for each version right on the site. @bitwalker has been doing a great job of keeping tags for all the releases of exrm, so it's at least possible to track commits between version. But take a look at inflex (https://hex.pm/packages/inflex) – it has 0 tags and no changelong. You'd basically need to traverse git history to find out what has changed between versions.

Consider limiting download stats per IP

We are seeing issues where one machines' script can run rampant and download packages many times per day and so skew the download stats. No limiting also allow the download stats to be gamed to promote a package higher on the "download charts".

Download stats are calculated once each day for the previous day. A proposed way of limiting would be to only count an IP once per package instead of counting every download.

To verify the new algorithm we need to run it against the old S3 access logs and compare it to the current stats. We can provide logs to anyone who wants to play with this.

Future: Move registry to log structured format or similar structure

This is a change that would be both 100% backwards incompatible and would require a fair amount of work, but as the size of the registry continues to grow, it is something we need to, at the very least, consider. This is a discussion for the future, and is definitely not something we will be able to implement for a while yet!

Currently, whenever the registry is updated, users need to download the entire file again, which at the moment is perfectly fine, but as the registry file size grows, this is an unrequired step, and could potentially use a fair amount of unwanted bandwidth if packages are installed often.

My proposal is to format the registry in a different way than the current ETS table. The option I've looked at is moving the registry to a log structured key-value hashtable. One example implementation is spotify/sparkey. It is basically a log and an index that is appended to with PUT or DELETE operations, and any data already in the file is completely immutable and append-only, allowing us to grab all the changes since we last downloaded it (the bytes after the last byte we have downloaded).

Another positive is of using sparkey is that any amount of concurrent independent readers can be used, but it only allows one writer at a time per storage unit, whilst allowing a high throughput when performing bulk writes.

Sparkey also allows Google Snappy compression, letting you select a block size which is then used to split the contents of the log into blocks. Each block is compressed independently with snappy.

As I stated earlier, this is a change that would be both 100% backwards incompatible and would require a fair amount of work, but this will dramatically cut down on unrequired bandwidth usage and allow for quicker pulling of the registry, as well as making Hex snappier for clients.

Note: This aimed to be more of a discussion starter than a full-blown proposal, and feedback would be appreciated!

Improve HTTP API and document it

This is an overreaching issue which goal is to improve the HTTP API and make it fully RESTful.

  • Make the API fully discoverable from the root (provide links to all resources)
  • Document API (follow http://apiary.io/blueprint for guidelines)
  • Use Content-Range for pagination instead of query parameters
  • Allow sorting on endpoints returning multiple resources (such as /packages)'
  • Prettify JSON responses

Could not start Hex.

steps to recreate:

  • $ mix new my_project --bare
  • include :phoenix into mix.exs
  • $ mix deps.get

what should happen:

  • download the phoenix dependency

what does happen:

Could not start Hex. Try fetching a new version with `mix local.hex` or uninstalling it with `mix archive.uninstall hex.ez`
** (FunctionClauseError) no function clause matching in String.to_char_list/1
    (elixir) lib/string.ex:1344: String.to_char_list(nil)
    lib/hex.ex:70: Hex.proxy/1
    lib/hex.ex:27: Hex.start_api/0
    lib/hex.ex:9: Hex.start/0
    (mix) lib/mix/tasks/local.hex.ex:68: Mix.Tasks.Local.Hex.start/0
    (mix) lib/mix/dep/loader.ex:117: Mix.Dep.Loader.with_scm_and_app/1
    (mix) lib/mix/dep/loader.ex:86: Mix.Dep.Loader.to_dep/3
    (elixir) lib/enum.ex:977: Enum."-map/2-lc$^0/1-0-"/2

Further

  • Running $ mix by itself results in the same issue as above

OS:

  • Nixos

Publish doesn't work (http error timeout)

Currently I can't publish a new version of the facebook package and don't know why:

Publishing facebook v0.1.2
  Dependencies:
    libex_config ~> 0.1.0
  Excluded dependencies (not part of the Hex package):
    json
    hackney_lib
    hackney
    exlager
  Included files:
    lib/facebook.ex
    lib/facebook/config.ex
    lib/facebook/graph.ex
    mix.exs
    README.md
    LICENSE
Proceed? [Yn] y
Updating package facebook failed (http_error)
:timeout

Timeout might be client or server side.. therefore I don't really know what's happening. Is there a way to do verbose logging?

Search Packages by Tags or Keywords

With a raising number of packages, it will become more important to find matching packages.
This could be done using Keywords or Tags like other package manager offer.

Examples:
https://www.npmjs.org/browse/keyword/logging
https://packagist.org/search/?tags=logging

The mix file could be extended with a keywords part:

  def project do
    [app: :my_project,
     version: "0.0.1",
     elixir: "~> 0.13.0",
     deps: deps(),
     keywords: ["logging"]]
  end

What do you think about such feature?

Look at options for storing application logs in S3/Glacier

Currently, all Logger messages are pushed to the :console backend, therefore storing them in Heroku exclusively. While Heroku is great at managing logs, they are inaccessible from outside of their service, making debugging the application and observing past application behavior a challenge.

I think it would be extremely valuable to look into storing application logs from the Logger service in an S3 bucket + Glacier archive (The difference being cheaper pricing + higher retrieval time for Glacier).

I believe a good model for storing logs would be to append them to an object in S3 (sent from a GenEvent process in bulk), and move those logs to Glacier once daily (simply a job). (Glacier charges $0.050 per 1,000 UPLOAD/RETRIEVAL requests, therefore we'd use S3 as a buffer)

When we're processing the logs, if they are in a format of which we can easily read (eg. JSON), we can analyze the errors and warnings from the past day, making sure that errors causing trouble for users don't go under the radar.

Redesign website

We want to move away from the temporary bootstrap design. @scrogson has already shown interest in helping out here. If you'd like to help or have some input please post in this issue.

Rebuild tarballs

  1. Remove duplicated files in compressed tarball.
  2. Change checksum to sha-2, ensure checksum is uppercase to match Base.encode16.
  3. Change metadata from listdict to map.
  4. Change dependency list from [{"ecto", "~> 0.1.0"}] to %{"ecto" => %{"requirement" => "~> 0.1.0", "optional" => false}}.
  5. Bump version.
  6. Remove directories from files list in metadata.

Add a FAQ page

For now, this issue is a place for discussion about questions (along with answers) that would be useful on an FAQ page.

🚨 under construction 🚨

Download rate limiting

As discussed in IRC by @josevalim, limiting downloads per 24 hours per key, user or IP address would be useful, as to stop scripts going crazy, and downloading packages a few thousand times 😜

AFAIK, neither S3 or Cloudflare support rate limiting for assets, so I'm not sure how we could limit tarball fetches. πŸ˜•

Keep Download Statistics in cache outside of application

Currently, the downloads statistics are stored in PostgreSQL without any caching, which can be a slight performance bottleneck, considering the homepage of Hex.pm pulls these statistics.

One option would be to store the statistics in an in-memory ETS table. However, this breaks the Twelve-Factor rule of Disposability, due to the fact that the state is kept in a process, and whenever the Heroku dyno is killed, the data will be lost with it.

Therefore, my solution is to still keep the Download statistics inside PostgreSQL, but keep them in a cache external to Heroku (eg. An addon provider) where they can be stored and grabbed at will.

The following example usage will use Redis as it's datastore.

First of all, to store the download counts, we will use combination of data types:
(From here on, we will be using \xff (ΓΏ) as the key separator, as it being at the end of the ASCII range is more efficient when sorting keys lexicographically, spaces will also be added for readability, and will not be used in keys)

Total download counts per package

For total download counts, we will simply use a number.
Key: t \xff #{package_id}
Value: #{downloads}

Daily download counts

For daily download counts, we will simply use a number.
Key: d \xff #{year} \xff #{month} \xff #{day} \xff #{package_id}
Value: #{downloads}

Release download counts

Again, for release download counts, we will simply use a number.
Key: rd \xff #{year} \xff #{month} \xff #{day} \xff #{package_id} \xff #{release_id}
Value: #{downloads}

Weekly total download counts

Again, for weekly total counts, we will simply use a number.
Key: wtd \xff #{year} \xff #{month} \xff #{week}
Value: #{downloads}

All time download counts

Again, for all time totall counts, we will simply use a number.
Key: atd
Value: #{downloads}

Most downloaded packages

For the most downloaded packages list, we will use a sorted set (zset).
Key: mdp

For each package:

Score: #{downloads}
Member: #{package_id}

And simply, to get the top ten packages, we can run the following command:
ZREVRANGEBYSCORE mdp +inf -inf WITHSCORES LIMIT 0 5
Which means: Get the top keys ranged by score in the set mdp, where the score is between +infinity and -infinity, returning the scores with the members, and with the elements 0 to 5

This will result in the following using ExRedis:

Exredis.query(client, ["zrevrangebyscore", :scores, "+inf", "-inf", :WITHSCORES, :LIMIT, 0, 5]) 
  |> Enum.chunk(2) 
  |> Enum.map(
  &(&1 
    |> :erlang.list_to_tuple)) 
    |> Enum.map(fn(x) -> {m, s} = x
      {s, _} = s 
        |> to_char_list 
        |> :string.to_integer
      {m, s}
   end)

Which will (for example) return the following:

[{"ecto", 50}, {"dynamo", 40}, {"jazz", 30}, {"cauldron ", 20}, {"cldr", 10}]

I have this pull request ready to go if you're interested? πŸ”¨

Also, thanks for Hex! It's really awesome! πŸ˜„

πŸ’ƒ

Rate limit API use to prevent abuse.

To prevent abuse of the Hex API, I think it would be wise to implement some form of rate limiting.

In the past, I've implemented rate limiting like this:

First, we define two variables:

  • duration: duration of limit in milliseconds
  • max: max requests within duration

For example, let's say we'll allow 2500 requests per hour.

When the user sends the first request, we set the following keys in a K/V store (with an identifier, such as a username, as a prefix):

  • total: total number of requests we'll allow in duration (2500)
  • remaining: total number of requests left in duration (2499)
  • reset: time until remaining is reset (3600000 or 1 hour)

Note: Because we have to identify requests, this makes it a lot harder to rate limit anonymous (unauthenticated) requests.

For every request after the first, we simply decrement remaining by 1 and send these headers with the response: X-RateLimit-Limit,X-RateLimit-Remaining, X-RateLimit-Reset.

Finally, when the user has reached their limit, we can send a 429 (Too Many Requests) response code until reset is up, when we reset the number of requests allowed.

I've got a working prototype of this that works with ETS, although, it would work anywhere where we can put key/value data, such as Redis, Memcached, or if we really wanted, another process with a hashmap.

Thoughts and feedback on this would be appreciated.

Serve Rebar 3 packages with Hex

Based on a discussion on IRC with @tsloughter (and to some extent @ferd), this is a proposal and discussion to have Rebar 3 packages stored and served from Hex.

Rationale

Hex already provides the whole structure for publishing and serving packages. Also, since Hex aims to be the package manager for the whole Erlang Ecosystem, with better tooling for Erlang already on the way, it makes sense for Rebar 3 and Hex teams to join efforts in the packages department.

Precompiled packages

Today Hex currently just stores the source code for packages. On the other hand, Rebar packages are actually precompiled packages (with BEAM files) per erts version.

Although those approaches may look conflicting at first, Rebar packages are a natural expansion of the Hex model. Once a package (its source) is published, the author (or a build service) can push precompiled versions of the package based on the erts version. In simpler words, we will extend Hex so a package can have multiple precompiled versions.

Since the Rebar team is currently only interested in precompiled versions, they can fetch the package registry, filter the registry based on the current erts version, and just download those packages. In the future, the Rebar team may also consider supporting source-based packages, which may be an option for compiling packages with C dependencies.

While the Elixir tooling still requires the source, which will continue to be available in Hex, nothing stops Elixir from also using precompiled packages in the future, saving in compilation time.

Hex overview

This is an overview of how Hex works today and the changes that will be done to support precompiled packages. This is also the API Rebar will use for fetching precompiled packages.

S3

The registry and packages are currently stored in S3. The available endpoints are:

  • /registry.ets.gz - gzipped ets file
  • /tarballs/PACKAGE-VERSION.tar - package tarball

We propose a new end-point, /tarballs/erts/6.2/cowboy-1.0.0.tar, to be added for precompiled package tarball per erts as part of this discussion.

The tarball contents for precompiled packages is the same as the package tarball except it also contains precompiled .beam files.

The registry

The registry is an ETS table generated with ets:tab2file/2. The items are:

  • {'$$version$$', Version} - the registry version
    • Version: integer, incremented on breaking changes
  • {Package, [Versions]} - all releases of a package
    • Package: binary string
    • Versions: list of binary string semver versions
  • {{Package, Version}, [Deps, Checksum, Builds | _]} - a package release's dependencies
    • Package: binary string
    • Version: binary string semver version
    • Deps: List of dependencies
      • Dep: [Name, Requirement, Optional, App]
        • Name: binary package name
        • Requirement: binary Mix.Version requirement
        • Optional: boolean true if it's an optional dependency
        • App: binary OTP application name
    • Checksum: Binary hex encoded sha256 checksum of package, see below
    • Builds: List of builds
      • Build: [Params]
        • Params: List of parameters
          • Param: [{Key, Value}]
            Key: binary
            Value: binary

The Builds field does not currently exist in the registry and we propose it to be added as part of this discussion. We have designed it as a proplist in case we want to support types of builds in the future.

Package tarball

The package tarball contains the following files:

  • VERSION
    • the tarball version as a single ASCII integer
  • CHECKSUM
    • sha256 hex encoded checksum of tarball
    • sha256(<<contents(VERSION)/binary, contents(metadata.exs)/binary, contents(contents.tar.gz)/binary)
  • metadata.exs
    • Erlang term file
  • contents.tar.gz
    • gzipped tarball with package contents

Next steps

This provides an overview of the current API and the work we are going to do for supporting precompiled packages in Hex. It also specifies the API that Rebar 3 will consume.

@tsloughter and @ferd, please validate if this API is enough for you to get started and if there is anything we should change and/or clarify in the short term. Also let us know if you need help or want to discuss anything on the Rebar 3 side of things.

Upgrade to lastest Ecto release

Ecto is now at v0.7.2, and hex_web is still depending on v0.2.5!

Because Ecto has changed a lot since 0.2.5, I think it might take a lot of code changes to get hex_web back to the Ecto version.

🌊 πŸ„

Design of HexDocs

Here's some ideas for the homepage and error pages of http://hexdocs.pm 🌎

Homepage 🏠

For the homepage, I think we should do the following:

A good example, bower's package search, consists of the following:

  • a large search box
  • a large, paginated table with the search results (defaults to all packages).
    • each row contains:
      • name
      • owner
      • stars
      • updated

screen shot 2014-10-02 at 11 36 02 am

I think we should base the homepage on this basic design, using Yahoo's pure, which is a small, minimalistic and modular CSS framework, and turn each table row into an accordion with a list of docsets:

screen shot 2014-10-02 at 11 42 52 am
^ something like this, but based on the table of search results

Error 🚧

For the error pages, we should keep it extremely basic, yet functional.

We'd set a large, visible header with the error (e.g. Page not found), and a link such as "Is something wrong? Let us know by opening an issue.".

screen shot 2014-10-02 at 11 59 51 am
^ something along these lines

As always, your feedback would be greatly appreciated! πŸ˜„

πŸ‡

Black list Erlang applications

We should not allow anyone to publish a package with the same name of an OTP application.

appmon
asn1
common_test
compiler
cosEvent
cosEventDomain
cosFileTransfer
cosNotification
cosProperty
cosTime
cosTransactions
crypto
debugger
dialyzer
diameter
edoc
eldap
erl_docgen
erl_interface
et
eunit
gs
hipe
ic
inets
jinterface
kernel
Makefile
megaco
mnesia
observer
odbc
orber
os_mon
ose
otp_mibs
parsetools
percept
pman
public_key
reltool
runtime_tools
sasl
snmp
ssh
ssl
stdlib
syntax_tools
test_server
toolbar
tools
tv
typer
webtool
wx
xmerl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.