Giter Site home page Giter Site logo

helix-static's Introduction

Helix Static

Serve static files from GitHub (with some extras) for Project Helix

Status

codecov CircleCI GitHub license GitHub issues LGTM Code Quality Grade: JavaScript

About

Helix Static is a shared microservice for Project Helix that serves static files from GitHub. It includes the following features:

  • MIME type detection
  • large file dection (large files get redirected and served from the CDN)
  • generation of long-cachable URLs for JS and CSS assets
  • replacement of URLs in JS and CSS assets with references to long-cacheable URLs (through ESI)

Developing Helix Static

You need node>=8.0.0 and npm>=5.4.0. Follow the typical npm install, npm test workflow.

Contributions are highly welcome.

Deploying Helix Static

Deploying Helix Static requires the wsk command line client, authenticated to a namespace of your choice. For Project Helix, we use the helix namespace.

All commits to main that pass the testing will be deployed automatically. All commits to branches that will pass the testing will get commited as /helix-services/static@ci<num> and tagged with the CI build number.

helix-static's People

Contributors

adobe-bot avatar ajloria avatar auniverseaway avatar davidnuescheler avatar dependabot[bot] avatar distributedlock avatar dlemstra avatar greenkeeper[bot] avatar koraa avatar kptdobe avatar marquiserosier avatar renovate-bot avatar renovate[bot] avatar rofe avatar semantic-release-bot avatar silvia-odwyer avatar snyk-bot avatar stefan-guggisberg avatar triebben avatar trieloff avatar tripodsan avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

helix-static's Issues

GET / should report the x-version header

the latest changes of the pingdom status no longer return the version on GET.
so when helix-cli is trying to get the version, it fails with a 403.

suggest to report the version independent of helix-status-check:

GET /

HTTP/1.1 204 No Content
x-version: 3.4.1

add support for Typekit proxying

A request made to /hlx_fonts/eic8tkf.css would be proxied to https://use.typekit.net/eic8tkf.css in the result, all references to https://use.typekit.net/ will be replaced with references to /hlx_fonts/

A request made to /hlx_fonts/af/d91a29/00000000000000003b9af759/27/l?primer=34645566c6d4d8e7116ebd63bd1259d4c9689c1a505c3639ef9e73069e3e4176&fvd=i4&v=3 would be forwarded by Fastly to https://use.typekit.net/af/d91a29/00000000000000003b9af759/27/l?primer=34645566c6d4d8e7116ebd63bd1259d4c9689c1a505c3639ef9e73069e3e4176&fvd=i4&v=3

static returns 400 for malformed path

trying to fetch a resource from github with a malformed path returns a 400, which shouldn't be propagated to the client.
serving a 404 would be better:

/en/publish/2020/01/14/-4635%' ORDER BY 1-- SrjWadobe-analytics-helps-retailers-bridge-online-shopping-and-physical-stores.html/index.html 

Logs:

2021-01-20T00:24:11.557Z       stdout: instrumenting epsagon.
2021-01-20T00:24:11.558Z       stdout: action-status-begin {"container":{"uuid":"c263e3229c3f2065b25e2e54c9502573","numInvocations":23368,"begin":{"mem":332095488,"age":68211166,"concurrency":1}}}
2021-01-20T00:24:11.559Z       stderr: [INFO] deliverStatic with adobe/helix-pages/e9bca53b46d741224511005b52b5a7722a807326 path=/en/publish/2020/01/14/-4635%' ORDER BY 1-- SrjWadobe-analytics-helps-retailers-bridge-online-shopping-and-physical-stores.html/index.html file=/en/publish/2020/01/14/-4635%'%20ORDER%20BY%201--%20SrjWadobe-analytics-helps-retailers-bridge-online-shopping-and-physical-stores.html/index.html allow=undefined deny=undefined root=/htdocs esi=false
2021-01-20T00:24:11.560Z       stderr: [INFO] deliverPlain: url=https://raw.githubusercontent.com/adobe/helix-pages/e9bca53b46d741224511005b52b5a7722a807326/htdocs/en/publish/2020/01/14/-4635%'%20ORDER%20BY%201--%20SrjWadobe-analytics-helps-retailers-bridge-online-shopping-and-physical-stores.html/index.html
2021-01-20T00:24:11.591Z       stderr: [INFO] got response. size=NaN, type=text/html
2021-01-20T00:24:11.637Z       stderr: [ERROR] unknown error while fetching content Bad Request
2021-01-20T00:24:11.638Z       stderr: [INFO] delivering error 400 Bad Request 400
2021-01-20T00:24:11.639Z       stdout: action-status-end {"container":{"uuid":"c263e3229c3f2065b25e2e54c9502573","numInvocations":23369,"begin":{"mem":332095488,"age":68211166,"concurrency":1},"end":{"mem":332009472,"age":68211247,"concurrency":2},"delta":{"mem":-86016,"age":81}}}

Support authenticated GitHub access

Currently content is fetched via unauthenticated GitHub raw requests. As a result private GitHub repos are not accessible. Furthermore unauthenticated GitHub raw requests might become subject to potential rate limiting/throttling.

This issue covers making Github requests using the provided GITHUB_TOKEN.

Related issues

Avoid accessing GitHub directly

In the light of today's outage I thought how we could put another cache between helix-static and raw.

We have almost all the pieces in place to use Fastly as a GitHub proxy and don't even need to change the VCL. All that's needed is for helix-static to make requests to the original host (Fastly) and pass in a X-Request-Type and X-Location header. The request type should be the same we use for redirects in case of 1 MB static resources. The X-Location is the current raw.githubusercontent.com URL.

Fastly and Runtime should cache the responses (which are immutable) for a long time, giving us some robustness for the next GitHub outage.

unexpected errors that abort/kill action

from the logs:

{"level":"error",
"ow":{
"activationId":"26fabd23482a4acfbabd23482a4acfab",
"actionName":"/helix/helix-services-private/[email protected]",
"transactionId":"Q5O7y82ufKQeOjM6hRHJ87xRDga1aWEH"},
"message":"POST https://runtime.adobe.io/api/v1/namespaces/helix-pages/actions/08c7b7b4de6bcd21fb950e4f16d1257fa3500928/hlx--static?blocking=true Returned HTTP 502 (Bad Gateway) --> \"The action did not produce a valid response and exited unexpectedly.\"","
timestamp":"2020-01-19T22:12:39.573248309Z"}
 wsk activation get "90937576109e4312937576109ec31261"
ok: got activation 90937576109e4312937576109ec31261
{
    "namespace": "helix-pages",
    "name": "[email protected]",
    "version": "0.0.1",
    "subject": "helix-pages",
    "activationId": "90937576109e4312937576109ec31261",
    "cause": "f3644ac3b1d14f32a44ac3b1d16f328d",
    "start": 1579471850119,
    "end": 1579471851723,
    "duration": 1604,
    "statusCode": 0,
    "response": {
        "status": "action developer error",
        "statusCode": 0,
        "success": false,
        "result": {
            "error": "The action did not produce a valid response and exited unexpectedly."
        }
    },
    "logs": [],
    "annotations": [
        {
            "key": "causedBy",
            "value": "sequence"
        },
        {
            "key": "path",
            "value": "helix/helix-services-private/[email protected]"
        },
        {
            "key": "kind",
            "value": "nodejs:10"
        },
        {
            "key": "timeout",
            "value": false
        },
        {
            "key": "limits",
            "value": {
                "concurrency": 200,
                "logs": 10,
                "memory": 256,
                "timeout": 60000
            }
        }
    ],
    "publish": false
}
$ wsk activation logs "90937576109e4312937576109ec31261"
2020-01-19T22:10:50.328Z       stderr: instrumenting epsagon.
2020-01-19T22:10:50.333Z       stderr: deliverStatic with adobe/helix-pages/21299f6930cc5c1dab371adb7cc40f53ed8f77eb path=/404.html file=/404.html allow=undefined deny=undefined root=/htdocs esi=false
2020-01-19T22:10:50.334Z       stderr: deliverPlain: url=https://raw.githubusercontent.com/adobe/helix-pages/21299f6930cc5c1dab371adb7cc40f53ed8f77eb/htdocs/404.html

add support for external image proxying

This issue is about implementing the external image support explained in adobe/helix-pages#93 for details.

Suggestion

  • helix-static receives a path that matches /${base64_encoded}.external.image
  • helix-static detects the external image request and decodes the url
  • rejects requests with missing or invalid accept headers (404 or 415)
  • helix-static fetches the external content.
  • if the content-length of the response is < MAX_BYTES the content is returned accordingly.
  • otherwise the sha1 of the content is computed
  • it checks if the respective blob is already available
  • otherwise it uploads it to azure
  • sends a 307 redirect to the azure blob location.

notes

  • downloading can happen to memory until a certain threshold is reached but then would go to disk
  • if the external url is already a helix blob store url, don't refetch the content

error while fetching content Error: getaddrinfo EMFILE raw.githubusercontent.com

Description
helix-static regularly throws EMFILE (too many open files) errors, as seen in the helix-pages application in Coralogix. I think we should try to understand why they are happening and avoid them:

{
   "level"  :  "error" ,
   "ow"  : {
     "activationId"  :  "[ID]" ,
     "actionName"  :  "/helix/helix-services-private/[email protected]" ,
     "transactionId"  :  "[ID]" 
  },
   "message"  :  "error while fetching content Error: connect EMFILE raw.githubusercontent.com raw.githubusercontent.com:443",
   "timestamp"  :  "2020-01-19T10:28:25.927313699Z" 
}

(I suspect these subsequently cause helix-log to log another EMFILE type error of its own)
See https://helix.coralogix.com/#/query/logs?id=2604zpGWgVh for samples.

Expected behavior
Ideally no such errors should be thrown.

Activation logs are missing

Run

curl -v "https://runtime.adobe.io/api/v1/web/acapt/helix-services/static@latest?owner=kptdobe&repo=helix-demo&ref=master&root=&path=/style.css&entry=/style.css&plain=true"

Copy the response x-openwhisk-activation-id, then run:

wsk activation logs <x-openwhisk-activation-id>

No output while the code logs some stuff.

Add blackbox config

Add a blackbox config with env variables, like we have for other Helix services.

This static sitemap runs against the size limit, but does not trigger a refetch

{
    "ref": "7966963696682b955c13ac0cefb8ed9af065f66a",
    "package": "8c8a56985d9b2624d338e98af8ba8cf03124dc11",
    "path": "/sitemap.xml",
    "params": "",
    "owner": "adobe",
    "branch": "staging",
    "esi": false,
    "plain": true,
    "root": "",
    "repo": "theblog"
}
OpenWhiskError - POST https://controller-a-rtbeta2-ew1-a.adobe-runtime.com/api/v1/namespaces/helix-pages/actions/helix-services/static@v1?blocking=true Returned HTTP 502 (Bad Gateway) --> 
"The action produced a response that exceeded the allowed length: 1250772 > 1048576 bytes. 

The truncated response was:

 {"statusCode":200,"headers":{"Content-Type":"application/xml","X-Static":"Raw/Static","Surrogate-Key":"KTsQcqy+lSSO6Ij1","Cache-Control":"max-age=86400, stale-while-revalidate=2592000","x-last-activation-id":"f6c6d61053f745fc86d61053f775fcfd"},"body":"<?xml version=\"1.0\" encoding=\"utf-8\"?><urlset 

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Location: config
Error type: Invalid allowedVersions
Message: The following allowedVersions does not parse as a valid version or range: "<15>"

Add surrogate-key response header

In order to be able to invalidate the cache when a github resource has changed, it would be helpful to use a surrogate-key. this is easier than compute all potential external URLs of a static resource.

Suggestion

  • compute the surrogate hash similar to helix-pipeline based on the raw github url of the loaded resource.
  • add the surrogate hash to the Surrogate-Key response header.

static should not fallback to master branch

See https://github.com/adobe/helix-static/blob/master/src/static.js#L75

Required for adobe/helix-pages#531: ref might be undetermined when requesting helix-static (default branch must be used). helix-static must then:

  • allow ref parameter to be omitted
  • if this is the case, determine the ref (default branch) - should be responsibility of adobe/helix-resolve-git-ref. See adobe/helix-resolve-git-ref#287.

Requires adobe/helix-resolve-git-ref/issues/299 for simplicity of the change.

Use more inclusive language

  • Whitelist/blacklist to Allowed List and Blocked List (or Deny List - some software uses this instead) respectively. Google and many developers are formalizing allowlist and blocklist. You might want to lobby for those terms to be used in the UI.
  • Master/Slave to master and replica (or subordinate, if that makes more sense) respectively.

An in-range update of @adobe/openwhisk-action-builder is breaking the build 🚨

The dependency @adobe/openwhisk-action-builder was updated from 1.3.1 to 1.4.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

@adobe/openwhisk-action-builder is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • ❌ ci/circleci: build: Your tests failed on CircleCI (Details).
  • βœ… Tidelift: Dependencies checked (Details).
  • ❌ build: * build - Failed
  • branch-deploy - Blocked

Commits

The new version differs by 7 commits.

  • 1e3a6e7 chore(release): 1.4.0 [skip ci]
  • 293ff57 feat(builder): trigger release
  • b89a261 test(unzip): replace unzip2 with yauzl due to security fixes (#61)
  • b752f94 refactor(lib): extract helper to own package (#60)
  • b544def chore(lint): use airbnb-base and update deps (#58)
  • c02a494 chore(tidelift): adding list of forbidden licenses
  • 8c915f1 chore(docs): update readme

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

HTTP Errors that are reported as Error 500

Some errors seen in the logs #158 (comment) that look misreported:

{ statusCode: 500,
  headers:
   { 'Content-Type': 'text/html',
     'X-Static': 'Raw/Static',
     'Cache-Control': 'max-age=300' },
  body:
   'Error: getaddrinfo EAI_AGAIN raw.githubusercontent.com raw.githubusercontent.com:443' }

should be 502

{ statusCode: 500,
  headers:
   { 'Content-Type': 'text/html',
     'X-Static': 'Raw/Static',
     'Cache-Control': 'max-age=300' },
  body:
   'Error: Client network socket disconnected before secure TLS connection was established' }

should be 502

{ statusCode: 500,
  headers:
   { 'Content-Type': 'text/html',
     'X-Static': 'Raw/Static',
     'Cache-Control': 'max-age=300' },
  body: 'Error: connect ECONNREFUSED 151.101.112.133:443' }

should be 502

{ statusCode: 500,
  headers:
   { 'Content-Type': 'text/html',
     'X-Static': 'Raw/Static',
     'Cache-Control': 'max-age=300' },
  body: 'Error: socket hang up' }

should be 502

Btw: what I haven't seen once are rate limits from GitHub/Fastly, if you don't count the limit of open connections.

migrate to helix-deploy

see adobe/project-helix#508

  • migrate code to use helix-deploy adapter
  • update CI to use helix-deploy

Some static requests fail

2020-08-08T15:47:32.256Z: instrumenting epsagon.
2020-08-08T15:47:32.258Z: deliverStatic with adobe/ferrumjsorg/54d751f37633fa777ce0816390b3bdbe515d0295 path=/index.html file=/index.html allow=undefined deny=undefined root= esi=false
2020-08-08T15:47:32.259Z: deliverPlain: url=https://raw.githubusercontent.com/adobe/ferrumjsorg/54d751f37633fa777ce0816390b3bdbe515d0295/undefined.undefined
2020-08-08T15:47:32.438Z: got response. size=14, type=application/octet-stream
2020-08-08T15:47:32.438Z: delivering error undefined 404

with versions>1.10.49

static returns status 400 when requesting a yaml file

When requesting an existing yaml file (e.g. test.yaml) helix-static returns status 400:

{
  "code": "jRwI4wBCWfdeO2N9hw5W5PjqvCtoN2FQ",
  "error": "Response type in header did not match generated content type."
}

To reproduce:

Add a file test.yaml to the root of a repo and then invoke the action:

curl -v "https://adobeioruntime.net/api/v1/web/helix/helix-services/[email protected]?owner=<owner>&repo=<repo>&path=test.yaml"

or e.g. run this curl command: curl -v "https://adobeioruntime.net/api/v1/web/helix/helix-services/static@v1?owner=trieloff&repo=helix-demo&path=helix-redirects.yaml"

cc @trieloff

Handle connect timeout gracefully

I've encountered (rarely though) connect timeouts (ETIMEDOUT) to raw github. Such errors are currently propagated as status 500 and logged as unknown error while fetching content. It would be better to return status 504 instead.

Type error when request for fonts fails

from epsagon:

TypeError - Cannot read property 'body' of undefined
1589087176.422
TypeError: Cannot read property 'body' of undefined
    at deliverFontCSS (/nodejsAction/XXChcIdO/main.js:269400:24)
    at process._tickCallback (internal/process/next_tick.js:68:7)

which corresponds to line:

body: e.response.body,

Replace CSS/JS references in HTML with content-addressable links

I thought that we already have some magic with the js/css in the pipeline and in helix-static?
like the pipeline creates a esi include that gets replaced by in helix-static with a content-hashed version?

We do have the magic in the pipeline, but in the helix-pages case, the header.html that contains the references is being served by helix-static, which does not do rewriting for HTML, only for CSS and JS.

We should add it.

Originally posted by @trieloff in adobe/helix-pages#298 (comment)

Rewrite font URLs

Rewrite absolute Adobe Fonts (typekit) URLs to relative /hlx_fonts/* URLs. This will speed up the page because the font file can be delivered from the same host.

Change

<link rel="stylesheet" href="https://use.typekit.net/pps7abe.css"/>

To

<link rel="stylesheet" href="/hlx_fonts/pps7abe.css"/>

surrogate-key computation needs to use base ref instead of commit sha

The surrogate-key computation uses the context.content.sources, which specifies the entire, original url where the markdown content was fetched. for example:

https://raw.githubusercontent.com/tripodsan/hlxtest/bb8fc39dc3e01021bf1d30ac1ed4d0de04cbf5bc/image.png

The problem is, that in order to (soft) purge the fastly cache, when the image.png is modified, the bot doesn't know exactly which SHA was used to deliver the resource.

the surrogate key should be computed using the original ref. e.g. master. suggest to include an additional parameter branch to the action params.

the surrogate key will then be computed using:

https://raw.githubusercontent.com/${owner}/${repo}/${branch}${path}

@trieloff helix-dispatch also needs to include the new branch param.

Dependency Dashboard

This issue provides visibility into Renovate updates and their statuses. Learn more

Awaiting Schedule

These updates are awaiting their schedule. Click on a checkbox to get an update now.

  • fix(deps): update dependency postcss to v8.4.7

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.


  • Check this box to trigger a request for Renovate to run again on this repository

Cleanup Code

the code has some leftovers that we should cleanup:

  • remove staticBase()

    helix-static/src/index.js

    Lines 293 to 296 in 8cac0a0

    function staticBase(owner, repo, entry, ref, strain = 'default') {
    // todo: is this still needed?
    return `__HLX/${owner}/${repo}/${strain}/${ref}/${entry}/DIST__`;
    }
  • remove plain parameter
  • remove strain parameter
  • decide path vs entry parameter (I would stick to path)

@trieloff WDYT?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.