Giter Site home page Giter Site logo

stubattribution's Introduction

Stub Attribution CircleCI GoDoc

A service which accepts an attribution code and returns a modified installer. Despite its name, it can (and does) provide attribution for both stub and full installers.

stubattribution's People

Contributors

bhearsum avatar cvalaas avatar dependabot[bot] avatar hoosteeno avatar jvehent avatar moz-astults avatar mozilla-github-standards avatar oremj avatar willdurand avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stubattribution's Issues

Travis CI free usage ends Dec 3; mozilla repos should switch to other CI platforms

We're opening this issue because your project has used Travis CI within the last 6 months. If you have already migrated off it, you can close and ignore this issue.

Travis CI is ending free builds on public repositories. travis-ci.com stopped providingthem in early November, and travis-ci.org will stop after December 31, 2020. To avoid disruptions to your workflows, you must migrate to another CI service.

For production use cases, we recommend switching to CircleCI. This service is already widely used within Mozilla. There is a guide to migrating from Travis CI to CircleCI available here.

For non production use cases, we recommend either CircleCI or Github Actions. There is a guide to migrating from Travis CI to Github Actions available here. Github Actions usage within Mozilla is new, and you will have to work with our github administrators to enable specific actions following this process.

If you have any questions, reach out in #github-admin:mozilla.org on matrix.

Consider removing S3 support

Instead of trying to update the very outdated dependencies in #136, which might have breaking changes according to the CHANGELOG file, let's see if we could remove support for the S3 backend.

My understanding is that we likely never used it in production anyway. I looked into some cloudops configs and we seem to use GCS, not AWS S3. That being said, we should make sure this is indeed the case. @jbuck WDYT?

If it is safe to remove S3/AWS support because it is unlikely to be used any time soon, we'll want to make sure that env variables like S3_BUCKET, S3_PREFIX and AWS_REGION are no longer supported.

ensure dlsource is not dropped when other attribution data fails validation

The new dlsource key that we added in #162 is somewhat different than existing data: it's meant to convey that a download was initiated through Bedrock, and override the dlsource value that already exists in our installers on archive.mozilla.org. Because of this, we should be a bit more liberal with how it's applied. For example, even if we fail the referer check for RTAMO, we should still attribute with dlsource (but not with other attribution data). There may be other cases like this as well - but that's the obvious one that stands out.

Log `client_id` in addition to `visit_id` to allow consumers to migrate to `client_id`

In #153, we added support for the client_id field but we still log the value using the visit_id key for historical reasons. After a discussion with @gleonard-m, we decided that it would be a good idea to also log the value with the new and more correct™ key, i.e. client_id.

By doing this, we shouldn't break anything and we allow consumers of the logs to move away from visit_id at their own pace. We'll have another issue to clean up the logs once we are confident that no one uses visit_id anymore.

FTR: the log statements are extremely important here since this is how we pass some useful information from the download flow to Redash.

Limit inputs

We should probably limit the product and code arguments to some regular expression to mitigate DoS.

Download Token Log are Mis-labeling GA clientId and Not Recording GA visitId

Description

On download, we're creating a unique DL token for a download event and passing it to client side telemetry in order to attribute the new profiles.

As part of this process, we're logging the DL token along with the corresponding Google Analytics clientId in a restricted database to be able to match GA data with attributed Firefox profiles.

There are currently 2 issues with the current implementation:

    1. We're only recording Google Analytics clientId in our logs. However, we need to log both the clientId as well as the visitId in order to attribute (since attribution is a session level datapoint, not client level).
    1. We're recording the Google Analytics clientId as a visit_id in our logs, which is confusing.

Success Criteria

  • An element contributing to the definition of done for the card
  • Tasks should be represented here as something like "Approved designs attached to this issue"

The Download Token Log table should report both the Google Analytics clientId as well as Google Analytics visitId associated with the download event.

The Download Token Log table should rename visit_id (which reports the Google Analytics clientId field) to visitor_id, or something less confusing.

Accept `client_id` as an alias of `visit_id`

This came up during the review of mozilla/bedrock#12794.

We use visit_id for what really is the Google Analytics "client" ID. Since Bedrock started to correct this small mistake, I think we could go one step further and allow client_id as the preferred alias for visit_id. That would allow Bedrock to refer to "client ID" everywhere.

In this project, we could also update VisitID to ClientID. The only thing we can do at the moment is rename the field in the log statements that are forwarded to BQ/Redash and used by the DS team. That might be something we could do in the future, though, but that's a breaking change compared to what is proposed here.

CODE_OF_CONDUCT.md file missing

As of January 1 2019, Mozilla requires that all GitHub projects include this CODE_OF_CONDUCT.md file in the project root. The file has two parts:

  1. Required Text - All text under the headings Community Participation Guidelines and How to Report, are required, and should not be altered.
  2. Optional Text - The Project Specific Etiquette heading provides a space to speak more specifically about ways people can work effectively and inclusively together. Some examples of those can be found on the Firefox Debugger project, and Common Voice. (The optional part is commented out in the raw template file, and will not be visible until you modify and uncomment that part.)

If you have any questions about this file, or Code of Conduct policies and procedures, please reach out to [email protected].

(Message COC001)

Use Go 1.20

Go 1.20 was released earlier this month: https://go.dev/blog/go1.20, we should upgrade eventually. Not following Golang (or dependency) updates will increase the maintenance costs in the future.

Potential data length issue

Hi,

We've noticed that certain campaigns are failing to make it into telemetry and I think I've found the answer to why. It seems that some of the campaigns put a long-ish string into utm_campaign, which results in a long base64 encoded string to send to the attribution service. I found that the attribution service rejects code strings of over 200 characters, which it turns out is easier to reach than I thought since that equates to a roughly 150 character decoded string. So I have two questions:

  1. Do you see a lot of rejections in the logs for the code being too long (over 200 chars)?
  2. Is it possible we could increase this limit?

Thanks!

Update to CircleCI v2 API

I recently got an email from "The CircleCI Team" saying:

Last month, we pointed you toward resources to help you migrate active projects from CircleCI 1.0 to 2.0 with the Migration Center.

With only 30 days left until CircleCI 1.0 sunsets, we want to ensure you have everything you need to help you with any active projects that still need to migrate to 2.0. Our team has been busy creating resources and new features to help make it easier:

More info in the CircleCI Blog at https://circleci.com/blog/sunsetting-1-0/ and https://circleci.com/docs/2.0/migration-intro/

I also found https://circleci.com/docs/2.0/local-cli/ which points to a handy CircleCI v2 config validator (using something like $ circleci config validate -c .circleci/config.yml).

Add open source software license

This Mozilla repository has been identified as lacking a license. Consistent with Mozilla's Licensing Policy an open source license should be applied to the code in this repository.

Please add an appropriate LICENSE.md file to the root directory of the project. In general, Mozilla's licensing policies are as follows:

  • Client-side products created by Mozilla employees or contributors should use the Mozilla Public License, Version 2.0 (MPL).

  • Server-side products or utilities that support Mozilla products may use either the MPL or the Apache License 2.0 (Apache 2.0).

In special cases, another license might be appropriate. If the repository is a fork of another repository it must apply the license of the original. Similarly, another license might be appropriate to match that of a broader project (for example Rust crates that Firefox depends on are published under an Apache 2.0 / MIT dual license, as that is the dual license used by the Rust programming language and projects).

Please ensure that any license added to the LICENSE.md file matches other licensing information in the repository (for example, it should match any license indicated in a setup.py or package.json file).

Mozilla staff can access more information in our Software Licensing Runbook – search for “Licensing Runbook” in Confluence to find it.

If you have any questions you can contact Daniel Nazer who can be reached at dnazer on Mozilla email or Slack.

OPENLIC-2023-01

Limit RTAMO attribution to www referrals

Like in mozilla-services/go-bouncer#347, we should ignore attribution for RTAMO if the referral wasn't from www.mozilla.org , using a referer header check. If the request didn't originally come from www.mozilla.org, and it contains attribution for RTAMO, the attribution should be dropped.

(browsers should maintain the original referer information through the redirects they go through, so even if the user is redirected from download.mozilla.org to stubdownloader.services.mozilla.com technically the referer will still be www.mozilla.org)

Allow any source values instead of replacing some with "(other)"

Originally Bug 1827985 - Stub Attribution "source" replaces many values with (other)

Seems like https://github.com/mozilla-services/stubattribution/blob/master/attributioncode/sourcewhitelist.go can be removed along with the callsites

Curious if we're able to see the logs of what has been filtered out?

logEntry.WithField("source", source).Error("source is not in whitelist")

LICENSE file missing from repository.

Hi, could you please add a LICENSE file to this repository (or make it private)? Typically folks use MPL2 plaintext, but whatever you think is best is fine.

Many Sentry events for "code is empty"

There are lots of Sentry events for "code is empty", which means we called the stub service without the base64 code.

The main problem here is that the / API route is a catch-all, which we can also confirm by looking at the nginx logs. I think we should return a 404 when the path isn't / in the HTTP handler. We'll need to double check that bedrock never calls the stub service with a path that isn't /, though.

ensure static attribution does not interfere with dynamic attribution

When https://bugzilla.mozilla.org/show_bug.cgi?id=1814727 lands we will begin adding an attribution code to our Windows installers that we upload to archive.mozilla.org. This means that the builds that the attribution service is modifying will no longer have a block of nul after the MOZCUSTOM tag - but rather an existing attribution code. Although unlikely, if the attribution service writes a code shorter than the one that already exists in the installer, we will end up with a corrupt or incorrect attribution code.

We should make sure that we always overwrite any existing attribution data when we do dynamic attribution.

OKR: Implement code linting and integrate into build-test/code-merge process

Under the "Services" portion of https://docs.google.com/document/d/1H7ivaEOqpknPHdb3ba-dgVI9KLUnoEFwAe8stHsaI6U/edit#heading=h.27q4mqxwvkok, we've got a few code/process-hygiene (I'm paraphrasing) OKRs to check/implement.

One of those is, "All projects should block merges on failure of linting warnings/errors."

@oremj - as you know the code + Go language best, obviously deferring to you for further investigation/implementation - thanks!

Set the correct sentry `release` on stage/prod

Currently, sentry uses git describe as the release value for all events, in any environments. While this is good on -dev because that allows us to associate an event with a commit, it'd make a bit more sense to use the git tags (versions) for stage and prod. We could then group the Sentry events by version and see what is new between two deploys for instance.

In order to fix this issue, we will likely have to read version.json to extract the value of the version in this file, and then pass that to Sentry when we configure it. We probably do not want to set anything if the version.json file does not exist.

Integrate with Sentry

From the docs, this should just be two steps:

pip install raven[flask]
from raven.contrib.flask import Sentry
sentry = Sentry(app, dsn='your dsn')

Adjust whitelist for new firefox.com sub-domains

In the whitelist for stub attribution:

https://github.com/mozilla-services/stubattribution/blob/master/attributioncode/sourcewhitelist.go

We the following sources that are whitelisted related to firefox.com:

"firefox-com": true,
"firefox.com": true,
"www.firefox.com": true,
"accounts.firefox.com": true

With the launch of Firefox Screenshots, there is a new domain that we need whitelisted.

That is: screenshots.firefox.com

Would it make more sense to just make firefox.com a regex instead of making a bunch of one-off sub-domains?

Like:

regexp.MustCompile(^(\w+\.)*firefox\.com$),

If you don't want to a pattern to match firefox.com and all sub-domains, then just add screenshots.firefox.com.

Also, while we are talking about the regexes, shouldn't the the patterns on line 19-24 have their periods escaped?

var sourceWhitelistRegexps = []*regexp.Regexp{
regexp.MustCompile(^[\w-]*.allizom.org$),
regexp.MustCompile(^www.google(.com?)?.\w+$),
regexp.MustCompile(^\w+.google.com$),
regexp.MustCompile(^\w+.search.yahoo.com$),
regexp.MustCompile(^\w+.facebook.com$),
regexp.MustCompile(^[\w-]+.wikipedia.org$),
}

like:

regexp.MustCompile(`^[\w-]*\.allizom\.org$`),

Thanks!

Document the `__pingdom__` endpoint

It looks like this project uses a __pingdom__ endpoint (see #47), which isn't part of the Dockerflow but definitely used (based on a Slack discussion with @jbuck). We should document this endpoint since it is "non-standard".

Add mozilla.org to regex section of whitelist

We currently include several critical global domains in the regex section of the whitelist:

https://github.com/mozilla-services/stubattribution/blob/master/attributioncode/sourcewhitelist.go#L18

We even include everything at our testing domain, *.allizom.org, and everything at *.firefox.com. But we don't include our production domain, mozilla.org. That means any mozilla subdomain that wishes to drive attributed product downloads must request a one-off rule to be added to the whitelist. Example: blog.mozilla.org, which spent a great deal of 2017 driving downloads, is not on the list and so we have no attribution data for it.

Can we add ^[\w-]*\.mozilla\.org$ to the regex whitelist? We can remove one-off subdomains from the individual list of rules at the same time.

CDN configuration for RTAMO

When a new user follows the RTAMO flow (downloading a FF/Windows installer from AMO), the stub service builds a modified installer that gets uploaded to a CDN. From there, the stub service returns the (CDN) URL to the installer, which is then used to download the file.

It'd be interesting to have answers to these questions:

  1. how are the folders/URLs on the CDN generated?
  2. how long are the files kept on the CDN?
  3. can we introduce a referer check on the CDN side for RTAMO?

Please delete the "python" (protected) branch

This repo has a very outdated "python" branch that I cannot delete. This seems to be a prototype of this service written in python from... 2016.

@cvalaas could you please delete it? It's unlikely to be used/useful at this point. Thanks!

Add support for macOS attribution data

See: https://docs.google.com/document/d/1NLobogbVfSYofcm7b61tesTirlXpmxwIRYd6Gsd3tt8/edit#heading=h.hhe1o3e8alj5


We’re still settling on the exact details, but the stub attribution service will need to:

  • Read the “koly” block of the DMG
  • Use the “koly” block to read the XML plist of the DMG
  • Parse a Mozilla-specific “attribution” XML element to determine offsets, sizes, partial checksums, etc.
  • Replace the placeholder with new attribution details
  • Compute new CRC32 checksums

This will likely require some internal refactoring since this project has been very Windows-oriented so far.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.