Giter Site home page Giter Site logo

proposals's People

Contributors

hober avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

proposals's Issues

Multi data controller / processor data sharing mechanism

This issue is to gather input from TAG representatives, and others who commented and are involved in issue #6 ahead of making an initial proposal.

SWAN.co complies with all major laws across the G7 and others. It shows a conceptual and concrete implementation of a multi data controller and processor sharing mechanism that complies with laws.

SWAN.co are advocating for a new and improved mechanism for data transfer based on data controllers / processor relationships under laws. SWAN.co is asking web browser vendors to explicitly retain support for the interim implementation of SWAN.co, that is optimally achieved via primitives including cookies (both first and third) and redirects until such time as a new and improved mechanism is widely deployed. This will provide much needed web eco-system certainty and defuse a complex and often heated debate so that real progress can be made on the objectives of this group.

SWAN.co are always interested in ideas for improvements of user experience, privacy or any other matters.

requestStorageAccessFor: Page-level cross-site cookie grant API

As requestStorageAccess in the Storage Access API is being switched to be frame-only, the former page-level behavior is now a gap. The ability to grant access for subresources in addition to iframes is likely important to preserve.

This proposal is to consider requestStorageAccessFor (name very much TBD) as a separate work item. It would work similarly to the old page-level requestStorageAccess behavior, but access would be requested by the top-level site on behalf of embedded origins. This both unlocks the old page-level behavior and ensures that the top-level site, which controls subresource loading, has control. This would require elevated trust to prevent abuse and potential security issues.

Note that requestStorageAccessFor was previously proposed as part of the Storage Access API, but the aforementioned frame-only behavior means that the new API should probably be a separate entity.

More context can be found in the old proposal under SAA or in the explainer.

@bvandersloot-mozilla @johannhof

Bounce Tracking Protection

In the spirit of a community group, we’d like to share some of our Intelligent Tracking Prevention (ITP) research and see if cooperation can get us all to better tracking prevention for a problem we call bounce tracking.

Safari’s Old Cookie Policy

The original Safari default cookie policy, circa 2003, was this: Cookies may not be set in a third-party context unless the domain already has a cookie set in a first-party context. This effectively meant you had to “seed” your cookie jar as first party.

Bounce Tracking

When working on what became ITP, our research found that trackers were bypassing the third-party cookie policy through a pattern we call "bounce tracking" or "redirect tracking." Here's how it works:

  1. The content publisher's page embeds a third-party script from tracker.example.
  2. The third-party script tries to read third-party cookies for tracker.example.
  3. If it can't, it redirects the top level to tracker.example using window.location or by hijacking all links on the page.
  4. tracker.example is now first party and sets a cookie—it seeds its cookie jar.
  5. tracker.example redirects back to the original page URL or to the intended link destination.
  6. The tracker.example cookie can now be read back in third-party contexts.

Modern tracking prevention features generally block both reading and writing cookies in third-party contexts for domains believed to be trackers. However, it's easy to modify bounce tracking to circumvent such tracking prevention. Step 5 simply needs to pass the cookie value in a URL parameter, and step 6 can stash it in first-party storage on the landing page.

Bounce tracking is also hard to defend against since at the time of the request, the browser doesn’t know if it’ll be redirected.

Safari’s Current Defense Against Bounce Tracking

ITP defends against bounce tracking by periodically purging storage for classified domains that the user doesn’t interact with. Doing navigational redirection is one of the conditions that can get a domain classified by ITP so being a “pure bounce tracker” that never shows up in a third-party context does not suffice to avoid classification. The remaining issue is potential bounce tracking by sites that do not get their storage purged, for instance due to the fact that the user is logged in to the site and uses it.

Can Privacy CG Find a Comprehensive Defense?

We believe other browsers with tracking prevention have no defense against bounce tracking (please correct if this is inaccurate) and it seems likely that bounce tracking is in active use. Because we've described bounce tracking publicly before, we don't consider the details in this issue to be a new privacy vulnerability disclosure. But we'd like the Privacy CG to define some kind of defense.

Here are a few ideas to get us started:

  • Adopt ITP’s current defense. This could be done as a periodic purge of cookies for websites without user interaction or combined with a classifier that only subjects domains that show bounce tracking behavior to this periodic purge.
  • Detect bounce tracking patterns and put offenders in a SameSite=Strict jail. This would mean the user can still be logged in to the offending websites by loading them directly, but they would see no cookies when they engaged in bounce tracking. Note though that a bounce tracker can navigate publisher.example–>tracker.example–>tracker.example–>destination.example where the second navigation to tracker.example is same-site and will possibly reveal SameSite=Strict cookies. SameSite=Strict cookies may have to be hardened against this kind of attack.
  • Detect bounce tracking patterns and put the offenders on some kind of block list. This could be done on-device based on web browsing or centralized through crawls. However, it would lead to broken page loads.
  • Detect a redirect on the response and re-raise the bounce request without cookies. This has load performance costs, could break some OAuth flows, and only addresses the “carry ID forward” part of the tracking, not the “user X clicked link Y on website Z” part. This protection would also be vulnerable to correlation between the initial request carrying cookies and the re-raised one.
  • Purge non-SameSite=Strict cookies after the domain has shown bounce tracking behavior or by combining it with the block list approach mentioned above.

Privacy by design with browser-managed E2E encryption and Fenced Frames

Offline-only storage

TLDR Add built-in end-to-end encryption support into web browsers, and prevent first-party from leaking data by denying network access once enabled. Also, allow users to choose where their encrypted data should be stored (not an issue for the web site publisher as it won't be able to read data anyway, because of E2E encryption)

Here is a summarized description of traditional architecture for web apps:

flowchart LR;
  subgraph P ["Back-end (Publisher's servers)"]
    SPB("Web app");
  end

  subgraph C ["Front-end (Consumer device)"]
    SC(User);
    subgraph BB ["Web browser"]
      SPF("Web app");
      B("Storage API");
    end
  end

  style BB fill:darkblue,stroke-width:0px;

  SC-->|Request/Update|SPF;
  SPF-->|Respond/Inform|SC;
  SPF-.->|Request/Update|SPB;
  SPB--->|Front-end code|SPF;
  SPB-.->|Respond/Inform|SPF;
  SPF-->|Process|SPF;
  SPB-.->|Process|SPB;
  SPF-->|"Write"|B;
  B-->|"Read"|SPF;

This architecture works great, except for privacy: the web app publisher owns users data (or at least this architecture
requires the web app publisher to have full access to users data, in order for the web app to work properly). This is an
issue, for users, but also for the web app publisher:

  • Costs increase along with the number of users
  • Managing user data mean additional development, maintenance and storage costs (authentication needed, requiring a
    database for users account, and so on...)
  • Users pay the price for weak server-side security (e.g. database leaks)

Actually, with this architecture, the web app publisher doesn't really have the choice to own user data. The modern web
offers solutions to theses issues (e.g. PWAs), but that's not enough for web apps in production that could leverage
browser storage as main storage provider (e.g. single player games, IDEs, authenticators/passwords managers apps):

  • The security/privacy risk is still present: compromised front-end code can still intercept user data (even with E2E
    encrypted apps: listening to DOM changes is enough)
  • Browser storage seams still considered more like a cookie container than a reliable application storage:
    • Users may accidentally and permanently lose their data if they clear their cookies in bulk for instance
    • No data mobility/backup: each browser instance has its own storage state, saved locally. Authentication is then
      the only way for users to access their data on multiple device.

Theses are true blockers for many web apps that could work entirely without a back-end. A good example of such apps are
single-player games, or even authenticator/passwords managers apps: web app's publisher must not be able to access
highly sensitive user data, and once loaded, the app doesn't need a server to provide credentials (OTP for instance), then
the best would be to only store user data client-side only, but that's risky: the browser may not encrypt browser's
storage, and even worse, the user might accidentally clean application storage permanently, which is extremely dangerous
for OTP. As a result, authenticator/passwords managers apps can't be built as web apps, or at least required E2E
encryption + authentication, which is costly for the publisher, and don't prevent all security risks (on compromised
update of the front-end code could still leak sensitive user data). As explained below, the solution proposed here try
to address these issues with a small architecture change for web apps that opt-in.

Proposed solution: External storage provider + Fenced frames

Offer to web apps a way for storing all data client-side rather than server-side, but with the trade-off of loosing
network access (to prevent data leaks):

flowchart RL;
  subgraph C ["Front-end (Consumer device)"]
    SC(User);
    subgraph BB ["Web browser"]
      B("Storage API");
      SPF("Web app");
    end
  end

  subgraph S ["Storage provider environment"]
    SSP("Storage provider");
  end

  subgraph P ["Back-end (Publisher's servers)"]
    SPB("Web app");
  end

  style BB fill:darkblue,stroke-width:0px;
  style SSP fill:green,stroke-width:0px;

  SC-->|Request/Update|SPF;
  SPF-->|Respond/Inform|SC;
  SPF-->|"Write"|B;
  B-->|"Read"|SPF;
  B==>|"Persist"|SSP;
  SSP==>|"Retrieve"|B;
  SPB--->|Front-end code|SPF;
  SPB-.->|Inform|SPF;
  SPF-->|Process|SPF;
  SPF-.->|Request/Update|SPB;

  linkStyle 4 stroke:green,stroke-width:4px;
  linkStyle 5 stroke:green,stroke-width:4px;
  linkStyle 9 stroke:red,stroke-width:4px;

As shown above, the proposed solution consist of 2 parts:

  • (in 🟢) Introduce an external storage provider, that the browser use to synchronize application data with.
    The external part is important: it let the user choose where their data should be stored: on disk storage (probably
    the default), cloud-based storage (using an account, for multi-device synchronization), or anything else the community
    can provide (let web extensions endorse the role of storage provider for instance). Also, for privacy reasons,
    browsers should encrypt user data before passing them to the storage provider, so that providers don't have access to
    what's stored, not even web apps origins (i.e. no need for the user to trust the storage provider; data can even be
    stored publicly (e.g. IPFS could work)). Follow points 2.8 and 2.11 of the W3C TAG Ethical Web Principles
    .
  • (in 🔴) Disable communication with first- and third-parties (like Fenced frames but on the top level
    frame too for instance). This is the final peace to add Privacy by design to all compatible web apps: this change
    ensure that even the web app publisher won't be able to access user data. Web app can still opt-out at anytime, but
    will lose access to what they stored then (browsers won't delete data thought, they'll just stay in the storage
    provider). Follow points 2.5 and 2.8 of the W3C TAG Ethical Web Principles.

As a result, web apps using the offline-only storage will always respect user privacy. Browsers should then inform the
user about that positive point, so that web apps get "rewarded" with happier users (much like with the green lock when
websites preferred HTTPS over HTTP). That way, the web should tend to become safer at the same time.

For quite obvious reasons, the proposed solution have to be opt-in for web apps providers, or it would break all the
web. The "how" is still to be discussed thought.

Pros

  • Privacy
    • Users now owns their data on compatible websites. Organizations could store their members' data on premise.
    • Users only have to trust their browsers (no more the web app publisher (no data access), or the storage provider (
      encrypted data, no even possible to infer what origin the data is coming from)): since browsers are already
      trusted for their sandboxing capabilities, and that most implementation are open source, it seems better that
      way (only having to trust the browser)
  • Security
    • Fix a whole class of attacks: even the web app owner wouldn't be able to access user data! The only weak point
      left is the browser itself, but they proved to be very fast to deliver security fixes, often faster than web apps
      providers
    • 🔥 Fix another class of attacks: since even web app publishers don't have access to user data, there is
      nothing to leak! Moreover, since each user store its own data, it becomes impossible to leak millions of users at
      once (attackers would have to infect devices of each user). Finally, since storage providers only see encrypted
      data, even storage providers leaks are not an issue
  • Freedom
    • Web app publishers are no more charged with user data storage, related code maintenance and legislation: it
      becomes possible for anyone to build a web app that's used by millions of users at no costs other than these
      related to web app's development (especially with SXGs). This point can
      really benefit to independent developers and small businesses, that generally can't afford good security measures
      for user data protection.
  • Experience
    • All browsers features remain available
    • Users gain more control over web apps they use: they could select which "application profile" to recover, on a
      per-origin basis
    • Authentication is often no more required, since web apps no more have to deal with user data management, resulting
      in improved user experience
    • Developers experience is improved too: less boilerplate code to maintain (authentication, communication with
      back-end code for user data retrieval)
    • Many browsers' features require user interaction (e.g. camera/mic, geo-location, reading filesystem, etc.),
      primarily because of the sensitivity of data that to the web app could leak, potentially to any server on the
      internet. With communication disallowed, such risk no more exists, then maybe some features access could be
      relaxed, or at least the user could choose which feature to enable by default on web apps supporting the solution
      described here. Also, this new architecture give access to new potential features, e.g. allow reading third-party
      and/or unpartitioned storage
  • Analytics/Advertising
    • Advertising is still possible, especially with upcoming proposals like TURTLE DOVE
    • Analytics are still possible, but only the user would be able to see them (and if web apps providers really need
      analytics, maybe some proposals could be requested too I guess)
  • Compatibility
    • Communication with external storage providers could be polyfilled with web extensions, so that web apps relying on
      offline-only storage could still work on older browsers, without the privacy guaranties thought, but it's up to
      the user to acknowledge the risks or not.
  • Web's future
    • Whilst offline-only storage restrict web apps capabilities, it gives new privacy-first evolution perspectives for
      future proposals (e.g. privacy-respectful multi-users communication, offline cross-origin storage access, ...)
    • The web is built around decentralization, an aspect that offline-only storage contribute to
  • Environment
    • No network communication mean reduced environmental footprint
    • Users can choose where their data are stored, so they can reduce their environmental footprint by selecting a
      greener storage provider (avoiding cloud-based one for instance)

Cons

  • Security
    • A new threat model emerge, where web apps may leak data via QR codes, file system access, and so on. These threats
      already exists, but they risk to become more intrusive than before. Anyway, web apps already get flagged as
      malicious when they try to harm users so mitigation already available.
  • Analytics
    • Web app publishers may not like not being able to get analytics, especially since they won't be able to detect
      cases where the app breaks because of outdated user data. However, Service Workers' cache have a similar issue
      that developers seams to handle quite well globally, so it's more a web app design issue. Also, nothing prevent
      the web app publisher to stay on the traditional model, that's the trade-off to pay to enjoy the architecture
      proposed here.
  • Adoption (by publishers)
    • Not all web apps can be converted to this architecture (e.g. banking apps, multi-players games), but future
      proposals may fix that, still in a privacy by design manner. Anyway, at least web apps who can, could.
    • All the web is built around web servers, then a lot of tools become useless. However, as explained before, not all
      web apps are compatible, and servers to servers, semantic web, and so on still need these tools
    • Developers are not used to design web apps without a web server accessible, it may take some time for the
      environment to adapt. Yet, it's not impossible as PWAs are already asked to work at least partially offline. Also,
      incoming communications are still allowed, so real-time user experience might still be possible (
      see Implementation details below for more details)

Implementation details

While this document is entitled "Offline-only", not all requests are expected to be blocked: actually, here, the meaning
of "offline-only" is more "side-effect free", i.e. that can't leak user data content. For instance, requests resulting
of a user navigation are side effect free, because the server can't infer anything about the user data content (assuming
the URL wasn't generated using user data - the anchor (<a>) case is still to be discussed thought). On the other side,
it's not possible for the web browser to know which fetch('http(s)://...') requests are side effect free, so they
should be blocked. JS-triggered requests may still be allowed on some cases thought, for instance with prefetched SXG,
but it's still to be discussed. Furthermore, note that readonly communication is still allowed while having user data
access (e.g. via postMessage (with SharedArrayBuffer disallowed), or reading Unpartitionned Storage).

Depending on the feasibility, it might be possible to allow web apps to work in both trusted (user data access,
communication restrictions) and untrusted (no user data access, no communication restriction) modes, at the same time:
for instance, the service worker might be running in an untrusted context and the main page in a trusted one. The
service worker would be able to send messages via postMessage to the main page when a push notification is received
for instance.

Concerning how a web app request trusted mode, multiple solutions are available for discussion:

  • Via a HTTP response header
    • + simple, consistent with CSP
    • + (probably) simpler to implement
    • - limited, no dynamic loading possible
  • Via a one-way switch API (e.g. navigator.offlineStorage.enable()):
    • + simple, powerful, consistent with Service Workers
    • + supports dynamic loading (do fetch requests, then call enable())
    • + supports location.reload() to leave the trusted context without leak
    • + supports live updates (e.g. spawn a web worker that manage a web socket connection to feed the main page
      via postMessage())
    • - (probably) harder to implement (active connections have to be closed, SharedArrayBuffers access invalidated)
  • (other solutions to discuss)

Offline-only storage could work in private browsing, since no user data would leave the browser anyway.

Link with Fenced Frames

The proposed feature could work like how Fenced Frames' read-only mode works. The main difference, however, is that a web app should be able to self-enter isolation mode (with some API), i.e. could work on top-level frames. This is important to allow browsers to notify the user that the displayed web page now operate privatly (good for web site reputation), and also to avoid top-level web documents to display a sole Fenced Frame HTML element when they want to use privacy-restricted APIs.

Other points to discuss:

  • Same origin tabs communication
    • Does trusted mode apply to all tabs of the same origin, or per tab?
      • "All tabs" seems to add complexity to web apps' developers, and maybe to browsers' implementations too
    • Where to place the Service Worker?
      • Could be useful to have the worker intercepting requests in trusted mode, but what to do if others tabs are
        in untrusted mode? 1 Service Worker per mode, or only one in untrusted for simplicity?
    • Allow communication between same origin tabs in trusted mode?
  • Allow communication between cross-origin tabs in trusted mode?
    • Could be useful for some apps (e.g. analytics apps) but could interfere with existing browser security measures
  • New storage API or current storage (e.g. Local Storage, Indexed DB, ...) synchronization?
    • The Storage Access API could be a good candidate (e.g. adding a parameter for disabling outgoing communications)
  • Multi-devices synchronization (sharing the same storage provider for instance)?
    • Can be handled by good design decision if a new storage API is preferred
    • Even without real-time synchronization, conflicts would happen anyway, so it's still a point to discuss
  • Web apps can still manipulate the user to leak their data (e.g. with a QR code), is it an issue?
    • I guess no: actually, it really depends on whether the first-party care about users' privacy or not, which seams
      to be often the case as of today. Then, at least at the beginning, trusting web apps might be enough (not all
      apps' data are critical), as long as we keep an eye on how the web evolves.

Conclusion

The suggested feature is ambitious, there are some implementation challenges to overcome, yet implications for security,
privacy and web's future are worth a look. Not all web apps can use the proposed storage, but at least some could,
empowering users by letting them deal with their data as they which. Anyway, I hope this document could contribute to
W3C privacy goals for the web.

Related

  • TURTLEDOVE: same idea (let the browser own application state access) but for
    advertising only and without DOM access
  • Fenced Frames: mostly how communication restrictions should apply
  • #28: original idea, but not attractive enough (was too restrictive on implementation details and did not solve the
    data mobility issue)
  • W3C Privacy Principles: W3C guidelines about privacy on the web.
    Considered related as I think the design proposed here fit well (
    especially High-Level Threats, since these threats no more
    apply when parties can't access user data)
  • W3C TAG Ethical Web Principles
  • Trusted Execution Environments: conceptually, offline-only storage is a kind of trusted execution environment, but without hardware required. The goal is to allow web apps that want to to operate inside this kind of environment, and to "reward" them for doing so.
  • Chrome Privacy Sandbox: I guess could be a good starting point for testing the viability of the feature described here
  • discourse.wicg.io topic

EDIT 2021/05/27: realized it's a form of E2E encryption (simpler to describe)
EDIT 2021/05/29: added the Storage Access API as a good candidate for implementing this API on
EDIT 2021/05/30: added links with Fenced Frames read-only mode

Cookies Having Independent Partitioned State (CHIPS)

Cookies Having Independent Partitioned State (CHIPS) is a proposal for a new cookie attribute, Partitioned. This attribute will indicate to user agents that these cross-site cookies should only be available in the same top-level context (top-level site or that site's First-Party Set if it has one) that the cookie was first set in.

In order to improve user privacy on the web, browsers are planning or have already begun to phase out third-party cookies (Chrome, Safari, Firefox). In order to continue to support cross-site cookie use cases which are restricted to a user's activity within a single top-level site, browsers have implemented and attempted to ship partitioned cookies, i.e. cookies which are only sent when a user agent is on the same top-level site as when the cookies were sent. For example, Firefox implemented partitioning cross-site cookies by top-level site by default in their Total Cookie Protection, which is enabled in ETP Strict mode. Safari previously tried implementing cookie partitioning, but eventually rolled it back. One of the stated reasons for the rollback was developer confusion with the partitioned semantics. They have since proposed leveraging the Storage Access API to allow developers to opt in to receiving partitioned state.

CHIPS differs from previous cookie partitioning design mainly because CHIPS requires third-party sites to set their cookies with the Partitioned attribute. We believe a third-party developer opt-in (instead of partitioning by default) will provide site owners the opportunity to migrate their system to the new cookie behavior before completely phasing out cross-site cookies that are set without the Partitioned attribute. In addition, it affords developers a choice between partitioned and unpartitioned cookies on user agents that continue to support unpartitioned cross-site cookies - whether by default, or for domains that may be allowlisted by user/device owner configuration. Unlike the Storage Access API, this proposal would also allow cross-site requests to receive cookies without requiring them to load a JavaScript execution context.

The developer opt-in also gives browser vendors an opportunity to incentivize developers to adopt best practices for cookies. For example, CHIPS requires cookies set with the Partitioned attribute must also be set with the __Host- prefix, requiring the cookies be host-bound and only sent over secure channels.

CHIPS also proposes to use the top-level site’s First-Party Set (FPS) owner as the partition key. This allows third-party service providers to use the same session identifier across sites within the same FPS, and allows them to serve common workflows such as single sign-on, and consent management across the sites, as long as they are within the same FPS.

Finally, our proposal also suggests user agents to not apply the 180 per-domain cookie limit to partitioned cookies, since this leaks information about users' state across different top-level sites. Instead, we recommend user agents apply a per-partition limit on the number of cookies that third-party domains can store. To prevent partitioned cookies from having a large memory footprint, we recommend that this limit be small, on the order of ~5-10 cookies.

Compatibility Considerations

Older clients will ignore the Partitioned attribute, and treat the cookie as unpartitioned in cross-site contexts. In order to allow developers to disambiguate requests coming from a partitioned context, we are also proposing modifying/adding a Fetch Metadata request header (w3c/webappsec-fetch-metadata/issues/80) to indicate when a request is coming from a partition context.

Third-party Cookie Access Heuristics explainer

The web is moving to deprecate third-party cookies, and not every site developer will have the time and bandwidth to implement workarounds that mitigate user-facing breakage. In particular, flows involving authentication tokens from identity providers are a common web pattern that relies on third-party cookies.

There are established practices where a browser grants temporary storage access when a user satisfies a predefined flow. We have assessed a few existing heuristics for security and privacy concerns, and have decided to prototype the following two scenarios:

  1. When a third party is loaded in a popup, after possible redirects, and the third party receives user interaction, the third party receives storage access on the opener site for 30 days.
  2. When a first party redirects to a third party, the third party receives a user interaction, and navigates back to the first party, the third party receives storage access on the opener site for 15 minutes.

We presented this proposal at TPAC to generally positive feedback:
Explainer: https://github.com/amaliev/3pcd-exemption-heuristics/blob/main/explainer.md
Slides: https://docs.google.com/presentation/d/e/2PACX-1vQAjOEnKv3fyXchlYwO2JbPGrvaT7w3Q24ikac_1YWO8IhFJhPvaWBpXZPTMx0wYud1jgiM_TkVQIvw/pub

We appreciate any additional feedback, comments, or concerns from the broader community. Thank you!

Fenced Frames

Web security has traditionally focused on preventing unauthorized leakage of information across origins. We've seen browsers go so far as to put different sites into different processes to prevent that leakage.

But what about preventing leakage of any information? Even authorized? What if you want to embed a cross-origin document in your page and you don't want it to be able to talk to any other frame on your page, or even to its own origin on other pages?

Why might you want to do that? Well, let's say that you wanted to give sensitive information about your user to a cross-origin frame to display something useful, but you don't trust that site with the data (e.g., it might want to leak it to storage, or another context on the browser). For instance, the user’s finances could be fed to a retirement or mortgage calculator, without the third party being able to store that data or relay it to anyone else on the page. Or, perhaps the browser itself has sensitive information that it could provide to such a restricted frame (e.g., cross-site data like isLoggedIn, the chosen ad of a lift experiment, an ad from FLEDGE, or possibly even cross-site storage access).

From a web security perspective, we'd now have to include collusion in the threat model. Entities within the fenced frame should not be allowed to communicate with those outside.

The Fenced Frames explainer represents our initial stab in this direction and we’d like to build on it together with the Privacy CG. The current draft is primarily focused on the use cases of embedding ads where we feel that it’s acceptable to allow some data to leak via navigation, but only on click. We envision future modes as well where the fenced frame is allowed access to any data, but is never allowed to communicate or only through approved channels.

As a rough outline, the basic features of a fenced frame are the following:

  1. It has a separate frame tree from its embedder.
  2. The frames within the fenced frame tree can communicate with each other as normal, and appear to have their normal origins.
  3. The frames within the fenced frame cannot communicate with any frames outside of the fenced frame. e.g., no external broadcastChannel, storage, shared caches, postMessage, etc. This also means that we need to limit the possible sizes of the frame, ignore frame resizing, and carefully consider information that could be passed from permissions policies. Some bits will leak in this way, but we will constrain them as much as possible.
  4. The fenced frame is not allowed network access (it leverages navigable web bundles instead). We're considering an exception for origins that have declared that they will only provide caching services and will not log user activity.
  5. The fenced frame can navigate on click. This allows for a leak of data and must be scrutinized further. The leak is at least limited to user clicks. The URL of the navigation may be required to be specified at fenced frame creation.
  6. The fenced frame can be given an opaque url (e.g., urn:uuid) that the embedder can't interpret but maps to a full URL within the fenced frame. This is how we intend to allow APIs to choose content for a fenced frame, which is returned to the embedder in the form of a urn:uuid, and is passed on to the fenced frame for display.

Thanks for taking a look!

Import/export passwords in keepass format for all browsers

Hi all!

1. feature-name

Import/export passwords in keepass format for all browsers

2. feature-description

Importing and exporting browser passwords must have a universal and encrypted password pattern. There is a technical discussion about Mozilla Firefox using the encryption format here: Encryption of passwords imported into a file.

  • The problem with passwords today is that every browser implements the way to import and export passwords.
  • I argue that to have good security/privacy of user data we have to have a universal standard in importing/exporting passwords. As a matter of technological, individual, philosophical and rational freedom
  • I believe that for this to be done, we have to adopt the standard of open solutions with MIT/GPL open licenses, etc.
  • Keepass is an open source solution for managing passwords, it would be nice to have the same file format for all web browsers. When I refer to a universal standard for importing/exporting passwords, I say that it would be interesting, as I said before, to adopt the keepass open format for this.
  • Keepass has been around for a long time and is a well-established, popular solution known all over the world.
  • Adopting the Keepass format for importing/exporting passwords gives people more freedom. Well, Keepass software is licensed under the GPL - the same license as Linux. Which, in theory, will always be available openly.
  • It's 2022, and the bookmarks format is the same as it was in 1999. So why don't we have the same Keepass format for importing/exporting passwords in all browsers? - reference here Netscape bookmarks

2.2. Notes

  1. I would like to know if this idea is good or bad
  2. My goal is to help this communities: PrivacyCG, Solid, W3C, WIGC, WebAuthn, KeepPass, Browsers(Brave, Vivaldi, Opera, Mozilla Firefox, Libre Wolf, Google Chrome etc)
  3. I didn't find any link, resource for this here in PrivacyCG Community
  4. I'm not promoting any company, service, product, solution, idea here - just adding the bibliographic reference links
  5. If I'm wrong about something, speak up, criticize, correct

3. References

DNS TLD for Privacy

Rumour has it that ICANN is considering another run of the new gTLD program.

Last time around, Google registered .app and runs it with an additional semantic: all domains in that TLD are automatically on the HSTS preload list, effectiely enforcing HTTPS for any server with an .app domain.

What if something similar were done with a privacy focus? For sake of argument, let's say .priv1 is registered, and there's agreement that browsers will not allow any third-party requests from those domains. The registrar might also insert contractual terms that limited first-party tracking as well.

Sites with .priv domains could then beliveably market themselves as privacy-focused, giving them an advantage with privacy-concious users / customers.

This would also provide an opportunity for browsers to try out new techniques for privacy in a 'sandbox' that's already privacy-focused.

Just thinking out loud here - any interest? Obviously it'd need good browser support. Best path forward might be to define an opt-in signal for sites first, just like HSTS did.

Footnotes

  1. I suspect .priv is not the right name here, but let's not bikeshed that at the moment

Standardizing Global Privacy Control (GPC)

Background

On January 1, 2020 the California Consumer Privacy Act (CCPA) went into effect and established new privacy rights for California consumers. Specifically, it covers the rights to:

  1. Opt out from the sale of personal information (Do-Not-Sell),
  2. Access personal information, and
  3. Delete personal personal information.

A "sale" is understood broadly and likely covers, for example, a website making available or disclosing identifiers or location data to an ad network for purposes of monetization. The most recent regulations to the CCPA published by the California Attorney General specify that automatic signals communicating a user's decision to opt out must be respected. Here is the relevant language:

If a business collects personal information from consumers online, the business shall treat user-enabled global privacy controls, such as a browser plugin or privacy setting, device setting, or other mechanism, that communicate or signal the consumer’s choice to opt-out of the sale of their personal information as a valid request ... .

The CCPA appears to be a catalyst for implementing new privacy functionality in browsers and other clients. Other states beyond California are introducing similar privacy bills in their legislatures. Microsoft announced to honor the new CCPA privacy rights not only for California but for all other states as well. Similarly, Mozilla announced the option to delete telemetry data for its users anywhere.

In addition to the CCPA, the General Data Protection Regulation (GDPR) also mentions the option for clients to make privacy practices explicit via machine-readable icons:

The information to be provided to data subjects pursuant to Articles 13 and 14 may be provided in combination with standardised icons in order to give in an easily visible, intelligible and clearly legible manner a meaningful overview of the intended processing. Where the icons are presented electronically they shall be machine-readable.

Various efforts are underway to implement the new privacy rights. The Interactive Advertising Bureau has released the IAB CCPA Compliance Framework for Publishers & Technology Companies and the Digital Advertising Alliance CCPA tools. Efforts by W3C Working Groups include the Confinement with Origin Web Labels. There are also various approaches led by companies in this space, for example, the Data Transfer Project.

Some Initial Thoughts

At this point, it seems worthwhile to have a discussion of these developments with the goal of converging to a standard. In particular, a Do-Not-Sell signal could be implemented similar to the Do-Not-Track (DNT) signal via an HTTP header.

Previously, the Tracking Protection Working Group developed the Tracking Preference Expression (DNT). There are certainly lots of learnings that can be taken from that effort for the question here. Though, a big difference is that recipients of a DNT signal are not required to comply with it. Per the California Online Privacy Protection Act (CalOPPA) they only need to say whether they comply.

There are multiple dimensions to the implementation of privacy rights:

  1. Which functionalities should be implemented? For example, a narrow implementation could just focus on a Do-Not-Sell signal, a simple binary signal. At the other end of the spectrum could be a full privacy communication channel that allows users not only the opt out from selling data, but also signal access requests and receive related data through the browser, for example.
  2. Which types of clients or platforms should be covered? Especially, on mobile devices much of the user interaction happens through non-browser apps. Should operating system vendors get involved here to add or change existing APIs to accommodate for privacy signals and communication?
  3. Which technologies should be used? The DNT effort relied on HTTP headers. Other choice mechanisms are reliant on HTTP cookies, many on third party cookies and some on first party cookies. With relevance for this context Google recently announced a plan to phase out support for third-party cookies in Chrome. Should Do-Not-Sell and similar functionalities even part of the browser and other clients or should there be a web platform (e.g., a Do-Not-Sell registry similar to the Do-Not-Call registry)?

Internet users, publishers, privacy organizations, and ad networks are some of the stakeholders in this question. Ultimately, there needs to be a consensus because the proposed task here is not only one of technology but also one of policy. The implementation of privacy rights such that they can be meaningfully exercised and the evolvement of the web ecosystem for all participants go hand-in-hand.

One concrete idea to move forward is the implementation of prototypes and testing them in usability studies. We already started this effort here at Wesleyan.

This issue is continuing a discussion of members of the Privacy Community Group on the mailing list.

Edit July 30, 2021: Below is a list of blog posts, public comments, and other responses on Global Privacy Control. I am updating the list on a regular basis. It is not comprehensive, but I am trying to cover all major developments.

Registry of Businesses and Domain Name Ownership

Introduction

This proposal puts forward the need for a single, or number of Authorities/Registrars that businesses can use to register as an entity that intends to control / process personal information on the web within certain jurisdictions, and the entity's ownership of a domain name.

Goals

  • Provide a more transparent means for the user to govern when and how their personal information is used/stored/shared by businesses.
  • Provide a means for user agents to decide default behaviour in regards to allowing data to be accessed/stored/shared per domain.
  • Provide a means for businesses to register their domain names for the business as an entity that intends to control / process personal information within certain jurisdictions.
  • Provide a means for businesses to register their relationship with service providers, to the extent that a service provider is a separate business, and intends to process shared personal information.
  • Aid in the decline of consent banners

Non-goals

  • This proposal does not attempt to define the protocol of signals between the user agent and business (e.g. Do Not Sell). Rather, it could help to define the relationship that may be necessary in order for the communication to exist.
  • The ability for client-side data to be shared across domains that have the same business registration, although this could perhaps follow.

Background and arguments

In regards to CCPA and GDPR, storage/access/control of personal information and personally identifiable information, is not limited to domain names, but rather to businesses. When a user visits a site, very commonly they will be presented with a consent banner, which gives the user access to the privacy policy of the business that operates the site under the domain name. The privacy policy may also include the third party service providers with whom the business may share the user's personal information. Currently, the user agent is not able to assist the user in a meaningful way when it comes to the proactive acceptance of a privacy policy.

The user agent is also not able to assist very heavily in retroactive control of data; if the user wishes to view/remove data held by the user agent, they are able to only see a list of domain names that the data is partitioned to. There is a likelihood of little understanding of the businesses that have access to the data when browsing the web, and the relationships between those businesses, where third party service providers are concerned.

In order for the user agent to be able to assist the user, I believe it is necessary that information about the owner of the domain name, and the relationship they have with service providers, is accessible to the user agent. A business could publish the data itself, and have that accessible in a .well_known location in relation to the domain name. Indeed this could be a valid first step in achieving the listed goals. However, when it comes to data protection, I believe it should be assumed that a domain is not fully trusted to keep this information correct by itself.

Businesses already have a large responsibility to fulfill data protection requirements, and depending on the jurisdiction, they are obligated to register themselves to a relevant authority[1]. This responsibility will no doubt get larger and more complex as laws are introduced in more jurisdictions. By implementing standard authorities on the web, it may help to normalise the process/data.

Proposal in slightly more detail

  • At the very least, a business registration should include the name of the business, and the information required to communicate with the business for data protection purposes. If a registration for a domain name is being made for a business, there should be a mechanism that acts as sufficient verification that it is the business or an agent of the business performing the registration. At any time, a business can query the domain names that have been registered under its ownership, and flag any issues, with sufficient verification that they have the power to do so.
  • Upon request, most likely at the time of domain name resolution, the user agent is given, or can query, the business registration for the domain name. In order to be deemed valid, the registration must be signed by an authority that the user agent has trusted. The registration may be cached at a number of locations between the user agent and the authority.
  • The user agent can use the existence and data of a business registration, and the preferences set by the user to determine if and how it allows the access/storage of client-side data for the domain by default, and the domains of the business's service providers.
  • Client-side user data can be partitioned to a business registration (or the absence of a business registration) under a domain. If the business registration meaningfully changes, the client-side data can undergo a process of transferral, or removal, controlled by the user agent.
  • For user data stored by the business (i.e. non-client side), the user agent can send signals to the business and its service providers, requesting it to perform certain actions i.e. Opt out, Do Not Sell. This is providing that the business has agreed to comply to a certain standard/protocol, and the registration contains details on the protocol of communication. If the business does not show itself to comply to a certain standard, the user agent will have had the opportunity to deny access to data / prompt the user upfront.

I believe that by having businesses optionally comply to standards, and knowing that access to data can freely be rescinded, user agents have the potential to satisfactorily make decisions on the user's behalf, or prompt the user when necessary. Thus potentially removing the need for consent banners.

To make things more clear, I've put together a mockup to demonstrate how this proposal could open up possibilities for the user agent (I am in no way recommending this is how browsers should decide to implement it 🙂):

User Privacy Dialogues

Considerations

This is a rough start at a proposal, and it's purposefully vague both in definitions and technical specification. If there's interest in it, there are many things to be considered. Some that I can think of off-hand:

  • What would the exact definition of a business and domain name be, for the purpose of this proposal?
  • What would a business registration look like? What data would it hold?
  • What constitutes a sufficient verification of a business?
  • Would there be multiple business registrations allowed for a single domain, for the same business/organisation, to cater for different jurisdictions/laws?
  • How about domain names that many different businesses may use?
  • How can a business registration link/relate to registrations in existing authorities[1]?
  • What existing mechanisms if any could be used that this registry could piggyback off of? (domain name registrars, certificate authorities etc.).

How does this compare to similar existing proposals?

First-Party Sets and Domain Boundaries are perhaps similar, in that they offer a mechanism to group domains together under an umbrella, but they serve different goals to this proposal in my opinion. Control and trust over the user's data is the primary goal of this proposal, not the ability for businesses to share client-side data between domains. This proposal puts forward the necessity for an outside authority that the user agent trusts, rather than relying on .well_known locations or DNS records set up by the business. This proposal also provides a means for the registration of relationships to service providers, that the user's information may be shared with, to allow for transparency/control over shared data.

IAB have published a framework to allow publishers to comply with CCPA legislation, by registering and signing an agreement. This proposal doesn't aim to compete with the framework, but it would be interesting to explore if this could perhaps compliment it.

Appendix

[1] Existing business registries:


Including my comment from April 29th here to improve visibility


Thanks a lot for the responses here and in the call. I've put some thought into how this could move along into a more concrete spec for consideration, while hopefully addressing some of the thoughts/concerns made so far.

Straw man spec

Data Structure

Upon request, the User Agent can access the following, per domain, which contains data relevant to the processing or control of personal information by the entity that owns the domain:

{
  "policy": {
    "type": "",
    "version": "",
    "clientStorageRequirement": "",
    "fullPolicyTextHref": "",
  },
  "serviceProviders": [
    {
      "entityId": "",
      "domainName": "",
      "processingCapability": ""
    }
  ],
  "interface": {
    "type": "",
    "version": "",
    "signalConformity": [
      "opt-in",
      "opt-out",
      "do-not-sell"
    ]
  },
  "signed": {
    "domainName: "",
    "entity": {
      "uniqueIdentifier: "",
      "name": "",
      "state": "",
      "country": "",
      "governmentAuthorityRegistrationId": ""
    },
    "expires": ""
  }
  "authority": "",
  "signature": ""
}

Policy

This details policy information that can be read programmatically. The schema is designed to be extensible, and the different types and standards are not part of this scope, other than what's considered the most basic.

Service Providers

This is an important aspect of the proposal, which has the potential to allow greater transparency and feed into decisions the User Agent makes regarding sharing of data. Consent Management Platforms currently create an environment where the user agrees (in my opinion unwittingly) to the sharing of their data to hundreds of third party services. This information is usually in the written privacy policy, but I think there's a great advantage to having this exposed to the User Agent.

Interface

This details how the User Agent is able to communicate with the entity in regards to control of personal data. Again, it is designed to be extensible and contain the standards to which the entity conforms to, with perhaps some flexibility for unique customisation. Applied standards can borrow a lot from learnings elsewhere, including the TCF as mentioned by chrispaterson.

Signed

This is the portion of the data that is required to be signed by an authority. Here, it is the domain name to business/entity relationship.

Where should the data be accessed from?

I think the options are:

  1. Stored on a server that the entity controls, in a .well_known location relative to the domain
  2. Stored as a DNS TXT Record, in a _well_known host relative to the domain

My preference is for a DNS record. It implies a certain amount of elevated priveleges to implement, inherently verifies domain access by the entity, and avoids a potential issue with matching wildcard domains due to upstream proxies.

How can the data be trusted?

An authority will be responsible for signing some of the data. In this straw man spec, only the domain name to entity relationship is signed, the rest of the data is separate. This is to allow for the easy updating of the policy, service providers and interface. The authority should act as a registry for the business entity, and should verify that the signature request is legitimate. Perhaps a similar process to EV certificates could be used for the validation process.

As far as the trusting of the policy goes, it's a difficult one. How would an authority audit the process, and monitor the process over time? Right now, everything is behind a black box to the User Agent, and this proposal is attempting to bring the processes to light. It was mentioned in the call that attempting to standardise these processes can also have the advantage of businesses having a better sense of how they should be handling the data.

Revocation

Some mechanism should be in place for the authority to signal that a record is now invalid, without having to wait for the expiry of the record. This needs more thought.

What can the User Agent do with the data?

Please see the UX mockup in the original post above. An API could be made accessible to JS perhaps for further functionality. To be clear, this proposal does not attempt to define standard behaviour of different browsers, or the API.

One further idea is the concept of the User Agent being in control of both transient and long-lived consent, given either implicitly or explicitly. Going further, there could be an identifier to represent this consent, which the User Agent could use to query the business/entity and its service providers for the existence of personal data associated with this identifier, revealing an audit trail of where and how the data is being used. This, again, is not in scope, but is perhaps made possible by the proposal.

What incentive does a business have to register and keep the policy up to date?

This was raised by a number of people in the call, and by sammacbeth above. My initial thinking was that, in the event that the User Agent were to become more restrictive for domains that do not provide a policy or business ownership, the incentive for the business will be to not be affected as heavily by these constraints. This will be especially true for non-essential third party service providers, where the User Agent may enact more stringent measures i.e. decide not to load them. This is the main contrast between this proposal and First Party Sets in my opinion, as the goal for that proposal seems to be for the User Agent to be able to treat multiple domains as the same site in terms of privacy, effectively lessening existing restrictions. Having said that, as this proposal's scope does not include the behaviour of the User Agent, a similar lessening of existing restrictions would be possible with this proposal I think, and could certainly provide an incentive if that's what the User Agent decided to allow.

High-level Questions

  • Is the possible User Agent behaviour valuable? Is this something that would garner interest?
  • If yes, does the presence of this data fulfil the desired behaviour? Are there other ways to achieve the same goal?
  • Is there value in the data even if it isn't signed? Can the hard dependency on an authority be removed?

JS Isolation via Origin Labels and Membranes

This proposal has moved into its own repository. Please file issues there.

https://docs.google.com/document/d/1GFWONU2lq9ukQoj6dIGudOO4P3op7a1xt75Gb_jAA1c/edit#heading=h.8blcqbqrr76o

The above is an early stage Brave proposal for how to constrain scripts from the document, using a programatic, run time approach.

The idea has similarities to the COWL design that didn’t quite gain traction in WebAppSec a few years back, but with several sig differences:

  1. By design, intended to not require rewriting existing code
  2. API designed to both allow
    (i) page to protect itself from 3p scripts, and
    (ii) the browser (and extensions) to protect browser state from 1st and 3p (e.g. privacy protections)
  3. Designed to allow filter-list style curating and sharing of isolation policies
  4. Designed to use APIs and interfaces already in the browser (e.g. it looks much more Web like than previous suggestions)

In general, the idea is part of trying to tackle a larger category of problems we don’t have a standards-based approach to:

  1. How to protect the 1p context (especially storage)
  2. How to allow the browser or page treat different scripts w/ different level of trust / privilege

And especially to do so in a way that doesn’t break existing sites / legacy code that’ll never get rewritten

bounce tracking mitigations

At TPAC we presented chromium's proposal for bounce tracking mitigations in a number of meetings:

In general I think the feedback was positive and there was some interest from other browser vendors.

Does the privacycg want to host this proposal during incubation? Alternatively, we could host the proposal in WICG.

Note, while bounce tracking mitigations falls under the navigational tracking umbrella, I'm hoping it can be worked on as a separate item from the omnibus nav tracking report. That effort appears to cast a wide net trying to encompass the entire universe of navigational tracking.

If we move this into privacycg, would it be possible to keep bounce tracking as a separate repo and then point to it from the nav tracking report? I just want to avoid blocking bounce tracking on solving all of nav tracking.

Thank you.

Site Groups / First Party Sets v2

Site Groups

This document proposes a new web platform mechanism to declare a collection of related domains as being in a Site Group. This is an evolution of the First-Party Sets proposal to accommodate for several changes:

  • Removal of much language around “First-Party” as it has many historical connotations/denotations that may be less relevant or confusing in the future.

  • Renaming of the standard to “Site Groups” as not only does this remove the “First-Party” confusion, but is a more straightforward name that may even be usable in communication to users.

  • Proposes modifications to existing standard browser UA policies to remove “single organization control” as a requirement of Site Groups due to:

  1. Lack of public information that documents corporate/organizational ownership, and any clear way of defining a policy that can be fairly applied
  2. Inability of browsers to police organizational ownership
  3. Bias of this requirement towards large companies over small
  4. Organizational ownership not being discernible to the user, nor offering the user any comfort that their data would be used in a specific way
  • Specifically empowers the “owner” site with the incremental cross-site functionality, and disempowers “secondary” sites from having cross-site capabilities. This should allow for all required functionality across sites while minimally increasing the availability of user data beyond the origin.
  • Adds a requirement of a shared privacy policy, and a human-readable site group name across all domains in a Site Group.

Most of the language in this proposal is directly taken from the First-Party Sets proposal, and a significant amount of privacy-specific language was removed from this as no changes are proposed from the First-Party Sets proposal into this one. That is, I tried to remove any parts of the FPS proposal that were not modified in any way and/or were not relevant to expressing the changes in this version. If Site Groups is deemed to be a useful extension of FPS then those elements can be reintegrated back in.

Thanks to the editors/writers of the First-Party Sets proposal:
Kaustubha Govind, Google
David Benjamin, Google

Introduction

Browsers have proposed a variety of tracking policies and privacy models which scope access to user identity to some notion of “first-party”. From the user’s perspective, first-party has typically meant a singular domain, but this limits how sites can provide services to the user. Site Groups aims to increase the ability of sites to provide valuable services to their users by widening the privacy boundary to include affiliated sites, while minimally impacting the user’s privacy. In redefining this scope, we must balance two goals: the scope should be small enough to meet the user's privacy expectations, yet large enough to provide the user's desired functionality on the site they are interacting with.

One natural scope is the domain name in the top-level origin. However, the website the user is interacting with may be deployed across multiple domain names. For example, https://google.com, https://google.co.uk, and https://youtube.com are owned by the same entity, as are https://apple.com and https://icloud.com, or https://amazon.com and https://amazon.de.
We may wish to allow user identity to span related origins, where consistent with privacy requirements. For example, Firefox ships an entity list that defines lists of domains belonging to the same organization. This explainer discusses a mechanism to allow organizations to each declare their own list of domains, which is then accepted by a browser if the set conforms to its policy.

Goals

  • Allow related domain names to declare themselves as within a Site Group.
  • Define a framework for browser policy on which declared names will be treated as the same site in privacy mechanisms.
  • Minimally increase the availability of user data to reduce potential privacy issues

Non-goals

  • Third-party sign-in between unrelated sites.
  • Information exchange between unrelated sites for ad targeting or conversion measurement.
  • Other use cases which involve unrelated sites.

Declaring a Site Group

A site group is identified by one owner registered domain and a list of secondary registered domains. (See alternative designs for a discussion of origins vs registered domains.)

An origin is in the site group if:

  • Its scheme is https; and
  • Its registered domain is either the owner or is one of the secondary domains.

The browser will consider domains to be members of a set if the domains opt in and the set meets UA policy, to incorporate both user and site needs. Domains opt in by hosting a JSON manifest at https:///.well-known/site-group. The secondary domains point to the owning domain while the owning domain lists the members of the set, a version number to trigger updates, and a set of signed assertions to inform UA policy (details below).

Suppose a.example, b.example, and c.example wish to form a first-party set, owned by a.example. The sites would then serve the following resources:
https://a.example/.well-known/site-group
{
"owner": "a.example",
"version": 1,
"privacy-policy": "a.example/privacy-policy.html",
"sg-name": "Human readable name of this site group",
"members": ["b.example", "c.example"],
"assertions": {
"chrome-sg-v1" : "",
"firefox-sg-v1" : "",
"safari-sg-v1": ""
}
}

https://b.example/.well-known/site-group
{ "owner": "a.example" }

https://c.example/.well-known/site-group
{ "owner": "a.example" }

The browser then imposes additional constraints on the owner's manifest:
Entries in members that are not registrable domains are ignored.
Only entries in members that meet UA policy will be accepted. The others will be ignored. If the owner is not covered by UA policy, the entire set is rejected.

Owner Privileges

The owner domain of a given site group has special privileges within that site group. It can read and write "first party" data stores (first party cookies, LocalStorage, Storage Access API, etc.) within the browser when the browser origin is set to a domain within its site group. More plainly:

  • The owner domain has access to read/write all data from across the site group
  • Each secondary domain only has access to read/write data from its own domain (same as currently implemented)

This would generally require any sites in a site group including resources (iframes, script, etc.) from the owner domain in order to create applications using the site group, but this seems like a minimal issue. This also allows for cases where sites in a site group can avoid calling the owner domain in order to lower any privacy/security risks for that request.

Discovering Site Groups

By default, every registrable domain is implicitly owned by itself. The browser discovers site groups as it makes network requests and stores the site group owner for each domain. On a top-level navigation, websites may send a Sec-Site-Group response header to inform the browser of its site group owner. For example https://b.example/some/page may send the following header:
Sec-Site-Group: owner="a.example", minVersion=1

If this header does not match the browser's current information for b.example (either the owner does not match, or its saved first-party set manifest is too old), the browser pauses navigation to fetch the two manifest resources. Here, it would fetch https://a.example/.well-known/site-group and https://b.example/.well-known/site-group.
These requests must be uncredentialed and with suitably partitioned network caches to not leak cross-site information. In particular, the fetch must not share caches with browsing activity under a.example. See also discussion on cross-site tracking vectors.

If the manifests show the domain is in the set, the browser records a.example as the owner of b.example (but not c.example) in its site-group storage. It evicts all domains currently recorded as owned by a.example that no longer match the new manifest. Then it clears all state for domains whose owners changed, including reloading all active documents. This should behave like Clear-Site-Data: *. This is needed to unlink any site identities that should no longer be linked. Note this also means that execution contexts (documents, workers, etc.) are scoped to a particular site group throughout their lifetime. If the group owner changes, existing ones are destroyed.

The browser then retries the request (state has since been cleared) and completes navigation. As retrying POSTs is undesirable, we should ignore the Sec-Site-Group header directives on POST navigations. Sites that require a site group to be picked up on POST navigations should perform a redirect (as is already common), and have the Sec-Site-Group directive apply on the redirect.
Subresource requests and subframe navigations are simpler as they cannot introduce a new first-party/site group context. If the request matches the origin URL's owner's manifest but is not currently recorded as being in that site group, the browser validates membership as above before making the request. Any Sec-Site-Group headers are ignored and, in particular, the browser should never read or write state for a site-group other than the current one. This simpler process also avoids questions of retrying requests. The minVersion parameter in the header ensures that the browser's view of the owner's manifest is up-to-date enough for this logic.

Design details

UA Policy

Defining acceptable sets

We should have some notion of what sets are acceptable or unacceptable. For instance, a set containing the entire web seems clearly unacceptable. Conversely, a set containing https://acme-corp-landing-page.example and https://acme-corp-online-store.example seems reasonable. There is a wide spectrum between these two scenarios. We should define where to draw the line.

Browsers implementing Site Groups will specify UA policy for which domains may be in the same set. While not required, it is desirable to have some consistency across UA policies. For a set of guiding principles in defining UA policy, we can look to how the various browser proposals describe first parties (emphasis added):

  • A Potential Privacy Model for the Web (Chromium Privacy Sandbox): "The notion of "First Party" may expand beyond eTLD+1, e.g. as proposed in First Party Sets. It is reasonable for the browser to relax its identity-sharing controls within that expanded notion, provided that the resulting identity scope is not too large and can be understood by the user."

  • Edge Tracking Protection Preview: "Not all organizations do business on the internet using just one domain name. In order to help keep sites working smoothly, we group domains owned and operated by the same organization together."

  • Mozilla Anti-Tracking Policy: "A first party is a resource or a set of resources on the web operated by the same organization, which is both easily discoverable by the user and with which the user intends to interact."

  • WebKit Tracking Prevention Policy: "A first party is a website that a user is intentionally and knowingly visiting, as displayed by the URL field of the browser, and the set of resources on the web operated by the same organization." and, under "Unintended Impact", "Single sign-on to multiple websites controlled by the same organization."

UA policies are at the discretion of each browser, but since this proposal does require UA policies to be in alignment, making any required adjustments to those policies is important. Specifically, the requirement of ownership by a single organization has a variety of issues. These are laid out in the following issues:

WICG/first-party-sets#18
WICG/first-party-sets#17
WICG/first-party-sets#14

While it is obviously up to each browser vendor to decide on its own UA policy, removing this requirement does not seem to have a negative impact on overall privacy considerations. Two ways to mitigate for the removal of this requirement are:

  • Shared and declared privacy policy across all domains
  • Removal of cross-site privileges from all domains except the owner domain

Additionally, the robust, enforceable requirements of the First-Party Sets proposal remain:

  • Signed assertions by a trusted verification entity
  • Sites being able to join only a single site group

Given the UA policy, policy decisions must be delivered to the user’s browser. This can use either static lists or signed assertions. Note that site group membership requires being listed in the manifest in addition to meeting UA policy. This allows sites to quickly remove domains from their site group set.

Shared Privacy Policy

More so than organizational ownership, a shared privacy policy across all domains in a site group can give a user comfort that their data is being used in a consistent and understandable way within the site group. Additionally, by including a shared privacy policy within the json declaration, would allow any browser UI elements to include direct links to the policy for users to inspect.

Parties that were interested in verifying that a site group was "well-behaved" could easily validate that all member sites did indeed adhere to the shared privacy policy, and report violations to browsers directly or to any entities managing signed assertions for site groups (see below).

A given domain within a group could implement a stricter privacy policy than the site group-shared policy, but could not relax any of the policies from a sharing/transfer/usage perspective. This could also be validated by interested parties and external assertion-management entities.

This idea could be extended to give users direct links to "forget me" from the site group or similar functionality within the browser.

Static lists

The browser vendor could maintain a list of domains which meet its UA policy, and ship it in the browser. This is analogous to the list of domains owned by the same entity used by Edge and Firefox to control cross-site tracking mitigations.

A browser using such a list would then intersect first-party set manifests with the list. It would ignore the assertions field in the manifest. Note fetching the manifest is still necessary to ensure the site opts into being a set. This avoids problems if, say, a domain was transferred to another entity and the static list is out of date.

Static lists are easy to reason about and easy for others to inspect. At the same time, they can develop deployment and scalability issues. Changes to the list must be pushed to each user's browser via some update mechanism. This complicates sites' ability to deploy new related domains, particularly in markets where network connectivity limits update frequency. They also scale poorly if the list gets too large.

Signed assertions

Alternatively, the browser vendor, or some entities it designates, can sign assertions for domains which meet UA policy, using some private key. A signed assertion has the same meaning as membership in a static list: these domains meet the signer’s policy. The browser would trust the signers’ public key and, as above, only accept domains covered by suitable assertions.
Assertions are delivered in the assertions field, which contains a dictionary mapping from signer name to signed assertion. Browsers ignore unused assertions. This format allows sites to serve assertions from multiple signers, so they can handle policy variations more smoothly. In particular, we expect policies to evolve over time, so browser vendors may wish to run their own signers. Note these assertions solve a different problem from the Web PKI and are delivered differently. However, many of the lessons are analogous.

As with a static list, signers maintain a full list of currently checked domains. They should publish this list at a well-known location, such as https://sg-signer.example/site-groups.json. Although browsers will not consume the list directly, this allows others to audit the list. The signer may wish to incorporate a Certificate-Transparency-like mechanism for stronger guarantees.
The signer then regularly produces fresh signed assertions for the current list state. For extensibility, the exact format and contents of this assertion are signer-specific (browsers completely ignore unknown signers, so there is no need for a common format). However, there should be a recommended format to avoid common mistakes. Each signed assertion must contain:

  • The domains that have been checked against the signer’s policy
  • An expiration time for the signature
  • A signature over the above, made by the signer’s private key

Assertion lifetimes should be kept short, say two weeks. This reduces the lifetime of any mistakes. The browser vendor may also maintain a blocklist of revoked assertions to react more quickly, but the reduced lifetime reduces the size of such a list.
To avoid operational challenges for sites, the signer makes the latest assertions available at a well-known location, such as https://sg-signer.example/assertions/. We will provide automated tooling to refresh the manifest from these assertions, and sites with more specialized needs can build their own. To support such automation, the URL patterns must be standard across signers.

Note any duplicate domains in the assertions and members attribute should compress well with gzip.

UI Treatment

In order to provide transparency to users regarding the Site Group that a web page’s top-level domain belongs to, browsers may choose to present UI with information about the Site Group owner and the members list. One potential location in Chrome is the Origin/Page Info Bubble - this provides requisite information to discerning users, while avoiding the use of valuable screen real-estate or presenting confusing permission prompts. However, browsers are free to choose different presentation based on their UI patterns, or adjust as informed by user research.

Browser UI elements can also expose the shared privacy policy for the site group, as well as a human readable name of the site group that would desirably match any cross-site branding. This would hopefully give users even more context about the site group, how their information is used, and why the site is part of a given site group.

Note that Site Groups also gives browsers the opportunity to group per-site controls (such as those at chrome://settings/content/all) by the site group boundary instead of eTLD+1, which is not always the correct site boundary.

Pop-up Tracking Protection

Bringing over from the comment here: #6 (comment), and to fit with the guidelines of this repository, offering a proposal to prevent it.

Pop-up tracking

Third parties are able to gain access to first-party storage using the following method:

  1. The content publisher's page embeds a third-party script from tracker.example.
  2. The third-party script tries to read third-party cookies for tracker.example.
  3. If it can't, it injects a tracker.example iframe on the publisher's page.
  4. User clicks on content in the iframe (intentionally or via click-jacking).
  5. Using window.open, a new tab/window is opened for tracker.example.
  6. tracker.example window is now first party and can read or write cookies.
  7. tracker.example window accesses a function on tracker.example iframe, via window.opener, to pass an identifier.
  8. tracker.example window closes itself.
  9. Identifier can be passed to initial third-party script via postMessage and stored in first-party storage for continued tracking on the site.

The pop-up needs only to be open for a very short amount of time. However, this method can still be used by third parties that the user wishes to interact with, in a less covert manner, but without their express wish to be tracked.

Proposal

On the condition that a third-party iframe uses window.open (or anchor with target not "_top") without the "noopener" feature to open a new browser tab or window, the resulting window, when navigated to the same third-party domain as the iframe, should be considered to be running in a third-party context for the sake of storage.

Because having full script access to the window.opener is the offending article, the new window will no longer be able to break the barrier between the first-party and third-party contexts.

This will not affect pop-ups that were opened by a first-party, or opened with the "noopener" feature.

Considerations

  • Would this affect oAuth implementations?
  • Should this only affect "known trackers"?
  • Should this be implemented only with the existence of an alternative method of gaining access to first-party storage (i.e. Storage Access API), either within the iframe or the pop-up.

Extending Storage Access API (SAA) to non-cookie storage

I'd like to propose the adoption of Extending Storage Access API (SAA) to non-cookie storage by the Privacy Community Group.

This work is being prototyped in Chrome as of today and was discussed at TPAC 2023.

Summary of Proposal:

We propose an extension of the Storage Access API (backwards compatible) to allow access to unpartitioned (cookie and non-cookie) storage in a third-party context, and imagine the API mechanics to be roughly like this (JS running in an embedded iframe):

// Request a new storage handle via rSA (this should prompt the user)
let handle = await document.requestStorageAccess({all: true});
// Write some cross-site localstorage
handle.localStorage.setItem("userid", "1234");
// Open or create an indexedDB that is shared with the 1P context
let messageDB = handle.defaultBucket.indexedDB.open("messages");

The same flow would be used by iframes to get a storage handle when their top-level ancestor successfully called rSAFor, just that in this case the storage-access permission was already granted and thus the rSA call would not require a user gesture or show a prompt, allowing for “hidden” iframes accessing storage.

Browsers currently shipping the Storage Access API apply varying methods of when or how to ask the user for permission to grant 3p cookie access to a site. Given that this proposal involves extending the existing Storage Access API, while maintaining largely the same implications (from a privacy/security perspective) to the user, a consistent prompt for cookie and non-cookie access is preferred. No prompt is needed when the origins are RWS (Related Website Sets, the new name for First Party Sets).

Privacy by design with browser-managed E2E encryption with FIDO Protocol and Hardware keys

Hi all!

1. feature-name

Privacy by design with browser-managed E2E encryption with FIDO Protocol and Hardware keys

2. feature-description

2.1 In summary

  1. I open an issue here 1170559926 talking about some technical stuff, some abstract views and concepts.
  2. Open a new issue here to make these points of view clearer and more objective

2.2 Concepts

The security/privacy risk is still present: compromised front-end code can still intercept user data (even with E2E encrypted apps: listening to DOM changes is enough)

  • If the browser is integrated with an offline authentication device, would it be possible to solve this security issue?
  • In case of logging in without passwords, we can usually authenticate with a usb device.
  • If the browser has E2E on an offline device - maybe it increases security?

A new threat model emerge, where web apps may leak data via QR codes, file system access, and so on. These threats
already exists, but they risk to become more intrusive than before. Anyway, web apps already get flagged as
malicious when they try to harm users so mitigation already available.

  • Maybe 'E2E encryption into web browsers+FIDO Protocol and Hardware keys' can be an alternative to solve this kind of problem?

2.3. Notes

  1. I would like to know if this idea is good or bad
  2. My goal is to help this communities: PrivacyCG, Solid, W3C, WIGC, WebAuthn, KeepPass, Browsers(Brave, Vivaldi, Opera, Mozilla Firefox, Libre Wolf, Google Chrome etc)
  3. I didn't find any link, resource for this here in PrivacyCG Community
  4. I'm not promoting any company, service, product, solution, idea here - just adding the bibliographic reference links
  5. If I'm wrong about something, speak up, criticize, correct

3. References

Privacy policy discovery.

It would be ideal if sites' privacy policies were more discoverable to users, their agents, and to crawlers. To that end, I'd suggest that we:

  1. Pave the rel=privacy-policy cowpath (based on HTTP Archive data, this appears in at least 285,421 distinct documents) by defining a privacy-policy link type.

  2. Define a well-known URL that redirects to a host's privacy policy (e.g. /.well-known/privacy-policy).

There's quite a bit that could be done beyond discovery of course, but these two steps seem small, simple, and relatively easy to adopt.

I've written this up in a little more detail at https://mikewest.github.io/privacy-policy-discovery/, but there's not much to that document beyond what's written here.

WDYT?

Privacy-Safe Storage API

Privacy-Safe Storage API

Privacy issues on the web occurs because an entity accessed data about a user that didn't wanted to share those data with that entity. However, if a user wants to visit a website, it has no choice other than trust the website to use it. Therefore, the website owner is the only one responsible for what happens to data about users of its website.

This proposal aims to solve this problem by enabling the website to on purpose loose its cross-origin and network access in exchange to an access to a long-term storage and different browser UI.

Main goal

  • Protects users from data leaks even when a frequently visited websites is corrupted (even if the attacker manages to keep network access, it won't be able to access user data)
  • Offers an strong alternative to authentication for PWAs and decentralized applications (simpler to implement, transparent for the user)
  • Allows users to safely enter sensitive information on an untrusted website
  • Allows websites and web applications to engage with more users thanks to the new privacy-by-design architecture enforced by this API design
  • Be incrementally adoptable

This proposal takes some ideas from other proposals but is more general:

  • the Fenced Frames proposal suggests a new HTML node type, and therefore apply only to embedded frames. Instead, the Privacy-Safe Storage API suggests a JavaScript API that a main frame may call at any time. Also, this proposal might not need Web Bundles to be implemented.
  • the Shared Storage is really close to this one as it proposes a storage only accessible in a secure environment. The Privacy-Safe Storage API adds a new secure environment context: an offline main frame. However, the Shared Storage is currently designed around worklets, whereas the current proposal focuses any kind of contexts (frames, workers, service worker, and so on).

Design

Proposed surface API

const storage = await navigator.storage.requestPrivacySafeStorage();

Calling the above API have the following effect:

  • Embedded frames created before access to the storage will continue to work without any change, however the main frame won't be able to communicate with them anymore.
  • Dynamically created frames won't load if created while the Privacy-Safe Storage is accessible. In the same way, workers created after the Privacy-Safe Storage is accessed will have access to this storage but the same restrictions as for the main frame will apply.
  • The Service Worker also looses cross-origin and network access, but can still work with its cache, as if the user where really offline. The main frame can still issue fetch call for instance since the network restriction apply at the Service Worker level. Other frames and workers that don't have access to the Privacy-Safe Storage will either use another instance of the Service Worker with network access, or not use a Service Worker at all (to be discussed).
  • In order to avoid leaks, only same origin links that open in the current frame are be opened. (to be discussed too, see the Fenced Frame equivalent)
  • All other storage APIs and cookies are either inaccessible or limited to reads only. Another solution is to have 2 storage instances per website, the first accessible from a non secure context and the second from a secure context.
  • The main frame can't communicate with cross-origin and non secure contexts, but the other way is allowed. This way, it is possible to have a worker that forwards messages for a websocket connection to the main frame, but is unable to get a any form of response from the main thread. Therefore, analytics collection is still possible (online session duration, web page loading time, browser information) but without access to sensitive information.

The storage API is yet to be discussed, but something like Local Storage API may be a good starting point.

Key scenarios

General use case
  1. A user visits a page on website A.
  2. The web page starts service worker installation.
  3. While the page loads, the service worker caches resources needed by the website to work offline.
  4. The web page is displayed and all resources are loaded, the web page requests an Privacy-Safe Storage access.
  5. The browser drops any cross-origin and network access of the main frame, and updates its UI to indicate to the user that anything it do as of now is full private.
  6. The browser returns a handle that give access to the Privacy-Safe Storage to main frame.
  7. The website can do anything with what's inside the storage or with other APIs, nothing will leave the browser.

Adoption strategy

This proposal tries to mimic what happens when websites works offline, thus websites that already works offline won't have to change their code a lot to use the feature proposed here. For instance, web applications like those below shouldn't require a lot of work to adopt this feature:

  • Single or local multiplayer games, text/image/video/music editors, notepads, public newspapers: no need for a "write access" to the network, and local analytics are still possible. Also, using the feature proposed here can attract users thanks to the updated UI and privacy guaranties. Also this proposal is compatible with proposals like TURTLEDOVE, so ads may still be displayed safely without breaking safety guaranties.
  • Passwords managers, authenticators (with OTP): security is critical, attacks on those web applications are highly valuables for attackers. When using the Privacy-Safe Storage API, users are guaranteed the remote server don't have any sensitive information that may leak.

Furthermore, this feature can be easily turned off if it don't match expected needs, for instance after an origin trial.

FAQ

What would happens if a third-party script call the proposed API?

This proposal assume it's the website developer responsibility to be careful when choosing external libraries. Also, from user point of view, the only risk is to see a broken website; the third-party script won't be able to leak storage information anyway.

Maybe an attacker can't retrieve storage content, but it can still alter/delete its content

Third-party libraries can already alter/clear localStorage storage, but most don't. Like above, this is outside of the scope of this proposal.

Will the browser offer a new permissions access level, such as "privacy-safe context only" when accessing the camera for instance?

That would be great, but this proposal primarily focuses on offering the storage mechanism first. (to be discussed too)

What is the purpose of such a feature since there's already the Local Storage, IndexedDB, and so on for long-term storage?

The main difference is that: (1) API design and behavior prevents the Privacy-Safe Storage content from leaving the browser, and (2) the browser UI helps the user feeling more comfortable when a secure website ask some information.

If this proposal gets enough attraction, maybe another proposal would request enabling "clear cookies at the end of the session" parameter by default.

What about data redundancy (backups, multi-devices)?

Since user data would be stored client-side, this can indeed be problematic in some cases (damaged devices, accessing the same site with multiple devices at the same time, and so on). This isn't a goal of this proposal, especially since most browsers offer profile synchronization.

Another possibility is to update the API to let the non-secure context of a website request encrypted snapshots of the Privacy-Safe Storage for its origin. Therefore, the website become responsible for dealing with the storage synchronization across devices and backup without having access to the storage content. (to be discussed)

Speculative Request Control

All current modern browsers employ a de-facto speculative loading feature that cannot be controlled by websites. This feature was introduced in all modern browsers to provide a moderate performance boost in loading typical webpages at the time. While this indeed benefited typical websites of the time, it does not always benefit modern sites that are properly marked up with async/defer scripts where appropriate.

Websites should be able to opt-out of eager speculative requests and be able to accept responsibility for their own site performance.

Giving site owners control over eager speculative requests improves the security implications of generating dynamic <meta> CSPs at runtime based on private locally-stored tracking consent data. Currently, client-side-generated <meta> CSPs are effectively unenforced until DOMContentLoaded due to eager speculative requests. With eager speculative requests disabled, these CSPs can be effectively applied and enforced immediately.

What is an eager speculative request?

An eager speculative request is a speculative request that is sent out before preceding synchronous scripts finish executing.

Motivating Use Cases

The motivating use case for this feature is to increase the ease at which sites could adopt a CSP based on locally-stored consent provided by a third party JS library. In this use case, we can assume that the library vendor and site owner have taken the time to explicitly preload resources asynchronously where appropriate, as they must knowingly disable eager speculative requests.

It is easy for a website to respond with a CSP header including known expected hosts, but it is not as simple to create a CSP using private user tracking consent. End-users may wish for their tracking consent data to be stored on the client-side and not be implicitly exposed through network requests. It is possible to create a client-side JavaScript library (e.g. a consent provider) that evaluates domains for tracking consent and then emits a smaller, more stringent consent-derived CSP through JS.

Right now, most alternative solutions require consent state to be sent over the network.

More in my explainer draft.

Fingerprinting resistance

Most browsers have a variety of similar-but-different fingerprinting protections, which make things unpredictable for site authors and cause web-compat problems on all browsers.

This deliverable would be for PrivacyCG to (i) document, (ii) standardize and (iii) extend these fingerprinting protections.

This deliverable would aim (i) to give web authors had a common set of expectations to target, (ii) make it easier for browser users to understand protections in platform, and (iii) reduce web compat problems fingerprinting protections cause, though a "strength in numbers" approach.

A "fingerprinting resistance mode" wouldn't need to be an explicit "mode" either. Some vendors might wish to have "fingerprinting resistance" enabled all the time, others might alias it with "private browsing", etc.

Suggested and User-Specified Hierarchical Interests (SUSHI)

This proposal builds on ideas from other proposals including PAURAQUE and Ad Topic Hints. It is centered around a hierarchical set of topics of interest that can be used in selecting relevant ads. This hierarchy might look something like the IAB's Content Taxonomy. It will work best in supporting interest-based advertising in a privacy preserving manner when combined with a proposal such as PARAKEET, which can support serving interest-based ads without allowing the site to tie those interests to the user. It might also be possible to adapt it for use by one of the TURTLEDOVE variants.

Suggested Interests

For a new user, SUSHI works similarly to PARAKEET and TURTLEDOVE in that advertisers (or their DSPs) can suggest topics that might be of interest to the user, and the browser remembers these suggestions. An important difference though is that the suggestions must be from a standardized hierarchy of ad topics. In most cases, the hierarchy will not be detailed enough to identify specific products such as a particular designer athletic shoe, but will instead be limited to the type of shoe, such as basketball shoe. This reduces the creepiness factor of having the same pair of shoes follow you around on the internet. Obviously if a particular advertiser only sells a single shoe of that style, then the shoes may still follow you.

Publishers will use the same interest hierarchy to suggest contextual topics that might be of interest to the user. However, in addition to using the topics for serving ads on the current page, the browser will remember the suggested topics to build an internal profile of the user's interests.

Over time, topics that are of particular interest to the user are likely to be suggested by multiple different websites. This will be more common at higher levels in the hierarchy. For example, different sites might suggest

  • sports→baseball→pros→Giants
  • sports→football→pros→49ers
  • sports→football→college→Stanford
  • sports→football→college→Berkeley

In this case sports has been suggested four times, football three times and college football twice ("pros" isn't counted as occurring twice, because its two occurrences are in different subtrees of the hierarchy). With enough suggestions from enough different sites, the browser can start to identify recurring topics that are likely to be of the most interest to the user.

In some instances, interest can be surmised from recurring topics suggested by a small number of publisher sites that the user visits frequently. However, this should not be implemented for advertiser suggestions. If only a single advertiser is suggesting a topic, they don't want their competitors to be able benefit from the suggestion, enabling the competitor to serve ads that might steal away the customer. Thus, advertiser suggested interests should remain exclusive to the advertiser until the same topic is suggested by a sufficient number of different advertisers and/or publishers.

Ranking

Browsers should keep a list of suggested topics for at least 30 days (unless cleared by the user) and use these to determine topics of interest to the user. Suggestions should only be considered for sites with which the user has interacted (so that sites cannot game the system by redirecting through a number of sites and making the same suggestion on each of them). A primary weighting factor should be the number of different sites that suggest a topic. The computation should also give more weight to recent suggestions. Frequent suggestions from a small number of sites that a user visits often can receive extra weight as well.

For example, assume that a user has visited 100 unique websites in the last 30 days, but only 10 in the last week. Of those 10, the user has visited 3 on at least 15 different days over the last month (every other day on average). Suggestions from these 3 should carry the most weight, followed by the 7 that were also visited recently. The remaining 90 should contribute less.

Suggestions older than 30 days should be forgotten, except for one use case. Many interests are cyclical/seasonal, such as sports leagues or holidays. A user may show very high interest in a particular sport or team until the season ends, after which little interest is shown until the start of the next season. If the browser keeps track of previous, strong interests, it may accelerate restoration of the topic's ranking when the user starts to show interest again at the start of the next season, rather than requiring it to start over from zero.

Each advertiser and/or publisher page should only be able to contribute a limited number of suggestions. If one page suggests five topics, while another only suggests one, the single topic might get five times as much weight as any of the five. Alternatively, we may want to allow the site suggesting five topics to rank its suggestions, perhaps assigning a weight of 50% to the primary topic, 20% to a secondary topic and 10% to each of the remaining topics.

User-Specified Interests

The browser should display a special advertising icon near the URL/bookmarks area that the user can click on at any time that advertising is present to learn more about the ads on the current page and to see their advertising preferences. When clicked, this should show:

  • If SUSHI was used in displaying ads on this page
  • The contextual topics suggested by the current page
  • The topics shared with PARAKEET for the ads on the current page
  • The highly ranked topics that appear to be of the most interest to the user

The user should be able to click on any topic and see the sites that have suggested that topic. They should be able to examine the topic hierarchy and manually specify any topic that they are currently interested in. They should also be able to block any topic from ever being identified as of interest to them.

Topics that are selected by the user should receive a high weighting factor. If the user has specified multiple topics of interest, those that have also received numerous suggestions could have higher weights than those with fewer suggestions. This might be accomplished by giving a boost to the weights computed for each user-specified interest. If the user-specified interest is not a leaf node in the hierarchy, then an incremental weight should also be added recursively to all its children.

Weights should be set to zero for topics that are not of interest to the user, as well as for child nodes of that topic.

A more advanced UI might allow the user to specify a duration for increased or blocked interest. For example, the user could indicate that they are always interested in the topic or that their interest is immediate and should expire within a few weeks. Similarly, after completing a big purchase, they might want to flag that they are not interested in that topic for the next 30 days (at which point all suggestions that resulted from their research/comparison shopping into this topic will have expired).

Ad Serving

When a website requests an ad using SUSHI, the browser will call PARAKEET with some subset of the suggested interests. The subset should always include all of the contextual interests suggested for the current page (PARAKEET may filter some of these out if the combination is uniquely identifying). The browser will also randomly select a few topics that have a sufficient number of suggestions, with those that have higher computed weights having a higher probability of being selected. Advertiser-specific suggestions should include the advertiser and its ad networks, so that PARAKEET can provide those suggestions only to the appropriate ad network if available.

The browser should include three flags for each suggested interest:

  • Interest was suggested by current page
  • Interest was specified by the user
  • Interest randomly selected from among previously suggested interests

PARAKEET may decide to use these flags only to help identify attributes that should be filtered out because the combination is too unique, or it may also share them with the ad network(s). Availability of these flags might influence bids based on how the interest for the user was identified. For example, if a topic is suggested by the current context (first flag) and it's also a topic that has been suggested a lot previously for this user (third flag), then an ad related to this topic might be worth more. PARAKEET would not receive the actual weights assigned by the browser to the topics.

Because only a randomized subset of interests is shared and changes over time based on the suggestions from sites most recently visited, the set of interests cannot easily be used to uniquely identify the user. PARAKEET can further restrict unique combinations (especially of those suggested by the current page), but as the combinations will change with each call, it may need to do less filtering that it would using its current design.

The browser might exclude any interest that is blocked by the user from ever being shared, or if the topic was suggested by the current page as a contextual topic, it may allow it to still be shared (with only the first flag set), as not doing so might be identifying.

Clicks and Conversions

When an ad is served, it should include a list of topics that contributed to the ad being selected. If the user clicks on the ad, that should indicate a higher level of interest in those topics. Topics associated with the ad that were passed by the browser to PARAKEET should perhaps receive more of a boost than topics returned with the ad that weren't in this initial list. Note, however, that the enhanced suggestions should only apply to that particular advertiser, unless the topic has previously been (or in the future is) suggested by a sufficient number of unique domains.

This feature could be integrated with the conversion reporting APIs, such as the Google's proposed aggregate reporting API. These aggregate reports could include not only the ads viewed from this advertiser, but also the interests that inspired those ads. When the advertiser signals the conversion to the browser, the advertiser could provide a set of interests related to item(s) purchased. For each interest, the advertiser should be able to state whether interest is likely to continue or not in that topic. For example, after purchasing a big item, the user is unlikely to purchase another for a long time and that topic's weight should be decremented so that advertiser does not continue to bid on it. If interest is expected to continue, the topic should receive a boost, making it more likely to be shared in the future (again only for this advertiser, unless it is a common interest).

After either clicks or conversions, the advertiser should be able to tell the browser to remove advertiser-specific attributes of a specified interest topic, so that they don't pay for continuing ads that they no longer feel are likely to be productive.

Hierarchy Detail

Interest Reporting across Hierarchy Levels

When the browser reports to PARAKEET that a user is interested in a particular topic that happens to be a leaf node in the hierarchy, that implies that the user is also interested in all the topics of each higher-level node up to the root. For example, someone that is interested in the 49ers is also interested in football and in sports. In fact, using the rankings algorithm described above, those more general topics are going to be at least as likely and generally more like to be suggested than the 49ers topic itself. However, while higher levels in the hierarchy are more likely, advertisers will pay more for lower levels of the hierarchy, as they allow for more focused ads.

To assist in favoring reporting interests lower in the hierarchy, when sufficient interest in those lower levels has been suggested, we might take a couple of different approaches. One approach would be to simply reduce the weight of each suggestion for nodes higher up the hierarchy. For example, a suggestion for a leaf node gets a weight of 10, while its parent gets a 7 and its grandparent gets a 4. However, this has the drawback that if the user is really interested in the higher level topic, and doesn't spend a lot of time in any combination of the lower-level topics, it may take longer for the general interest to manifest itself.

Another option would be to start the topic selection process by randomly selecting from only the top nodes in the hierarchy according to their weights. For the selected node, chose one of its child nodes with a probability of each node's weight relative to that of its siblings. Repeat down to leaf nodes. Include only the final node in the ad request to PARAKEET. For example, if there are five child nodes, which have weights of 10, 7, 3, 0 and 0, then the first node would be selected 50% of the time, the second 35% of the time and the third 15% of the time, with the last two never being selected, because the user has not shown any interest in them thus far. We could further modify this so that some percentage of the time we don't select any child node and instead report the parent. The probability of choosing the parent should increase as the combined weight of the children decreases, as this means there has thus far been little interest in the lower levels. This node selection process would be repeated 5-10 times so that the call to PARAKEET could include up to 5-10 topics (the actual number sent to PARAKEET could be fewer if the algorithm selects the same topic multiple times, which is likely for topics for which the user has shown a lot of interest).

Hierarchy Levels

The current IAB hierarchy is not detailed enough to support this proposal. It probably requires at least one more level of detail in many branches of the tree. For example, the automobile section might add another level for the make of the car, while the sport sections might add a level for specific teams and the shoe sections could specify different styles of shoes (dress, casual, sports, with an additional level below these).

I also envision a method where ad networks could support a limited number of custom values one level deeper than the public hierarchy. There would be limits on the number of unique nodes that an ad network could provide, and these nodes would not be available to other ad networks, but the higher levels of the hierarchy at least would still get a boost when these lower levels are set. If there is sufficient interest in pursuing this, I can provide more details.

Privacy and Usability Properties

Explainability

When a user wants to know why they saw a particular ad, the browser can show them the interest(s) that resulted in that ad. For each interest, they can see the sites that suggested the user might be interested in that topic and the number/percentage of times that topic was suggested by each site.

User Profile

Within the limits of PARAKEET and Fenced Frames, even if an advertiser, ad network and/or publisher collude, they should not be able to tie ad requests back to specific users (except when the user clicks on an ad). Thus, these parties should not be able to use the interests to build a profile of the user. Because the interests change for each ad request and over time, fingerprinting should not be feasible.

Example Algorithm

The complete algorithm might look something like:

for each day of the last 30 days
    for each site visited on this day
        for each suggested topic
            compute the sum of weights for the topic (including all of its child topics) for each page (of this site)
            divide result by the square root of the number of pages visited on this site
            multiply weight by sqrt(30 minus days from current day) / sqrt(30)
for each suggested topic
    compute the sum of all of the above weights
normalize each topic's value to a value between 0 and 1
for each clicked ad
    for each topic on this clicked ad
        move topic's value 10% closer to 1
for each topic
    if user has flagged topic as interesting (or any of its ancestors)
        move topic's value 50% closer to 1
    else if user has flagged topic as blocked (or any of its ancestors)
        clear topic's value
    else if topic's value is smaller than some delta
        set topic's value to delta, so that it has a non-zero chance of sometimes being selected.
for each non-zero topic
    count unique publishers recommending topic
    count unique advertisers recommending topic
    if unique advertisers is greater than 0 and less than 5 and unique publishers is less than 5 and user has not flagged topic as interesting
        restrict topic to only these advertisers and publishers

Storage Access Headers

In the current Storage Access API specification, sites must explicitly opt into accessing their unpartitioned cookies. This explicit requirement is an intentional protection against CSRF, so that embedded iframes can choose whether they want to access unpartitioned cookies or not. However, there's currently only one way to opt into accessing unpartitioned cookies: by calling a JavaScript API. This effectively means that only embedded iframes can opt in, and they have to execute some JavaScript before being allowed to access cookies - even if the user has already granted the relevant permission. This causes unnecessary network traffic and resource loads/reloads.

At TPAC, I presented some slides (see slides 7-12) on ideas for new ways to "opt in" or "activate" an existing permission through the Storage Access API. In particular, the use of an HTTP response header instead of requiring a invocation of the JavaScript API. This generally got positive feedback (IMO) and other browser vendors seemed interested in exploring the ideas, in particular suggesting a complementary request header.

Explainer: https://github.com/cfredric/storage-access-headers

I'd appreciate any additional feedback, comments, concerns, etc. Thank you!

High-level trust signal (requestInstall / requestTrust)?

Based on some of @michaelkleber 's comments in the last meeting, and some of the broader conversations about user permission, I wondered if we could view non-login state and login state as being on a spectrum. In this model, a site first asks to be "installed" or "added to my browser" or "trusted" or "remembered" and then later or simultaneuosly (possibly) can indicate the user is logged in. This is sort of a crazy idea but maybe it will spark some others.

Background

A few concerns have come up with isLoggedIn

  1. "Login" doesn't cover all of the cases where a site may want less aggressive state deletion (shopping carts, paywalls, avoiding MFA on subsequent logins)
  2. Sites may be motivated to and able to lie if a login flow must not be browser-mediated
  3. Sites may force users to create accounts (or install apps) just to persist state, resulting in more tracking
  4. If isLoggedIn provides binary information about authentication in third-party contexts, it may be vulnerable to fingerprinting

Some of the above issues are being solved by prompting users, but there is understandable reluctance to confront users with cryptic prompts they may be predisposed to accept without reading.

Idea

However, there is a place where users already grant permission in a way that feels (and is) weighty and provides a conceptual model we can leverage, which is installing an app or browser extension. Although generic and a bit heavy, it sets a gate for sites to do more that doesn't rely on heuristics or purpose-built APIs.

  1. A site can display a button asking a user to "install" the site that calls a requestInstall API. This would replace the current PWA concept of adding a site as an pseudo-app, which could be a chekbox option when "install"ing.

  2. The resulting prompt for installing a site would be fairly heavy, similar to installing an app, and it could have real estate for browsers to provide additional trust information or warnings.

  3. By default, an installed site might have a small number of additional rights

  • Longer cookie duration
  • The right to further prompt the user for permissions that non-installed sites in this new world don't get to ask for (say, notifications)
  1. This could be sufficient for most login, paywall meter, "seen this person before", cookie consent, analytics, and shopping cart cases. Sites currently pushing registration upfront could move to pushing anonymous "installs" instead.

Extensions

  1. In this model, isLoggedIn might only be for login information that wants to be advertised in a third-party context to know who to prompt for StorageAccess. A site could request to set this bit, and if the login flow is not mediated by the browser, the user could be prompted to allow this permission to be set. Gating the right to even ask for this permission on being "installed" reduces risk of permission spam and fingerprinting, especially when coupled with reminder UIs when this bit is accessed.

  2. If the "install" flow truly feels as weighty as installing an extension (possibly including manual review), users could even be subsequently prompted to grant wildcard StorageAccess across the web to trusted sites, just as they effectively can today with browser extensions or URL handlers, but in a better sandbox and with more visibility into when it is accessed.

  3. "Installed" sites could also receive the right to ask for more fingerprinting-prone APIs, thereby limiting the surface area over which they can be used for correlation and reducing permission spam.

  4. Browsers could automatically "install" sites that a user visits frequently with an "undo" option to reduce the hassle of managing all of this.

  5. Browsers could still purge "install" state for unused sites.

  6. As part of "install"ing, users could agree to share that permission across a listed first-party set of sites.

  7. Users could be prompted to "install" sites as part of logging in to a site. Maybe browser-mediated logins default to also installing the site. This is useful for banks or video services where almost all functionality is login gated anyway.

  8. Users could mange a list of "installed" sites in their browser.

  9. A user noticing they're on a site that is not installed might be a signal in phishing avoidance.

Concerns

The risk here is that the incentives for being installed are so high that sites push this quite hard and users do not net increase their privacy. The key is for the install barrier to be high enough and for policing by search engines to be strong enough to mitigate this. We saw this happen with native mobile apps, where after an initial rush of user-hostile install prompts, things settled at a low-key equilibrium.

Another risk is that the whole concept of "installing" a site is confusing and possibly antithetical to the web. It could raise the perceived weight of checking out a new website and managing the sites you've been to. Sites would also have to manage when they want to ask to be added to the browser vs being added as a full app.

Maybe "install" is just too overloaded a concept and a browser-mediated "remember me on this site" or "trust this site" or "save this site" that behind the scenes works the same is close enough? This is also discussed in privacycg/is-logged-in#9

Another risk is that if browsers continue to improve isolation between sites, then it would be less necessary to purge first-party state, and then some of this machinery is less useful.

Final thoughts

A lot of challenges arise from trying to prevent lowest common denominator behavior on the web by sites a user visits transiently or are known bad actors while also empowering sites a user has a trusted relationship with. At the same time, we're trying to slice permissions finer while also recognizing that users don't respond well to tons of prompts. It seems like sort of first-aproximation signal from the user of who to trust would be quite valuble for a lot of things.

The IsLoggedIn API

We'd like to work on our proposed IsLoggedIn API in this community group. The API allows websites to inform the web browser of the user's login state.

Currently, web browsers have no way of knowing if the user is logged in to a particular website. Neither the existence of cookies nor frequent/recent user interaction can serve that purpose since most users have cookies for and interact with plenty of websites they are not logged in to.

Why do browsers need to know, you may ask.

The current behavior of the web is “logged in by default,” meaning as soon as the browser loads a webpage, that page can store data such as cookies virtually forever on the device. That is a serious privacy issue. Long term storage should instead be tied to where the user is truly logged in.

There could be other powerful features and relaxations of restrictions besides storage that the web browser only wants to offer to websites where the user is logged in.

The ability to do these things requires knowledge of where the user is logged in.

Full explainer, open issues we'd like to discuss, and a summary of previous feedback is found here: https://github.com/WebKit/explainers/blob/main/IsLoggedIn/README.md

Edit: Updated the link.

isKnown state

This builds off of the conversation around the isLoggedIn proposal by @johnwilander at https://github.com/privacycg/is-logged-in

This is a rough sketch here but I think we would like to create a more private way to understand a user as a repeat visitor to the site without having to invade the user's privacy by requiring they identify themselves.

I will explain with the use case I am most familiar with, paywalls. This is one case, but I do not think it is the only case. It is a distinct case from the one I see being discussed at privacycg/is-logged-in#9 which is a reaction to a logged in state. The isKnown state, as I am currently proposing it, would not require a user to log in.

Many publishers use paywalls, these are important to maintain a profit for many publishers. The way this works right now is a site lays out a cookie, that cookie records the number of visits and when the number of visits exceeds the number we wish to allow freely then we can trigger a paywall and block access to content.

This process suffers from potentially the same issues that maintaining a logged in state does (which is why the isLoggedIn is a useful reference):

  • It is often maintained via a fingerprinting process (in the same way that sites attempt to retain the user's login outside of the cookie by profiling and recording information about their device) which is privacy invasive.
  • It is dependent on cookies or storage in such a way that it could potentially leak additional data about the users to third parties.

It contains an additional threat to privacy in that maintaining a connection with the user outside of login that requires tracking the user without their explicit consent (in UI terms, not diving into the legal side of this right now), which opens up problems that don't gel well with the future privacy-first web.

What I would like to have instead is a very simple non-invasive way to handle this, one not dependent on cookies or storage. This is important because, as we are already starting to see in anticipation of the post-cookie world, sites that feel they do not have some ability to meter access to content will push users to annoying UI on their first visit, asking them to log in to read even the first page they arrive on.

In a best-case scenario I'd imagine the isKnown state to be composed of two parts.

The first property is a single number which can be either incremented or reset one time per pageview by the site the user has landed on. This will prevent the site from trying to use the value for tracking or fingerprinting, but allow the site to own its definition of "known user" in a way consistent with how it is currently being done in the wild.

The second property is a single number that represents the period of hours (not seconds, not milliseconds, to avoid fingerprinting) since the last Known access.

Some potential flows:

New users:

  • User arrives at a site
  • Site sees isKnown.count == 0
  • User's isKnown is incremented
  • User's isKnown.count == 1

Returning users (short time period):

  • User arrives at a site for the second time
  • Site sees isKnown.count > 0
  • Site sees isKnown.time < 720
  • This means it is the second time the user has visited in less than 30 days
  • User's isKnown is incremented
  • User's isKnown.count == 2

Returning user (long period):

  • User arrives at a site for the second time
  • Site sees isKnown.count > 0
  • Site sees isKnown.time >= 720
  • This means it has been more than 30 days since the last visit
  • User's isKnown is reset
  • User's isKnown.count == 1

By having a way to measure access reliably while not relying on privacy invasive methods like fingerprinting we can better support an open web and legitimate publishers who would like to maintain users' privacy while also remaining profitable.

We've discussed this in calls a few time and I keep promising to write something up, but generally have been low on time to do so, so I'll keep this brief and try to hash it out via Q&A to see if it makes sense to advance this idea further.

Proposed Work Item: First-Party Sets

First-Party Sets is a web platform mechanism that allows a set of registrable domains (or origins) to be defined as "first-party" to each other. Our primary motivation for this proposal is to define a privacy boundary that allows browsers to eliminate cross-site tracking that currently relies on mechanisms such as third-party cookies and fingerprinting. Tracking policies and privacy models from various browser vendors - Chromium, Edge, Mozilla, WebKit - scope access to user identity to some notion of first-party , which we refer to as a privacy boundary.

Although the top-level document’s registrable domain can act as a natural privacy boundary; it is clear that multi-domain sites are a reality, which compels us to define a better alternative. For example, Firefox ships an entity list to group together domains belonging to the same organization.

Organizations generally prefer maintaining distinct domain names to manage branding, or to allow for future business sales/acquisitions. Additionally, choosing the registrable domain as the privacy boundary may compel organizations to move all their web properties to a single parent domain. The parent domain that a property is hosted on may change with business ownership, and train users to make security decisions based on the subdomain component of URLs. This could make them more susceptible to phishing attacks.

First-Party Sets allows site operators to assert a list of domains as being associated with the same entity. This then allows us to define a top-level document’s First-Party Set as the privacy boundary. Browsers may choose to not impose cross-domain communication restrictions across members of a given First-Party Set (such as is done in practice with disconnect.me’s extension, Firefox ETP’s use of the entity list, and Edge Tracking Protection’s similar exception for same-party domains). However, it is important to apply a set of countervailing pressures:

  • Preventing abuse by unrelated websites forming a First-Party Set - This is achieved by requiring every organization to submit their list for acceptance based on conformance with a published UA policy.
  • Making site associations visible to the user - This is achieved by making First-Party Sets discoverable via various browser UI surfaces.
  • Discourage formation of arbitrarily large sets by imposing storage and entropy limits - Browser storage limits and entropy limits such as the proposed Privacy Budget that are currently applied per-domain are applied per First-Party Set

First-Party Sets has recently been the subject of discussion on various forums; including at PrivacyCG F2F, and WebAdvBG.

We have been working to incubate First-Party Sets in WICG, and it was recently transferred there: https://github.com/WICG/first-party-sets

We'd like to propose that the Privacy CG discuss it and see if the group would like to take it on as a Work Item.

Principles of User Privacy (PUP)

Dear friends,

I would like to propose the Principles of User Privacy (PUP) as a work item.

The goal of PUP is to provide a set of solid set of terms and tools to discuss and think about privacy, grounded in the domain's state of the art. With this document I hope to facilitate any and all conversations pertaining to privacy on the Web.

There is no specific conformance class for this document, it is intended to inform principled decisions, in line with TAG documents or RFC 8890.

Analytics mode

There are lots of APIs that both (i) have concrete private risk, and (ii) are useful in the minority of cases to debug site, network or client issues.

This has caused lots of disagreement, heat and problems in privacy reviews.

Having an explicit "i am in debug mode, and I'd like to enable all the privacy-risk to help fix this problem, but just for a bit" option would help cut the knot

Web hardware revocation API

This proposal achieves privacy-friendly web hardware revocation (i.e., hardware ban). In particular, it makes a web servicer(i.e., web server) capable of blocking users who have previously abused them without users' privacy violations.

Background

As is well known, malicious actions on the internet are increasing, and it is a big problem. One of the factors that their prevention makes difficult is the user's anonymity. So servicer can't block users who have abused in the past because the servicer can't track the user.

The easiest way to solve this problem is to track the user. It means servicers require strong identification schemes of users like SMS or credit card authentication (i.e., 3D secure). However, it causes privacy concerns.

Thus, we need a method that blocks users who abuse in the past without tracking. In the mobile context, the DeivceCheck API of iOS satisfies them; they provide a hardware revocation scheme conscious of users' privacy. However, I can't find Web APIs like them. In addition, DeivceCheck API assumes common trusted execution comportment of devices, so many devices can't support it.

Idea

This idea is for Web APIs to provide a hardware revocation method without violating user privacy.

Mainly this idea consists of a cryptographic protocol and hardware registration protocol. The cryptographic protocol achieves revocation without tracking risk, but it assumes that the user doesn't have multiple secret keys. Therefore the hardware registration protocol limit number of distributed secret key to users to support the realization of the assumption.

The cryptographic protocol which this idea used is named anonymous blacklisting protocol. The most popular anonymous blacklisting protocol is EPID(Enhanced Privacy ID). EPID is a signature scheme that ensures user anonymity but revocability. First, EPID realizes strong user privacy. In EPID, there is one public key and multiple private keys. So the verifier can't track users because the same public key is used to verify all signatures. Second, EPID has strong revocability. The servicer (i.e., verifier) can revoke the user(i.e., signer) with the user's signatures which were used for malicious actions. Note that the verifier doesn't need to track or identify users.

Hardware registration protocol is for limiting the number of distributed secret keys to users. It assumes GM(i.e., Third Party for registration), and the user attests their device ID to GM and obtains the EPID secret key. Concretely, such attestation schemes are available, like TPM EK attestation, Android ID Attestation, or iOS DeivceCheck.

References

EPID:

TPM Attestation:

Android ID Attestation

DeivceCheck API

Note

I don't know if I should write it here, but this idea is strongly related to privacy.

Adopt Storage Access API as a deliverable

I propose we adopt the Storage Access API as a deliverable of the Community Group. It would facilitate discussion to have a separate repository for it, instead of trying to iron out all of the details in one issue thread at whatwg/html#3338.

The end goal would be to land pull requests in the HTML Standard and other applicable specifications.

Referrer trimming

Referrers leak information about a user's browsing activity cross-site. Browser vendors have deployed a variety of mitigations to this vulnerability, ranging from spoofing the referrer to be same-origin to changing the default referrer policy.

Current status (please correct or add to this):

Last update: 2021-04-12

Firefox

  • Referrers default to strict-origin-when-cross-origin.
  • Trim document.referrer down to eTLD+1 when we observe tokens that can be used to circumvent our cross-site tracking protections.

Chrome

Referrers default to strict-origin-when-cross-origin.

Safari

  • All cross-site subresource HTTP referrers are trimmed to the page's origin. This matches strict-origin-when-cross-origin, but sites can not override this setting.
  • Trim all cross-site document.referrer to the page's origin.
  • Trim document.referrer to eTLD+1 if the referrer has link decoration and the user was navigated from a domain classified by ITP. See ITP 2.3.

Brave

From this doc: Referrer values default to strict-origin-when-cross-origin and can only be tightened via referrer policy, not weakened.

Edge

Referrers default to strict-origin-when-cross-origin.

Future path

At the very least it seems like we can align on defaulting to strict-origin-when-cross-origin (see also: w3c/webappsec-referrer-policy#125). But even this default can still be overwritten by motivated adversaries. This leads to the question of why only change the default, and not permanently trim cross-site referrers with no way to override?

  • Do we expect to see a lot of breakage?
  • Are there legitimate uses of full cross-site referrers that we want to continue to support?
  • Can we somehow prevent abuse but still allow some parties to receive full referrer?

Referrer trimming: Edge's behaviour?

Hi @erik-anderson,
Relates to: #13
What's Edge behaviour as of today? Does Edge now default to strict-origin-when-cross-origin in Stable?

The Edge team has been experimenting with defaulting to strict-origin-when-cross-origin in our Dev and Canary channels. (comment April 2020)

Possible Intention Signal stronger than a simple user-gesture requirement

Users want to enable certain web features, developers want to make that process simple, while browsers want to mitigate spam and anti-patterns designed to exhaust or trick users into granting permissions they do not truly wish to give. This is an initial discussion on better ways to accomplish these goals.

Background

Some web features enable invasive or possibly exploitable access to users. In particular Web Push and PWA installation. Especially with Safari Mobile getting ready to release Web Push. They have made the choice to lock Web Push behind an installation requirement, which will strengthen a site's ability to persuade users to install the site and give them elevated access to the user in ways unrelated to Web Push. This is likely to avoid the current situation that plagues Dekstop Safari and every other browser. The pervasive user prompts that annoy users every time they land on a page. The metrics and approaches that have been used to date are insufficient to protect users from this annoyance on sites they wish to not be annoyed by and were the main reason for Apple refusing to implement Web Push until the EUs current regulatory stint.

Idea

Users need a simple way to enable these features when they want to but avoid being annoyed by them until such time. The objective is to make non-annoying enabling easy and annoying prompting painful and unreliable. This is driven by the fact that I have to date not found a use-case unilateral prompting.

One proposal is that sites can publish a manifest (or it can be part of the PWA manifest) that specifies an HTML classname. The browser will intercept pointer-down events on matching elements prior to any client-side scripts running and prompt the user whether they are sure they wish to grant that permission. If the user grants that permission, an event will be triggered on the document object that the site can subscribe to and respond to. In particular, the browsers are able to trust this event more because of the metrics they can implement surrounding it. Currently, prompts are triggered by client-side javascript and browsers implemented a basic user-gesture requirement to attempt and alleviate abuse. Because the browser does not know which elements will cause the function call to the permission prompt, they have a hard time implementing additional metrics. Those Metrics that I foresee as compatible with declared elements are:

  • Screen Coverage: How much of the screen do the element(s) cover? If the percentage is significant, and especially if it is 100%, it is ignored.
  • Visibility: the element should be fully visible (not opacity:0 or occluded by visible elements with "pointer-events:none" applied)
  • Reliably Stationary: the element should seldom move in relation to the document. a lower-effort metric is "last moved at" that can ignore the event if the button moved too recently (this is pretty much impossible with the current approach as it would require tracking the movement of EVERY element and caching their last moved at timestamp as we don't currently know WHICH elements will cause the prompt)
  • Text Content: the element should have relevant text content that describes the permission that will be granted
  • HTML children: the element can have HTML children such as SVGs for icons and TexNodes, but other types of HTML Elements could be disqualifying
  • Other metrics as browsers deem appropriate. Though they should facilitate most legit use-cases. Static buttons, Buttons in modals, Buttons in flyout menus, etc.

If browser makers are too worried about the performance implications of monitoring "random" elements provided by the developer, these buttons can also be implemented as new standard HTML elements that are browser-implimented and can do this tracking / diagnostics within the element itself. The element could even go so far as to enforce overflow:hidden to guarantee that it can't have internal content that breaks out of it's borders to try and capture unrelated touch events.

These approaches would enable less annoying but simultaneously convenient access to the user's ability to enable web features such as Web Push, PWA installation, Geolocation, Local File Access, etc. Give the user a button on the site that they can click to enable said feature and measure that button to make sure it's not doing anything untoward.

Concerns

The methods that unscrupulous developers may use to try and trick browsers and or users are ever-changing. There may be gaming methods already in use that I am unaware of that circumvent these proposed metrcis. The metrics browsers use to stamp-down on misuse will need to change over time to keep up.

Final Thoughts

A stronger user intention signal can go a long way to alleviate wayward prompts. This can pave the way to a more user-friendly web in the future, allowing users to more easily trigger the events / features they want while ignoring the ones they don't.

Ad Topic Hints

What if we flipped digital advertising around?

Today, when you visit a website, each ad network roughly follows three steps:

  1. Figure out who you are,
  2. …then look up a profile of behavioral data about you. Stuff like what other websites you’ve visited.
  3. …then, based on this, try to infer what ad topics you might be interested in seeing.

What if we flipped the script entirely, to make the web more private, but also to put people in control:

  1. Visit website
  2. Multiple ad networks all ask your web browser: “What topics of ads should I show to this person for this website visit? Please select a topic they’ll find interesting / relevant.”

This not only skips over the resolution of the user-identity step (which is poised to break in light of browser tracking prevention efforts), it also means the ad networks no longer need to keep a profile of behavioral data about individuals.

But perhaps most interesting of all, it moves that decision of “what ad topics would you be interested in seeing” into a place where people can exert control over the process.

Through a combination of sensible automatic defaults, with the opportunity for people to manually override the system (if they so desire) perhaps we can have both relevant ads about interesting topics, and also preserve user privacy and autonomy.

Addressing the risk of fingerprinting

People have multiple interests, and these interests change over time. What’s more, people don’t necessarily know what they like. An important function of ads is to help people discover new products and services that they’d love, but didn’t know about before.

As such, the “Ad Topic Hints” returned by the browser should change constantly. Some topics of interest may show up more frequently than others, and the user might express the desire to see other topics less. And finally, there ought to be some randomness thrown in - to mix things up and explore topics they haven’t seen before.

This is great news from a privacy perspective, because it means these “Ad Topic Hints” couldn’t be used as some kind of tracking vector, or fingerprinting surface. If the “Ad Topic Hints” returned by the browser include a lot of random variation and change over time, not only across sites, but even across multiple sessions on the same site, we should be able to ensure they can’t be used for fingerprinting. This is one of the major points of criticism about FLoC that this “Ad Topics Hints” proposal seeks to address.

Addressing the risk of sensitive information disclosure

These ad interests aren’t communicating data about what websites a person has visited, their attributes or characteristics. FLoC indirectly does this (to some extent), and this is another piece of criticism this proposal seeks to address. Since we’ve flipped the script, this proposed API would instead be sending out information about characteristics of ads, not people.

But perhaps more importantly, this API would, by design, provide the user with the ability to inspect (and if they so desire, override) the set of “Ad Topic Hints” their browser is telling sites to show to them. Any inferences being made about what ad topics their browser thinks they may find interesting would be clearly displayed. Rather than have the browser vendor determine what is “sensitive” or not, if the person felt that a given “Ad Topic” revealed something they didn’t want revealed, they could ask their browser to stop requesting ads of that topic.

Ad topics as vectors of numbers

Rather than describe an “Ad Topic” with a categorical label, we propose using a neural network to convert ads into embedding vectors (introductory explanation here if you're not familiar with the concept). This has numerous benefits. It’s precise, can be universally applied without the need for human annotation, smoothly captures concepts that don’t have simple names, works across all languages, and allows us to compute the numerical “similarity” of any two ads.

Imagine an open-source ML system into which you feed an ad. It analyses the image / video as well as text, and emits a list of 64 numbers. Like this:

1.56, -3.82, 3.91, -2.27, -7.16, …, 1.81

Anyone can run this code on any ad to instantly get the list of numbers that are the “embedding” for that ad. We can design a system which can deal with all kinds of inputs, so that it works for image ads, video ads, text ads, anything.

This ML system would be designed so that ads about similar topics result in nearby points. In this way, we can easily compare any two ads to see how “similar” they are. We just compute the cosine of the angle between these two vectors. It’s as simple as just computing the dot-product of both embedding vectors and dividing by both magnitudes. It’s computationally fast and cheap.

Now that we have a simple, standard way to understand the “topic” of an ad, and a way to compare the similarity of two ads, let’s describe how it would be used.

The browser can select a vector that’s “similar” to other ads the person finds interesting / relevant. It can avoid selecting vectors similar to ads for which the person has expressed dislike. And every now and again, the browser should just pick a random area it hasn’t tried before - to explore, and learn if the person is interested in that topic or not.

Sensible defaults

Most people will not want to take the time to interact with an interface that asks them about their ad interests. That’s fine, so long as we have a reasonable default for people who don’t bother to configure this themselves.

The browser can collect information about ads the person is shown across the web, ads they click on, and ad conversion events.

Based on this information, the browser can infer what ad topics the person seems to engage with (and what they do not).

Autonomy through centralized transparency and control

Unlike much behavioural advertising today, where inferences derived from behavioural data are often invisible and unknowable - the browser can make all of this available to the user. It can show them not only the inferred interests it has generated, but also the raw data used to generate that prediction.

This leads to the second big difference with most forms of behavioural advertising. The user may choose to modify or override these inferred interests.

The fact that these inferences are all centralised within the browser is what makes this a tractable user experience. It’s not realistic for people to identify all the ad networks which may be making inferences about them based on behavioural data. It’s even less realistic to imagine that people will modify / override these inferences across all those networks. Centralisation gives the user a realistic form of control.

This should also address concerns about “autonomy”. When it’s possible to see all the data, and all the inferences, and to override / modify them in one place, we can say that this puts people in control over the ads they want to see and what information their browser transmits about those interests.

What’s more, the browser should allow people to configure how much “exploration” they’d like. Some people might desire more variety, while others might prefer their browser to stick to a narrower range of ad topics.

This proposal isn’t prescriptive about the exact algorithm the browser should use to select the ad interest vector to be sent to a given page, as this should be a great opportunity for browser vendors to compete with one another, in terms of ease of use and relevance of ads, as well as ease of user understanding and control.

Ideas about ways to incorporate user-stated/controlled interests

Several important proposals about ads and privacy involve labeling ads in a way that the browser can understand. While these proposals are primarily about attribution / measurement use-cases, we could utilize this here as well.

Once a browser understands what pieces of content are ads, it could potentially introduce a universal control that allows people to tell the browser how they feel about the “Ad Topic” of that ad. Perhaps a “right click” or a long-press on mobile could reveal options like “More ads of this topic” or “Fewer ads of this topic”.

Another idea would be for the browser to have a special UI somewhere with an infinite feed of ads. These could either be a hard-coded list, or could be fetched through ad requests to networks that wanted to participate in such a UI. People could go through this “ad feed” selecting “More ads of this topic” or “Fewer ads of this topic” on each. This would help the browser quickly understand more about what this person did / didn’t want to see ads about.

There are no doubt many other ideas out there which merit experimentation. This is just the beginning of this conversation.

Concern about centralized browser control

But there are also downsides to this level of centralization within the browser. Browser vendors who operate “Search Ads” that rely on first-party data would be able to personalize ads with or without this “Ad Topic Hints” API. They wouldn’t have much incentive to make this system work particularly well (from the perspective of ad monetization). As such, they might under-invest in this “Ad Topic Hints” API.

How can we stimulate more competition in this space? One possible approach would be to make this API “pluggable”. Such browser plugins would need to be reviewed / vetted to ensure user privacy and stop abuse. Plugins would have access to the ad-interaction data described in the “sensible defaults” section as well as user feedback on ads, and could design their own user-interfaces as well as algorithms to generate the “Ad Topic Hints” returned.

Making “Ad Topic Hints” pluggable is just one idea. There may be even better solutions available.

Understanding Ad Topic Hints

Advertisers will naturally want to develop some understanding of these “Ad Topic Hints” and map them to concepts they already understand, like the IAB taxonomy of ad topics.

The easiest way to understand these “Ad Topic Hints” would be to take a sample of ads that represent all the various categories in the IAB taxonomy of ad topics, and run them through the ML system. Ideally one would produce mappings for multiple examples of each category.

Then, for any “Ad Topic Hint” vector, one could compare it to these reference points. A simple approach would be to just consider the topic of the ad with the “closest” vector. A more sophisticated approach might consider the actual “distance”. If the closest reference point is sufficiently far away, this may be an unlabelled part of the ad topic spectrum. We may discover that additional categories need to be added to existing taxonomy systems.

To help illustrate this mapping process, imagine these embedding vectors were just two dimensional. By coloring the space which is closest to a given reference point all the same color you’d wind up with a Voroni Diagram like this:

Image of a Voronoi diagram from Wikipedia

Imagine that each of those black dots represents a “reference ad” deemed to belong to a particular “Ad Topic” in the IAB’s taxonomy. Any “Ad Topic” vector would fall into one of these colored regions. A simple approach would be to deem that topic the same as the reference point within that region.

Opener Protections

I'd like to propose the adoption of Opener Storage Partitioning by the Privacy Community Group.

This work is being investigated by Chrome and was discussed at TPAC 2023.

Summary of Proposal:

Our goal is to maintain cross-page communication where important to web function while striking a better balance with user-privacy.

This will be done in two steps. First, whenever a frame navigates cross-origin any other windows with a window.opener handle pointing to the navigating frame will have that handle cleared. Second, either (a) any frames with a valid window.opener (or window.top.opener) handle at the time of navigation will have transient storage via a StorageKey nonce instead of access to standard first- and third-party StorageKeys or (b) the opener will be severed by default until user interaction or an API call restores it.

The first proposal should be less disruptive than either of the second, but metrics will need to be gathered on both. Once implemented, these proposals together prevent any synchronous or asynchronous communication between a first- and third-party storage bucket for the same origin. Instead, communication between two buckets for the same origin will only be possible if one of the buckets is transient. This mitigates the threats we are concerned with.

Top-Frame lifetime, partitioned storage for embedded frames

Currently frames can either have storage blocked or allowed, w/o an in between. This is unideal for several reasons. Some examples include

  1. There are cases where two eTLD+1 equiv frames on the same page may wish to work together (and to use storage to do so) but are currently unable to without relying on the first party for intermediation.
  2. Users may wish to prevent frames to have storage for privacy reasons, but to not have the embedded party's functionality break. Partitioned frame length storage would allow users / implementors a way to, by default, give frames storage "only for as long as i'm using it".
  3. Frames may appear and disappear while a user is interacting with the top level frame. An option to tie the embedded frame's storage-lifetime to the top level frame provides a in-standard, temporary way for storage to be persisted for the third party, but only for as long as the user is interacting with the 1p.

Safari's current approach of persistent, partitioned 3p storage allows more than is needed in such cases, and other browsers "all block or all allow" approaches allow either too little (storage is blocked), or way too much (un partitioned storage).

The proposal here is to provide an in-standard way of defining the following:

  1. By default all 3p frames created during the lifetime of the top level frame have storage cleared once the top level frame is closed
  2. Two eTLD+1 equiv frames see the same storage area
  3. These storage areas are partitioned (e.g. A under B sees different storage than A under C or A as top level)
  4. Storage Access API allows frames to request persistent, unpartitioned storage

Full storage partitioning / double-keying

Many privacy measures have focused on cookies and replacements for them. However, many other storage types can, without appropriate restrictions, be used for stateful cross-site tracking.

This includes explicit storage types like LocalStorage or IndexedDB, implicit storage, like the HTTP cache, communication channels like ServiceWorker and BroadcastChannel, and subtle state like HSTS flags.

When such state is accessible from a third-party context, it may enable cross-site tracking. Some state like this, e.g. the HTTP cache, can effectively be read by a passive resource, and thus any third-party resource may be affected. Other such state requires a scripting context in a third-party origin and thus would require an iframe or similar mechanism.

For any given storage mechanism, multiple approaches are possible. One approach is to totally deny access to the storage mechanism in a third party context. Another is to partition or double-key it; that is, have a completely separate instance of the storage based on the origin of the top-level browsing context. Yet another is to expose a unique ephemeral storage area.

WebKit has long partitioned many storages, including (at one point controversially), the HTTP cache. Unfortunately, exactly what we did is not documented. Blink and Gecko also have work in progress to add pervasive double keying.

It would be useful to agree on a common behavior, and to push these changes into standards as requirements.

Changes along these lines would ultimately go into HTML, Fetch, and perhaps various IETF deliverables. Perhaps also other standalone Web APIs that create a storage mechanism. However, it could be useful to have a central location and issue tracker to develop a plan and proposed behavior before filing issues/PRs against the relevant specifications.

FedCM as a trust signal for the Storage Access API

See privacycg/storage-access#196, this was intended to live in FedID CG but chairs thought that because of the way it integrates with SAA it may actually be a potential PrivacyCG work item. Comment from privacycg/storage-access#196:

In the FedID CG we have been fedidcg/FedCM#467 the merits of autogranting Storage Access calls based on existing FedCM grants. Based on the positive reception of this idea we wrote up an explainer of how we think this should work from a technical perspective: https://github.com/explainers-by-googlers/storage-access-for-fedcm

Relevant for this specification is that instead of simply creating a new storage-access permission on a successful FedCM prompt, we'd update Storage Access to look at existing FedCM accounts connections to establish whether storage access can be granted without an additional prompt. Benefits to this include the ability to scope the grant to the privacy boundaries of FedCM, and avoiding two simultaneous permission grants for the user (agent) to manage.

This issue is tracking discussion and integration on the Privacy CG side.

cc @bvandersloot-mozilla @annevk @martinthomson @cfredric @hflanagan @samuelgoto @yi-gu

Prevent Service Worker as Cross-Site Proxy

As restrictions grow for third parties, it's reasonable to expect that they will look into ways in which they can piggyback off of the first party's site. Two methods come to mind to achieve this:

  • DNS record on a sub-domain (+ an SSL certificate to be hosted by third party)
  • Proxy on a sub-domain/path in the server/cdn config

In terms of privacy, having a third party masked as the first party should be something that is discouraged. Both of the above methods at least are likely to require elevated privileges for the first party to implement, and hopefully some level of careful consideration.

There is another method to achieve a similar thing though, using Service Workers, that is arguably easier to set up and requires less elevated privileges to implement:

  1. Ask the first party to host the following service worker script at first-party.com/third-party-path/sw.js:
/* sw installation goes here */

self.addEventListener('fetch', event => {
  event.respondWith(
    fetch(`https://third-party.com/?url=${encodeURIComponent(event.request.url)}`)
  );
});
  1. Ask the first party load a third party script / execute on its page:
navigator.serviceWorker.register('/third-party-path/sw.js')
  .then(registration => {
   // do whatever
  });

All subsequent requests with path first-party.com/third-party-path/ will now be proxied to the third party domain, and considered first party to the document and any CSP rules.

Proposal

Either:

  1. The origin of the Response passed in to FetchEvent.respondWith must match the origin of the FetchEvent request. This allows matching cached responses to be passed too, but not synthetic responses.
  2. The document treats the resource as having the url/origin from the propagated Response, rather than the one from the request.

*edited to take comments into consideration. Original: Cross-Site requests in Service Workers should be disallowed. First update: A fetch made within a Service Worker fetch handler must match the site (or origin?) of the request it is handling.

Considerations

  • Are there legitimate use cases to respond to requests with synthetic responses, or resources from different sites/origins?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.