Giter Site home page Giter Site logo

private-aggregation-api's Introduction

This document is an individual draft proposal. It has not been adopted by the Private Advertising Technology Community Group.


Private Aggregation API explainer

Author: Alex Turner ([email protected])

Table of Contents

Introduction

This proposal introduces a generic mechanism for measuring aggregate, cross-site data in a privacy preserving manner.

Browsers are now working to prevent cross-site user tracking, including by partitioning storage and removing third-party cookies. There are a range of API proposals to continue supporting legitimate use cases in a way that respects user privacy. Many of these proposals, including Shared Storage and Protected Audience, plan to isolate potentially identifying cross-site data in special contexts, which ensures that the data cannot escape the user agent.

Relative to cross-site data from an individual user, aggregate data about groups of users can be less sensitive and yet would be sufficient for a wide range of use cases. An aggregation service has been proposed to allow reporting noisy, aggregated cross-site data. This service was originally proposed for use by the Attribution Reporting API, but allowing more general aggregation would support additional use cases. In particular, the Protected Audience and Shared Storage proposals expect this functionality to become available.

So, to complement the Attribution Reporting API, we propose a general-purpose Private Aggregation API that can be called from a wide array of contexts, including isolated contexts that have access to cross-site data (such as a shared storage worklet). Within these contexts, potentially identifying data could be encapsulated into "aggregatable reports". To prevent leakage, the cross-site data in these reports would be encrypted to ensure it can only be processed by the aggregation service. During processing, this service will add noise and impose limits on how many queries can be performed.

This API introduces a contributeToHistogram() function; see examples below. This call registers a histogram contribution for reporting. Later, the browser constructs an aggregatable report, which contains an encrypted payload with the specified contribution(s) for later computation via the aggregation service. The API queues the constructed report to be sent to the reporting endpoint of the script's origin (in other words, the reporting origin) after a delay. The report format and endpoint paths are detailed below. After the endpoint receives the reports, it batches the reports and sends them to the aggregation service for processing. The output of that process is a summary report containing the (approximate) result, which is dispatched back to the reporting origin.

See the Private Aggregation API specification.

Examples

Protected Audience reporting

The Protected Audience API plans to run on-device ad auctions using cross-site data as an input. The Private Aggregation API will allow measurement of the auction results from within the isolated execution environments.

For example, a key measurement use case is to report the price of the auctions' winning bids. This tells the seller how much they should be paid and who should pay them. To support this, each seller's JavaScript could define a reportResult() function. For example:

function reportResult(auctionConfig, browserSignals) {
  // Helper functions that map each buyer to its predetermined bucket and scales
  // each bid appropriately for measurement, see scaling values below.
  function convertBuyerToBucketId(buyer_origin) {  }
  function convertBidToReportingValue(winning_bid_price) {  }

  // The user agent sends the report to the reporting endpoint of the script's
  // origin (that is, the caller of `runAdAuction()`) after a delay.
  privateAggregation.contributeToHistogram({
    // Note: the bucket must be a BigInt and the value an integer Number.
    bucket: convertBuyerToBucketId(browserSignals.interestGroupOwner),
    value: convertBidToReportingValue(browserSignals.bid)
  });
}

The buyer can make their own measurements, which could be used to verify the seller's information. To support this, each buyer's JavaScript would define a reportWin() function (and possibly also a reportLoss() function). For example:

function reportWin(auctionSignals, perBuyerSignals, sellerSignals, browserSignals) {
  // The buyer defines their own similar functions.
  function convertSellerToBucketId(seller_origin) {  }
  function convertBidToReportingValue(winning_bid_price) {  }

  privateAggregation.contributeToHistogram({
    bucket: convertSellerToBucketId(browserSignals.seller),
    value: convertBidToReportingValue(browserSignals.bid),
  });
}

Measuring user demographics with cross-site information

publisher.example wants to measure the demographics of its user base, for example, a histogram of number of users split by age ranges. demo.example is a popular site that knows the demographics of its users. publisher.example embeds demo.example as a third-party, allowing it to measure the demographics of the overlapping users.

First, demo.example saves these demographics to its shared storage when it is the top level site:

sharedStorage.set("demo", '{"age": "40-49", ...}');

Then, in a demo.example iframe on publisher.example, the appropriate shared storage operation is triggered once for each user:

await sharedStorage.worklet.addModule("measure-demo.js");
await sharedStorage.run("send-demo-report");

Shared storage worklet script (i.e. measure-demo.js):

class SendDemoReportOperation {
  async function run() {
    let demo_string = await sharedStorage.get("demo");
    let demo = {};
    if (demo_string) {
      demo = JSON.parse(demo_string);
    }

    // Helper function that maps each age range to its predetermined bucket, or
    // a special unknown bucket e.g. if the user has not visited `demo.example`.
    function convertAgeToBucketId(country_string_or_undefined) {  }

    // The report will be sent to `demo.example`'s reporting endpoint after a
    // delay.
    privateAggregation.contributeToHistogram({
      bucket: convertAgeToBucketId(demo["age"]);
      value: 128,  // A predetermined fixed value, see scaling values below.
    });

    // Could add more contributeToHistogram() calls to measure other demographics.
  }
}
register("send-demo-report", SendDemoReportOperation);

Goals

This API aims to support a wide range of aggregation use cases, including measurement of demographics and reach, while remaining generic and flexible. We intend for this API to be callable in as many contexts and situations as possible, including the isolated contexts used by other privacy-preserving API proposals for processing cross-site data. This will help to foster continued growth, experimentation, and rapid iteration in the web ecosystem; to support a thriving, open web; and to prevent ossification and unnecessary rigidity.

This API also intends to avoid the privacy risks presented by unpartitioned storage and third-party cookies. In particular, it seeks to prevent off-browser cross-site recognition of users. Developer adoption of this API will help to replace the usage of third-party cookies, making the web more private by default.

Non-goals

This API does not intend to regulate what data is allowed as an input to aggregation. Instead, the aggregation service will protect this input by adding noise to limit the impact of any individual's input data on the output. Learn more about contribution bounding and budgeting below.

Further, this API does not seek to prevent (probabilistic) cross-site inference about sufficiently large groups of people. That is, learning high confidence properties of large groups is ok as long as we can bound how much an individual affects the aggregate measurement. See also discussion of this non-goal in other settings.

Operations

This current design supports one operation: constructing a histogram. This operation matches the description in the Attribution Reporting API with Aggregatable Reports explainer, with a fixed domain of 'buckets' that the reports contribute bounded integer 'values' to. Note that sums can be computed using the histogram operation by contributing values to a fixed, predetermined bucket and ignoring the returned values for all other buckets after querying.

Over time, we should be able to support additional operations by extending the aggregation service infrastructure. For example, we could add a 'count distinct' operation that, like the histogram operation, uses a fixed domain of buckets, but without any values. Instead, the computed result would be (approximately) how many unique buckets the reports contributed to. Other possible additions include supporting federated learning via privately aggregating machine learning update vectors or extending the histogram operation to support values that are vectors of integers rather than only scalars.

The operation would be indicated by using the appropriate JavaScript call, e.g. contributeToHistogram() and contributeToDistinctCount() for histograms and count distinct, respectively.

Reports

The report, including the payload, will mirror the structure proposed for the Attribution Reporting API. However, a few details will change. For example, fields with no equivalent on this API (e.g. attribution_destination and source_registration_time) won't be present. Additionally, the api field will contain either "protected-audience" or "shared-storage" to reflect which API's context requested the report.

The following is an example report showing the JSON format

{
  "shared_info": "{\"api\":\"protected-audience\",\"report_id\":\"[UUID]\",\"reporting_origin\":\"https://reporter.example\",\"scheduled_report_time\":\"[timestamp in seconds]\",\"version\":\"[api version]\"}",

  "aggregation_service_payloads": [
    {
      "payload": "[base64-encoded data]",
      "key_id": "[string]",

      // Optional debugging information if debugging is enabled
      "debug_cleartext_payload": "[base64-encoded unencrypted payload]",
    }
  ],

  // Optional debugging information if debugging is enabled and debug key specified
  "debug_key": "[64 bit unsigned integer]"
}

As described earlier, these reports will be sent to the reporting origin after a delay. The URL path used for sending the reports will be /.well-known/private-aggregation/report-protected-audience and /.well-known/private-aggregation/report-shared-storage for reports triggered within a Protected Audience or Shared Storage context, respectively.

Temporary debugging mechanism

While third-party cookies are still available, we plan to have a temporary mechanism available that allows for easier debugging. This mechanism involves temporarily relaxing some privacy constraints. It will help ensure that the API can be fully understood during roll-out and help flush out any bugs (either in browser or caller code), and more easily compare the performance to cookie-based alternatives.

This mechanism is similar to Attribution Reporting API's debug aggregatable reports. When the debug mode is enabled for a report, a cleartext version of the payload will be included in the report. Additionally, the shared_info will also include the flag "debug_mode": "enabled" to allow the aggregation service to support debugging functionality on these reports.

This data will only be available in a transitional phase while third-party cookies are available and are already capable of user tracking. The debug mode will only be enabled for contexts that are able to access third-party cookies. That is, it will be disabled if third-party cookies are disabled/deprecated generally or for a particular site/context; note that this also means debug mode will automatically become deprecated when third-party cookies are deprecated.

Though the debug mode is tied to third-party cookie availability, browsers may temporarily allow debug mode without third-party cookies in order to support testing, such as the browsers in the Mode B group of Chrome-facilitated testing.

Enabling

The following javascript call will then enable debug mode for all future reports requested in that context (e.g. Shared Storage operation or Protected Audience function call):

privateAggregation.enableDebugMode();

The browser can optionally apply debug mode to reports requested earlier in that context.

This javascript function can only be called once per context. Any subsequent calls will throw an exception.

Debug keys

To allow sites to associate reports with the contexts that triggered them, we also allow setting 64-bit unsigned integer debug keys. These keys are passed as an optional field to the javascript call, for example:

privateAggregation.enableDebugMode({debugKey: 1234n});

Duplicate debug report

When debug mode is enabled, an additional, duplicate debug report will be sent immediately (i.e. without the random delay) to a separate debug endpoint. This endpoint will use a path like /.well-known/private-aggregation/debug/report-protected-audience (and the equivalent for Shared Storage).

The debug reports should be almost identical to the normal reports, including the additional debug fields. However, the payload ciphertext will differ due to repeating the encryption operation and the key_id may differ if the previous key had since expired or the browser randomly chose a different valid public key.

Reducing volume by batching

In the case of multiple calls to contributeToHistogram(), we can reduce report volume by sending a single report with multiple contributions instead of multiple reports. For this to be possible, the different calls must involve the same reporting origin and the same API (i.e. Protected Audience or Shared Storage). Additionally, the calls must be made at a similar time as the reporting time will necessarily be shared.

Batching scope

For calls within a Shared Storage worklet, calls within the same Shared Storage operation should be batched together.

For calls within a Protected Audience worklet, calls using the same reporting origin within the same auction should be batched together. This should happen even between different interest groups or Protected Audience function calls. However, reports triggered via window.fence.reportEvent() (see here for more detail), should only be batched per-event. This avoids excessive delay if this event is triggered substantially later. Reports for Protected Audience bidders may not share the same aggregation coordinator choice. In this case, calls should be batched separately for the different coordinator choices.

One consideration in the short term is that these calls may have different associated debug modes or keys. In this case, only calls sharing those details should be batched together.

Contributions limit

We will also need a limit on the number of contributions within a single report. In the case that too many contributions are specified with a ‘batching scope’, we should truncate them to the limit.

However, to reduce the impact of this limit, we will pre-aggregate (i.e. merge) any contributions that have the same bucket and filtering ID before truncation.

If necessary, we could instead split the contributions back into multiple reports, each respecting the limit.

Strawman limit: 20 contributions per report.

Padding

The size of the encrypted payload may reveal information about the number of contributions embedded in the aggregatable report. This can be mitigated by padding the plaintext payload (e.g. to a fixed size). In the shorter term, we plan to pad the payload by adding 'null' contributions (i.e. with value 0) to a fixed length. In the future, we plan to instead append bytes to a fixed length, but this will require updating the payload version.

Aggregation coordinator choice

This API should support multiple deployment options for the aggregation service, e.g. deployments on different cloud providers. To avoid a leak, this choice should not be possible from within an isolated context when a context ID is set.

We propose a new optional string field aggregationCoordinatorOrigin to allow developers to specify the deployment option, e.g. the origin for the aggregation service deployed on AWS, GCP, and other platforms in the future. The specified origin would need to be on an allowlist maintained by the browser. If none is specified, a default will be used.

This allowlist matches the Attribution Reporting API's, available here.

Shared Storage callers would specify this field when calling the run() or selectURL() APIs, e.g.

sharedStorage.run('someOperation', {'privateAggregationConfig':
    {'aggregationCoordinatorOrigin': 'https://coordinator.example'}});

Note that we are reusing the privateAggregationConfig that currently allows for specifying a context ID (see here).

Protected Audience sellers would specify this field in the auctionConfig, e.g.

const myAuctionConfig = {
  ...
  'privateAggregationConfig': {
    'aggregationCoordinatorOrigin': 'https://coordinator.example'
  }
};
const auctionResultPromise = navigator.runAdAuction(myAuctionConfig);

For Protected Audience bidders, we plan to allow this choice to be set for each interest group via navigator.joinAdInterestGroup(), e.g.

const interestGroup = {
  ...
  'privateAggregationConfig': {
    'aggregationCoordinatorOrigin': 'https://coordinator.example'
  }
};
const auctionResultPromise = navigator.joinAdInterestGroup(interestGroup);

This setting would be able to be overridden via the typical interest group mechanisms. For example, the update mechanism could support a new privateAggregationConfig key matching the call to joinAdInterestGroup().

Privacy and security

Metadata readable by the reporting origin

Reports will, by default, come with a variety of (unencrypted) metadata that the reporting origin will be able to directly read. While this metadata can be useful, we must be careful to balance the impact on privacy. Here are some examples of metadata that could be included, along with some potential risks:

  • The originally scheduled reporting time (noised within an ~hour granularity)
    • Could be used to identify users on the reporting site within a time window
    • Note that combining this with the actual timestamp the report was received could reveal if the user's device was offline, etc.
  • The reporting origin
    • Determined by the execution context's origin, but a site could use different subdomains, e.g. to separate use cases.
  • Which API triggered the report
    • The api field and the endpoint path indicates which API triggered the report (e.g. Protected Audience or Shared Storage).
  • The API version
    • A version string used to allow future incompatible changes to the API. This should usually correspond to the browser version and should not change often.
  • Encrypted payload sizes
    • If we do not carefully add padding or enforce that all reports are of the same natural size, this may expose some information about the contents of the report.
  • Debugging information
    • If debugging is enabled, some additional metadata will be provided. While this information may be potentially identifying, it will only be available temporarily: while third-party cookies are enabled.
  • Developer-selected metadata

Open question: what metadata to allow

It remains an open question what metadata should be included or allowed in the report and how that metadata could be selected or configured. Note that any variation in the reporting endpoint (such as the URL path) would, for this analysis, be equivalent to including the selected endpoint as additional metadata.

While allowing a developer to specify arbitrary metadata from an isolated context would negate the privacy goals of the API, specifying a report's metadata from a non-isolated context (e.g. a main document) may be less worrisome. This could improve the API's utility and flexibility. For example, allowing this might simplify usage for a single origin using the API for different use cases. This non-isolated metadata selection could also allow for first-party trust signals to be associated with a report.

Alternatively, there may be ways to "noise" the metadata to achieve differential privacy. Further study and consideration is needed here.

Contribution bounding and budgeting

As described above, the aggregation service protects user privacy by adding noise, aiming to have a framework that could support differential privacy. However, simply protecting each query to the aggregation service or each report sent from a user agent would be vulnerable to an adversary that repeats queries or issues multiple reports, and combines the results. Instead, we propose the following structure. See below for the specific choices we have made in our current implementation.

First, each user agent will limit the contribution that it could make to the outputs of aggregation service queries. (Note that this limitation is a rate over time, not an absolute number, see Partition choice below.) In the case of a histogram operation, the user agent could bound the L1 norm of the values, i.e. the sum of all the contributions across all buckets. The user agent could consider other bounds, e.g. the L2 norm.

The user agent would also need to determine what the appropriate 'partition' is for this budgeting, see partition choice below. For example, there could be a separate L1 'budget' for each origin, resetting periodically. Exceeding these limits will cause future contributions to silently drop.

Second, the server-side processing will limit the number of queries that can be performed on reports containing the same 'shared ID' to a small number (e.g. a single query). See here for more detail. This also limits the number of queries that can contain the same report. The shared ID is a hash representing the partition. It is computed by the aggregation service using data available in each aggregatable report. Note that the hash's input data will need to differ from the Attribution Reporting API (e.g. to exclude fields like the destination site that don't exist in this API).

With the above restrictions, the processing servers only need to sample the noise for each query from a fixed distribution. In principle, this fixed noise could be used to achieve differential privacy, e.g. by using Laplace noise with the following parameter: (max number of queries per report) * (max L1 per user per partition) / epsilon.

Scaling values

Developers will need to choose an appropriate scale for their measurements. In other words, they will likely want to multiply their values by a fixed, predetermined constant.

Scaling the values up, i.e. choosing a larger constant, will reduce the relative noise added by the server (as the noise has constant magnitude). However, this will also cause the limit on the L1 norm of the values contributed to reports, i.e. the sum of all contributions across all buckets, to be reached faster. Recall that no more reports can be sent after depleting the budget.

Scaling the values down, i.e. choosing a smaller constant, will increase the relative noise, but would also reduce the risk of reaching the budget limit. Developers will have to balance these considerations to choose the appropriate scale. The examples below explore this in more detail.

Examples

These examples use an L1 bound of 216 = 65 536.

Let's consider a basic measurement case: a binary histogram of counts. For example, using bucket 0 to indicate a user is a member of some group and bucket 1 to indicate they are not. Suppose that we don't want to measure anything else and we've set up our measurement so that each user is only measured once (per partition per time period). Then, each user could contribute their full limit (i.e. 216) to the appropriate bucket. After all the reports for all users are collected, a single query would be performed and the server would add noise (from a fixed distribution) to each bucket. We would then divide the values by 216 to obtain a fairly precise result (with standard deviation of 1/216 of the server's noise distribution).

If each user had instead just contributed a value of 1, we wouldn't have to divide the query result by 216. However, each user would end the week with the vast majority of their budget remaining -- and the processing servers would still add the same noise. So, our result would be much less precise (with standard deviation equal to the server's noise distribution).

On the other hand, suppose we wanted to allow each user to report multiple times per time period to this same binary histogram. In this case, we would have to reduce each contribution from 216 to a lower predetermined value, say, 212. Then, each user would be allowed to contribute up to 16 times to the histogram. Note that you have to reduce each contribution by the worst case number of contributions per user. Otherwise, users contributing too much will have reports dropped.

Partition choice

A narrow partition (e.g. giving each top-level URL a separate budget) may not sufficiently protect privacy. Unfortunately, very broad partitions (e.g. a single budget for the browser) may allow malicious (or simply greedy) actors to exhaust the budget, denying service to all others.

The ergonomics of the partition should also be considered. Some choices might require coordination between different entities (e.g. different third parties on one site) or complex delegation mechanism. Other choices would require complex accounting; for example, requiring Shared storage to record the source of each piece of data that could have contributed (even indirectly) to a report.

Note also that it is important to include a time component to the partition, e.g. resetting limits periodically. This does risk long-term information leakage from dedicated adversaries, but is essential for utility. Other options for recovering from an exhausted budget may be possible but need further exploration, e.g. allowing a site to clear its data to reset its budget.

Implementation plan

We plan to enforce a per-site budget that resets every 10 minutes; that is, we will bound the contributions that any site can make to a histogram over any 10 minute rolling window. We plan to use an L1 bound of 216 = 65 536 for this bound; this aligns with the Attribution Reporting API with Aggregatable Reports explainer.

As a backstop to limit worst-case leakage, we plan a separate, looser per-site bound that is enforced on a 24 hour rolling window, limiting the daily L1 norm to 220 = 1 048 576.

This site will match the site of the execution environment, i.e. the site of the reporting origin, no matter which top-level sites are involved. For the earlier example, this would correspond to the runAdAuction() caller within reportResult() and the interest group owner within reportWin()/reportLoss().

We initially plan to have two separate budgets: one for calls within Shared Storage worklets and one for Protected Audience worklets. However, see shared contribution budget below.

Enrollment and attestation

Use of this API requires enrollment and attestation via the Privacy Sandbox enrollment attestation model.

When an aggregatable report is triggered, a check will be performed to determine whether the calling site is enrolled and attested. If this check fails, the report will be dropped (i.e. not sent).

Future Iterations

Supporting different aggregation modes

This API will support an optional parameter alternativeAggregationMode that accepts a string value. This parameter will allow developers to choose among different options for aggregation infrastructure supported by the user agent. This will allow experimentation with new technologies, and allows us to test new approaches without removing core functionality provided by the default option. The "experimental-poplar" option will implement a protocol similar to poplar VDAF in the PPM Framework.

Shared contribution budget

Separating contribution budgets for Shared Storage worklets and Protected Audience worklets provides additional flexibility; for example, some partition choices may not be compatible (e.g. a per-interest group budget). However, we could consider merging the two budgets in the future.

Authentication and data integrity

To ensure the integrity of the aggregated data, it may be desirable to support a mechanism for authentication. This would help limit the impact of reports sent from malicious or misbehaving clients on the results of each query.

To ensure privacy, the reporting endpoint should be able to determine whether a report came from a trusted client without determining which client sent it. We may be able to use trust tokens for this, but further design work is required.

Aggregate error reporting

Unfortunately, errors that occur within isolated execution contexts cannot be easily reported (e.g. to a non-isolated document or over the network). If allowed, such errors could be used as an information channel. While these errors could still appear in the console, it would also aid developers if we add a mechanism for aggregate error reporting. This reporting could be automatic or could be required to be configured according to the developers' preferences.


This document is an individual draft proposal. It has not been adopted by the Private Advertising Technology Community Group.

private-aggregation-api's People

Contributors

alexmturner avatar dmcardle avatar jensenpaul avatar rushilw-google avatar yoavweiss avatar hidayetaksu avatar

Stargazers

 avatar  avatar Huajie Shen avatar Karthik R avatar David Dabbs avatar Edgar Littleman avatar  avatar Celestia Airdrop Confirmed avatar  avatar sh471 avatar  avatar J's avatar  avatar Sid Sahoo avatar Xavier Capaldi avatar  avatar Rashmi Rao avatar Gabor Dolla avatar  avatar Gabriel H. Nunes avatar  avatar  avatar  avatar Hender Ricard avatar  Martin Pál avatar Zheng WEi avatar LZ91X  avatar Amandeep Aggarwal avatar  avatar  avatar Camillia Smith Barnes avatar Renan Feldman avatar Alberto Regalado avatar frikke avatar Evgeny Skvortsov avatar  avatar  avatar Sherwin Seunghyun Cho avatar  avatar Philippe avatar David Turner avatar

Watchers

Chris Needham avatar David Dabbs avatar  avatar Braedon Vickers avatar Jonasz Pamuła avatar Charlie Harrison avatar Bhanu Vattikonda avatar Jonathan Frederic avatar Renan Feldman avatar Michal Kalisz avatar Stefan avatar Gabriel H. Nunes avatar  Martin Pál avatar  avatar Maciej Kowalczyk avatar  avatar John Delaney avatar Lionel Basdevant avatar Brian May avatar Evgeny Skvortsov avatar Nikunj Agrawal avatar  avatar Zheng WEi avatar Michał Kalisz avatar  avatar Asha Menon avatar Rashmi Rao avatar  avatar  avatar  avatar

private-aggregation-api's Issues

More precisely define the contribution budget and window in the spec

The spec currently leaves the budget as fully implementation-defined (see here). The explainer, however, describes Chrome's implementation plan.

While it may be difficult to align on exact details across implementations, we should probably define the privacy unit more precisely while leaving some key parameters as implementation-defined (e.g. the precise epsilon/budget and perhaps the timespan used in the privacy unit).

Status of this document

The SotD section of the draft spec is currently empty. The Readme clearly states:

This document is an individual draft proposal. It has not been adopted by the Private Advertising Technology Community Group.

so at a minimum this needs to be reflected in the draft spec itself.

I would also argue that because this draft has not been adopted by PATCG (and nor is it a WICG spec), it should also not carry the W3C logo and branding. This has the potential to cause wider confusion over the spec's standardisation status.

Updating references to FLEDGE

Now that FLEDGE has been renamed to the Protected Audience API (see the blog post), we should update references in the API's operation. (We also should update references in the explainer and spec text -- e.g. see PR #34.)

Specifically, we're planning to replace the "api": "fledge" field in reports with "api": "protected-audience". We also plan to change the reporting path for aggregatable reports triggered in Protected Audience auctions from /.well-known/private-aggregation/report-fledge to /.well-known/private-aggregation/report-protected-audience.

Please let us know if you have any thoughts on this change!

Reliant specs and their layering

Reading through the spec, this seems like infrastructure that a couple of other specs are currently relying on, but one that could be potentially useful for future cases. (e.g. one can imagine a "cross-origin data worklet" that has access to cross-origin data and uses it to generate performance-related histograms in a secure way that doesn't leak any user-specific data)

Given that, I think it'd make sense to make this spec more agnostic to specs that rely on it. In order to do that, I think it'd make sense to move all the SharedStrorage/FLEDGE specific bits to their respective specifications, and only define the infrastructure (that can serve both cases) here.

Consider adding a custom 'label' to allow more flexible batching

Currently, the aggregation service only allows each 'shared ID' to be present in one query. A set of reports with the same shared ID cannot be split for separate queries, even if the resulting batches are disjoint.

One option to add more flexibility is to support an optional, custom field (a ‘label’) that is factored into the shared ID generation. We could consider a few different options:

  1. Putting the field in the shared_info: The reporting origin would be able to easily split reports into separate batches based on the label. However, this approach would require the label to be set outside the isolated (Shared Storage or Protected Audience) context. It also would require the report to be deterministic similar to the context ID, i.e. sending a null report if no contributions are made. This approach is therefore unlikely to work for Protected Audience bidders (see related discussion) and could increase the number of reports sent.
  2. Putting the field in the payload: This avoids the deterministic report requirement and would allow the label to be based on cross-site data, i.e. set from inside the isolated contexts. But, this also prevents the reporting origin from directly determining the label embedded in the report. The reporting origin may therefore have to send a larger number of reports to the aggregation service and ask it to filter based on a given set of labels. For certain use cases, the reporting origin may be able to maintain a context ID to label mapping that would avoid this increased scale, albeit less ergonomically than above.
  3. Allowing bucket range filtering: Instead of using an explicit label, we could allow filtering based on a range of buckets, with budget only used for that range. This could be more flexible but also increases the complexity of the Aggregation Service’s privacy budgeting implementation.
  4. A combination of the above: We could implement multiple of the above options and allow them to be used together or in different situations.

For all of the above approaches, we’ll also need a mechanism to limit the scale impact on the Privacy Budget Service. For example, we want to prevent developers from specifying a unique ‘label’ per report. There are a few options we could consider, including:

  1. The Aggregation Service could limit the number of labels/bucket ranges or shared IDs per query
  2. We could limit the space of allowed labels/bucket ranges directly, e.g. only allowing integer labels up to a maximum value.

This functionality would also be useful for the Attribution Reporting API, so we may want to align on an approach. (For example, bucket range filtering has been proposed earlier.) Note that Attribution Reporting does not currently support making deterministic reports.

Knowing the publisher domain for Fledge

Hello,

We have two use cases which are hard to find in the current design of the Private Aggregation API, and they both need the publisher domain. The first one is to report to the marketer on which publishers its ads were displayed. For brand safety reason, an advertiser may wish to not have its ad displayed on publishers incompatible with its brand image (eg a website with a lot of offensive content). In some countries, adtechs are legally bound to report this information (see issue #14 in PATCG). The other is fraud detection (and prevention). An adtech should monitor for any fraudulent website set up specifically for siphoning money off legitimate publishers. Here reactivity is paramount to detect in a minimal amount of time this kind of fraud.

Encoding the publisher domain in the 128 bit space of the key is a tricky problem given its dynamic nature and the cardinality of this dimension. A particularly thorny point is the fact that the domain is never available in the clear in Fledge (if it is not made available in this API).

An issue with some discussion and proposed solutions (including adding back the publisher domain in the metadata with empty reports for plausible deniability as a DP mechanism) was posted in ARA (issue #583), as the Private Aggregation API uses the same aggregatable reports, but the use cases were slightly different.

Enabling Debug Reports in Private Aggregation API in Mode B.

Hi,

Mode B is about to start, for which we have all been preparing for quite some time. This will be a great opportunity to check how well the Protected Audience operates in an environment without 3rd party cookies. Not only from a technical perspective but also from operational and business perspectives. Thanks to event-level reporting, we can gather information about campaign performance. A much greater challenge is collecting statistics on how auctions are conducted. Two very powerful tools serve this purpose:

  • forDebuggingOnly
  • Private Aggregation API

The first one will be available in Mode B, as per WICG/turtledove#632. Simultaneously, the Private Aggregation API is the ultimate solution, and it's worthwhile to start preparing for its use. However, as far as I understand, debug reports will not be available with the disabling of third-party cookies.

Of course, ultimately, debug reports will not be available for privacy reasons, but during Mode B testing, their absence presents a significant obstacle to properly monitoring campaigns.

Let's summarize a few facts:

  1. In Mode B, there will be 0.75% of Chrome users with Protected Audience enabled (Private Aggregation API as well) and 3rd party cookies disabled.
  2. We have a limited number of reports that a user can send (AFAIK the queue size is 1000), so we limit ourselves to sending reports at the bidding level only for 1% of bids.
  3. The number of users signing up for the Interest Group varies (from smaller to very large advertisers).
  4. Considering the scale from points 1-3 and the fact that we have budget-related limitations: budget L1 on the browser side (2^16 per 10 minutes, 2^20 per day) and the fact that there is noise imposed on the Aggregation Service side,
    I would like to explore potential solutions to the above problems:
    • Enabling debug reports for Mode B
      or
    • Increasing the L1 budget for reporting origin site for Mode B.

Bests,
Michal

Clarify reporting origin

Hello,

I am trying to understand how the reporting origin is defined, so as to know on which domain to host the well-known endpoints for collecting the data.
For Fledge reports, I guess this is simply defined from the IG owner or the bidding URL (not a Fledge expert, so pardon my approximations).
For Shared Storage, I have no idea which domain is used. Is it the origin of the script?

Thanks a lot!

Potential Bug in the debug mode latency stats implementation

I believe there may be a bug in the impl of debug mode for PA API Private Aggregation latency stats.

Some new fields introduced for this feature in the auctionConfig (auctionReportBuyerKeys, auctionReportBuyers, auctionReportBuyerDebugModeConfig) require bigints. However, based on some failing testing on our end, it appears the auctionConfig is being stringified sometime during the navigator.runAdAuction() call. JSON.stringify() does not support bigint conversion without a bit of extra parsing work...

Sample Error:

TypeError: Do not know how to serialize a BigInt
             at JSON.stringify (<anonymous>)
             at navigator.runAdAuction

Extended Private Aggregation API availability & Status

Hello,
During prototyping some use of the Private Aggregation API (and the extended Private Aggregation API) I ran into some confusion regarding the rollout of some of the new method names discussed in a June 1 update to https://developer.chrome.com/docs/privacy-sandbox/private-aggregation/

While prototyping, some of the methods and functionality seemed to not be available (were called, but no responses from Chrome received in Stable, Canary, or Beta), specifically privateAggregation.contributeToHistogram and privateAggregation.contributeToHistogramOnEvent.

Additionally, I attempted to configure both a test seller and test buyer for receiving "latency-stats" & "interest-group-counts" as discussed in https://github.com/WICG/turtledove/blob/main/FLEDGE_extended_PA_reporting.md#triggering-reports with no results. Is that functionality available?

For a quick overview of how that was set up, I had the test buyer set up to add:

sellerCapabilities: { '*': ['interest-group-counts', 'latency-stats'] },

to the interest group in navigator.joinAdInterestGroup() and set up the test seller to request "latency-stats" and "interest-group-counts" in the auctionConfig by adding the following to the auctionConfig:

auctionReportBuyers: { interestGroupCount: { bucket: 0n, scale: 1, }, totalGenerateBidLatency: { bucket: 2n, scale: 1, }, },

FLEDGE integration

Hi Alex,

I am very happy to see the Private Aggregation API Intent to Experiment. We're looking forward to test it with FLEDGE.

The I2E is planned for M107, which is very soon (link), and I was wondering if you could shed some more light on the planned FLEDGE integration.

More specifically, I was wondering, will it be possible to generate the histogram reports within generateBid, and only actually send them upon a specific future event? (E.g. winning the auction / showing the impression / user clicking the ad.) Please see WICG/turtledove#177 (comment) for additional context.

Best regards
Jonasz

Debug Mode Availability for Latency Stats?

Hello,

Wanted to inquire when debug mode for the Private Aggregation API would be available for latency-stats? Is this something that is planned in the near term?

Thank you!

Use for frequency capping model calibration

Thanks for sharing the proposal of Privacy Aggregation API https://github.com/alexmturner/private-aggregation-api. After going through the proposal, I think it would be great if it could support the use case of frequency capping model calibration. There was a aggregated reporting proposal and an example use case for this, but it seems like that proposal hasn't been updated for a while and I guess it might be out-of-date. However, we are still looking for a solution to calibrate the frequency capping model and I think Privacy Aggregation API could be helpful.

I have learned that the Shared Storage API in the Chrome Privacy Sandbox can be used to enforce frequency capping in browser. However, if we can use the Privacy Aggregation API to generate a ratio (like # ads per user in domain X / total # ads across all domains for users visited X) to adjust the global cap, the adjusted cap can be used across different browsers.

Does it sound like a use case that the Privacy Aggregation API will support? Thanks for shedding light on this.

Meeting: Supporting Reach Measurement

We will hold a meeting in February on how the Private Aggregation API could best support the Reach use case. Additional meetings will be scheduled as needed.

We plan to discuss the recent proposals to extend the API to support more advanced reach reporting. We also hope to hear feedback from other stakeholders on these proposals and, more generally, what functionality is considered critical for supporting reach measurement.

We will meet for one hour starting at Wednesday 15 February at 4 pm Eastern Time (= 9 pm UTC = 10 pm Paris = 1 pm California).
The call will take place on Google Meet.
Video call link: https://meet.google.com/jsz-vnjk-dsr
Or dial: ‪(US) +1 253-289-6828‬ PIN: ‪185 375 697 81‬#
More phone numbers: https://tel.meet/jsz-vnjk-dsr?pin=6836240560245

Agenda proposals, attendance records, notes will happen in this Google Doc:
https://docs.google.com/document/d/1j8gWNkOvIRrI1vi2vg3Pv13hjKf-i7bugiEUZsrU4UU/edit?usp=sharing

For a speaker queue, we will use the Google Meet "Raise My Hand" feature.

If you want to participate in the call, please make sure you join the PATCG: w3.org/community/patcg.

Fetch based design as a considered alternative

When working on Pending Beacon, we got feedback that the design should be as close to fetch() as possible. There are parallels here, so I'm filing this issue as means to think through the implications of that alternative design.

A few differences between fetch()/PendingBeacon and this API:

  • It's an anti-goal here to give the developer any control over when reports are being sent or knowledge that they were sent
  • The reports are sent to very specific end-points, and not to any arbitrary reporting end-point
  • The report's input is extremely specific (an array of PAHistogramContributions), and not amenable to fetch()'s general purpose sending
  • There's no concept of a response (even though that's similar for PendingBeacon)

Given all the above, I think it makes sense to not try to align this with fetch(). At the same time, it's worthwhile to document the reasoning.

Support for Blobs in sharedStorage worklets

Hello,
We (Criteo) are trying to implement reach measurement through PAA and Shared Storage.
As it is, the worklet script for reach measurement (similar to this) needs to be hosted on another service/website, and accessed through window.sharedStorage.worklet.addModule(<PathOrURL/to/worklet>)
Classical worklets (example in this) support using Blob URLs as worklet paths, but sharedStorage worklets do not. Are there security reasons for not doing so?
If not, we would be keen to see a change in the sharedStorage implementation so that it, too, could fit blobs as worklets.

Improving support for debugging (without debug mode)

Private Aggregation currently supports a temporary debugging mechanism called debug mode to help developers integrate with the API and debug their usage. However, this mechanism is tied to third-party cookie eligibility and will thus be deprecated along with third-party cookies.

We’re exploring ways to continue supporting this use case. In particular, we aim to allow developers to measure the frequency of certain ‘debug events’ and to split these measurements on relevant developer-specified dimensions (e.g. advertiser or code version). We also aim to support these use cases with minimal or no privacy regressions and for this support to be available after third-party cookie deprecation.

Design idea

One possible design is to introduce a new JavaScript call that allows developers to send a contribution if a browser-defined 'debug event' occurs, e.g.:

privateAggregation.contributeToHistogramOnDebugEvent(
  "insufficient-10min-budget", {bucket: 12345n, value: 67n, filteringId: 8n});

Note that this is conceptually quite similar to the contributeToHistogramOnEvent() method already available as part of the Protected Audience extensions to Private Aggregation.

Debug events

This mechanism could support sending contributions in a range of debug events, e.g. in case of:

  • insufficient budget (either 10 min or daily) to send a report
  • contributions being dropped due to the per-report limit
  • no reporting errors (i.e. a report is successfully scheduled to send)
  • a worklet crash

That list is not exhaustive, but it should also be possible to continue to extend the list in future iterations.

Budgeting

To avoid privacy regressions, these contributions would be limited by the usual client-side contribution budget. However, this proposal would require a mechanism to “reserve” some portion of the budget for these debug events. This is necessary to allow for contributions measuring an insufficient-budget event to successfully send.

Aggregating separately

Ad techs may wish to process their debug event histogram contributions separately from other contributions. This would be possible by using a different filtering ID for those contributions. This mechanism would also allow for flexibility in which groups of contributions can be processed together or separately.

Note that this design differs slightly from Attribution Reporting’s earlier aggregate debugging proposal. That proposal sends separate debug and non-debug reports and makes sending a debug report deterministic when debug reporting is enabled. It is difficult to adapt this mechanism to Protected Audience buyers without privacy impacts.

Consider reducing maximum number of contributions per report

We plan to pad the encrypted payload, which will increase the size of the payload. To mitigate this size increase, we're considering reducing the number of contributions allowed in each report (e.g. from 50 to 20).

Note also that currently, any additional contributions would be dropped, but we could also consider allowing batching of contributions into multiple reports if this maximum value is limiting. Please let us know if you have any thoughts!

Extending shared storage API to support advanced reach reporting

crossposting from here per @csharrison suggestion

We are excited about the shared storage API and its support of Reach measurement.

Given that Reach is a fundamental metric of brand advertising and that accurate assessment of ad campaign efficiency requires accurate and flexible reach measurement, we would like to request that shared storage API extends its functionality to support advanced scenarios of reach measurement in a privacy safe manner. In particular, it is important that the system scales to thousands of advertisers pulling interactive exploratory reports on the ongoing and finished campaigns daily. We believe high utility for Reach advertisers should be achievable with reasonable privacy budget settings.

Privacy Sandbox is critical for maintaining high quality of reach measurement. In the absence of features of the privacy sandbox discussed below the reach modeling could use domain partitioned cookies as a signal, however lack of cross-domain deduplication signal poses a huge challenge for unique user count deduplication. It is unclear when and if technology of sufficient quality using domain partitioned cookies can be developed. Furthermore development of such technology could potentially pose an extra risk on user privacy, as accurate cross-domain deduplication done in the clear context (compared to on-device nature of Chrome Privacy Sandbox) may have negative effects on user privacy.

Specifically the following functionality is critical to make sure that important reach reporting scenarios can function with high quality while being powered by the shared storage.

Availability of the secure report in the context of the event. The explainer states:
The report contents (e.g. key, value) are encrypted and sent after a delay.

This means that users of the system have to pre-define reporting segments before the ads are served, however, modern advertising reach reporting and optimization use cases enable interactive slicing of the traffic on various criteria directly linked to the event context, such as reporting time window, device type, time of day, location, etc.

Additionally, encoding all of these options into the key rather than creating reports on-demand would put unnecessary strain on the privacy budgets.

On the other hand, it should be possible to add sufficient levels of noise on the final aggregates on demand, to ensure high standards of privacy protection without sacrificing the flexibility of reach slicing without requiring delayed reporting, which further limits the freshness of reporting capabilities.

Therefore we would request you to kindly consider the following approach:

Allow the aggregated report entity to be available immediately in the page javascript context so that at a later time the ad tech would have an option to upload a batch of the reports for further aggregation. Since aggregated reports are returned unconditionally for each impression, its arrival does not provide any extra information for ad-tech.

Since the final aggregated report would have a differentially private noise and appropriate privacy budget tracking, this option would maintain high privacy protection standards. Meanwhile it would keep the reach slicing flexibility that modern reach reporting flows rely upon now.

Alternatively the report could still be returned with a delay, but be accompanied with an event level identifier that would allow it to be joined to the original event.

Enable count_distinct secure aggregation function. So far the explainer only mentions sendHistogramReport function for sending the reports.
The histogram report appears to be insufficient for implementation of Virtual People technology in the privacy sandbox. This technology is used by Google and the Cross-Media Measurement project.

Aggregating Histogram with per-bucket noise is insufficient, because each browser gets mapped to a virtual person and the count of virtual people rather than browsers is important. Histogram is good for counting unique browsers by some partitions, but is incapable of counting unique virtual people.

IAB Audience reach measurement guidelines page 4 reads: "deriving unduplicated audience reach people-based measures from digital activity and other research is the most difficult of the metrics however, it is also inherently the most valuable to users of measurement data."

Providing the count_distinct aggregation function would be enabling a natural implementation of the Virtual People technology and proper differential privacy noise is capable of ensuring high privacy protection standards.

The count_distinct can be implemented for the buckets of the histogram, so that no new type of report would be required.

To support demographic composition and frequency scenario it should be possible to filter histogram buckets based on index and on the range of the value.

Enable pre-aggregation of the reports for further quick combination at serving time with low latency.
Interactive reports are critical to an advertiser's ability to understand the reach of the campaigns that they are running.

To enable interactive exploration the aggregation API would need to provide the ability to pre-aggregate histograms and return an intermediate data structure result encrypted. Then such intermediate reports could be pre-aggregated for atomic reporting units and reach for a collection of reporting units extracted at real time when report is required.

Enable combining reports with first party reports.
Campaigns could be running with some events served on first-party sites, while others on 3rd parties. One way to get deduplicated reach of such campaign accurately would be to let the secure aggregation server to digest histogram that is constructed by the ad-tech in the clear, along with pre-aggregated encrypted histogram.

Secure aggregation should be scaling to impression-level reports.
Each ad impression would be emitting a reach report and secure-aggregation infrastructure should be scalable to large volumes to make sure that the reach use-case is supported.

Again, thank you very much for providing this flexible privacy safe api and thank you very much for your consideration.

Enforce contribution budgets at the origin level

Currently, Chrome enforces contribution budgets (2^16 per 10 min and 2^20 per day) at the site level.

Proposal

We propose that Chrome enforces contribution budgets at the origin level as well. We have multiple AdTechs (i.e., origins) sharing the same site. We would like to enforce contribution budget consumption of each AdTech.

Note that we are proposing enforcement of contribution budgets at BOTH the site and origin levels. We understand that we cannot lift the enforcement at the site level. Otherwise, each site can create an unlimited number of origins to get unlimited contribution budgets.

It would be great if Chrome can support origin-level debugging and monitoring: #131. For example,

privateAggregation.contributeToHistogramOnDebugEvent(
  "insufficient-10min-budget-adtech1", {bucket: 12345n, value: 67n, filteringId: 8n});

Option 1: Percentages (Recommended)

We propose that Chrome allows AdTechs to specify a percentage (from 0 to 100) of the total per-site contribution budgets for each origin. The percentages of all origins under the site do not have to sum to 100. We assume there are 3 AdTechs under the same site (company.com): adtech1.company.com, adtech2.company.com and adtech3.company.com. Suppose the total contribution budgets at the site level is X. Here are several situations.

  1. AdTech1: 50, AdTech2: 30, AdTech3: 20. In this case, the percentages sum to 100. AdTech1 reserves 50% * X, AdTech2 reserves 30% * X, AdTech3 reserves 20% * X. There are no spillovers in this situation.
  2. AdTech1: 40, AdTech2: 40, AdTech3: 40. In this case, the percentages sum to 120 (> 100). This implies that we allow spillovers, namely, we allow each AdTech to consume up to 40% * X until company.com runs out of X at the site level. AdTechs can leverage spillovers to minimize unused contribution budgets cross origins.
  3. AdTech1: 30, AdTech2: 30, AdTech3: 30. In this case, the percentages sum to 90 (< 100). This implies that some contribution budgets are left unused at the site level. This might be less ideal for some AdTechs, but valid use cases for other AdTechs.

We would like to set the default percentage to 0 at the origin level so that new origins won’t interfere with other origins in unexpected ways.

Option 2: Weights

We propose that Chrome allows AdTechs to specify a weight for each origin. To decide the shares of the contribution budgets for each origin, we sum all weights and then divide the weight of each origin by the sum. Unlike Option 1 (the percentage approach), this option requires Chrome to calculate the shares for each origin according to the configurations of all origins. When one origin changes its weight, Chrome needs to recalculate the shares for all origins. We also need Chrome to add a percentage value that applies to all origins to support spillover. Note that with this approach, there is only one spillover value per site.

We assume there are 4 AdTechs under the same site (company.com): adtech1.company.com, adtech2.company.com, adtech3.company.com and adtech4.company.com. Here is an example. AdTech1 weight: 2, AdTech 2 weight: 0, AdTech3 weight:1, AdTech4 weight: 1. The sum is 4. AdTech1 gets 50%, AdTech2 gets 0%, each of AdTech3 and AdTech4 gets 25%. If the spillover value is 10%, AdTech1 may consume up to 60% of the contribution budgets if some contribution budgets are left unused at the site level. Stories for the other 3 AdTechs are similar.

The default weight should be 0 to minimize surprises. For example, origins might be created for testing purposes or accidentally.

Overall, we recommend Option 1 (the percentage approach).

Lastly, both proposals apply to both the per 10 min contribution budgets and the daily contribution budgets.

Questions on the private aggregation API and Aggregation Service scalability

Request:
We would like to have more information about the scaling capabilities of the Aggregated Service.

Background:
The use case that we have in mind is the one from Request for event-level ReportLoss API · Issue #930 · WICG/turtledove, where we would use the Private Aggregation API with the potential trigger described in Request for event-level ReportLoss API · Issue #930 · WICG/turtledove:

Aha! This sounds like a feature we ought to be able to be able to add. Take a look at the section on Triggering reports — if we added a new trigger that was something like reserved.highest-losing-bid, would it address your needs?

Of course we need to figure out the exact semantics. But a solution that would let you get the information you're looking for out of the Private Aggregation API seems like an excellent goal.

Such a trigger would produce an aggregatable report for every component auction a buyer would take part in. This would represent billions of aggregatable reports per hour, which is at least one order of magnitude above what is defined in https://github.com/privacysandbox/aggregation-service/blob/main/docs/sizing-guidance.md. For the use case described in this issue, we’re investigating performing the aggregation daily, so up to 10^14 of reports to be processed, with a set of size up to 10^12 pre-declared bucket keys. To have a sufficiently wide representation of our feature space (hence the high number of pre-declared keys above) and reach an acceptable level of noise for most of buckets, we need to gather a lot of contributions, and ideally avoid applying any sampling strategy.

On a side note, we have several usages in mind regarding the private aggregation API (see Add new reporting signal script-errors · Issue #494 · WICG/turtledove) targeting different aggregation frequency (hourly, daily, …). To be able to properly lever the aggregation service (and satisfy the underlying rules described in https://github.com/WICG/attribution-reporting-api/blob/main/AGGREGATION_SERVICE_TEE.md#privacy-considerations), we would require that a solution such as the one described in https://github.com/patcg-individual-drafts/private-aggregation-api/blob/main/flexible_filtering.md is implemented.

Questions:

  • do you have run experiments at higher scale than what’s described in the guidance? either regarding the number of reports sent or the number of pre-declared aggregation buckets?
  • more generally, what’s the upper bound in terms of load that the Aggregation Service can handle? Would it be up to the figures provided in background section?

Feedback on Contribution bounding value, scope, and epsilon

Hi all,

We're seeking some feedback on the Private Aggregation API's contribution budget. We'd appreciate any thoughts on both the value of the numeric bound as well as the scope (currently per-origin per-day and separate for FLEDGE and Shared Storage).

In particular, one change we're considering is moving the scope from per-origin to per-site. This would mitigate abuse potential for cases of things like wildcard domains which are (arguably) easier to mint than domains to exceed privacy limits. (See more discussion here.)

Thanks!

[January 2024 edit:] Additionally we would like to open the scope of this issue to understand feedback on epsilon. The Aggregation Service currently supports a range up to 64. Note that the Aggregation Service adds noise to summary reports that is distributed according to the Laplace distribution with a mean of zero and a standard deviation

sqrt(2) * L1 / epsilon

where L1 is currently 2^16. We are interested in understanding the smallest value of epsilon required the support the minimum viable functionality of your system

Concrete epsilon values

I'm having an awfully hard time trying to work out what epsilon ($\varepsilon$) is used in Google's trials. I can't find it in the documentation here or on developers.google.com. There are old numbers for the attribution API, but those don't obviously translate across.

My poking around in the aggregation-service code only really indicate that a value is not baked into that code. I did learn that this only seems to have Laplacian noise implemented, which in turn suggests that only L1 sensitivity is being applied; that's something.

I found two mentions of a value of 10 as a default, but the explanatory material says that there are two budgets in play. Those also appear to only be for testing purposes; besides, it looks like local testing deployments can pick any value up to 64 (which is awfully large, though 10 is also quite large, depending on the refresh interval; 10 every 10 minutes provides no meaningful privacy).

Given that the privacy properties of this API depend on this value a great deal, documenting this value is essential.

Consider reducing the delay for reports with a context ID

It may be possible to reduce the delay of reports sent with a context ID, see here.

While this change is web visible, the existing randomized delays mean that reliance on reports arriving at particular times is not currently possible. So, this should not be breaking.

Improve the payload encoding efficiency

Currently, the payloads embedded in aggregatable reports are not as size efficient as they could be. Some possible changes to reduce the size are explored here.

However, changing the payload encoding would be a breaking change for anyone using the plaintext payload included when debug mode is enabled. After debug mode is no longer available (i.e. after third-party cookie deprecation), changing the encoding should be simpler as only the user agent and the aggregation service should require updates. However, on a more minor note, the encrypted payload size and version would also change.

Measuring buyer’s timeout

In PA-API, the seller can specify perBuyerCumulativeTimeouts, but it’s unclear how the buyers can measure:

  • How many times they hit the timeout (out of how many auctions)
  • How many interest groups manage to submit the bid (and how many interest groups in total).

The above metrics are essential for optimizations. Any suggestions or thoughts to get that number?

I am aware that some of those metrics are sent to the seller (https://github.com/WICG/turtledove/blob/main/FLEDGE_extended_PA_reporting.md#reporting-per-buyer-latency-and-statistics-to-the-seller) via Private Aggregation API, but those metrics (and the denominators) will also be interesting for the buyers.

There’s also a discussion in the same article under “Reporting API informal specification” that touches generateBid() hitting timeout, but it’s not obvious how that handles per-auction metrics instead of per interest-group.

What is a delay after which report is sent?

Hi Chrome Team,

Could you please clarify what is the delay for Private Aggregation API after which the report is sent?
I see it was changed for aggregatable reports to 0-10 minutes: https://github.com/WICG/attribution-reporting-api/blob/main/AGGREGATE.md#aggregatable-reports

For Private Aggregation API it is not specified in this or shared storage repository.
On the chrome developer portal it is said up to 1 hour. Do aggregatable report have different delay for Private Aggregation API and Attribution API?

Thanks,
Anatolii

Consider splitting contributions into multiple reports instead of truncating

Feedback from discussion in #44. See also this explainer section. The interaction between this and a providing context ID needs to be thought through still. To avoid performance concerns, we will likely still need a limit on the number of reports.

This would be a web visible change as number of reports being sent would change. However, the main compatibility concern is around feature detection -- developers may wish to know if they can safely use more contributions. One option is to provide a method/attribute with the maximum number of allowed contributions.

Scale estimation for usage in FLEDGE

Hi all,

We’re seeking some rough scale estimates for expected Private Aggregation API usage in FLEDGE auctions once both APIs have shipped and third-party cookies have been deprecated. This will help inform server designs to ensure they can handle the expected traffic.

We’d appreciate any estimates, even if there’s substantial uncertainty, and have provided a template below for providing them in a reply to this issue or through the Privacy Sandbox feedback form. Please feel free to omit answers to any of the questions if they are sensitive.


Template:

Estimates for: [company name]

We plan to participate in FLEDGE auctions as [a bidder/a seller/both a bidder and a seller].

  • We expect to run/participate in [number] of FLEDGE auctions per day
  • We expect to record [number] histogram contributions (i.e. key/value pairs) on average per auction.¹ These histograms will include:
    • [example 1]
    • [example 2]
  • We plan to batch reports [hourly/daily/weekly/other] for processing in the aggregation service.

​​¹ Please include any contributions sent using the reportContributionForEvent() mechanism.

Effects of k-anonymity not described well in the spec

From the spec, it sounds like all Private Aggregation contributions from a non-k-anonymous bid in a Protected Audience auction will processed. According to the Protected Audience explainer (and the Chrome implementation), the only contributions from non-k-anonymous bids that are counted are those that use the "reject-reason" signal.

Private Aggregation API one-week test summary [05.2023]

The aim of the tests is to verify the possibility of using the Private Aggregation API to collect statistics of Fledge auctions and events related to it, specifically the registration of impressions and clicks on the banner. In most cases, our banners consist of a main banner (ad) and "product" ad components.

What mechanisms will we be using?

  1. Registration of 2 bid histograms using contributeToHistogram (CTH).
  2. Registration of 2 win report histograms using contributeToHistogram.
  3. Registration of event histograms in the bidding function using contributeToHistogramOnEvent:
  • 2 reserved.win events.
  • 1 reserved.loss using signalValue (bid-reject-reason).

We register only 1% of histograms related to bid (point 1 and point 3 reserved.loss).

During the tested period, the stable version of Chrome was 113.
Private Aggregation API was enabled on 1% Origin Trial,
Extended Private Aggregation Reporting in FLEDGE was available for 115 (dev/canary) chrome version.
114 is the beta chrome version.

The following results are based on reports received over 7 consecutive days, limited to 5 days based on the schedule date:

  • We took all the reports that have been received within a 7-day period (2023-05-19, 2023-05-26) and limit ourselves to those reports that have a schedule date (2023-05-19, 2023-05-24).

Time

#reports by scheduled_report_day

Time between registering a histogram and the schedule date value.

chart_reg_schedule

Let's compare the moment of receiving the report with the histogram registration

retrieve time - event time

Comparing PAA reports with other sources

Impressions

Reported through contributeToHistogram

chrome version #impressions #first report CTH #second report CTH #first report CTH / imp ratio
Chrome/113.0.0.0 9385457 1598632 1598456 17.03%
Chrome/114.0.0.0 39131 38979 38970 99.61%
Chrome/115.0.0.0 9972 9678 9679 97.05%

During the test period, in the stable version (113), the Private Aggregation API was enabled for 1% of OT traffic, while the Protected Audience accounted for 6%. So expected #first report CTH / imp ratio is ~ 1/6.

Reported through contributeToHistogramOnEvent(‘reserved.win’, …)

chrome version #impressions #first report from event reserved.win #second report from event reserved.win
Chrome/113.0.0.0 9385457 19 19
Chrome/114.0.0.0 39131 310 310
Chrome/115.0.0.0 9972 9757 10034

The extension for the Private Aggregation API was primarily available for version 115, which is why the quantity of reports and impressions is relatively small and susceptible to noise.

For the reserved.loss reports, the values were passed in separate buckets using signalValue with bid-reject-reason. The received values were consistent with the forDebuggingOnly reports from the bidding function.

Comparing debug reports to normal reports.

Normal reports are understand as “non-debug” reports retrieved by /.well-known/private-aggregation/report-protected-audience endpoint

All reports

reports type auctions auction_perc
ONLY normal received 33039 0.69%
both debug and normal received 4507755 94.62%
ONLY debug received 223149 4.68%

Reports sent for 1% of bids through contributeToHistogram

reports type auctions auction_perc
ONLY normal received 19187 1.17%
both debug and normal received 1546760 94.64%
ONLY debug received 68345 4.18%

Reports sent for all impressions through contributeToHistogram

reports type auctions auction_perc
ONLY normal received 13655 0.44%
both debug and normal received 2937018 94.60%
ONLY debug received 153892 4.96%

Summary

  1. During our testing of the Private Aggregation API in Fledge auctions, we encountered a limitation related to reaching the limit of 1000 pending reports in the browser. That’s why we decided to report only 1% of bids.

  2. One notable observation was the difference between debug reports and normal reports, with a gap as high as 5%.

  3. Currently, there is a waiting period of up to 12 hours to receive 95% of the reports. This delay in report delivery can impact the timeliness of data analysis and machine learning processes.

  • Q: Could you consider reducing the delays in report transmission to decrease the waiting time and enable more real-time access to crucial insights.
  1. Furthermore, we would like to utilise the Private Aggregation API for various purposes such as machine learning, reporting, and system monitoring. Each of these use cases has distinct characteristics, with some requiring data to be delivered as quickly as possible (e.g., monitoring), while others prioritise data accuracy and precision. However, it is important to note that reports originating from a specific hour (exact schedule date truncated to the hour) can only be processed once by the Attribution Service (AS).
  1. Expanding the scale of our tests is necessary to achieve credible results and effectively evaluate the Extended Private Aggregation Reporting in FLEDGE. By conducting tests on a larger scale, we can gather more comprehensive data, gain more accurate insights.
  • Q: Would it be possible to extend support for Extended Private Aggregation Reporting in FLEDGE?
  • Q: We would like to plan the next steps regarding the replacement of forDebuggingOnly with Extended Private Aggregation Reporting in FLEDGE. Do we know when forDebuggingOnly will no longer be available?WICG/turtledove#632)

Mitigating data loss and delays due to batching

While batching helps reduce report volume, a simple implementation would risk data loss if the user agent shuts down before a batch is complete. This is especially a concern for reports issued by the winning bidder of a FLEDGE auction, as they could be triggered by window.fence.reportEvent() substantially later, see here for details. Batching would also necessarily delay reports until the batch is complete, again this is especially a concern for the winning bidder case.

To mitigate these concerns, we could consider modifying the batching scope. For example, we could batch contributions conditional on a window.fence.reportEvent() call separate from other contributions from the winning bidder. We could also consider persisting any pending contributions to disk to avoid data loss, allowing the batch to be processed when the user agent starts up next.

Use with Shared Storage API + Origin Trial

I'm glad to see https://github.com/alexmturner/private-aggregation-api! I'm hoping to be able to use this in conjunction with https://github.com/pythagoraskitty/shared-storage for some fraud & abuse use cases to replace some 3P cookie functionality.

I do also want to +1 https://github.com/alexmturner/private-aggregation-api#authentication-and-data-integrity as that's going to be important that we maintain integrity of the aggregated data.

Do we expect an Origin Trial soon with the Private Aggregation and Shared Storage APIs so we have enough time to start experimenting with them?

Thanks!

Report verification for Protected Audience

We need to pick and implement a design for Protected Audience bidders (see here). We also need implement for both bidders and sellers.

This should be backwards compatible as it will be additive, optional functionality.

Batch contributions for the same reporting origin together

The report scheme permits multiple contributions in the same report. We should consolidate some histogram contributions to reduce the volume of reports. For example, calls to sendHistogramReport() during the same Shared Storage operation could be batched together.

Providing global configuration for AdTechs

Current Implementation

Buyers specify and store aggregationCoordinatorOrigin in each interest group when they join or update interest groups. This aggregationCoordinatorOrigin is the choice of the coordinator origin for the buyer’s reports.

const interestGroup = {
...
'privateAggregationConfig': {
'aggregationCoordinatorOrigin': 'https://coordinator.example'
}
};

Sellers specify and store aggregationCoordinatorOrigin in auctionConfig when they call runAdAuction(). This aggregationCoordinatorOrigin is the choice of the coordinator origin for the seller's reports.

const myAuctionConfig = {
...
'privateAggregationConfig': {
'aggregationCoordinatorOrigin': 'https://coordinator.example'
}
};
const auctionResultPromise = navigator.runAdAuction(myAuctionConfig);

During the auction process, buyers and sellers can call privateAggregation.contributeToHistogram() or privateAggregation.contributeToHistogramOnEvent() to make histogram contributions. Specifically, buyers can make histogram contributions from generateBid() and reportWin(); sellers can make histogram contributions from scoreAd() and reportResult().

After the end of the auction (plus a randomized delay), the histogram contributions are bundled together into reports. Chrome fetches keys from the specified coordinator origins (although note that Chrome has caching) to encrypt the reports. Chrome then sends the encrypted reports to buyers and sellers.

Later, when buyers and sellers send the encrypted reports to the Aggregation Service, Chrome fetches keys from the specified coordinator origins to decrypt the reports. Note that the coordinator origin being used for encryption is the one specified when the interest group was joined or updated, and the decryption keys will not be available if the Aggregation Service is being run on or migrated to a different cloud.

Concerns

Our DSPs are above the 10MB limit on kInterestGroupStorageMaxStoragePerOwner. Specifying aggregationCoordinatorOrigin in each interest group increases interest group sizes even further. Our DSPs would like to reduce interest group sizes.

Our DSPs need to repeatedly specify the same aggregationCoordinatorOrigin in each interest group covering multiple endpoints.

Proposals

We propose that Chrome provides global default configuration per AdTech, so that buyers or sellers do not have to repeatedly specify the same aggregationCoordinatorOrigin in each interest group or auctionConfig.

We propose that Chrome provides global configuration at the origin level. Per-site configuration may not provide enough granularity for AdTechs, for example, interest group owners are at the origin level. Some AdTechs may share the same site or even origin. For example, if two DSPs share the same origin and use both AWS and GCP coordinators, DSPs can consider default to the more commonly used coordinator and specify the other coordinator when joining or updating interest groups. Or these two DSPs can consider splitting at the origin level.

This proposal should not raise any privacy concerns. In the current implementation, aggregationCoordinatorOrigin defaults to the AWS coordinator origin for all AdTechs. There are only two versions of coordinators, i.e., AWS and GCP, at this moment, thus this proposal allows AdTechs to default to the GCP version. We only propose that Chrome allows each AdTech to specify the default coordinator origin for itself. The actual keys used for encryption and decryption are still secure as before.

Note that we are not proposing to remove the ability to set aggregationCoordinatorOrigin in interest groups. This feature is still useful. For example, when DSPs migrate from one coordinator to the other coordinator, interest groups can have both coordinators. In this case, Chrome sends two reports, i.e., one for each coordinator. DSPs can compare these two reports during the migration.

In the short term, our DSPs can specify aggregationCoordinatorOrigin in interest groups. However, we would like Chrome to be able to support global configuration per origin before 3PCD so that our DSPs can reduce interest group sizes before 3PCD.

Other Use Cases

Global configuration could potentially be useful for a few different things, for example, specifying contribution limits per AdTech: #81 (comment).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.