Comments (11)
Yes I think it should provide enough protection (just my own opinion). If we can we should validate with real-world testing to see if the data is really necessary though, as I think even without this you should be able to estimate the window with pretty high accuracy.
@johnivdel FYI.
from attribution-reporting-api.
I think this really hinges on whether we need to add random noise to report delays, since the existing API does not have any noise except to stagger queued reports at browser startup.
If we implement what is currently describe in the explainer, then this proposal doesn't seem necessary. If you get a report associated with a click you can tell which report window it came from by looking at the time delta from the click time. The only source of noise here is from browser up-time / startup staggering which hopefully should be minimal.
However, if we feel we want to add noise to the API, we should be analyzing why we are doing it and whether reporting the conversion-window
makes the noise less effective. For instance, in PCM noise is needed because click-time information is not known from a report and conversions are reported based on some delay relative to the conversion-time. We may consider unifying this mode of delay (see privacycg/private-click-measurement#29 for more context) by e.g. adding noise on top of the conversion windows.
If we did something like that, the main thing we want to achieve is for PCM's privacy model to hold (i.e. your click time and your conversion time are well hidden). However, we may be able to come up with other solutions here that don't involve just adding the two delay mechanisms together.
from attribution-reporting-api.
Thanks for the thoughtful response @csharrison !
The only source of noise here is from browser up-time / startup staggering which hopefully should be minimal.
This is entirely possible though I don't have the data to confirm one way or another.
On the other hand, even small inaccuracies in our reports can be significantly problematic for our trust with advertisers. In the past, Facebook has been criticized for inaccuracies in its metrics (of this range) so I hope you can see why I'm keen on trying to get the most accurate data as possible, subject to privacy constraints.
If we did something like that, the main thing we want to achieve is for PCM's privacy model to hold (i.e. your click time and your conversion time are well hidden).
If we report the conversion-window
at a day level resolution, I think we can maintain the privacy of the click time and conversion time. For instance, suppose there's a random delay of 0-24 hours for conversion reports. If we got a conversion with a 7-day window, we would know that the click happened 8-7 days ago, still a 24-hour window, and that the conversion happened in the past 8 days. When you add in the additional noise of browsers being offline at the end of the window, it becomes unknowable.
What do you think? If that doesn't work, how do you think we can move forward here? I'm really keen on finding solutions to let us credit conversions to the right window because advertisers have different targets based on them. Thanks!
from attribution-reporting-api.
Good point. I thought about this with @johnivdel and I think we concluded that reporting a time that omits browser downtime (and associated randomized report delays at startup) is fine for the privacy model of preventing identity joining, but it unfortunately leaks information about user behavior that we might not want to leak: how long it has been since the user last used their browser. This is an unexpected thing to be able to learn from the API so I would be cautious adding it unless it's very necessary.
Note that this problem only exists for the event-level API. For the aggregate API I believe we can send along the time the report was "scheduled" to be sent, since no report can be tied to any individual (unlike the event level API).
However, I wonder how big a problem this is really? e.g. with a reporting window of 1, 7, 28 days you would need a user who converts between day 1-7 to be offline for >2 weeks before they are "mixed in" with users that convert in the 7-28 day window. I think this should be pretty rare.
from attribution-reporting-api.
Though I suppose in some sense you will be able to learn this with any implementation of the event level API to some extent, especially if you configure just a single reporting window. In the existing case if a report comes T time after the last reporting window you know (% some complexity with startup scheduling) that the user hasn't started the browser for T time, so maybe this is moot.
Still, I think I would be interested in seeing if this "implicit noise" is a problem in practice, since it does boost some of the privacy guarantees (like learning the true conversion delay).
from attribution-reporting-api.
Though I suppose in some sense you will be able to learn this with any implementation of the event level API to some extent, especially if you configure just a single reporting window. In the existing case if a report comes T time after the last reporting window you know (% some complexity with startup scheduling) that the user hasn't started the browser for T time, so maybe this is moot.
Yeah, this is why I didn't consider it a new risk. We could always somewhat mitigate the problem by adding some random delay to firing off the report even after the browser is restarted.
You are right that in practice the cross-window case should be pretty rare. If we think the value of being 100% accurate outweighs the potential privacy risks, I'd still prefer this if possible.
Note that this problem only exists for the event-level API. For the aggregate API I believe we can send along the time the report was "scheduled" to be sent, since no report can be tied to any individual (unlike the event level API).
That's great to hear.
from attribution-reporting-api.
Yes I think the natural way to solve this is to introduce some noise to startup that hides browser-uptime a bit more. Right now we add uniform noise to simulate a shuffle, but it might make sense to alter it to be exponential noise, with the key property that it has an unbounded tail, so a user always has some plausible deniability that they got really unlucky but were actually online.
Right now Chrome's uniform delay is between 0 and 5 minutes, so maybe Exponential[1/3] minutes is a reasonable (somewhat conservative) starting point so the mean delay is 3 minutes and there is only a ~3% chance a report will be delayed more than 10 minutes at startup.
The only problem with this approach is that it bunches up reports a bit more than the uniform noise we have currently (lower variance), so timing attacks may be more effective to join impressions from different publishers. We should weigh these two approaches or maybe consider summing the noise from two noise sources to get best of both worlds.
from attribution-reporting-api.
If we label the conversion window, I think we would be tolerant of a wider delay than Exponential[1/3]. We're not trying to figure out when people's browsers are back online; we just want to be able to assign conversions to the right bucket. What do you think?
from attribution-reporting-api.
It seems reasonable to me, but one problem with wider delays is that it has a chance of perpetually delaying conversions from users that use the browser for very short periods of time before closing it (e.g. via Android killing the process). We will likely need to tune some of the parameters with data to make sure we're making the right trade-offs here.
from attribution-reporting-api.
That makes sense. Do you think that will give us enough protections to include the attribution window in the conversion report?
from attribution-reporting-api.
Following up on this:
We've gone in the direction on not adding substantial random delays to reports. In the Aggregate explainer reports are annotated with their scheduled_report_time
.
For parity between the two, it seems useful to also send the scheduled_report_time for event level reports.
This would also be helpful for debugging as it lets the reporting endpoint know what state the browser was working with sending the report.
As noted above in some scenarios you may be learning additional information than before (e.g. a report in the 7 day window that got reported on day 40 because the browser was shutdown can now be tied back to the 7 day window), but this should not affect the worst/"standard" case because the report time is derived from the impression-side data.
from attribution-reporting-api.
Related Issues (20)
- Allow max attributions per rate-limit window during event report replacement
- App-to-Web Click Source Registration HOT 4
- Header validator inconsistent with spec re size checks
- Incorrect Laplace noise formula in your documentation HOT 1
- summary_window_operator field name is potentially misleading
- No HOT 1
- Help on how to set the correct key_piece on Register-Trigger HOT 4
- Verifying Subdomain Conversion in Source Registration Tests HOT 3
- Incorrect Bitwise Operation in aggregatable-histogram creation HOT 5
- <script src="https://platform.linkedin.com/badges/js/profile.js" async defer type="text/javascript"></script> HOT 2
- کیف پول نات کوین
- Attribution & Contribution reporting HOT 3
- Online-to-Offline attribution
- Nd HOT 1
- 89ec73911c9a2bb9 HOT 1
- احرازهویت دومرحله ای HOT 1
- Consider omitting Attribution Reporting request headers when there is none attribution support HOT 4
- Proposal for registering triggers in S2S
- Non valid reports of type "trigger-unknown-error" HOT 1
- Allow capping aggregated contributions for buckets HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from attribution-reporting-api.