Minutes IETF116: pearg: Wed 04:00
minutes-116-pearg-202303290400-00
Meeting Minutes | Privacy Enhancements and Assessments Research Group (pearg) RG | |
---|---|---|
Date and time | 2023-03-29 04:00 | |
Title | Minutes IETF116: pearg: Wed 04:00 | |
State | Active | |
Other versions | markdown | |
Last updated | 2023-04-06 |
Chair welcome ("PearG"):
Note well / Wear masks, in person.
Draft updates (5 mins)
-
RG draft statuses
-
IP Address Privacy Considerations:
- No recent updates since the last meeting, but updates coming
soon
- No recent updates since the last meeting, but updates coming
-
Censorship:
- Recent update
-
Numeric IDs
- Sent to RFC editor
-
Safe Internet measurements:
- Review
- Maybe interesting for PPM, as well
-
Presentations (100 mins)
-
Interoperable Private Attribution (Martin Thomson) - 30 mins
- Attribtion: important piece of the ad industry
- Trains!
- Let's talk about the Tokyo subway system
- Actually, let's talk about identifiers, like access cards (e.g.,
PASMO) - Using passenger tracking for the purpose of capacity planning,
performance, etc. - Specifically, for systems that track when a person enters the
system and when the person exits - But logs are a privacy risk and can be used for other purposes,
even if they are inherently pseudonymous - identities could be
linked. - Can we create a design that aggregates the data that's
interesting, and provides individual privacy? - One design is using tokens with buckets
-
Tokens need to be:
- anonymous
- authenticated
- time-delayed "opening"/redemption
- ephemeral
-
Moving on to advertising
- Attribution: information from one context and linking it in a
different context - Answer a question: "How many people saw the ad, then came to the
show?" -
Understanding whether certain advertising is working:
- good placment
- creatives
- how much to spend
- how long to run campaigns
-
Current, cross-context attribution allows linking people across
contexts -
With advertising, the context is everything:
- Whether an ad was shown, and if that ad was clicked
- Was a product puchased, or not
- where was the ad shown
-
Interoperable Private Attribtion (IPA)
- People have an identifier (significant protections against
revealing the identifier) - Sites can request an encrypted and secret-share of that
identifier - Sites have a view of the identifier, but it's not linkable
cross-site
- People have an identifier (significant protections against
-
Attribution in MPC (multi-party computation)
- sites gather events
- MPC decrypts identifiers and performs attribution
- aggregated results are the output (histogram)
-
MPC does not, itself, see the original query
-
MPC:
- Any computation if you only need addition and multiplication
- It can be expensive
- IPA uses a three-party, honest-majority threat model
-
Differential Privacy
- (epsilon, delta)-DP for hiding individual contributions
- Every site gets a query budget that renews each epoch (e.g.,
week) - This does provide leakage across time (epochs), more
research needed in this area - Parameters are not fixed yet
-
Client's encrypted identifiers are bound to a site, they are
bound to:- the site that requested them
- the epoch/week they are requested
- the type of event: source (ad), trigger (purchase)
-
IPA: advances and challenges
- IPA's flexibility provides somewhat of a drop-in replacment
for current anti-fraud systems -
IPA's flexibility hurts accountability
- Existing challenge in making the system auditable
-
MPC performance is a challenge, especially at the scale of
10s of billions
- IPA's flexibility provides somewhat of a drop-in replacment
-
Status: Good progress, overall, but still requires research in
some areas - Currently running some synthetic trials
-
Ongoing work in W3C working groups, protocol may come to PPM in
the future -
Brian Trammel: MPC performance is a challenge. Computation or
communication complexity? - MT: A lot is algorithmic (linear), but some of that will likely
improved, but much of it is communication cost. Originally,
records were working on the order of ~40GB, but it's still
mutli-gigabytes in size - Chris Wood: 1) What was the MPC functionality you needed (as
defined by the existing adtech industry), 2) Now that
functionality is defined, and how you implement. How did you
reach this design? - MT: Need more time. Lots of people took the steps to get here.
Apple's PCM took an initial approach. This is mostly about
understanding how the advertising industry uses measurement as a
core part of their processes. There is a "need" vs. "want"
different of perspective by different parties, and those
discussions are on-going. If you add cross-device attribution,
it gets more complicated. - CW: There is an academic research community that has spent a lot
of time designing MPC protocols. There seems to be some overlap
and collaboration opportunity here. - Shivan: Who would run the servers in the MPC protocol?
- MT: We need to trust them to not collude - to be determined
- Jonathan Hoyland: If it's run by a third-party that is running
an auction, what are the guarantees that they're actually
running the MPC protocol - MT: Currently leaning on the oversight / auditing.
- JH: Can the response include a proof?
- MT: Recently asked if Verifiable MPC was considered - but VMPC
is not ready yet. So, "trust and verify" is the current approach
-
Secure Partitioning Protocols (Phillipp Schoppmann) - 20 mins
- Let's go more into details for scaling aggregation computations
- Billions of impressions from billions of clients
- ALl clients submit their reports to the MPC cluster
- MPC outputs the aggregate results
-
Goals
- When sharding the MPC cluster, every client must use the
same shard - We need a private mechanism for mapping one client to the
same shard - This should have low communication cost
- "correctness" must not be affected
- When sharding the MPC cluster, every client must use the
-
Assumptions:
- Bound on the number of contribitions
- Many clients, fewer shards
-
Blueprint: partitioning from distributed OPRFs
- client has an index (i), and payload (v)
- One server has an OPRF key (server 1)
- Other server (server 2) will learn the result of OPRF
computation - server 1 must add some padding queries
- Server 2's output of OPRF is used for mapping client to
target partition
-
Dense Partitioning: OPRF Output = Shard ID
- If there are only a small set of shards, then this is reasonable
-
Sparse Partitioning: OPRF Output = Random Client ID
- Can the client's reports be aggregated before the MPC
computation? - This doesn't result in creating a client identifier because
server 1 pads the set of known client identifier if dummy
values, so server 2 can't distinguish between real users and
fake users
- Can the client's reports be aggregated before the MPC
-
How can the sparse histogram be private without seeing the
actual histogram?- View the output of the OPRF as a histogram
- Make sure frequency can't be linked to specific users
- Choose a threshold, below threshold add dummy values, above
threshold [..] (?)
-
Conclusion: efficient for these use cases
-
Next steps: Is there general interest? Are there other protocols
where this might be useful? Are there other properties that are
needed? -
Chris Patton: Definitely interesting, but maybe not as an
independent draft - PS: So, add this into individual drafts, instead of making a
general purpose protocol - CP: Yes
- Martin Thomson: The bounds seem to be fundemental. How confident
are you that these are required costs? - PS: The numbers are not the absolute lower bound, they are based
on the curent design described in this presentation - MT: IPA may not be able to set an upper bound on the number of
contributions, for example due to a Sybil attack - PS: While any party can create reports, but fraudulent reports
may be able to be filtered downstream
-
DP3T: Deploying decentralized, privacy-preserving proximity tracing
(Wouter Lueks) - 25 mins- D3-PT, started back in March 2020, first draft in May 2020,
September 2020 - Summer 2021 working on presence tracing - Non-traditional academic environment - scaling to millions of
users on a small timescale - Relying on existing infrastructure had a large impact
- The system was designed that they were purpose-built and
couldn't be re-used for other purposes -
Risks associated with digital contact tracing:
- Must embed social contact / graph
- location tracing
- medical information
- social interactions
- social control risk
-
Time has shown what can go wrong with designs/deployments like
this- Police departments in crime solving
- data leaks
- harassment of specific subgroups
-
It is very important that systems should be designed with
purpose-limitations in mind, so they can't be easily abused in
other ways - Relying on existing infrastructure, using phones with BTLE
sending beacons - Proximity can be derived based on the beacons they saw
- Exposure notification works by the set intersection of beacons
the person (who tested positive) saw and all of the identifiers
that another person broadcast - The design of these beacon broadcasts required that the OS
vendor must be involved - While the design was relatively simple, relying on existing
hardware made the situation more difficult/complicated - The result of collaboration with Google/Apple, was the
Google/Apple Exposure Notification (GAEN) Framework/API -
For full effect, you need privacy at all layers of the stack,
including the bluetooth protocl stack- MAC address must rotate at the same time as the beacons
-
Similarly, at the network layer, a network adversary can detect
uploading the report of seen beacon identifiers (when reporting
covid positive) - CH used dummy uploads to hide -
Lessons learned:
- Purpose limitations
- context matters (how/where they are deployed)
- Privacy at all layers
-
Tommy Pauly: More comment than questions: for privacy at all
layers, Apple is routing upload report through iCPR - WL: While this is great, there might be other sidechannels we
need to look at - XXX: How do you authenticate IDs?
- WL: There isn't any binding, but the upload requires knowing the
underlying seed from which the beacon was derived - Chris Wood: What would've an ideal interface looked like, and
how would you've designed it differently? - WL: The strictness provided protections, but it introduced
challenges, as well. There isn't an easy answer.
- D3-PT, started back in March 2020, first draft in May 2020,
-
LogPicker: Strengthening Certificate Transparency Against Covert
Adversaries (Alexandra Dirksen) - 25 mins- HTTPS is mostly a default now (90%+ of all page loads are https
in chrome) - CAs are the trust anchors of the Web PKI
-
There are recent illicit certificate creations, and seemingly
increasing- WoSign
- Digicert
- Diginotar
- Comodo
- TurkTrust
-
For rogue certificates, where you get a certificate for a domain
that you don't own (e.g., HTTPS interception) - In the attacker scenario, a covert attacker obtaining a rogue
certificate - Certificate transparency overview
-
CT is still vulnerable to this attack
- All logs belong to a CA vendor
- First compromise was in 2020
- vulnerable to collaboration attacks
- vulnerable to split view attack
-
Gossip is proposed as a mitigation for Split View attacks
-
LogPicker: a decentralized approach
- CA contacts one log (leader) from a large set of logs (log
pool) - Leader then contacts the other logs in the pool
- the pool then selects one log, at random
- The selected log includes the certificate in its merkle tree
- The logs that participated in choosing the log create a
proof, and that proof is aggregated and sent back to the CA
for inclusion in the certificate
- CA contacts one log (leader) from a large set of logs (log
-
This design meets the goals
- Chris Wood: The log pool uses an election protocol?
- AD: Yes, two protocols
- CW: Have you looked at alternative solutions that use threshold
signing? - AD: The aggregated signature uses BLS, but which signature
scheme is used is not strictly defined
- HTTPS is mostly a default now (90%+ of all page loads are https