Skip to main content

Minutes IETF116: pearg: Wed 04:00
minutes-116-pearg-202303290400-00

Meeting Minutes Privacy Enhancements and Assessments Research Group (pearg) RG
Date and time 2023-03-29 04:00
Title Minutes IETF116: pearg: Wed 04:00
State Active
Other versions markdown
Last updated 2023-04-06

minutes-116-pearg-202303290400-00

Chair welcome ("PearG"):
Note well / Wear masks, in person.

Draft updates (5 mins)

  • RG draft statuses

    • IP Address Privacy Considerations:

      • No recent updates since the last meeting, but updates coming
        soon
    • Censorship:

      • Recent update
    • Numeric IDs

      • Sent to RFC editor
    • Safe Internet measurements:

      • Review
      • Maybe interesting for PPM, as well

Presentations (100 mins)

  • Interoperable Private Attribution (Martin Thomson) - 30 mins

    • Attribtion: important piece of the ad industry
    • Trains!
    • Let's talk about the Tokyo subway system
    • Actually, let's talk about identifiers, like access cards (e.g.,
      PASMO)
    • Using passenger tracking for the purpose of capacity planning,
      performance, etc.
    • Specifically, for systems that track when a person enters the
      system and when the person exits
    • But logs are a privacy risk and can be used for other purposes,
      even if they are inherently pseudonymous - identities could be
      linked.
    • Can we create a design that aggregates the data that's
      interesting, and provides individual privacy?
    • One design is using tokens with buckets
    • Tokens need to be:

      • anonymous
      • authenticated
      • time-delayed "opening"/redemption
      • ephemeral
    • Moving on to advertising

    • Attribution: information from one context and linking it in a
      different context
    • Answer a question: "How many people saw the ad, then came to the
      show?"
    • Understanding whether certain advertising is working:

      • good placment
      • creatives
      • how much to spend
      • how long to run campaigns
    • Current, cross-context attribution allows linking people across
      contexts

    • With advertising, the context is everything:

      • Whether an ad was shown, and if that ad was clicked
      • Was a product puchased, or not
      • where was the ad shown
    • Interoperable Private Attribtion (IPA)

      • People have an identifier (significant protections against
        revealing the identifier)
      • Sites can request an encrypted and secret-share of that
        identifier
      • Sites have a view of the identifier, but it's not linkable
        cross-site
    • Attribution in MPC (multi-party computation)

      • sites gather events
      • MPC decrypts identifiers and performs attribution
      • aggregated results are the output (histogram)
    • MPC does not, itself, see the original query

    • MPC:

      • Any computation if you only need addition and multiplication
      • It can be expensive
      • IPA uses a three-party, honest-majority threat model
    • Differential Privacy

      • (epsilon, delta)-DP for hiding individual contributions
      • Every site gets a query budget that renews each epoch (e.g.,
        week)
      • This does provide leakage across time (epochs), more
        research needed in this area
      • Parameters are not fixed yet
    • Client's encrypted identifiers are bound to a site, they are
      bound to:

      • the site that requested them
      • the epoch/week they are requested
      • the type of event: source (ad), trigger (purchase)
    • IPA: advances and challenges

      • IPA's flexibility provides somewhat of a drop-in replacment
        for current anti-fraud systems
      • IPA's flexibility hurts accountability

        • Existing challenge in making the system auditable
      • MPC performance is a challenge, especially at the scale of
        10s of billions

    • Status: Good progress, overall, but still requires research in
      some areas

    • Currently running some synthetic trials
    • Ongoing work in W3C working groups, protocol may come to PPM in
      the future

    • Brian Trammel: MPC performance is a challenge. Computation or
      communication complexity?

    • MT: A lot is algorithmic (linear), but some of that will likely
      improved, but much of it is communication cost. Originally,
      records were working on the order of ~40GB, but it's still
      mutli-gigabytes in size
    • Chris Wood: 1) What was the MPC functionality you needed (as
      defined by the existing adtech industry), 2) Now that
      functionality is defined, and how you implement. How did you
      reach this design?
    • MT: Need more time. Lots of people took the steps to get here.
      Apple's PCM took an initial approach. This is mostly about
      understanding how the advertising industry uses measurement as a
      core part of their processes. There is a "need" vs. "want"
      different of perspective by different parties, and those
      discussions are on-going. If you add cross-device attribution,
      it gets more complicated.
    • CW: There is an academic research community that has spent a lot
      of time designing MPC protocols. There seems to be some overlap
      and collaboration opportunity here.
    • Shivan: Who would run the servers in the MPC protocol?
    • MT: We need to trust them to not collude - to be determined
    • Jonathan Hoyland: If it's run by a third-party that is running
      an auction, what are the guarantees that they're actually
      running the MPC protocol
    • MT: Currently leaning on the oversight / auditing.
    • JH: Can the response include a proof?
    • MT: Recently asked if Verifiable MPC was considered - but VMPC
      is not ready yet. So, "trust and verify" is the current approach
  • Secure Partitioning Protocols (Phillipp Schoppmann) - 20 mins

    • Let's go more into details for scaling aggregation computations
    • Billions of impressions from billions of clients
    • ALl clients submit their reports to the MPC cluster
    • MPC outputs the aggregate results
    • Goals

      • When sharding the MPC cluster, every client must use the
        same shard
      • We need a private mechanism for mapping one client to the
        same shard
      • This should have low communication cost
      • "correctness" must not be affected
    • Assumptions:

      • Bound on the number of contribitions
      • Many clients, fewer shards
    • Blueprint: partitioning from distributed OPRFs

      • client has an index (i), and payload (v)
      • One server has an OPRF key (server 1)
      • Other server (server 2) will learn the result of OPRF
        computation
      • server 1 must add some padding queries
      • Server 2's output of OPRF is used for mapping client to
        target partition
    • Dense Partitioning: OPRF Output = Shard ID

    • If there are only a small set of shards, then this is reasonable
    • Sparse Partitioning: OPRF Output = Random Client ID

      • Can the client's reports be aggregated before the MPC
        computation?
      • This doesn't result in creating a client identifier because
        server 1 pads the set of known client identifier if dummy
        values, so server 2 can't distinguish between real users and
        fake users
    • How can the sparse histogram be private without seeing the
      actual histogram?

      • View the output of the OPRF as a histogram
      • Make sure frequency can't be linked to specific users
      • Choose a threshold, below threshold add dummy values, above
        threshold [..] (?)
    • Conclusion: efficient for these use cases

    • Next steps: Is there general interest? Are there other protocols
      where this might be useful? Are there other properties that are
      needed?

    • Chris Patton: Definitely interesting, but maybe not as an
      independent draft

    • PS: So, add this into individual drafts, instead of making a
      general purpose protocol
    • CP: Yes
    • Martin Thomson: The bounds seem to be fundemental. How confident
      are you that these are required costs?
    • PS: The numbers are not the absolute lower bound, they are based
      on the curent design described in this presentation
    • MT: IPA may not be able to set an upper bound on the number of
      contributions, for example due to a Sybil attack
    • PS: While any party can create reports, but fraudulent reports
      may be able to be filtered downstream
  • DP3T: Deploying decentralized, privacy-preserving proximity tracing
    (Wouter Lueks) - 25 mins

    • D3-PT, started back in March 2020, first draft in May 2020,
      September 2020 - Summer 2021 working on presence tracing
    • Non-traditional academic environment - scaling to millions of
      users on a small timescale
    • Relying on existing infrastructure had a large impact
    • The system was designed that they were purpose-built and
      couldn't be re-used for other purposes
    • Risks associated with digital contact tracing:

      • Must embed social contact / graph
      • location tracing
      • medical information
      • social interactions
      • social control risk
    • Time has shown what can go wrong with designs/deployments like
      this

      • Police departments in crime solving
      • data leaks
      • harassment of specific subgroups
    • It is very important that systems should be designed with
      purpose-limitations in mind, so they can't be easily abused in
      other ways

    • Relying on existing infrastructure, using phones with BTLE
      sending beacons
    • Proximity can be derived based on the beacons they saw
    • Exposure notification works by the set intersection of beacons
      the person (who tested positive) saw and all of the identifiers
      that another person broadcast
    • The design of these beacon broadcasts required that the OS
      vendor must be involved
    • While the design was relatively simple, relying on existing
      hardware made the situation more difficult/complicated
    • The result of collaboration with Google/Apple, was the
      Google/Apple Exposure Notification (GAEN) Framework/API
    • For full effect, you need privacy at all layers of the stack,
      including the bluetooth protocl stack

      • MAC address must rotate at the same time as the beacons
    • Similarly, at the network layer, a network adversary can detect
      uploading the report of seen beacon identifiers (when reporting
      covid positive) - CH used dummy uploads to hide

    • Lessons learned:

      • Purpose limitations
      • context matters (how/where they are deployed)
      • Privacy at all layers
    • Tommy Pauly: More comment than questions: for privacy at all
      layers, Apple is routing upload report through iCPR

    • WL: While this is great, there might be other sidechannels we
      need to look at
    • XXX: How do you authenticate IDs?
    • WL: There isn't any binding, but the upload requires knowing the
      underlying seed from which the beacon was derived
    • Chris Wood: What would've an ideal interface looked like, and
      how would you've designed it differently?
    • WL: The strictness provided protections, but it introduced
      challenges, as well. There isn't an easy answer.
  • LogPicker: Strengthening Certificate Transparency Against Covert
    Adversaries (Alexandra Dirksen) - 25 mins

    • HTTPS is mostly a default now (90%+ of all page loads are https
      in chrome)
    • CAs are the trust anchors of the Web PKI
    • There are recent illicit certificate creations, and seemingly
      increasing

      • WoSign
      • Digicert
      • Diginotar
      • Comodo
      • TurkTrust
    • For rogue certificates, where you get a certificate for a domain
      that you don't own (e.g., HTTPS interception)

    • In the attacker scenario, a covert attacker obtaining a rogue
      certificate
    • Certificate transparency overview
    • CT is still vulnerable to this attack

      • All logs belong to a CA vendor
      • First compromise was in 2020
      • vulnerable to collaboration attacks
      • vulnerable to split view attack
    • Gossip is proposed as a mitigation for Split View attacks

    • LogPicker: a decentralized approach

      • CA contacts one log (leader) from a large set of logs (log
        pool)
      • Leader then contacts the other logs in the pool
      • the pool then selects one log, at random
      • The selected log includes the certificate in its merkle tree
      • The logs that participated in choosing the log create a
        proof, and that proof is aggregated and sent back to the CA
        for inclusion in the certificate
    • This design meets the goals

    • Chris Wood: The log pool uses an election protocol?
    • AD: Yes, two protocols
    • CW: Have you looked at alternative solutions that use threshold
      signing?
    • AD: The aggregated signature uses BLS, but which signature
      scheme is used is not strictly defined