Skip to main content

Minutes IETF117: pearg: Fri 16:30
minutes-117-pearg-202307281630-00

Meeting Minutes Privacy Enhancements and Assessments Research Group (pearg) RG
Date and time 2023-07-28 16:30
Title Minutes IETF117: pearg: Fri 16:30
State Active
Other versions markdown
Last updated 2023-07-31

minutes-117-pearg-202307281630-00

Privacy Enhancements and Assessments Research Group

IETF 117, San Francisco
Friday Session I, July 28, 2023, 09.30 - 11.30 am Pacific

Notetakers: (add your name here), Mallory Knodel

draft update: safe-internet-measurement

Mallory: recent revision, this is the first full draft. please review
contents now. not meant to be a long document and does include guidance
on each section. proactively reaching out to Tor Safety Board, which has
their own guidelines for measurements of the Tor network, will also send
to maprg.

are there other groups that would be good to review?
Shivan: ppm.

talk: privacy in language models

By Reza Shokri:
https://datatracker.ietf.org/doc/slides-117-pearg-privacy-in-language-models/

Discussion:

Jonathan Hoyland: How can it be that the attacker already has the data--
he's already won before the game started? (how the model trains on data
in this unless they have all of the data?)

Reza: the attacker doesn't know if your data is in the training set.

Jonathan: if the attacker has the data then it doesn't matter if it's in
the set or not. that's not measuring the right thing.

Reza: if i run an algorithm to train a model it should extract patterns
from the data set. it shouldn't "remember" or produce a specific piece
of data like a social security number. It should learn on fake numbers,
not my number.

Chris Lemon: Is this vulnerable to "guessed" data, like medical data,
could they guess that a participant in the study is prone to some
condition.

Reza: Rephrasing, is this way of evaluating privacy across whether it
can predict the same as ensuring specific personal data is in the set.
We are developing an auditing technique to decide the certainty for the
adversary in finding specific data that is in the set.

Nick Doty: Is deleting data a path forward?

Reza: Use publicly intended data and nobody owns that and use private
personalisation are related to this. It's a challenging problem for
smaller models and differential privacy pre-removes data from the set
but in the context of language models i'm not sure this would work. We
should try to reduce in advance contamination to avoid those issues.

talk: PrivacyTests.org -- open source tests of web browser privacy"

By Arthur Edelstein:
https://datatracker.ietf.org/doc/slides-117-pearg-privacytestsorg-open-source-tests-of-web-browser-privacy/

Discussion:

Tommy Pauly: Add to the list of reasons why not is cost because the cost
of doing all the IP address blocking etc is what needs figuring out in
order to improve this. Not for lack of desire.

If we're looking at Tor then let's include masque proxies and other
proxies, too, for private browsing as techniques you haven't yet looked
at.

Matthew Finkle: On cross-session tests we need to decide what a session
means. Cookies persist beyond when a browser window is open and when
it's closed.

Have you talked with SDOs about moving these tests there, like WHATWG,
etc?

Arthur: I'd be happy to discuss that and I'm open to different ideas.

Nick Doty: Many of these functionalities are being used by websites so
user/abuse education among web admins is needed and that might be a
barrier to adoption (from the earlier part of the talk).

Arthur: There's a lot that can be fixed without compromising web
functionality but there are such things and the goal should be to find
realistic agreement on what the browser should limit because of leaking.
Everything is partitionable as far as i can see, for example. We don't
want to use functionality as an "out".

talk: Security and Privacy Implications of Transient Numeric Identifiers

By Fernando Gont/Ivan Arce:
https://datatracker.ietf.org/doc/slides-117-pearg-security-and-privacy-implications-of-transient-numeric-identifiers/

Discussion:

Nick Doty: Have you considered threats when IDs are rotated at different
times and gives some identity overlap?

Fernando: Reuse of IDs is something we considered and we suggest not to
reuse across different layers when not needed.

Ivan: Other examples like this. Stable addreses across networks in IPv6
you could track, too. More generally there are problems when you have an
ID field in your protocol and you add semantics that are not needed. You
have an ID and you have topological info. You have an ID or you need an
ordering property. You have a unique ID in your protocol but some
implementer wanted a value in the ID. You start breaking the security
and privacy requirements with the semantics of these fields, creating
unintended risks. We are trying to make this more mandatory for protocol
designers so they are aware of what is needed and not needed.

Jonathan Hoyland: Do you describe the difference between this and
cryptographic channel bindings?

Fernando & Ivan: We do not talk about this specifically, no.

Fernando: Concretely QUIC was specified at the time of writing and they
didn't have transnumeric IDs defined very well. Let's not care because
we use TLS. Even if you use cryptographic techniques and you use IDs and
port numbers you still need to follow this guidance. You could have IDs
that are selected from the global counter and that might leak, for
example.

Ivan: Even if your protocol goes over an encrypted channel that doesn't
give you a free pass to generate IDs in whatever way you want because
there might be problems.

Jonathan: Channel bindings is how you do that.

(taken to the chat.)

talk: France's Recent Proposals for DNS Blocking in Browsers

By Mallory Knodel:
https://datatracker.ietf.org/doc/slides-117-pearg-proposed-laws-on-dns-blocking/

Discussion: