I am a PhD student at ETH Zurich under the supervision of Kenny Paterson. I completed both my bachelor’s and master’s degrees in Mathematics at the University of Zagreb University of Zagreb.
research
I am interested in cryptography and probabilistic data structures.
Probabilistic data structures (PDS) are compact representations of high-volume data that provide approximate answers to queries about the data. They are commonplace in today’s computing systems, finding use in databases, networking and more. While PDS are designed to perform well under benign inputs, they are frequently used in applications where inputs may be adversarially chosen. This may lead to a violation of their expected behaviour, for example an increase in false positive rate.
Privacy Implications of AMQ-Based PQ TLS Authentication
During the TLS 1.3 handshake, an entity (e.g., a client application) commonly transmits a chain of certificates and certificates’ digital signatures to the remote party to authenticate themselves. Towards a transition to post-quantum TLS, several proposals have been made to switch to post-quantum digital signatures in these certificate chains.Since post-quantum digital signatures are much larger than those based on classical assumptions, there has been a significant line of work aiming to reduce the overhead required for post-quantum authentication. Notably, the privacy implications of the proposed changes to the authentication system in TLS have not been thoroughly evaluated. Several of these proposals suggest Intermediate Certificate Authority (ICA) suppression, in which certificates that are already known to the client can be removed from the certificate chain to reduce the communication cost. One approach uses probabilistic data structures to transmit this information; however, this technique introduces additional privacy leakage, revealing to the server all ICAs whose certificates the client used to authenticate one of their past connections.In this work, we evaluate the privacy implications of taking such approaches to ICA suppression and the severity of its impact on TLS clients. In doing so, we perform an exploratory analysis on the current state of certificate-based PKI, particularly focusing on the distribution of ICAs. We define an adversarial model and a set of experiments to concretely evaluate the privacy leakage under the outlined assumptions. Our work suggests there is a risk in including this additional vector of information to adversarial servers aiming to execute website fingerprinting attacks.
Probabilistic Data Structures in the Wild: A Security Analysis of Redis
Mia Filić, Jonas Hofmann, Sam A. Markelon, and 2 more authors
Redis (Remote Dictionary Server) is a general purpose, in-memory database that supports a rich array of functionality, including various Probabilistic Data Structures (PDS), such as Bloom filters, Cuckoo filters, as well as cardinality and frequency estimators. These PDS typically perform well in the average case. However, given that Redis is intended to be used across a diverse array of applications, it is crucial to evaluate how these PDS perform under worst-case scenarios, i.e., when faced with adversarial inputs. We offer a comprehensive analysis to address this question. We begin by carefully documenting the different PDS implementations in Redis, explaining how they deviate from those PDS as described in the literature. Then we show that these deviations enable a total of 10 novel attacks that are more severe than the corresponding attacks for generic versions of the PDS. We highlight the critical role of Redis’ decision to use non-cryptographic hash functions in the severity of these attacks. We conclude by discussing countermeasures to the attacks, or explaining why, in some cases, countermeasures are not possible.
Compact Frequency Estimators in Adversarial Environments
Count-Min Sketch (CMS) and HeavyKeeper (HK) are two realizations of a compact frequency estimator (CFE). These are a class of probabilistic data structures that maintain a compact summary of (typically) high-volume streaming data, and provides approximately correct estimates of the number of times any particular element has appeared. CFEs are often the base structure in systems looking for the highest-frequency elements (i.e., top-K elements, heavy hitters, elephant flows). Traditionally, probabilistic guarantees on the accuracy of frequency estimates are proved under the implicit assumption that stream elements do not depend upon the internal randomness of the structure. Said another way, they are proved in the presence of data streams that are created by non-adaptive adversaries. Yet in many practical use-cases, this assumption is not well-matched with reality; especially, in applications where malicious actors are incentivized to manipulate the data stream. We show that the CMS and HK structures can be forced to make significant estimation errors, by concrete attacks that exploit adaptivity. We analyze these attacks analytically and experimentally, with tight agreement between the two. Sadly, these negative results seem unavoidable for (at least) sketch-based CFEs with parameters that are reasonable in practice. On the positive side, we give a new CFE (Count-Keeper) that can be seen as a composition of the CMS and HK structures. Count-Keeper estimates are typically more accurate (by at least a factor of two) than CMS for “honest” streams; our attacks against CMS and HK are less effective (and more resource intensive) when used against Count-Keeper; and Count-Keeper has a native ability to flag estimates that are suspicious, which neither CMS or HK (or any other CFE, to our knowledge) admits.
Adversarial Correctness and Privacy for Probabilistic Data Structures
Mia Filić, Kenneth Paterson, Anupama Unnikrishnan, and 1 more author
We study the security of Probabilistic Data Structures (PDS) for handling Approximate Membership Queries (AMQ); prominent examples of AMQ-PDS are Bloom and Cuckoo filters. AMQ-PDS are increasingly being deployed in environments where adversaries can gain benefit from carefully selecting inputs, for example to increase the false positive rate of an AMQ-PDS. They are also being used in settings where the inputs are sensitive and should remain private in the face of adversaries who can access an AMQ-PDS through an API or who can learn its internal state by compromising the system running the AMQ-PDS. We develop simulation-based security definitions that speak to correctness and privacy of AMQ-PDS. Our definitions are general and apply to a broad range of adversarial settings. We use our definitions to analyse the behaviour of both Bloom filters and insertion-only Cuckoo filters. We show that these AMQ-PDS can be provably protected through replacement or composition of hash functions with keyed pseudorandom functions in their construction. We also examine the practical impact on storage size and computation of providing secure instances of Bloom and insertion-only Cuckoo filters.