## Computer Science Colloquia

*Monday, May 9, 2011*

**Aleksander Morgan**

Advisor: abhi shelat

Attending Faculty: Worthy Martin and Gabe Robins

Olsson 228E, 14:00:00

A Master's Thesis Presentation

**Analysis of the Hidden Subset Sum problem**

In many discrete-log-based protocols, one of the steps required is to generate (*x*, *g**x *mod *p*) for a fixed *g *and arbitrary *x*. Doing so requires multiple exponentiations, which take a nontrivial amount of time and often end up being the most expensive operation. Furthermore, since the numbers being exponentiated do not depend on the message and multiplication is homomorphic, obtaining such pairs can be done using precomputations. One such scheme was proposed by Boyko, Peinado, and Venkatesan, and its security is based on the hardness of the hidden subset sum problem: given *A *_ Z and *V *= {*v*1, ..., *v*m}, each *v*i _ Z/*A*Z, find *S *=*b*1, ..., *b*n, each *b**i *_ Z/*A*Z, such that for all *i*, *v*i = _*b**j *mod *A*, where *b**j *_ *T *for some *T *subset of *S*. Several papers have computed the hardness for high densities of generators; however, in the case of low density, which is the case that would make his approach preferable to other known precomputation methods, no promising results were known.

This paper addresses the hardness of the hidden subset sum problem for low densities. In particular, given a random *T*, we need to calculate on average how far the distribution defined by the *T *is from the uniform distribution. In calculating the standard deviation of the statistical differences, we divide the set of possible (*T*, _*S*) pairs into 2|T| subsets. Then by projecting onto a suitably chosen basis, we are able to eliminate all but three of them from the count. Finally, we bring several theorems which allow us to count the remaining three, and compute the exact value for the average standard deviation, an improvement over the status quo. We then use the same methods to bound the statistical distance between the two distributions. For that, we first introduce terminology into which we recast the problem. Then we introduce an algebraic basis for computing the statistical difference. To simplify the summation quantity, we introduce a geometric interpretation of the summation set, which allows us to simplify the quantity we are counting. Finally, we introduce a combinatorial recasting of the problem which allows us to reduce our geometric summation to a closed form.