Anonymous, zero-server-trust couple matching - Part 1

NOTE:: The algorithm used has been re-worked, and changed since this article was written. Expect a new article with the formal algorithm soon :)

Update: The new article with the latest algorithm is here on my blog. Do read that article as a follow up!

The queerly named Puppy Love platform has been running in my university since 2014, meant to help shy nerds meet their crush, made my the Programming Club (of which, I happen to be the current coordinator).The platform opens 1 week prior to the Valentine’s Day every year, and let’s people choose up to 4 of their crushes, encrypting their choices with their password. At the stroke of the midnight hour, everyone logs back, and their choices are decrypted, and are then matched to other people’s choices. Only the people who matched are informed about it. If your love was unrequited (the other person didn’t like you), you will not get to know. More importantly, if you did not like the other person, you would not know if that person liked you or not.

The above seemed like a very rosy thing, till people began to find flaws. Some of the issues were with the code, and nothing too serious. But as would have, people began to get doubtful about the trust worthiness of the server admins. This year, being my turn to face the onslaught of accusations of looking at people’s choices, I, with Vinayak Tantia, a friend and fellow Coordinator, came up with a much improved algorithm to implement the same thing.

Here’s the biggest issue with such a platform:

You do NOT trust me when I say my code on the server is NOT seeing your choices


What is required is, how I call it, a zero-server-trust based algorithm. Regardless of what code is running on the server, the client should be able to prove that any information it sends to the server shall not reveal any information about the client’s choices.

The algorithm

Email based initial auth

Instead of using the campus FTP or proxy based logins, which would have required you to send your password to the server (thus requiring trust), we send an initial email to the person’s official IIT Kanpur email address, and let the person create a password. The password’s hash is sent to the server for storage. The hash will be used for future logins, and thus the plain password never leaves the client.

Public Private Keys

On the initial password creating, we also create a new public-private keypair on the client itself. The private key is then encrypted with the plaintext password of the user. The plaintext public key and the encrypted private key are then kept on the server.

The usage

The user is allowed to query for people and add them to his tentative list of likes. The queries are routed to the IIT Kanpur internal student search, and thus the server cannot be logging the client’s requests. These choices are saved in a string, which has a few random bytes in the start. This string is encrypted with the private key, and sent to the server for storage and later retrieval. The random string is added to ensure that the server cannot deduce addition or removal of choices. Thus, whether the choices were changed or not, there shall be a change in the encrypted data.

Submission

Finally, when the user is ready to lock his choices, the client fetches the public keys of all the students on campus. This data should be the order of magnitude of a few mega bytes. The client encodes its preference (a boolean) in a string (described later) for each student on the campus, and sends these strings back to the server for storage. Each string is encrypted with the public key of the intended recepient, and thus they are meaningless for the server.

Matching

Note: This section is deprecated in favor of the algorithm described in the follow up post.

The most tricky part. How does the server match two people, without knowing any of their choices? Turns out, there’s something called Homomorphic computation, which is intended for this very purpose. Here’s a motivating example:


$$ m_1^{e} \times m_2^{e} = (m_1 \times m_2)^{e}\\ ((m_1 \times m_2)^{e})^{d} = (m_1 \times m_2) $$

So basically, the server could compute the product of m1 and m2, and convey the result to both the parties. They could separately get back the value of m1 × m2, without the server ever coming to know the product. All this requires is a shared public private key pair between every two people on the campus.

Of course, we’re not going to be using product, since a product has enough information to get the other person’s response. What is required is a message of the sort such that it encodes the choice, but can not be inverted to get the choice. Only when it is used with another message containing a positive choice is it possible to get any information out of the message.

Update: The above technique means that if the database is compromised, any person can take the other person’s response, forge a new response from his side, and see what choices were filled. To disallow this, the algorithm needs to be changed into a two-step method (both people share their choice in the form of two messages), where the second message is dependent on the first message of the other person. We are still working on deciding the exact form to be taken.

Server requirements

There shall be a large number of entries in the database. For each pair of people, there shall be one entry in the database storing their messages, and their shared secret. The server is only expected to be providing the service it is claiming to provide, that is, letting people match. Thus, we only assume that the server shall not be trying to jeopardize the whole matching game by not playing its part. At no point is the server expected to maintain any level of trust.

Conclusion

The project is being written using Node and JavaScript, and the source code is publicly available on Github here. We hope to finish it within a month, considering that the semester is on.