Zero Knowledge Proofs and Merkle Trees – An overview before diving into it

If you care about your personal data and preferences (and you get more and more worried with every data leak out there), like your official identification documents, social security or health insurance numbers, you probably encountered a few situations where you questioned if the service you’re going to consume or subscribe to really needs all that information. What does it do with your data, you may ask? Why does the drug store ask you for key information that may strongly relate you as an individual with the meds you are taking? Can this data be used by your health insurance to try and charge you more next year? Or maybe, if you wish to prove that you have what it takes to get a loan, you have a good credit score and the income, but you don’t want to share your means of living? I suppose you pay your taxes and the government probably knows exactly what you do, so why expose yourself again to a private company that nobody knows where it stores your data or what it does with it?

Trust is really important in business, but we have no idea who is handling and manipulating our private information. That happens with, or even without our knowledge, as we have signed it away and agreed with it by accepting the terms of use of that service. They are often too long with difficult words, and not everyone has the base knowledge needed to read it. The concepts I’m going to show you are not new, but they are now very viable and becoming of great importance nowadays.

What is Zero Knowledge Proof

Also known as ZKP, this was first conceived in a paper in 1985 [1] and it is a cryptographic method to prove that Person A has the information required by Person B, but without revealing said information or who is Person A to Person B.There are plenty of algorithms that implement this concept of ZKP, each for a kind of proof. Between them, there are two huge groups: the non-interactive and the interactive ZKP.

  • Interactive ZKP requires both parties to interact. Person A needs to convince Person B that something is true or not true, but the resulting proof is intransferable. Only both parts of the interaction know that the test was valid or not. If someone else wants for you to prove it again, you should do all the interactions to prove that all over again.

  • Non-interactive ZKP provides a set of information (proof) that anyone can verify themselves. The proof is just there and it is probably encrypted or hidden somehow, everyone can check it without asking someone to act in any forms.

Interactive Zero Knowledge Proofs

Ludic examples

There are a few classic examples that give shape at a very high level of proving something without telling what the truth is.

  1. The two balls and the colorblind: Assume that you have a friend that is colorblind, thus he cannot tell if something is green or red. There are two balls, one red and one green. He claims that the balls have the same color, and you want to prove to him that they are not the same color, without telling him which ball is red or green. The test will go like this: The colorblind will grab each ball with each hand, show you in which hand they are, then he’ll put them behind him. Now he chooses either he shows them as he just did or either he switches them without you noticing it. After that, he shows you both balls and you tell him if he switched them or not. This way you kind of proved that you know when he changes the balls or not with a pretty wide rate of 50% of confidence. Of course that is far from enough, since you may be colorblind as well and just threw a lucky shot and guessed the switch act. So to reduce this we repeat this test as many times as we want or the counterpart becomes convinced, and the answer being either one of “switched” or “not switched” we divide by two the chance of cheating on the colorblind friend. If we run the test twice, we have ½ * ½ = ¼ or 25% of chance of being cheated. Run it 10 times and have a (½)^10 chance of being cheated. [1]

  2. The Ali Baba Cave: There is a cave with two corridors (A and B) that encounter at some point, forming a circle but you cannot walk across them entering A and exiting B, because they are separated by a locked door. There is one entrance and one exit for these two corridors. The door may be locked by some sort of magic, some combination of keys… doesn’t matter. The only thing that matters is that no one besides one special person can open that door and walk freely entering either corridor A or B, and exiting at the another. Even more, nobody can see him opening the door, at the risk of disclosing the magic words or the hidden bolt that locks the door. For you visiting the cave, to prove that the door-keeper can open the door, you’ll need some algorithm that allows him to do his work opening the door without you seeing his secrets. The test can be done by you not looking when the keeper enters the corridor he chooses, then you yell for him to exit in the corridor you wish: A or B. If he appears in the right corridor, he probably has the ability to open that door, but that could be just a lucky shot. You should tell him to exit a few more times just to make sure. [1]

Another couple of pretty cool examples, all involving cryptography, are the ones that allow you to provide a company with your personal information without actually giving it to them. Of course they need to allow you to do so, by integrating their systems and services to this privacy measure. For instance, ING, the dutch bank, proposed an algorithm that allows clients to prove that their salaries are within a range (Zero Knowledge Range Proof), or either they are included in some set of allowances (Zero Knowledge Set Proof) [2]. They even wrote some code in Go so we can see some examples [3].

Non-Interactive Zero Knowledge Proofs

One of the most straightforward ways of implementing non-interactive ZKP is making a representation of a set, to be checked later if it contains part of the required information. We’re going to take a quick look at a way of implementing a Merkle Tree and make it hide information.

What’s a Merkle Tree?

Merkle Trees are binary trees that digest information that sits on their leaves, hashing each pair until it outputs one single hash, called Root [4].


Gently borrowed from Wikipedia [4].

But if we want to have a single hash to represent all the data at the input, we could simply hash (L1, L2, L3, L4), right? Well, it depends. Many sources of file sharing provide you with the digest hash of the file or even the checksum, so you can prove that the file you downloaded for hours is really the file you wanted. Here we are presented with the first problem: we need to have the whole file before checking it. So, this is where the Merkle Trees shines. Assuming we want to download something that is divided into many parts, and perhaps each part is hosted by a different source, we can be extra-paranoid and check if each of the downloaded parts is really part of the whole file you’re downloading. And then maybe if each file hoster is part of a distributed network, why not kick him out (or at least ignore him) if he is providing bad blocks of data?

The trick here is that Merkle Trees allow us to check if something is part of a whole and makes checking them easier for the computer to process, since there is less data to hash. But for both cases, the checksum of the entire file and the Merkle Proof of the tree of blocks need to be provided by a trusted source, like the entity that originally made the data available. They’ll initially publish something like “we’re providing this download divided into 256 parts across a file sharing network, and the root hash is 0xb123abc…In case you need to check if your downloaded part is really part of the full download, send us the number of the file and we’ll send you the path that generates the root hash”. Of course this process will probably be managed by a program, you don’t need to calculate it by hand.

To verify if something is a part of the tree, we’ll need to get a few hashes from the trusted provider. The number of hashes needed is described as the log2(N), being N the number of original leaves.


Gently borrowed from Smart Contract Programmer, on Youtube [5]

In the above example, if we want to check if Block 3 or 4 (counting the first block as Block 1) are a part of the tree, and we are sent 3 hashes, we’ll estimate the original tree as having (2³) 8 blocks of information. If we already know that there were originally 8 blocks, we’ll expect to receive (log2(8)) 3 hashes. The fourth hash is the one we have and want to check if it is part of the tree.


Gently borrowed from Belavadi Prahalad on his Medium Article [6]

The above example has 16 blocks, so we’ll need (log2(16)) 4 hashes to get to the root.

Merkle Tree’s zero knowledge proof in Blockchain

There is an example of an application that’s to include the root of a Merkle Tree in a block of a blockchain to prove that a bunch of transactions (perhaps in a sidechain) were included. For example, if we want to put 1000 transactions we can either calculate the hash of the concatenated transactions to obtain a single root hash, or we calculate the root hash of these transactions in a Merkle Tree. The first approach, for us to verify that the transaction was included, will require us to know every single transaction used to generate the hash. The second approach can hide the transactions from people that are not supposed to know them, and obtain a Merkle Proof, saying if the transaction belongs to that tree or not.

One drawback is exemplified in an anecdote about three bakers that knew a cookie recipe. Each one of them are nodes of a network that can make those cookies and prove they still know the recipe to other members of the network. No one else knows the recipe, but everyone can send them cookies and they’ll tell if it was made by one of them. The problem happens when they age and die, burying the recipe with them, and all that remains is the knowledge of a secret recipe, and perhaps some cookies (the result of the secret), but no one else can tell when a new cookie appears if it was made by the original bakers. The secret itself is lost forever. [7]

That’s why it’s good to keep the source that generated the tree referenced somewhere, because the hash registered on blockchain is useless if no one knows what it stands for.

Conclusion

In this article, we went through the initial steps and concepts of applying a technique to hide information from people that do not need to know it. We had a general idea of how it is possible to give just enough information to someone, to prove if the claimer is who it claims to be. All that in high level examples, with a pinch of Merkle Tree building. In the next article, I’ll show some code implementation and practical examples.

REFs

[1] https://en.wikipedia.org/wiki/Zero-knowledge_proof

[2] https://www.ingwb.com/themes/distributed-ledger-technology-articles/ing-launches-major-addition-to-blockchain-technology

[3] https://github.com/ing-bank/zkrp

[4] https://en.wikipedia.org/wiki/Merkle_tree

[5] https://www.youtube.com/watch?v=n6nEPaE7KZ8&ab_channel=SmartContractProgrammer

[6] https://medium.com/crypto-0-nite/merkle-proofs-explained-6dd429623dc5

[7] https://www.youtube.com/watch?v=OcmvMs4AMbM&ab_channel=SimplyExplained

We want to work with you. Check out our "What We Do" section!