Cryptography in Rust for Hackers

A large part of the documentation on cryptography is written by academics for other academics to read.

Explain introductory number theory and abstract algebra, symmetric encryption (block and stream cipher), asymmetric encryption (RSA, Elliptic curve), hashes, signatures, key exchange, polynomial commitments, SNARKs and STARKs.

XChaCha20 and Poly1305, BLAKE2b, Poseidon, Argon2i, X25519, EdDSA (RFC 8032), Ed25519.

References

Books

Understanding Cryptography
Real world cryptography
Programming Bitcoin
https://toc.cryptobook.us/ by Dan Boneh
Could you give a couple examples of attacks that you thought was just theoretical but turned out to be very practical? Very curious about this topic
Applied Cryptography Book by Bruce Schneier
Pairings for Beginners, Craig Costello https://static1.squarespace.com/static/5fdbb09f31d71c1227082339/t/5ff394720493bd28278889c6/1609798774687/PairingsForBeginners.pdf
https://raw.githubusercontent.com/crypto101/crypto101.github.io/master/Crypto101.pdf
Cryptobook
Tom Stuart Understanding computation
Moonmath

Computational complexity theory

Cryptography and P vs. NP: A Basic Outline

Number Theory

https://crypto.stanford.edu/pbc/notes/numbertheory/
https://explained-from-first-principles.com/number-theory/#extended-euclidean-algorithm
https://youtube.com/playlist?list=PL8yHsr3EFj53L8sMbzIhhXSAOpuZ1Fov8

Algebra

https://youtube.com/playlist?list=PL8yHsr3EFj52XDLrmvrFDgwcf6XOm2TEE
https://youtube.com/playlist?list=PLL0ATV5XYF8AQZuEYPnVwpiFy0jEipqN-
https://youtube.com/playlist?list=PLL0ATV5XYF8CP3A00vb4qjTt6jWmXL2K_
Modern computer algebra https://www.cambridge.org/core/books/modern-computer-algebra/DB3563D4013401734851CF683D2F03F0#
https://xn--2-umb.com/22/ntt-argument/index.html
https://youtu.be/HpzVD1l3Olw
https://youtu.be/Buv4Y74_z7I
[Arithmetic_of_Elliptic_Curves] (https://link.springer.com/book/10.1007/978-0-387-09494-6)
[Algebraic_varieties] (https://www.youtube.com/playlist?list=PL8yHsr3EFj53j51FG6wCbQKjBgpjKa5PX)

SNARKs

A Cambrian Explosion of Crypto Proofs - Eli Ben-Sasson
ZK whiteboard sessions — introductory modules with Dan Boneh et al
Why and how zk-SNARK Works: a definitive explanation by Maksym Petkus
Zk-SNARKs: under the hood by Vitalik Buterin part 1, part 2, part 3
Zero Knowledge Canon, part 1 & 2
[PCP] (https://www.youtube.com/playlist?list=PLGkwtcB-DfpzST-medFVvrKhinZisfluC)
[Plonk] (https://m.youtube.com/watch?v=RUZcam_jrz0)
[arkworks_talk] (https://youtu.be/zgSF_dRe4UY)
[Decentralized_private_computation] (https://youtu.be/_oW29AOKWTs)
[Plonk2] (https://vitalik.ca/general/2019/09/22/plonk.html)
[KZG] (https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html)
[Performance] (https://a16zcrypto.com/measuring-snark-performance-frontends-backends-and-the-future/)
[Spartan] (https://github.com/microsoft/Spartan)
[Matter Labs Awesome ZKP] (https://github.com/matter-labs/awesome-zero-knowledge-proofs)
[zcash] (https://z.cash/technology/zksnarks/)
[Missing_explanation_snark] (https://www.cryptologie.net/article/508/the-missing-explanation-of-zk-snarks-part-2/)
Recursive Zero-Knowledge Proofs: A Comprehensive Primer
[HyperPlonk] (https://eprint.iacr.org/2022/1355.pdf)

PLONK

STARKs

STARK Brainfuck
[Bitcoin stark] (https://github.com/bitcoin-stark/khepri)
[Bitcoin stark 2] (https://github.com/lucidLuckylee/zerosync)
[arithmetization] (https://cronokirby.com/posts/2022/09/notes-on-stark-arithmetization/)
[TritonVM] (https://github.com/TritonVM/triton-vm)

Courses

Challenges

Cryptopals
[Matasano] https://www.youtube.com/watch?v=iZa_XKpj9X4
[Cryptohack] https://cryptohack.org/challenges/

Implementations

Monocypher
Disco-c
arkworks
[py_plonk] (https://github.com/ethereum/research/tree/master/py_plonk)
[curdle_proofs] (https://github.com/asn-d6/curdleproofs/tree/main/src)
[aes] (https://github.com/secworks/aes)
[chacha] (https://github.com/secworks/chacha/blob/master/src/rtl/chacha_core.v)

Examples of attacks

How 3 hours of inaction from Amazon cost cryptocurrency holders $235,000
RSA CTF Tool
NSA, NIST, and post-quantum cryptography
[RSA, LLL Attacks] https://github.com/mimoo/RSA-and-LLL-attacks
[side channel] (https://youtu.be/cX-o9rM2DgM)

Applications

Mental Poker in the Age of SNARKs - Part 1

Miscellaneous

https://soatok.blog/2020/06/10/how-to-learn-cryptography-as-a-programmer/
Elliptic curves https://curves.xargs.org/
Lattice based cryptography https://medium.com/cryptoblog/what-is-lattice-based-cryptography-why-should-you-care-dbf9957ab717
https://www.youtube.com/watch?v=bI7lmKCAmA0
https://betterprogramming.pub/understanding-zero-knowledge-proofs-through-the-source-code-of-tornado-cash-41d335c5475f
https://medium.com/@boneh/using-zk-proofs-to-fight-disinformation-17e7d57fe52f
https://vitalik.eth.limo/general/2022/08/04/zkevm.html
https://tradelayer.substack.com/p/trade-offs-in-zk-design-space
https://youtu.be/mH0oCDa74tE
https://btc.usespiral.com/
https://github.com/ethereum/py_ecc/blob/master/py_ecc/bls12_381/bls12_381_pairing.py
https://zkrepl.dev/
https://zeroknowledge.fm/246-2/
https://zcash.github.io/halo2/#minimum-supported-rust-version
https://crypto.stanford.edu/cs355/22sp/schedule/
https://docs.gnark.consensys.net/en/latest/Concepts/schemes_curves/
https://github.com/baro77/ZKbasicsCS/blob/main/ZKbasicsCheatsheet20220621.pdf
https://hackmd.io/@gnark/eccbench
https://eprint.iacr.org/2022/1223?utm_source=substack&utm_medium=email
https://vitalik.ca/general/2017/01/14/exploring_ecp.html
[MSM] (https://hackernoon.com/optimization-of-multi-scalar-multiplication-algorithm-sin7y-tech-review-21)
[MSM2] (https://youtu.be/Bl5mQA7UL2I)
https://click.mlsend.com/link/c/YT0yMDQzNzk0ODAzNDQyODUwMjExJmM9bTZ3MiZlPTAmYj0xMDE5MjEzNTc0JmQ9aDFuNGwzeQ==.pkm6QS5Aq15ZcI_AnhBJHzaE-A73i6nSdRexu2fIjhM
https://people.csail.mit.edu/silvio/Selected%20Scientific%20Papers/Proof%20Systems/Computationally_Sound_Proofs.pdf
https://eprint.iacr.org/2013/879.pdf
https://eprint.iacr.org/2022/1164
https://medium.com/@ingonyama/cloud-zk-a-toolkit-for-developing-zkp-acceleration-in-the-cloud-3d670c09c6ed
https://cronokirby.com/notes/2022/08/on-ram-in-structured-computation/
https://eprint.iacr.org/2012/071
https://eprint.iacr.org/2013/507
https://eprint.iacr.org/2013/879
http://www.scipr-lab.org/doc/TinyRAM-spec-0.991.pdf
https://blog.fluidity.money/the-hunting-of-the-zk-snark-homomorphic-hidings-aa6c7824597?gi=49de0fc52df
https://www.michaelstraka.com/posts/recursivesnarks/
https://www.iacr.org/archive/asiacrypt2010/6477178/6477178.pdf
https://decentralizedthoughts.github.io/2020-12-22-what-is-a-merkle-tree/
https://decentralizedthoughts.github.io/2020-08-28-what-is-a-cryptographic-hash-function/
https://m.youtube.com/watch?v=g_eY7JXOc8U
https://people.csail.mit.edu/silvio/Selected%20Scientific%20Papers/Pseudo%20Randomness/Verifiable_Random_Functions.pdf
https://crypto.stanford.edu/pbc/notes/crypto/
https://github.com/supranational/sppark
https://aimath.org/news/congruentnumbers/howtomultiply.html
http://numbers.computation.free.fr/Constants/Algorithms/fft.html
https://cr.yp.to/papers/pippenger-20020118-retypeset20220327.pdf
https://cp-algorithms.com/algebra/fft.html#two-stripes
https://youtu.be/IOiZatlZtGU
https://cs3110.github.io/textbook/chapters/adv/curry-howard.html
https://www.pédrot.fr/slides/inria-junior-02-15.pdf
https://vitalik.ca/general/2022/03/14/trustedsetup.html
https://github.com/Entropy1729/bft-consensus-poc/
https://blog.cryptographyengineering.com/2011/09/29/what-is-random-oracle-model-and-why-3/
https://blog.cryptographyengineering.com/2017/01/21/zero-knowledge-proofs-an-illustrated-primer-part-2/
https://blog.cryptographyengineering.com/2014/11/27/zero-knowledge-proofs-illustrated-primer/
https://blog.cryptographyengineering.com/
https://eprint.iacr.org/2006/372.pdf
https://github.com/DelphinusLab/zkWasm.git
https://twitter.com/SalomonCrypto/status/1581314867243327489?s=20&t=SL7BbhF99hyqKHo1HinAbQ
https://twitter.com/BTC_Archive/status/1580857619664670725
https://arxiv.org/pdf/2210.00264.pdf
https://www.cryptologie.net/article/507/the-missing-explanation-of-zk-snarks/
https://a16zcrypto.com/measuring-snark-performance-frontends-backends-and-the-future/
https://www.quantamagazine.org/mathematicians-discover-the-perfect-way-to-multiply-20190411/
https://github.com/LeastAuthority/moonmath-manual/raw/main/main-moonmath.pdf
https://xiaohuiliu.medium.com/how-plonk-works-part-2-1072dcd7634a
https://xiaohuiliu.medium.com/how-plonk-works-part-1-bc8050f4805e
https://delendum.xyz/2022/10/12/a-brief-taxonomy-of-circuit-compilation-strategies.html
https://eprint.iacr.org/2013/279.pdf
https://eprint.iacr.org/2021/322
https://arxiv.org/abs/2112.01472
https://blog.cryptographyengineering.com/2017/01/21/zero-knowledge-proofs-an-illustrated-primer-part-2/
https://twitter.com/delendumv/status/1581068663448555521
https://github.com/thor314/pazk
https://twitter.com/salomoncrypto/status/1580677281474699264
https://twitter.com/salomoncrypto/status/1581695845023350785
https://hackernoon.com/trade-it-like-it-is-hot-a-review-of-popular-zk-projects-and-the-zero-knowledge-proof-technology
http://www.zeroknowledgeblog.com/index.php/the-pinocchio-protocol
https://medium.com/aztec-protocol/aztecs-zk-zk-rollup-looking-behind-the-cryptocurtain-2b8af1fca619
https://github.com/worldcoin/awesome-zkml
https://twitter.com/salomoncrypto/status/1583705993300492288
https://xiaohuiliu.medium.com/zk-snarks-on-bitcoin-239d96d182bd
https://www.espressosys.com/blog/veri-zexe-decentralized-private-computation-with-universal-setup
https://eprint.iacr.org/2022/802
https://github.com/fluidex/awesome-plonkhttps://eprint.iacr.org/2020/499.pdf
https://www.iacr.org/cryptodb/data/paper.php?pubkey=30562
https://github.com/ZK-Garage/plonk/blob/master/plonk-book/src/chapter_2.md
https://github.com/arielgabizon/Lectures

Randomness

Pseudorandom number generators

Introduction

Applications

Information Entropy

Security requirements for CPRNG

Bad uses of PRNG

Foundations

Introduction

Math is at the core of cryptography. It gives us a framework to work with and prove that cryptographic primitives are safe or, at least, that some problems are really hard to solve. Public key cryptography is possible due to the fact that some calculations are fast to be done in one way, but it is difficult to do them the other way round and that has to do only with math. Even if a good understanding of math is important, some of the key concepts for cryptography are rather easy to grasp and related to elementary school math. Besides, you don't have to be really good at doing calculations, you just have to understand how to implement the different cryptographic primitives in a secure way. For example, the RSA cryptosystem relies on finding prime numbers and multiplicative inverses (in short, given $ a $ , find $ b $ such that $ a\times b=1 $ ) and efficiently computing powers of numbers. The method owes its strength to the difficulty of factoring a number into its prime factors. We will now explain in more detail each of these concepts (though it is likely that many of these ideas ring a bell) and, more importantly, how to implement them in practice. There are, of course, some underlying subtleties and we will need at some points to go down the rabbit hole, but this will be much easier if we know the reasons why we need it.

Natural Numbers

These are the numbers we use to count objects, such as $ 0,1,2,3,4,5 $ , etc. We represent them by $ \mathbb{N} $. In elementary school, we were taught how to add, subtract, multiply and divide two numbers from this set and the properties of these operations. We can think of any of them as functions $ f $ taking two inputs from the set, $ x $ and $ y $ , and returning some output. For example, the addition is simply $ f(x,y)=z=x+y $ . This way of thinking will become handy when we deal with more complex data types. Rust supports the natural numbers among its primitive data types, the unsigned integers u8, u16, u32, u64, u128 and usize.

Given any numbers $ a $ and $ b $ (with $ b \neq 0 $ ) we can write $ a $ in the following way $ a=q\times b+r $ , where $ q $ and $ r $ are two numbers (called quotient and remainder), where $ r < b $. This is what we did at school when learning division with remainder. The decomposition is unique and it is known as the division algorithm. It will give us the basis for modular arithmetic, which has many applications in cryptography. If $ b=5 $, the remainder of the division of any number can only take the values $ 0,1,2,3,4 $, which will help us separate the natural numbers into different "boxes", according to their remainder. A useful application is the case where $ b=2 $ , so the remainder is only $ 0 $ or $ 1 $. When the remainder is $ 0 $ the number is even, otherwise it is odd: we can check whether any large number is odd or even just by checking the last bit!

We say that $ a $ is divisible by $ b $ if the remainder of the division of $ a $ by $ b $ is $ 0 $, that is, $ a=q\times b $ (in other words, $ a $ is a multiple of $ b $). We will write this as $ b \mid a $; if $ b $ does not divide $ a $, we write $ b \not\mid a $. For example, $ 25 $ is divisible by $ 5 $ since $ 5\times 5=25 $, but not by $ 2 $, since $ 25=12\times 2+1 $ and $ r\neq 0 $.

Prime Numbers

An important subset of $ \mathbb{N} $ is that of prime numbers. We say that $ p>1 $ is a prime number if the only numbers that divide $ p $ are $ 1 $ and $ p $. For example, $ 2,3,5,7,11,13,17,19,23 $ are all prime numbers. On the other hand $ 4,6,8,10 $ have $ 2 $ as a divisor (besides $ 1 $ and themselves), so they are not prime (they are called composite numbers).

The fundamental theorem of arithmetic states that any number can be expressed in a unique way (except for the order) as the product of prime powers. Let's look at some examples: $ 5=5^1 $ $ 20=2^2\times 5 $ $ 60=2^2\times 3 \times 5 $ $ 35=5\times 7 $ $ 1000000=2^6\times 5^6 $

When we get this decomposition, we have found the prime factors of the number and we say it is factorized. For small numbers, factorization is quite straightforward (we can try different primes, divide and check remainders); for very large numbers, however, it can be really challenging, as there are no general efficient algorithms.

Euclid's lemma states that if a prime number $ p $ divides the product of two numbers $ a \times b $, then $ p \mid a $ or $ p \mid b $.

Greatest Common Divisor: The Euclidean algorithm

Another important concept is that of the greatest common divisor: given two numbers $ a $ and $ b $, we want to find the largest number $ c $ such that $ c\mid a $ and $ c\mid b $. We denote this by $ c=gcd(a,b) $ or simply $ c=(a,b) $. For example, $ 20=2^2\times 5 $ and $ 50=2\times 5^2 $. Both numbers are divisible by $ 1,2,5,10 $. $ 10 $ is the greatest number dividing both and so $ gcd(20,50)=10 $. Two numbers $ a,b $ are called relatively prime (or coprime) if $ gcd(a,b)=1 $ . If $ a $ and $ b $ are both prime (and different), $ 1 $ is the only common divisor. However, $ 8 $ and $ 9 $ are not prime themselves ($ 8=2^3 $ and $ 9=3^2 $), but their only common divisor is $ 1 $ and are coprime.

The greatest common divisor satisfies the following equation, for some $ x $ and $ y $: $ x\times a+y\times b=gcd(a,b) $ The greatest common divisor can be found very efficiently using the Euclidean algorithm. A simple rust implementation is given below:

pub fn gcd(a: u128,b: u128) -> u128 {

    let mut r0: u128=b;
    let mut r1: u128=a;

    if a>b {
        r0 = b;
        r1 = a;
    } 

    let mut r2: u128 = 1;

    while r2 >0 {
        r2=r1.rem_euclid(r0);
        r1=r0;
        r0=r2;
    }

    r1
}

We take two numbers $ a $ and $ b $ and we output their greatest common divisor. If $ a $ is smaller than $ b $ we initialize the dividend as $ b $ and the divisor as $ a $ (this makes us chop the larger number by the smaller one); otherwise we invert the selection. Next, we begin by reducing $ r_1 $ by $ r_0 $ and we change the roles (since $ r_2 $ is smaller then $ r_0 $ ). A numerical example helps illustrate the points:

Take a=12, b=8 (we can immediately see that the right answer is 4, but this helps us see how the algorithm finds it).
$ r_0=8 $, $ r_1=12 $, $ r_2=1 $ so we immediately enter the while loop.
$ r_2=4 $ since the remainder of $ 12 $ divided by $ 8 $ is 4.
$ r_1=8 $ and $ r_0=4 $.
Since $ r_2 $ is not zero, we keep it inside the loop.
$ r_2=0 $ (since $ 8 $ is divisible by $ 4 $), $ r_1=4 $ and $ r_0=0 $.
Now $ r_2=0 $ so we exit the loop and the function outputs $ gcd=4 $.

We can change the division at each step in the algorithm by doing a subtraction instead. This saves us from performing an expensive operation such as division; since in many steps of the algorithm the quotient of the division is $ 1 $, doing this change will increase performance. Of course, if the numbers are quite different, doing the division will be faster, at least for the first few steps.

Application: Breaking RSA

The RSA cryptosystem contains a public parameter for each user $ n $, which is the product of two large primes, $ p $ and $ q $. If we can factor $ n $, we can find the private key and decrypt messages or forge signatures. Suppose we have the public parameters $ n_k $ for many users. If two $ n_i $ share one prime factor (this is possible if the RSA algorithm does not sample primes at random), we can use the greatest common divisor algorithm (which is very efficient) to find it. We take one $ n_j $ and take the product of all other $ n_k $ (this is much faster than trying each $ n_i $ against $ n_j $): the gcd will find either output $ 1 $ or one of the prime factors $ p $. If it were the latter case, we can compute $ q $ just by dividing $ n_i $ by $ p $.

Chinese Remainder Theorem

Integers

One problem we have with the natural numbers is that we cannot subtract any two to yield a third one: for example, if we try $ 3-10 $ we get something that is not in the set. If we try doing that calculation using unsigned integers, Rust will output an error. The subtraction operation is not closed in the natural numbers. The set of integers, $ \mathbb{Z} $, is obtained by adding negative numbers: $ -1,-2,-3 $, etc. If we add a number and its negative, we get $ 0 $: $ 5+(-5)=0 $, $ 10+(-10)=0 $. We say that $ -a $ is the additive inverse (or opposite) of $ a $. We can view subtraction $ a-b $ as the addition of $ a $ and $ -b $, thanks to the incorporation of inverses. The addition operation has three basic properties:

We have an identity element, $ 0 $. For any $ a $ in $ \mathbb{Z} $, $ a+0=a $ .
If $ a $ is in $ \mathbb{Z} $, so is $ b $ such that $ a+b=0 $; we commonly refer to $ b $ as $ -a $.
For any $ a $, $ b $ in $ \mathbb{Z} $, then $ a+b $ is also in $ \mathbb{Z} $. This means that the operation is closed.

The set of integers $ \mathbb{Z} $ with the addition operation forms an important algebraic structure, which is called a group. Many different mathematical objects share the same structure, even if the operations and elements are totally different, and we have general results and tools to deal with them.

When we turn to division, we see that the operation is not closed. If we try doing $ 5/3 $, we do not get an integer. If we want the operation to be closed, we need to add need elements and arrive at the set of rational numbers, $ \mathbb{Q} $, which is the set of numbers that can be expressed as the ratio of two integers.

For cryptography, the integers (with the addition and multiplication of elementary school) have several drawbacks that limit their application. We can define both operations in a slightly different way, which will have interesting properties and provide us with operations that are easy to compute, but hard to reverse.

Modular Arithmetic

One inconvenient we face with computers is that the numbers we can work with are limited. Besides, in some cases, we are not interested in a number itself, but rather in its belonging to a certain class or group. For example, when we bet on a roulette, we can choose whether the result will be even or odd. If it is even, then $ r=2\times k $, for some $ k \in {0,1,2,3...18} $. If it is odd, then $ r=2\times k+1 $. We notice that if we want to check parity, we only need to look at the remainder, which can take two values in this case: $ 0 $ or $ 1 $. In fact, when we want to check whether a number is even in the computer, we look at the last bit and check whether it is zero or not. For the case of $ 2 $, we see that any number $ a $ satisfies either: $ a\equiv 0 \pmod{2} $ $ a\equiv 1 \pmod{2} $ We say that $ a $ is congruent to $ 0 $ (or $ 1 $) modulo $ 2 $. This way, we split all the numbers into two categories: even and odd. We can do the same for any number $ p>1 $, remembering that the remainder is $ 0 \leq r \leq p-1 $. This can also be seen as $ a\equiv r \pmod{p} $ as $ p\mid a-r $ or $ a=k\times p+r $. This notation was invented by Gauss and is really powerful to study a lot of complex problems. We can perform usual operations such as addition and multiplication, but we have to be careful of how things work, given that results will always have to be in the range $ 0 \leq r \leq p-1 $ (As a side note, we could choose a different range, such as $ {-2,-1,0,1,2,p-3} $, but it can be confusing and we should better stick to our first choice).

In the case of the sum, we can add them just as regular numbers and, if the result exceeds $ p $, take the remainder. For example, let's take $ p=7 $, so the elements we have are $ {0,1,2,3,4,5,6} $. First, we see that $ 0 $ is an element of the set and that adding it to any number does not change the result. If we add $ 2 $ and $ 3 $ the result is $ 5 $. If we add $ 5 $ and $ 4 $, we get $ 9 $, but $ 4+5=9\equiv 2 \pmod{7} $. $ 2 $ is just the remainder of the division of $ 9 $ by $ 7 $.

We see that the result stays in the original set. What happens when we add $ 4 $ and $ 3 $? $ 4+3=7\equiv 0 \pmod{7} $ We get $ 0 $! That is because $ 7 $ is divisible by itself and the remainder is $ 0 $. We see that $ 4 $ is the additive inverse of $ 3 $ under this arithmetic. Similarly, $ 1 $ and $ 6 $ are each other's inverse, as are $ 2 $ and $ 5 $. We can recognize that the set $ {0,1,2,3,4,5,6} $ with the sum done modulo $ 7 $ has the same properties as the ordinary addition in the integers. This means that the set with this addition modulo $ p $ forms a group. Subtraction can be easily defined as adding the inverse of the number or just performing ordinary subtraction and then taking the result modulo $ p $.

With multiplication we get something similar. For example, $ 4\times 5=20\equiv 6 \pmod{7} $. Taking the modulo operation ensures that we always stay inside the set. We also see that $ 1 $ works as the multiplicative identity since any number multiplied by $ 1 $ stays the same. Let's look at what happens with $ 6\times 6 $: $ 6\times 6=36\equiv 1 \pmod{7} $. We multiplied $ 6 $ by itself and got $ 1 $! Division $ a/b $ can be restated as $ a\times b^{-1} $, where $ b\times b^{-1}=1=b^{-1}\times b $. We see that $ 6 $ is its own multiplicative inverse with the multiplication modulo $ p $. We can also see that: $ 3\times 5=15\equiv 1 \pmod{7} $ $ 2\times 4=8\equiv 1 \pmod{7} $ So, $ 3=5^{-1} $ and $ 2=4^{-1} $! This can sound weird, but we have to remember that we are working with congruences. We can understand the precise meaning of this by rephrasing. Let's take the case of $ 6 $ and $ 6$ . There are two numbers $ a=q_1\times 7+6 $ and $ b=q_2\times 7+6 $ (because that is what the congruence means). Let's take the product $ a\times b $ : $ a\times b=(q_1\times 7+6)\times (q_2\times 7+6) $ Let's apply the distributive law: $ a\times b=q_1\times q_2 \times 7^2+6\times 7\times (q_1+q_2)+36 $ Let's split this further $ 36=1+35=1+7\times 5 $ and regroup, taking as a common factor $ 7 $ : $ a\times b=7\times (q_1\times q_2\times 7+6\times(q_1+q_2)+5)+1 $ The first term is divisible by $ 7 $, so it is congruent to $ 0 $. Or, if we subtract $ 1 $ to $ a\times b $, we see that it is divisible by $ 7 $ (since it is the product of $ 7 $ and an integer).

Defining operations

We need to implement first some of the arithmetic operations and define field elements. We will show how to do it in Rust.

use::std::ops::{Add, Sub, Mul, Div};
pub struct FieldPoint {
    num: u128,
    prime: u128,
}

The first line imports the standard library (in particular, the operations of addition, subtraction, multiplication, and division), which will allow us to override these operators with the expressions we need to use in modular arithmetic.

In the second line, we define a public structure named FieldPoint, which has two fields: num (a number in the range 0 to prime) and prime (this will give us the size and we will perform all operations modulo prime). For practical applications, we need to replace the unsigned integers u128 with appropriate variables that allow us to store large integers.

We can now instantiate some methods over FieldPoint, such as how to create one or how to multiply or divide field elements.

impl FieldPoint {
    pub fn new(num: u128, prime: u128) -> FieldPoint {
        if num > prime {
            panic!("Not a valid input for a field point, num should be nonnegative and less than prime, obtained {}", num);
        } else {
            FieldPoint {num:num, prime:prime}
        }
    }
}

Methods are defined following the keyword impl and the name of the struct. We have a constructor for the FieldPoint, which takes two unsigned u128 integers.

To define addition, we can implement the trait Add for FieldPoint in this way:

impl Add for FieldPoint {
    type Output = Self;
    fn add(self, other: Self) -> Self {
        if self.prime == other.prime {
            FieldPoint {num: (self.num + other.num).rem_euclid(self.prime), prime: self.prime}
        } else {
            panic!("Cannot add these field points, different prime values {},{}",self.prime,other.prime);
        }
    }
}

The addition is simply adding the num fields and if the result exceeds the modulus prime, we take the remainder of the Euclidean division between the sum and the modulus.

Multiplication works in a similar way:

impl Mul for FieldPoint {
    type Output = Self;
    fn mul(self, other: Self) -> Self {
        if self.prime == other.prime {
            FieldPoint {num: (self.num*other.num).rem_euclid(self.prime), prime: self.prime}
        } else {
            panic!("Cannot multiply these field points, different prime values, {},{}",self.prime,other.prime);
        }
    }
}

We need to define integer powers of FieldElement. We can do it in a rather efficient way by squaring and taking the remainder:

pub fn power(&self,index: u128) -> Self {
        if index == 0 {
            FieldPoint {num: 1u128, prime: self.prime}
        } else {
            let mut aux=index.rem_euclid(self.prime-1u128);
            let mut acc: u128 = 1;
            let mut base: u128 =self.num;
            while aux >0{
                if aux%2 == 0 {
                    base = (base*base).rem_euclid(self.prime);
                    aux=aux/2u128;
                } else {
                    acc = (acc*base).rem_euclid(self.prime);
                    aux=aux-1u128; 
                }
            }
            FieldPoint {num: acc, prime: self.prime}
        }

    }

The power function takes a FieldElement and index, a u128. If the index is $ 0 $, the result is trivial and we output a FieldElement with num equal to $ 1 $. In any other case, we first reduce index (if index exceeds prime, we can take the remainder of index by prime-1 -this works when the modulus is prime, since Euler's theorem (to be presented soon) says that $ a^{p-1}\equiv 1 \pmod{p} $-. A better version would reduce index by $ \phi(n) $) (the function $ \phi$ is known as Euler's totient function) and store it in aux. We also define a variable to calculate the result acc and base, where we will repeatedly square and take the remainder of the num.

We now focus on the squaring and the updating of the result:

while aux >0{
    if aux%2 == 0 {
        base = (base*base).rem_euclid(self.prime);
        aux=aux/2u128;
    } else {
        acc = (acc*base).rem_euclid(self.prime);
        aux=aux-1u128; 
    }
}

We will go decreasing the index stored in aux: if it is even (the first condition -this could be checked much faster, by inspecting the last bit of aux-), we divide aux by two and update base to the remainder of its square. If it is odd, then we proceed to update the result in acc and decrease aux by one (which means that in the next step it will be even).

To convince ourselves, let's take a short numerical example, while we follow the instructions. Let's take prime as 11, num as 4, and index as 39.

We set aux equal to the remainder of 39 and 10 (which is also $ \phi(11) $). We get aux=9.
Since $ 9>0 $, we go inside the while loop. $ 9 $ is odd, so acc=9 and aux=8.
aux is even, so base=4*4=16; we have to reduce the number by taking the remainder by $ 11 $, so base=5 and aux=4.
aux is even, so base=5*5=25 and we get base=3 and aux=2.
aux is once again even, base=9 and aux=1.
aux is odd, we get acc=9*4=36->3 and aux=0.
Since aux=0, we jump outside the while loop and the function returns the FieldPoint (num=3,prime=11).

Computing inverses: Exponentiation or extended Euclidean algorithm?

Fermat's little theorem states that if $ p $ is a prime number, then $ a^p\equiv a \pmod{p} $. We can also write this as $ a^{p-1}\equiv 1 \pmod{p} $ and see that $ a^{-1}=a^{p-2} $. This gives us a way to compute the multiplicative inverse of $ a $ by taking powers of it. It is clear that this will involve many multiplications and that performing modular division (or inversion) will be a costly operation, especially when the modulus is very large. As a matter of fact, we will see that using $ p-2 $ as exponent will overestimate the power we need to use for some numbers. One advantage of this method is that, for a given prime, we will perform exactly the same number of operations and it will be possible to implement it in constant time (which will make it resistant to timing attacks).

Inverses can be calculated alternatively with help from the extended Euclidean algorithm:

pub fn inversion(a:i128,b:i128) -> i128 {

    let mut t=0i128;
    let mut r=b;
    let mut t1=1i128;
    let mut r1=a;

    while r1 != 0i128 {
        let q=r.div_euclid(r1);
        (t,t1)=(t1,t-q*t1);
        (r,r1)=(r1,r-q*r1);
    }

    if r != 1i128 {
        return 0i128;
    }
    if t<0{
        t=t+b;
    }

    t
}

Let's see how it works for a simple case: $ a=3 $, $ b=5 $; the inverse of $ 3 $ (modulo 5) is $ 2 $. The algorithm begins:

$ t=0 $, $ t_1=1 $, $ r=5 $, $ r_1=3 $.
Since $ r_1=3 \neq 0 $ we loop: $ q=1 $, $ t=1 $, $ t_1=0-1\times 1=-1 $, $ r=3 $, $ r_1=2 $.
$ r_1 \neq 0 $, $ q=1 $, $ t=-1 $, $ t_1=1-1\times (-1)=2 $, $ r=2 $, $ r_1=1 $.
$ r_1 \neq 0 $, $ q=2 $, $ t=2 $, $ t_1=-1-2\times 2=-5 $, $ r=1 $ and $ r_1=0 $.
$ r_1 = 0 $, so the function outputs $ t=2 $, which is the correct answer.

In many cases, the Euclidean algorithm will find the inverse in a faster way; the drawback is that the amount of time needed to find the answer will depend on the number we are trying to invert.

We see that both multiplication and addition modulo $ p $ are closed operations in the set $ {0,1,2,...,p-1} $. We will soon see that this set with these operations forms a finite field.

Groups

We define a group as a (non-empty) set $ G $ together with a binary operation (that is, an operation that takes two input elements from the set $ G $) $ \times $ satisfying:

G1: If $ a $ and $ b $ are in the set, then $ a\times b=c $ is also in the set.
G2: There is an identity element, $ e $, such that $ e\times a=a\times e=a $.
G3: If $ a $ is in the set, there is some $ b $ in the set such that $ a\times b=e $. We say that $ b $ is the inverse of $ a $ and denote it $ b=a^{-1} $.
G4: For $ a,b,c $, $ a\times (b\times c)=(a\times b)\times c $.

The notation in groups is sometimes confusing and people can freely use additive (+) or multiplicative ($ \times $) notation, and call their identities either $ 0 $ or $ 1 $. This doesn't matter much, since the binary operation can be quite weird (such as "addition" on elliptic curves). If you can start by looking at things a little bit more abstractly, it will pay off very quickly.

$ \mathbb{Z}/n\mathbb{Z} $ as a group. Cyclic groups.

You will frequently see these sets are denoted as $ \mathbb{Z}/p\mathbb{Z} $. We have to be very careful if we want to work with $ n $ not prime in $ \mathbb{Z}/n\mathbb{Z} $ (in this case, it is not a finite field either). For example, let's try to solve this equation: $ (x+2)\times(x+1)\equiv 0 \pmod{12} $ We could use our knowledge of math and, when the product of two numbers is $ 0 $, at least one of them is $ 0 $ (spoiler's alert: this will go wrong):

$ (x+2)\equiv 0 \pmod{12} $. If $ x=10 $, then $ x+2=12\equiv 0 $, since it is divisible by 12.
$ (x+1)\equiv 0 \pmod{12} $. If $ x=11 $, then $ x+2=12\equiv 0 $, since it is divisible by 12. Let's pick now $ 2 $ and see what happens: $ (2+2)\times(2+1)=12\equiv 0 \pmod{12} $. So $ 2 $ is a solution to the equation, but $ 2+2\equiv 4\not\equiv 0 $ and $ 2+1\equiv 3\not\equiv 0 $. This happens because $ 12 $ is not a prime number.

As a matter of fact, given $ a $ and $ n $, we have that $ a $ has an inverse (modulo $ n $) if and only if $ gcd(a,n)=1 $, that is, $ a $ and $ n $ are coprime. In the previous example, $ 3 $ is not coprime to $ 12 $ (they have $ 3 $ as a common divisor).

If the set is not too large, we can find inverses just by trial and error. However, it would be nice to have some results that help us compute inverses and how to calculate (integer) powers of numbers.

Let's focus on a prime number $ p $ and take all the non-zero elements of the set, $ (\mathbb{Z}/p\mathbb{Z})^\star $. Let's fix $ p=7 $, so $ (\mathbb{Z}/p\mathbb{Z})^\star={1,2,3,4,5,6} $ and let's focus on multiplication over the set. We can define the power $ a^n=a\times a\times a\times ...\times a $. Obviously, $ 1 $ is not interesting, because $ 1^n=1 $, so let's take $ 5 $: $ 5^1\equiv 5 \pmod{7} $ $ 5^2\equiv 4 \pmod{7} $ $ 5^3\equiv 6 \pmod{7} $ $ 5^4\equiv 2 \pmod{7} $ $ 5^5\equiv 3 \pmod{7} $ $ 5^6\equiv 1 \pmod{7} $ $ 5^7\equiv 5 \pmod{7} $ $ 5^8\equiv 4 \pmod{7} $ $ 5^{13}\equiv 5 \pmod{7} $ We see that the powers of $ 5 $ span all the elements of the group. We also see that numbers repeat themselves at an interval of $ 6 $, that is $ 4=5^2=5^8=5^{14}... $. Let's look at $ 3 $: $ 3^1\equiv 3 \pmod{7} $ $ 3^2\equiv 2 \pmod{7} $ $ 3^3\equiv 6 \pmod{7} $ $ 3^4\equiv 4 \pmod{7} $ $ 3^5\equiv 5 \pmod{7} $ $ 3^6\equiv 1 \pmod{7} $ $ 3^7\equiv 3 \pmod{7} $ We got all the elements (albeit in a different order). Finally, let's look at $ 2 $: $ 2^1\equiv 2 \pmod{7} $ $ 2^2\equiv 4 \pmod{7} $ $ 2^3\equiv 1 \pmod{7} $ $ 2^4\equiv 2 \pmod{7} $ This time we didn't span all the elements of the group and we got to the same number after $ 3 $. We will show that these results are valid in general (provided we're working modulo a prime number).

First, we can prove that the set $ (\mathbb{Z}/p\mathbb{Z})^\star $ together with multiplication forms an abelian group (the product can never give 0 since all the numbers are not divisible by $ p $). Second, the group is finite, since the number of elements is finite (6 in our example); its order is $ 6 $. We also saw that by repeatedly multiplying $ 5 $ by itself (that is, taking powers of $ 5 $), we can generate all the elements of the group (note that everything repeats after $ 6 $, which is the order of the group). Since the group can be generated by one of its elements, it is a (finite) cyclic group.

For an element $ a $, the lowest positive integer $ n $ such that $ a^n\equiv 1 \pmod{p} $ is known as the order of $ a $. The elements of the group with their respective order in parentheses are: $ 1 (1) $, $ 2 (3) $, $ 3 (6) $, $ 4 (2) $, $ 5(6) $, $ 6(2) $. We can see that the orders of each element divide the order of the group, $ 6 $. We will present the following theorems, which show that this is not a coincidence.

Subgroups. Lagrange's theorem

We saw that the order of $ (\mathbb{Z}/7\mathbb{Z})^\star $ was 6 and that if we take any element $ a $, doing $ a^6\equiv 1 \pmod{7} $. However, for $ 2 $ we can do $ 2^3\equiv 1 \pmod{7} $. A subgroup $ H $ is a subset of $ G $, that is itself a group, that is, satisfies G1-G4. For example, if we consider the subset $ H={1} $, this is a subgroup of order $ 1 $. Why? Because $ 1\times 1=1 $, so the operation is closed and all other properties follow from the operations of the group $ G $. $ G $ is also a subgroup of itself. These two are called the trivial subgroups of $ G $ (which are not very interesting). The set $ {1,2,4} $ is a subgroup of $ (\mathbb{Z}/7\mathbb{Z})^\star $. To check this, we need to see that if an element is in the set, so is its inverse, the identity belongs to the set and the operation is closed. Let's check this:

$ 1 $ is in the set and $ 1 $ is its own inverse.
The operation is closed, because $ 2\times 2\equiv 4 \pmod{7} $, because $ 4\times 4=16\equiv 2 \pmod{7} $ and because $ 2\times 4=8\equiv 1 \pmod{7} $ (we don't need to check the products with $ 1 $ since that is obvious). We also checked the inverses, since $ 4=2^{-1} $.

The subset $ {1,2,4} $ forms a subgroup of order $ 3 $. Lagrange's theorem states that the order of a subgroup divides the order of the group. We have another subgroup $ {1,6} $, which is of order $ 2 $. These are non-trivial subgroups. If the order of a group is prime, then its only subgroups are the trivial subgroups (since $ p $ is prime, the subgroups can only be of order $ 1 $ and $ p $). A group whose only subgroups are the trivial ones is known as a simple group. For example, $ \mathbb{Z}/7\mathbb{Z} $ with addition is the group $ {0,1,2,3,4,5,6} $ of order $ 7 $. There are no subgroups other than the whole group and $ {0} $. Note that the order of each element (other than zero, which has order $ 1 $) is $ 7 $, since $ 7\times a=a+a+a+a+a+a+a $ is divisible by $ 7 $ and, therefore, congruent to $ 0 $ modulo $ 7 $. The fact that some groups can be broken down into smaller subgroups is of concern when working with elliptic curves: if the group is not of prime order, it can be broken down into smaller groups and an attacker may break the system by performing searches on these subgroups.

Rings

The set $ \mathbb{Z} $ together with addition and multiplication forms a ring. The polynomials with ordinary addition and multiplication also form a ring. $ n\times n $ matrices also form a ring under addition and multiplication. Formally, a ring is a set $ R $ with two operations $ + $ and $ \times $ such that:

R is an abelian group under $ + $ (that is, R fulfills all the conditions for a group G1 to G4, plus commutativity, G5).
There is a multiplicative identity $ e $, such that $ a\times e=e\times a=a $. Frequently, we use $ e=1 $.
Multiplication is associative.
We have the distributive property of multiplication concerning addition.

Fields

We know examples of fields from elementary math. The rational, real and complex numbers with the usual notions of sum and multiplication are examples of fields (these are not finite though).

A finite field is a set equipped with two operations, which we will call $ + $ and $ * $ . These operations need to have certain properties in order for this to be a field:

If $ a $ and $ b $ are in the set, then $ c=a+b $ and $ d=a*b $ should also be in the set. This is what is mathematically called a closed set under the operations $ + $, $ * $.
There is a zero element, $ 0 $, such that $ a+0=a $ for any $ a $ in the set. This element is called the additive identity.
There is an element, $ 1 $, such that $ 1*a=a $ for any $ a $ in the set. This element is the multiplicative identity.
If $ a $ is in the set, there is an element $ b $, such that $ a+b=0 $. We call this element the additive inverse and we usually write it as $ -a $.
If $ a $ is in the set, there is an element $ c $ such that $ a*c=1 $. This element is called the multiplicative inverse and we write is as $ a^{-1} $.

Fermat's little theorem

Suppose we have a prime number $ p $ and we are working over the finite field $ \mathbb{F}{p} $, with the operations done modulo $ p $. Given $ a \in \mathbb{F}{p} $, Fermat's little theorem states that \[ a^p\equiv a \pmod{p} \] In other words, p divides $ a^p-a $. We can write this alternatively as \[ a^{p-1}\equiv 1 \pmod{p} \] The theorem has many practical consequences. For example, if we want to calculate $ a^k $, we can reduce the calculation by taking the remainder of $ k $ by $ p-1 $. For example, $ p=11 $ and we want to calculate $ 5^{1578} $ modulo $ 11 $. We can quickly calculate $ p-1=11-1=10 $ and see that $ 1578\equiv 8 \pmod{10} $. The result is then $ 5^8\equiv 4 \pmod{11} $. Another consequence is that, if we want to compute the multiplicative inverse of $ a $, we can compute $ a^{p-2} $. We can check this easily, since \[ a \times a^{p-2}=a^{p-1}\equiv 1 \pmod{p} \]. Using $ p =11 $ and $ a=5 $, we can compute $ 5^9\equiv 9 \pmod{11} $ and verify $ 5\times 9=45\equiv 1 \pmod{11} $.

The theorem provides a bound for the order of any element in the multiplicative group $ (\mathbb{Z}/p\mathbb{Z})^\star $, that is $ {1,2,3,...,p-1}$ with multiplication.

The discrete logarithm problem

Given a group, we can apply the operation repeatedly on a point $ g $ to get to a point $ P $, that is $ g^k=g\times g\times g\times ... \times g=P $. For example, in $ (\mathbb{Z}/7\mathbb{Z})^\star $, $ 5 $ generates all the elements by successive multiplications with itself. We could then ask how many times $ x $ should we multiply $ 5 $ with itself to get to $ 3 $, that is, $ 5^x\equiv 3 \pmod{7} $. Since we know that the order of the group is $ 6 $, we should only concern ourselves with numbers $ 0-6 $. If we look above or try all combinations, $ 5^5\equiv 3 \pmod{7} $ so $ x=5 $. Similarly, if we look for $ y $ such that $ 5^y\equiv 4 \pmod{7} $, we get $ y=2 $. The problem of finding $ x $ so that $ g^k=P $ is known as the discrete logarithm problem (in number theory, $ x $ and $ y $ are known as indices). We quickly see that this logarithm works quite differently from the common logarithm on the real numbers (though the idea is the same, given $ y $, find $ x $ such that $ e^x=y $). There is no obvious pattern, it is not increasing and if we had to search over a large set, it could be really daunting. Many cryptographic systems rely on the hardness of this problem over a finite cyclic group.

Polynomials

A polynomial is an expression made of an indeterminate/variable (we typically call it $ x $) and coefficients (elements of a ring, such as $ \mathbb{Z} $,$ \mathbb{Q} $,$ \mathbb{R} $, etc) of the form: $ p(x)=a_nx^n+a_{n-1}x^{n-1}+...+a_2x^2+a_1x+a_0 $ The greatest power of $ x $ determines the degree of the polynomial.

We can define polynomial addition between $ p_1 $ and $ p_2 $ by summing the coefficients corresponding to the same power of $ x $: $ p_1(x)+p_2(x)=(a_n+b_n)x^n+(a_{n-1}+b_{n-1})x^{n-1}+...(a_1+b_1)x+(a_0+b_0) $ Polynomial addition has an identity element (the polynomial whose coefficients are all zero), has inverses (which is simply taking the inverses of the coefficients, since the coefficients belong to a ring) and the operation is closed.

We can also multiply polynomials, by applying distributive property. We can also define the division of two polynomials, though this operation is not closed.

We can see that the set of polynomials with coefficients in a ring $ \mathcal{R} $ with addition and multiplication forms a ring structure.

We can also think of polynomials as functions. If we specify the value of $ x $, we can evaluate the expression and obtain a result. We say that $ x_0 $ is a root of a polynomial if $ p(x_0)=0 $. The fundamental theorem

Polynomials will play an important role in the development of zk-SNARKs. Elliptic curves can be seen as the set of zeros of some polynomial in two variables.

Elliptic Curves

Elliptic curves are very useful objects because they allow us to obtain a group structure with interesting properties. Given a field $ \mathcal{F} $, an elliptic curve is the set of points $ (x,y) $ which satisfy the following equation: \[ y^2+a_1xy+a_3y=x^3+a_2x^2+a_4x+a_6 \] This is known as the general Weierstrass equation. In many cases, this can be written in the simpler form \[ y^2=x^3+ax+b \] which is the (Weierstrass) short-form. Depending on the choice of the parameters $ a $ and $ b $ and the field, the curve can have some desired properties or not. If $ 4a^3+27b^2 \neq 0 $, the curve is non-singular.

We can define an operation which allows us to sum elements belonging to the elliptic curve and obtain a group. This is done using a geometric construction, the chord-and-tangent rule. Given two points on the curve $ P_1=(x_1,y_1) $ and $ P_2=(x_2,y_2) $, we can draw a line connecting them. That line intersects the curve on a third point $ P_3=(x_3,y_3) $. We set the sum of $ P_1 $ and $ P_2 $ as $ (x_3,-y_3) $, that is, point $ P_3 $ flipped around the $ x $-axis. The formulae are: $ s=\frac{y_2-y_1}{x_2-x_1} $ $ x_3=s^2-x_1-x_2 $ $ y_3=s(x_1-x_3)-y_1 $

The addition of different points on an elliptic curve over R

We can easily see that we have a problem if we try to sum $ P_1=(x_1,y_1) $ and $ P_2=(x_1,-y_1) $. We need to add an additional point to the system, which we call the point at infinity $ \mathcal{O} $. This inclusion is necessary to be able to define the group structure and works as the identity element for the group operation.

Another problem appears when we want to sum $ P_1 $ and $ P_1 $ to get to $ P_3=2P_1 $. But, if we draw the tangent line to the curve on $ P_1 $, we see that it intersects the curve at another point. If we want to perform this operation, we need to find the slope of the tangent line and find the intersection: $ s=\frac{3x_1^2+a}{2y_1} $ $ x_3=s^2-2x_1 $ $ y_3=s(x_1-x_3)-y_1 $

It takes a little bit of work, but we can prove that the elliptic curve with this operation has the properties of a group. We will use finite fields to work with these curves and the groups that we will obtain are finite cyclic groups, that is, groups which can be generated by repeteadly using the operation on a generator, $ g $: $ {g,2g,3g,4g,5g,...} $.

If we plot the collection of points onto a graph, we see that the points are distributed in a rather "random" fashion. For example, $ 2g $ could be very far from $ 3g $ which in turn are very far from $ 4g $. If we wanted to know how many times $ k $ we have to add the generator to arrive at a certain point $ P $ (that is solving the equation $ kg=P $) we see that we don't have an easy strategy and we are forced to perform a brute search over all possible $ k $. This problem is known as the (elliptic curve) discrete logarithm (log for friends) problem (other friends prefer ECDLP).

On the other hand, if we know $ k $, we can compute in a very fast way $ P=kg $. This offers us a way to hide (in plain sight) things inside the group. Of course, if you could break the DLP, you could get $ k $, but it is rather infeasible. If we want to calculate $ 65536g $, we can do it by realizing that $ g+g=2g $, $ 2g+2g=4g $, $ 4g+4g=8g $...until $ 32768g+32768g=65535g $, so we narrowed the operations 65536 to 16. There are many useful algorithms that allow us to speed up the operations over elliptic curves, allowing us to avoid expensive calculations such as inversions, which appear when we want to calculate the slope.

Projective coordinates

We can save ourselves from costly inversions if we move from our nice 2 dimensional space to a 3 dimensional space. This was introduced by Moebius and helps us also to represent the point at infinity properly. We can map our points from our elliptic curve $ (x,y) $ to points in projective space $ (X,Y,Z) $ as $ (x,y) \rightarrow (X=x,Y=y,Z=1) $ and $ \mathcal{O} \rightarrow (0,1,0) $. We can go back using the transformation $ (X,Y,Z) \rightarrow (x=X/Z,y=Y/Z) $, except for the point at infinity, where it is ill-defined. We can visualize this process with the following picture, where we take three points from an elliptic curve and transform them to 3-d.

We can think of this as transforming our 2-d points to lines passing through the origin in 3-d space. For example, the point $ (x_1,y_1) $ in 2-d transforms to the line $ (\mu x_1,\mu y_1, \mu) $ with $ \mu $ an element in the field. Thus, two points $ P_1=(X_1,Y_1,Z_1) $ and $ P_2=(X_2,Y_2,Z_2) $ are the same in 2-d (more precisely, are congruent) if we can find $ \eta $ such that $ (\eta X_1,\eta Y_1,\eta Z_1)=(X_2,Y_2,Z_2) $. These lines do not contain the origin $ (0,0,0) $. It is usual to write points is projective space as $ (X:Y:Z) $, instead of $ (X,Y,Z) $. In our picture, the point A (yellow) gets mapped to the point D (red above it). All the points that lie on the same straight line passing through the origin and D (pink dashed) are considered equivalent to D. Similarly, point B (blue) is mapped to point F (light blue) and all the ponts over the light green dotted line (except the origin) are equivalent to F. When we add points in this space, the components $ (X,Y,Z) $ will change, but we can go back to the point belonging to the curve by just retracing our steps to $ Z=1 $ along the line that passes through the origin. Why go all this length? We will shortly see that we avoid inversions at each addition step and do just one at the time of finding the point in 2-d (for example, when we need to find $ r=x_1 $ in ECDSA). Of course, if we have to do $ P=2g $ we didn't gain anything, but if we have to perform $ P=kg $ with $ k $ in the order of 256 bits, we saved many costly inversions.

Making the substitutions into the elliptic curve equation \[ \frac{Y^2}{Z^2} = \frac{X^3}{Z^3} + a \frac{X}{Z} + b \] We can multiply by $ Z^3 $ and get the equation \[ ZY^2=X^3+aZ^2+bZ^3 \] If we want to sum $ P $ and $ Q $ to yield $ R=P+Q $ in projective space, we can use the formulae:

$ Z_R=Z_PZ_Q(X_PZ_Q-X_QZ_P)^3 $ $ X_R=(X_PZ_Q-X_QZ_P)(Z_QZ_P(Y_PZ_Q-Y_QZ_P)^2-(X_PZ_Q-X_QZ_P)^2(X_PZ_Q+X_QZP)) $ $ Y_R=Z_PZ_Q(X_QY_P-X_PY_Q)(X_PZ_Q-X_QZ_P)^2-(Y_PZ_Q-Y_QZ_P)A $ $ A=Z_PZ_Q(Y_PZ_Q-Y_QZ_P)^2-(X_PZ_Q+X_QZ_P)(X_PZ_Q-X_QZ_P)^2 $.

This looks more complicated and difficult than the simple formulae for 2 dimensional (2-d) space. However, we do not have to calculate any inverses! To get the sum, we have to perform 12 multiplications and 2 squarings. In 2-d, we have 2 multiplications, one squaring and one inversion. Inversions can be 20 times or more expensive than multiplications, so we've saved at least 10 multiplications (some authors say inversions are about 80 times more expensive than multiplications).

Some curves can go even faster. If $ x^3+ax+b $ has a solution in $ \mathcal{F}_p $, we can work with an equivalent Jacobi quartic $ v^2=a^\prime u^4+du^2+1 $, where $ a^\prime $ and $ d $ depend on the root. We can transform the curve $ (u,v) $ to 3-d space $ (U,V,W) $ using $ u=U/W $ and $ v=V/W^2 $ and get the equation

\[ V^2=a^\prime U^4+dU^2W^2+W^4 \]

If we want to sum $ P_3=P_1+P_2 $, in these coordinates we have:

$ U_3=U_1W_1V_2+U_2W_2V_1 $ $ V_3=((W_1W_2)^2+a^\prime (U_1U_2)^2)(V_1V_2+dU_1U_2W_1W_2)+2a^\prime U_1U_2W_1W_2(U_1^2W_2^2+U_2^2W_1^2) $ $ W_3=(W_1W_2)^2-a^\prime (U_1U_2)^2 $

These allow us to further reduce the costs for adding to 6 multiplications and 4 squarings. Other models with fast implementations are Edwards curves and Montgomery curves, which have some of the fastest implementations.

Montgomery curves satisfy the following equation \[ By^2=x^3+Ax^2+x \] where $ B(A^2-4)\neq 0 $. This expression can be cast in the Weierstrass form by making some transformation. If we take $ (x,y) $ and map it to $ (x^\prime,y^\prime) $ given by $ (x,y)\rightarrow(x/B+A/3B,y/B) $, we get \[ y^2=x^3+\left(\frac{3-A^2}{3B^2}\right)x+\frac{2A^3-9A}{27B^3} \] Transforming a Weierstrass curve into a Montgomery curve is not always possible, though. The order of the group must be divisible by $ 4 $ and $ x^3+ax+b=0 $ must have a solution.

Montgomery curves can also be related to twisted Edwards curves, which obey the following equation \[ ax^2+y^2=1+dx^2y^2 \] The parameters are related via $ A=2(a+d)/(a-d) $ and $ B=4/(a-d) $. We say these two curves are birrationally equivalent. For example, the well-known Edwards curve 25519, with $ p=2^{255}-19 $ is (birrationally) equivalent to the Montgomery curve $ t^2=u^3+486662u^2+u $. The mappings are $ (x,y)=(\sqrt{-486664}u/t,(u-1)/(u+1)) $ $ (u,t)=((1+y)/(1-y),\sqrt{-486664}(1+y)/(x(1-y))) $

Montgomery curves have some interesting properties that lend themselves to constant time implementation. We can work in projective coordinates just using the $ x $ component, with the transformation $ x=X/Z $. Doubling a point takes the simple form: $ 4R=(X_1+Z_1)^2-(X_1-Z_1)^2 $ $ X_2=(X_1+Z_1)^2(X_1-Z_1)^2 $ $ Z_2=R((X_1-Z_1)^2+((A+2)/4)R) $

Twisted Edwards curves have there advantages, too. The expressions for point addition and doubling are the same. Given $ P_1=(x_1,y_1) $, $ P_2=(x_2,y_2) $ we get $ x_3=\frac{x_1y_2+x_2y_1}{1+dx_1x_2y_1y_2} $ $ y_3=\frac{y_1y_2-ax_1x_2}{1-dx_1x_2y_1y_2} $ If we let $ x_1=x_2 $ and $ y_1=y_2 $ we get the expressions for point doubling. There are several alternatives to speeding up the calculations, such as projective, inverted or extended coordinates.

There are some other tricks to add and multiply points over elliptic curves, such as the technique by Gallant, Lambert and Vanstone (GLV) and generalized by Galbraith, Lin and Scott (GLS).

Symmetric encryption

Encryption has been the main application of cryptography for a very long time. Its goal is to transform map a message into another one and send it through an insecure channel, such that only the intended parties (who know all the elements necessary to reverse the transformation) can read it while looking like absolute nonsense to everybody else. For example, suppose that you are a general during a war and you need to communicate the battle plan to your reinforcement battalions (which are still far from you) and launch a surprise attack at the precise moment. If you sent some messenger with a letter containing the plans written in plain form, then anyone getting the letter would know your plans and act in consequence; what's more, the messenger could betray you and exchange that information with your enemies and thwart your masterminded tactic.

Encryption uses an algorithm called cipher and some key to change the message into a random-looking text. More precisely, it takes plaintext and, through some mathematical computations, outputs a ciphertext. The cyphertext can only be decrypted if the key is known. In modern encryption, only the key is secret; the details of the encryption algorithm are publicly known. This is in agreement with Kerkhoff's principle, which states that, in a cryptographic system, only the key should be secret. In older times, people tried to hide the message by using unknown algorithms or strategies, hoping that the enemy would not be able to figure out the secret; this is known as security through obscurity. Needless to point out, this strategy has failed numerous times with catastrophic consequences.

Symmetric encryption is widely used today and there are efficient algorithms, some of them even implemented on hardware. Examples of symmetric encryption algorithms are AES (Advanced Encryption Standard), 3DES, ChaCha, Salsa, Twofish, Blowfish, and Serpent, to name some of them. In this type of encryption, the same key is used to encrypt and decrypt messages (therefore, if someone can send encrypted messages, then he can decrypt too). We will see in a later chapter that there is asymmetric encryption (or public key cryptography), where we have two different keys: a public key (used to encrypt messages) and a private key (used to decrypt).

Once we have the key, we can send secure messages between the parties and it is unlikely that they will be decrypted, thanks to the math and heuristics behind it and the appropriate security levels. However, we find ourselves with the problem of how to agree on the key between the involved parties: if we tried sending it in plaintext over an insecure channel, it could be compromised and the symmetric encryption would be pointless since adversaries could have obtained it. We will focus in a later chapter on how to perform key exchanges.

There are two main ciphers types for symmetric encryption: block and stream ciphers. We will analyze their characteristics in the next sections.

Formalization

We have two parties wanting to communicate securely, which we will call Alice and Bob (for A and B, respectively). Alice wants to send Bob a plaintext, $ P $ in such a way that only Bob can read it and learn its contents. They have previously agreed on a common secret key, $ k $, and they will be using some algorithm, such as AES. The encryption algorithm is some function, $E$, taking the plaintext and the key and outputting the ciphertext $ C $: \[ E(P,k)=C \] The decryption algorithm, $ D $, on the other hand, takes the ciphertext and the key and returns the plaintext \[ D(C,k)=P \]

There are a couple of things we would like from our encryption algorithm and the output ciphertext. First, the ciphertext should appear as random text, with no clear patterns. We would also like that, if we change even a single bit from the message, the resulting ciphertext is completely different from the original one: this is known as the avalanche effect.

These are related to two properties that a secure cipher should have: confusion and diffusion. Confusion serves to hide the relationship between the key and the ciphertext. Diffusion is related to the fact that the value in the ciphertext of one bit depends on others; equivalently, if we changed one bit from the plaintext, we could expect that many bits would also change their values, which is related to the avalanche effect.

To be secure, a cipher's permutation should satisfy the following three conditions:

The permutation should be determined by the key.
Different keys should give rise to different permutations.
The permutations should look random.

The first condition guarantees that we need the key to be able to decrypt. If the permutations are not given by the key, then it plays no role whatsoever in the process and things could be decrypted without it. The second one means that there are not two keys yielding the same permutation. If it were so, then the messages encrypted with one key could be decrypted with another, and that would make it easier to break the cryptosystem. The third one is simply that we should not be able to learn anything about the plaintext from the ciphertext (an example where this fails is on certain bitmaps with ECB mode encryption).

Information vs Computational Security

One important key is related to the security proofs of our cryptographic schemes. In some cases, one can prove that certain schemes are mathematically secure, even if the attacker has unbounded computational power. This is called an information-theoretic secure scheme. However, to be able to build practical cryptographical schemes, we need to introduce some assumptions. Modern cryptographic algorithms can be proven computationally secure, where the adversary has bounded computing power and will be able to break the system only after spending a very large time and/or resources, even with the fastest and most powerful devices available nowadays.

Instead of perfect security, computational security relies on the following:

Security is preserver only against efficient adversaries.
Adversaries can succeed, but only with a very small probability.

If we have sufficiently good bounds for computational power and the probability of success is small enough, we can consider our schemes secure for practical purposes.

There are two common approaches to analyzing the security of our cryptographic protocols:

Concrete.
Asymptotic.

In the concrete case, we bound the probability of success, $ \epsilon $, after the attacker has spent time $t $. We say that the scheme is $ (t,\epsilon) $-secure if an adversary spending time $t $ has a probability of success of at most $ \epsilon $.

The asymptotic approach is related to complexity theory and views the running time of the attacker and his success probability as functions of a security parameter, $ \lambda $ (for example, the size of the secret key). It only guarantees security provided $ \lambda $ is sufficiently large.

We say an algorithm is efficient if their running time is polynomial in $ \lambda $, that is $ c_1 \lambda^{c_2} $ for some numbers $ c_1 $ and $ c_2$. We can also write this in big O notation, $ \lambda^{c_2}$.

As for the probability of success, we consider them to be small if it is smaller than any inverse polynomial in $ \lambda $. More precisely, for every constant $ c $, the attacker's success probability is smaller than the inverse polynomial in $ \lambda $, $ \lambda^{-c}$. A function growing slower than any inverse polynomial is called negligible.

A scheme is called secure if every probabilistic, polynomial time attacker succeeds in breaking it with only negligible probability.

Bit Operations

One operation that will be frequently used in cryptography is the exclusive OR operator (XOR). It is a binary operation, taking two bits and outputting another; we will represent the operation with the $ \oplus $ symbol. Its truth table is: $ 0\oplus 0=0$ $ 0\oplus 1=1$ $ 1\oplus 0=1$ $ 1\oplus 1=0$

We can also view the XOR operation as addition modulo $ 2 $: $ 0+0\equiv 0 \pmod{2}$ $ 1+0\equiv 1 \pmod{2}$ $ 1+1\equiv 0 \pmod{2}$ This is expected: the addition of two odd or two even numbers is always even, whereas the addition of one odd and one even number is always odd.

Why is this operation useful? Suppose that we want to encrypt a message, given as a sequence of bits. One way to encrypt it is to generate a sequence of (pseudo) random bits and XOR each bit to get the ciphertext. An attacker can try to decipher the text, but he finds the following problem:

If he sees $ 0 $ in the ciphertext, it could be because the plaintext had $ 1 $ and the random bit was also $ 1 $, or both were zero. So, he has a $ 50 % $ chance of guessing correctly!
If he sees $ 1 $ in the ciphertext, either the plaintext is $1 $ and the random bit is $ 0 $ or the other way round. Again, he has a $ 50 % $ chance of guessing correctly.

If the message is composed of several bytes (for example, 16 bytes - 128 bits), the probability of guessing the correct message is $ 3\times 10^{-39} $!

We see that the XOR operation is hard to reverse unless we know one of the original inputs. In that case, if $ c=m\oplus r$, then \[ m=c\oplus r\]

Stream and block ciphers

A block cipher takes a message of fixed length (128 bits, for example) and encrypts it, by performing some random permutation of its elements. Two values characterize the block cipher: the block size (for example 16 bytes -128 bits-) and the key size. Both determine the level of security of the cipher. This kind of cipher does not operate with individual bits, but with blocks of fixed size.

Block sizes should neither very large nor very small. In the first case, it can impact the cost and performance of the encryption, since the memory footprint and ciphertext length will be important. However, if the block size is small, it is susceptible to a codebook attack.

In practice, a block cipher is the repetitive application of permutation and substitution steps; these take place in rounds. The main building blocks are:

Substitution boxes (S-boxes).
Mixing permutations.
Key schedule.

If we call $f_k $ the function corresponding to round $ k $, the ciphertext is \[ C= f_n(f_{n-1}(...f_2(f_1(P))))\]

The functions for each round have the same operations but are parametrized by a different key (which leads to different substitutions and permutations). We should not use the same key for all steps, otherwise, our cryptosystem can be vulnerable to slide attacks.

Decryption is the successive application of the inverse functions $ g_k=f_k^{-1}$, \[ P=g_1(g_2(...g_{n-1}(g_n(C))))\]

Stream ciphers work in a very different way; instead of combining blocks of text and the key, they deterministically generate a sequence of "random" bits (called the keystream) from the key and perform XOR operations with the text.

The keystream, $ KS $, is derived from the secret key $ k $ and a public nonce $ \mathrm{nonce} $. If we have our message, $ \mathrm{m} $ to encrypt we perform $ C=KS \oplus \mathrm{m} $. To decrypt, we simply XOR again, $ \mathrm{m}=KS\oplus C$. We can easily see that the encrypt and decrypt operations are essentially the same; we only need the keystream to be able to do it. It is very important that $ \mathrm{nonce} $, which need not be secret, is never reused. To see why, suppose we have two messages $ \mathrm{m}_1 $ and $ \mathrm{m}_2$, and their corresponding ciphertexts, which have been encrypted using the same key $ k $ and $ \mathrm{nonce} $. We can recover the message $ \mathrm{m}_1 $ using the following operation: \[ \mathrm{m}_1=C_2\oplus C_1 \oplus \mathrm{m}_2 \]

This was an implementation error that Microsoft Excel and Word had: they reused the same nonce, which meant that decryption could be done if two versions of the same file were available.

AES

The Advanced Encryption Standard (AES) was the result of an open competition, organized by NIST in 1997 and which lasted for three years. The proposal by Rijmen and Daemen was nominated as the winner and was standardized in 2001 by NIST.

AES offers 3 levels of security: AES-128, AES-192, and AES-256, with key sizes of 16, 24, and 32 bytes, respectively. As the key's size increases, so does security. However, for most applications, AES-128 provides sufficient security levels (the best-known attacks against AES are only slightly better than brute-force attacks, which would require $ 2^{128} $ operations).

AES is a block cipher: it takes a 16-byte block (128 bits) and the variable length key and outputs a 16-byte ciphertext. If the text has less than 16 bytes, then it is conveniently padded. After performing decryption, it should be possible to eliminate the padding to recover the message; therefore, random padding cannot be used, because we could not distinguish the original message from the random bits.

Remember that block ciphers are permutations: they map all the possible plaintexts into all possible ciphertexts.

The cipher sees the plaintext as a $ 4\times 4 $ matrix of bytes. AES has a round function, which is applied several times to the plaintext, scrambling and mixing everything well, until we obtain the ciphertext. Each round uses a different key (which is generated in a deterministic way from the secret key), making the smallest changes in the bits of the secret key result in entirely different encryption. The steps in each round function (except in the last one) are:

SubBytes
ShiftRows
MixColumns
AddRoundKey

The first three are easily reversible, but the last one is not: it performs an XOR operation between the text and the round key. However, all the steps are necessary to achieve the desired security levels.

AES uses 10 rounds to perform encryption. All steps contain the four operations, except for the first one (where only the round key is added) and the 10th (where MixColumns is omitted).

SubBytes (also called substitution boxes) provide the substitution step and is a nonlinear function. Given that we encrypt blocks of 16 bytes, the substitution can be done with the aid of lookup tables.

In ShiftRows and MixColumns, the bytes of the columns/rows are moved.

To generate the keys for each round, the key schedule function is called: all the keys are derived from the secret key, using the substitution boxes and XOR operations. One drawback of this key scheduling is that, if an attacker learns one of the keys, he can reverse the algorithm and learn all other keys, including the secret key.

Why do we need all these operations to have a secure cipher?

The MixColumns and ShiftRows guarantee that all the elements are "well mixed". If one of them is missing, then we could break the cipher into smaller blocks and perform a codebook search over $ 2^{32} $ possibilities, which is far better than $ 2^{128} $.
SubBytes gives the nonlinear part to the cipher. Without it, all the operations are linear and easier to reverse.
AddRoundKey makes the ciphertext depend on the key. If we skipped this step, then we don't need any key to decipher.
The key schedule prevents us from reusing the same key all the time, making the cipher vulnerable to slide attacks.

When we want to encrypt a large message, we could divide it into blocks of 16 bytes and pad the last one, if necessary. This very simple approach is known as the electronic codebook mode (ECB) and should not be used. As encryption is deterministic, every time we encrypt a given plaintext, we will get the same ciphertext. This is problematic when we have, for example, an image with repetitive patterns or large areas of one color, since the ciphertext will exhibit those patterns too. To avoid this problem, there are several modes that we can use:

Cipher block chaining (CBC)
Propagating cipher block chaining (PCBC)
Cipher Feedback (CFB)
Output feedback (OFB)
Counter (CTR)

For example, in the CBC mode:

Initialize a 16-byte random vector (IV),
Perform $ \tilde{B}_1=IV \oplus B_1 $, where $ B_1 $ is the first block and set $ k=1 $.
Use AES to encrypt $ E_1= \tilde{B}_1 $.
Perform $ \tilde{B}{k+1}=E_k \oplus B{k+1} $
Use AES to encrypt $ E_{k+1}= \tilde{B}_{k+1} $ and do $ k=k+1 $
If $ k \neq k_{max} $, go to step 4. Otherwise, it is the end of the encryption.

The IV guarantees that, even if the same plaintext is encrypted, the resulting ciphertext will be different.

Another problem that we face is that, even though the message has been encrypted, we have no way of knowing that it has been modified by an attacker. To prevent modification of the ciphertext, we can add message authentication codes (MAC), which we will cover in another chapter.

ChaCha20

ChaCha20 is a modification of the Salsa20 cipher, invented by Daniel J. Bernstein in 2005. Its working principle is the same as all stream ciphers: it generates a keystream from the secret key and encrypts by performing an XOR operation between the plaintext and the keystream.

ChaCha20 generates the keystream by repeatedly calling a block function that outputs 64 bytes of keystream. It takes as input:

256-bit key.
96-bit nonce.
32-bit counter.

Every time the function outputs 64 bytes of keystream, the counter is increased by one and the process continues, until the keystream is larger than the plaintext; then it is truncated to the length of the plaintext and the XOR operation is performed. The maximum size we can encrypt is given by the maximum value of the counter, $ 2^{32} $, and the output of each round, 64 bytes, yielding a maximum of $ 2^{32}\times 64=256 $ GB.

The core operation is the Quarter Round. It takes 4 32-bit unsigned integers, denoted $ a,b,c $ and $d $ and performs the following operations: $ a=a+b;\space d=d\oplus a;\space d<<<16$ $ c=c+d;\space b=b\oplus c;\space b<<<12$ $ a=a+b;\space d=d\oplus a;\space d<<<8$ $ c=c+b;\space b=b\oplus c;\space b<<<7$ where $ <<<n $ denotes an $ n $-bit rotation towards the left.

The ChaCha state is composed of 16 32-bit words: the first four are constants; the next 8 correspond to the key, followed by the counter and the nonce.

Asymmetric encryption

Introduction

Modern cryptography relies on problems that are hard to solve computationally, even if they can be stated in a very simple way. These are problems for which efficient algorithms are not yet known or the best algorithms would require enormous resources. For example, we learned at school that the number $ 120 $ could be written as $ 2^3\times 3 \times 5 $, which is its decomposition in prime factors. We can find the factors in small numbers by inspection or by trying all the different numbers less than the square root of the number and exhausting all possibilities. What happens, though, when we get a number with a thousand decimal digits? Even if we understand the problem, there may not be an efficient way to compute the factorization, especially if the number is made of two large prime factors.

Asymmetric encryption (or public key cryptography) was enabled by the existence of computationally hard problems. The RSA cryptosystem relies on the difficulty to factorize very large integers, whereas elliptic curve cryptography depends on the hardness of solving the discrete log problem.

First, we will need to explain the underlying math to better understand how the RSA cryptosystem works.

RSA cryptosystem

Math fundamentals

Key generation

Elliptic curve cryptography

ElGamal encryption?

Attacks

Primality Testing

We can test primality using the Miller-Rabin test. Given an odd number $ n $, we can write it as $ n=2^r\times d +1 $, for some $ r> 0 $ and $ d $ an odd number. If $ d $ is prime, then: $ a^d \equiv 1 \pmod{n} $ $ a^{2^r \times d}\equiv -1 \pmod{n} $ If $ n $ is prime, then it satisfies Fermat's little theorem and the only square roots of $ 1 $ are $ -1 $ and $ 1 $. If any of these conditions is not fulfilled, $ n $ is not prime (if it passes, it could be composite, depending on the choice of $ a $, known as the witness). Checking several $ a $ allows us to make sure that the test passed for a composite number. The decomposition is easy to do:

pub fn decompose(n: u128) -> (u128,u128) {
        let mut d: u128=n-1;
        let mut r: u128=0;
        while d%2 == 0 {
            d /= 2;
            r += 1;
        }
        (d,r)
    }

Since $ n-1 $ is even, we can take factors of $ 2 $ repeatedly, until $ d $ is no longer divisible by $ 2 $. The condition can be checked faster by looking at the last bit of $ n-1 $.

The core of the Miller-Rabin test is (it yields true if it is probably prime):

fn miller_rabin(a: u128, n: u128, d: u128, r: u128) -> bool {
        let n_minus_one: u128 = n - 1u128;
        let field=FieldPoint::new(a,n);
        let mut x = field.power(d);
        let mut count: u128 =1;
        if x.num == 1 || x.num == n_minus_one {
            return true;
        }
        while count < r {
            x = x.power(2u128);
            if x.num == n_minus_one {
                return true;
            }
            count += 1u128;
        }
        false
    }

If you have a composite number and try several witnesses, it is very likely it will fail at least one (and stop at the first one) and so we can discard the number.

RSA

Fast Multiplication Algorithms

Karatsuba's multiplication

We all learned at elementary school how to multiply two numbers: we write one below the other and proceed to multiply each of the numbers above by each digit of the number below and then we add all the numbers:

          1234 
×          152
———————————————
      2468 ( =  1234 ×     2)
     6170  ( =  1234 ×    50)
    1234   ( =  1234 ×   100)
———————————————
    187568 ( = 187568)

This algorithm has $ \mathcal{O}(n^2) $. In 1960, Kolmogorov speculated that this represented the asymptotic bound for multiplication (that is, multiplication of two numbers could not take less than $ \mathcal{O}(n^2) $ operations). He gave a lecture on the topic and one of the students, Karatsuba, then 23 years old, came up with a solution that runs with $ \mathcal{O}(n^{\log_2(3)}) $, thus disproving Kolmogorov's conjecture. The basic idea of Karatsuba's algorithm is the following: say we want to multiply $ x $ and $ y $; we can break them into smaller numbers: $ x=x_1\times 10^m +x_0 $ $ y=y_1\times 10^m +y_0 $ where both $ x_0 $ and $ y_0 $ are numbers less than $ 10^m $. The product $ x\times y $ is simply: $ x\times y=x_1\times y_1\times 10^{2m}+(x_1\times y_0+y_1\times x_0)\times 10^m+x_0y_0 $ Karatsuba found that $ x_1y_0+y_1x_0 $ can be calculated efficiently at the expense of some additions: $ x_1\times y_0+y_1\times x_0=(x_1+x_0)\times (y_1+y_0)-x_1\times y_1-x_0\times y_0 $. Even if there are some extra calculations, these operate over smaller numbers, resulting in an overall smaller cost for large numbers.

Toom-Cook algorithm

The divide and conquer strategy can be taken further, leading to a reduction in the complexity of the multiplication algorithm. Toom and Cook developed several methods (known as Toom-X, X being a number), which consist of the following stages:

Splitting
Evaluation
Pointwise multiplication
Interpolation
Recomposition

Several variants of the algorithms are implemented in GNU Multiple Precision Arithmetic Library. Toom-2 is the same as Karatsuba's algorithm. Toom-X begins by splitting the numbers $ x $ and $ y $ in X parts of equal length(1) and these are treated as the coefficients of some polynomial (we focus on Toom-3, but you can see more details here)(2): $ x(t)=x_2 t^2+x_1 t+x_0 $ $ y(t)=y_2 t^2+y_1 t+y_0 $ If we evaluate $ x $, $ y $ at $ t=b $, we get the numbers back. The multiplication of both numbers is equal to a polynomial of degree $ 2(X-1) $, $ w(t)=w_4t^4+w_3t^3+w_2t^2+w_1t+w_0 $ We can evaluate the polynomials at 5 different points, which will suffice to determine uniquely the polynomial $ w $ due to the interpolation theorem. We can choose 5 convenient points which make the evaluation and reconstruction of the polynomial easy. Common points are $ 0, 1, -1, 2 $ and $ \infty $ (this last one is just the product of the main coefficients). Let's see the form of each value: $ w(0)=x(0)y(0)=x_0y_0 $ $ w(1)=x(1)y(1)=(x_0+x_1+x_2)(y_0+y_1+y_2) $ $ w(-1)=x(-1)y(-1)=(x_0-x_1+x_2)(y_0-y_1+y_2) $ $ w(2)=x(2)y(2)=(x_0+2x_1+4x_2)(y_0+2y_1+4y_2) $ $ w(\infty)=x(\infty)y(\infty)=x_2y_2 $

If we look at things from $ w $ and its coefficients, we get: $ w(0)=w_0 $ $ w(1)=w_4+w_3+w_2+w_1+w_0 $ $ w(-1)=w_4-w_3+w_2-w_1+w_0 $ $ w(2)=16w_4+8w_3+4w_2+2w_1+w_0 $ $ w(\infty)=w_4 $

This is just solving one linear system (where 2 coefficients are straightforward). Once the coefficients are known, all that remains is to evaluate $ w $ at $ t=b $ and add. Toom-3 has a lower order ($ \mathcal{O}(n^{\log(5)/\log(3)})=\mathcal{O}(n^{1.46} $)) than Karatsuba's method ($ \mathcal{O}(n^{1.58}) $, so it runs faster for sufficiently large integers.

For larger integers (in the order of 10,000 to 40,000 digits), we can go faster by means of the Schönhage-Strassen algorithm, which uses the fast-Fourier transform (FFT) to achieve a complexity $ \mathcal{O}(n\log(n)\log\log(n)) $. Before we can explain the algorithm, we need to introduce the FFT. The order can be further reduced to $ \mathcal{O}(n\log(n)) $, but this algorithm is only practical for (super-ultra) incredibly large numbers and is an example of a galactic algorithm.

The Fast-Fourier Transform

The FFT is one of the key building blocks of many important algorithms, such as fast multiplication of very large numbers, polynomial multiplication, solving finite difference equations, error correcting codes (Reed-Solomon codes), and digital signal processing. It was used by Gauss early in the 19th century when he was trying to interpolate the orbits of asteroids Pallas and Juno. A simple implementation requires $ \mathcal{O}(n^2) $ operations. In 1965, Cooley and Tukey realized that the algorithm could be implemented more efficiently, reducing it to $ \mathcal{O}(n\log(n)) $, which led to its widespread use. Almost every language and numerical computation library have it implemented. In Rust, you can check this link.

To get an idea of the huge improvement over the naïve algorithm, let's look at the number of calculations for different samples: | Number of samples | $ 10^3 $ | $ 10^6 $ | $ 10^{12} $ | |------------------ | ------ | ------ | -------- | | DFT operations | $ 10^6 $ | $ 10^{12} $ | $ 10^{24} $ | | FFT operations | $ 10^4 $ | $ 2\times10^{7} $ | $ 4\times10^{13} $ |

We see that the amount of computations is reduced by more than two orders of magnitude for samples with $ 1000 $ or more elements!

FFT over complex numbers

The Fourier transform maps a function from its original domain (space or time) to another function depending on the (space or time) frequency. Stated another way, it decomposes a function into a collection of sine waves with different frequencies and amplitudes, which are useful to analyze the behavior of a given system. We can also perform the inversion, adding all those waves to recover the original function. Even though (continuous) Fourier transforms have many applications, we will be interested in discrete Fourier transforms (DFT), where we have a finite collection of data. Given data $ x_0 $, $ x_1 $,...$ x_{N-1} $, the DFT gives a sequence $ X_0, X_1,...X_{N-1} $, where $ X=\sum_{k=0}^{N-1} x_k\exp(-2\pi i k/N) $ where $ i^2=-1 $ is the imaginary unit. Inversion of the DFT is given by $ x=\frac{1}{N}\sum_{k=0}^{N-1} X_k\exp(2\pi i k/N) $.

The DFT can be cast in the form of a matrix-vector product, $ X=Mx $, where $ M $ is the $ N\times N $ DFT matrix: $ M= \left( {\begin{array}{ccccc} 1 & 1 & 1 & ... & 1 \ 1 & \omega & \omega^2 & ... & \omega^{N-1} \ \vdots & \vdots & \vdots & \vdots & \vdots \ 1 & \omega^{N-1} & \omega^{2(N-1)} & ... & \omega^{(N-1)(N-1)} \end{array} } \right) $

Implemented this way, the DFT requires $ N^2 $ operations, resulting from vector-matrix multiplication. The FFT will make this calculation more efficient, by taking advantage of the structure and using a divide and conquer strategy.

We can also see the DFT as evaluating a polynomial with coefficients $ x_k $ over the roots of unity. This will be useful when discussing fast polynomial multiplication.

The key point is that computing the DFT with $ N $ points can be reduced to calculating two DFTs with $ N/2 $ points. We can apply this recursively to break down a very large problem into a collection of smaller and easier-to-solve subproblems and then recombine those results to get the DFT.

The algorithm also takes advantage of the properties of the $ n $-th roots of unity in the complex plane. A number $ z $ is known as an $ n $-root of unity if $ z^n=1 $. These are of the form $ z_k=\exp(2\pi i k/n) $ for $ k=0,1,2,...,n-1 $. An interesting point is that these roots come in conjugate pairs: for each root $ r $ we have the corresponding $ \bar{r} $ (as a matter of fact, they form a finite group of order $ n $ under multiplication). For example, the fourth roots of unity are: $ 1, i, -1, -i $. It is easy to see which are the pairs.

To see how all works, suppose we have a vector $ x=(x_0,x_1,x_2,...x_{n-1}) $ and we want to compute the FFT. We can split between even and odd numbered terms: $ X=\sum_{k=0}^{n/2-1} x_{2k}\exp(2\pi i 2k/n)+\sum_{k=0}^{n/2-1} x_{2k+1}\exp(2\pi i (2k+1)/n) $ We can express the odd terms in a different way, by taking out a factor of $ \exp(2\pi i/n) $, $ X=\sum_{k=0}^{n/2-1} x_{2k}\exp(2\pi i 2k/n)+\exp(2\pi i/n)\sum_{k=0}^{n/2-1} x_{2k+1}\exp(2\pi i (2k)/n) $ We can now see that the factors corresponding to the $ n $-roots of unity repeat themselves whenever $ k $ is larger than $ n/2 $. Another way to see this is to rearrange the terms by taking $ 2 $ from the numerator of the exponential and sending it to the denominator: $ X=\sum_{k=0}^{n/2-1} x_{2k}\exp(2\pi i k/(n/2))+\exp(2\pi i/n)\sum_{k=0}^{n/2-1} x_{2k+1}\exp(2\pi i (k)/(n/2)) $ We now find that $ \sum_{k=0}^{n/2-1} x_{2k}\exp(2\pi i k/(n/2))=DFT(x_{2k}) $ is just the DFT of the even terms, which contains $ n/2 $ points. Similarly, $ \sum_{k=0}^{n/2-1} x_{2k+1}\exp(2\pi i (k)/(n/2)) $ is the DFT of the odd terms, containing $ n/2 $ points. This way, we broke the $ n $ point DFT into two smaller $ n/2 $ point DFTs, which can be combined to yield the original one. Now, each of those $ n/2 $ DFTs can be broken into two smaller ones, so we can recursively reduce the number of computations by working with smaller samples (this way, we save ourselves of the large vector-matrix product).

Extending the FFT to arbitrary rings

FFT can be extended from complex or real numbers to arbitrary rings, such as integers or polynomials (check our math survival kit). In particular, we can use the number theoric transform which specializes the FFT to $ \mathbb{Z}/p\mathbb{Z} $, that is, the integers modulo $ p $ (a prime number). Here we also have the $ n $-roots of unity, given by $ \alpha^n\equiv 1 \pmod{p} $ It is important that we restrict ourselves to prime numbers: in this case, we have that the square root of $ 1 $ are just $ 1 $ and $ -1 $. For example, if we take $ p=5 $, $ 1^2\equiv 1 \pmod{5} $ and $ -1\equiv 4 $, $ 4^2 =16 \equiv 1 \pmod{5} $. This is not true for $ 8 $ since $ 1^2\equiv 3^2\equiv 5^2\equiv 7^2\equiv 1 \pmod{8} $ and we would have $ 4 $ square roots!

The problem with using FFT in finite fields is that we are not free to choose the domain and the field just as we please. We need to select a multiplicative subgroup of order $ 2^n $ (in other words, we need to select a group that is generated by an element $ g $ and which contains its powers up to $ 2^n $). For example, if we take $ p=5 $, we have a group of order $ 4=2^2 $ which is generated by $ 2 $: $ {2^1=2,2^2=4,2^3\equiv 3, 2^4\equiv 1} $; it does not need to span all the elements of the field, though.

FFT multiplication algorithm

The algorithm follows the same line as Karatsuba's and Toom's:

Split
Evaluation
Pointwise multiplication
Interpolation
Combination

The key difference lies in the use of the FFT to speed up calculations.

Polynomial multiplication

Let's start with polynomial multiplication. Given two polynomials, $ p(x)=p_d x^d+p_{d-1}x^{d-1}+...+p_0 $ and $ q(x)=q_d x^d+q_{d-1}x^{d-1}+...+q_0 $, we want to find their product, $ w(x)=p(x)q(x) $. The simplest algorithm would be to apply repeatedly the distributive property, perform the multiplications and rearrange everything. The product of two polynomials of degree $ d $ is a polynomial of degree $ 2d $. We can see that this strategy involves operations of the order $ \mathcal{O}(d^2) $, that is, operations grow quadratically with the degree of the polynomials involved. We can take advantage of the structure of the polynomials and the interpolation theorem. We have at least two forms to describe the same polynomial:

Giving the $ d+1 $ coefficients.
Specifying the value of the polynomial at exactly $ d+1 $ points (3).

What are the advantages of the second option? That we get to choose the points freely and reduce the number of calculations. For example, if we have an even function, $ f(x)=f(-x) $ we can evaluate fewer points. Similarly, if the function is odd, $ f(-x)=-f(x) $ and we have to change the sign to get the value of $ -x $. So, choosing pairs $ x $ and $ -x $ we reduce the number of evaluations by half (except if we choose $ 0 $, for example). We can split our polynomial between two polynomials: one has odd number terms, and the other even: $ p(x)=p_e(x)+xp_o(x) $. For example, if $ p=x^5+3x^4+5x^3+2x^2+6x+3 $, we split it: $ p(x)=(3x^4+2x^2+3)+x(x^4+5x^2+6) $ We have then: $ p_e=(3x^4+2x^2+3) $ and $ p_o=(x^4+5x^2+6) $, where both polynomials are even functions! This way, we easily see that: $ p(-x)=p_e(x)-xp_o(x) $ If we have pairs $ (x_k,p(x_k)) $ and $ (x_k,q(x_k)) $, the product polynomial evaluated at $ x_k $ is $ (x_k,p(x_k)q(x_k)) $.

To determine the product polynomial, we need $ 2d+1 $ points; taking advantage of the above strategy, we need fewer point evaluations. If we could convert easily from the coefficient form to point evaluations, perform the multiplications in that form, and then transform back to coefficient form, we can achieve a lower complexity. We can recursively break the polynomials $ p_e(x^2) $ and $ p_o(x^2) $ into smaller polynomials.

We can choose as evaluation points the $ n $ roots of unity, which come in pairs: $ exp(2\pi i k/n) $ with $ k=0,1,2...n-1 $. In other words, we can quickly calculate the DFT of the polynomials, multiply the coefficients and reverse the DFT once the product has been found. This leads to operations in the order $ \mathcal{O}(d\log(d)) $.

Integer multiplication

To apply the FFT to integer multiplication, we need to transform our numbers to the coefficients of polynomials, perform the FFT multiplication and finally reconstruct the result. Overall this will take $ \mathcal{O}(n\log(n)\log(\log(n)) $ . There is a large overhead, which will make this algorithm practical only for very large integers. For example, if we want to multiply $ 3578 $ and $ 2457 $, we can define vectors $ (8,7,5,3,0,0,0,0) $ and $ (7,5,4,2,0,0,0,0) $, where we conveniently pad the numbers with zeros.

Typically, operations are performed modulo $ 2^N+1 $, where $ N $ is larger than the combined number of bits of the integers $ x $ and $ y $, to make sure that results never wrap around.

The Fourier transform has the advantage that an operation such as the convolution of $ x $ and $ y $ can be calculated from the product of the transforms $ X $ and $ Y $ and transforming back: $ \sum_{k=0}^{N} x_k y_{N-k}=IFFT(FFT(y)\times FFT(x)) $

The Schönhage-Strassen algorithm makes use of the negacyclic convolution. Given vectors $ x $ and $ y $ of length $ n $ and $ r $ a $ 2n $-th (primitive) root of unity (that is, $ r^{2n}\equiv 1 \pmod{p} $ and $ r^k\not\equiv 1 $ if $ 0<k<2n $ ), we can define the following weight vectors: $ W_j=r^j $ for $ 0\leq j<n $ $ W_j^{-1}=r^{-j} $ for $ 0\leq j<n $ The negacyclic convolution (NCC) of $ x $ and $ y $ can be computed as: $ NCC(x,y)=W^{-1}IFFT(FFT(Wx)\times FFT(Wy)) $

A comparison of the different methods implemented in GNU Multiple Precision Arithmetic Library is shown in this link.

Hash functions

Objectives

At the end of this chapter, the reader should:

Learn what a hash function is and its properties.
Become acquainted with different hash functions in use and their advantages and disadvantages.
Understand the working principles behind hash functions in current use.
Understand the different uses of hash functions.
Know different types of attacks that can be performed on hash functions.

Introduction

Hash functions are one of the most versatile tools in cryptography. They have many applications in signatures, integrity verification, message authentication, public key encryption, and commitments, among many others. For example, they can be used in storage systems or in criminal cases to detect whether a file has been modified or not. Git uses hashes to identify files in a repository. Bitcoin uses hash functions in its proof of work.

A hash function takes an (almost) arbitrary input and outputs a fixed-length string (typically, 32 bytes). The output is known as the digest. In cryptography, the hash function needs additional properties for it to be secure. Some examples of cryptographic hash functions are MD5 and SHA-1 (now obsolete), SHA-2, SHA-3, and Blake2.

Hash functions have a different working principle from stream or block ciphers. While ciphers achieve data confidentiality to ensure that the information cannot be read by unwanted parties, hash functions guarantee that the data has not been modified.

The strength of cryptographic functions relies on their unpredictability: if we modify one bit from a given message, the hash changes significantly. We saw this idea when we talked about the avalanche effect. We can view a secure hash as a function that takes an arbitrary message and outputs a random string.

Properties of a secure hash function

Given a function $ f $, we say that $ x $ is a preimage of $ y $ if $ f(x)=y $. If the function is injective (or one-to-one), then for each $ y $ in the image of $ f $ (that is, for every $y $ that is a valid output of $ f $), there is exactly one $x $ that is the preimage of $ y $. For example, if we have a linear function over the real numbers $ f:\mathbb{R}\rightarrow \mathbb{R}/f(x)=5x+2$ and we take $ y=12 $, we can see that $ x=2 $ is the preimage of $ y $. In the case of the function $ g:\mathbb{R}\rightarrow \mathbb{R}/ g(x)=x^2$, if we take $ g(x)=1 $, we see that both $ x=1 $ and $ x=-1 $ are preimages (and can be found with ease).

We can view a hash as a function taking an arbitrary string of bits and yielding a string of fixed length $ N $, $ H:{0,1}^\star \rightarrow {0,1}^N $. Given that the size of the input is larger than the output, the function cannot be one-to-one. In other words, given an output $ h $, there may be $ m_1, m_2,...,m_k $ such that $ h=H(m_1)=H(m_2)...=H(m_k) $.

The main property of secure hash functions is called preimage resistance. Given a random hash, preimage resistance guarantees that an attacker will not be able to find its preimage. In other words, given a hash $ h $, the attacker cannot find a valid $ m $ such that $ H(m)=h $. An ideal hash function behaves as a one-way function: we can find the image easily, but we cannot go back! Of course, one way to try to break a hash function is to take all possible input messages and calculate their digests until we find a match (though this match could not be the same as the input originally used since the function is not one-to-one). This approach is, however, computationally infeasible, given the size of the search space.

There are two types of preimage resistance, called first and second preimage resistance. The former states that it is (nearly) impossible to find a message $ m $ that hashes to $ h $, while the latter indicates that if we have have $ h_1=H(m_1) $, it is infeasible to find $ m_2\neq m_1 $ such that $ H(m_2)=h_1$. If we find such $ m_2 $, we say that we have found a collision for $ h_1 $. We can see that if a hash function is second preimage resistant, then it also has first preimage resistance.

The other property is collision resistance: it is computationally infeasible to find two messages $ m_1,m_2 $ which hash to the same value, that is, $ H(m_1)=H(m_2) $. We can see that if a hash function is collision-resistant, then it is second preimage resistant.

These properties mean that an adversary cannot modify or replace data without changing its hash value. Whereas collision resistance means that the attacker cannot create two messages with the same hash, second preimage resistance indicates that given a message, it is not possible to create a different one with the same hash.

Construction of hash functions

The easiest way to hash a message is to split it into pieces and process each of them in succession with a similar algorithm. This is known as iterative hashing and there are two main groups:

Merkle-Damgård construction, based on compression functions, which transform the input into an output of smaller size. This was the most widely used construction for hash functions until 2010 and includes MD5 (message digest 5), SHA-1 (secure hash algorithm-1), and SHA-2.
Sponge functions.

The Merkle-Damgård construction splits the message into blocks of the same size and combines them with an internal state, using compression functions. For example, given $ m $, we split it into blocks $ m_1,m_2,...,m_k $ (typically, 512 or 1024 bits) and we start with an initialization vector $ h_0 $. If the message cannot be split into blocks of the desired size, we just pad the message. We take $ m_1 $ and $ h_0 $ and give them to the compression function $ f $, yielding $ h_1=f(m_1,h_0) $. We then give $ h_1 $ and $ m_2 $ and calculate $ h_2=f(m_2,h_1) $ and continue recursively until we get the hash $ h $, $ h=h_k=f(m_k,h_{k-1}) $.

Compression functions can be built from block ciphers. One common construction is the Davies-Meyer construction: we take $ h_{k-1} $ as the plaintext and encrypt it using $ m_k $ as the symmetric key, $ e=\mathrm{encrypt}(h_{k-1},m_k) $ and we then perform an XOR operation, $ h_k=h_{k-1} \oplus e$. The XOR is necessary for security because the block cipher can be inverted: you can take the final hash, decrypt it using the last block as key and get $ h_{k-1} $ and so on.

Sponge functions perform bit mixing using XOR operations between message blocks and an internal state. Sponge functions are more versatile than compression functions and have found applications not only in hashing but also as deterministic random bit generators, authenticated ciphers, and pseudorandom functions.

Sponge constructions have two different phases: absorbing and squeezing. The internal state consists of two parts: the rate, $ r $, and the capacity, $ c $. The capacity determines the security level of the scheme and prevents length extension attacks. The working principle is as follows:

Initialize the internal state of $ r+c $ bits, $ r_0, c_0 $.
The message is padded so that it can be split in equal blocks $ m_1,m_2,...m_k $ of $ r $ bits.
Perform $ i_1=m_1 \oplus r_0 $
Apply the permutation function to $ i_1, c_0 $ to obtain the next internal state, $ r_1 c_1 $.
Take the next block $ m_{k+1} $, calculate $ i_{k+1}=m_{k+1}\oplus r_k $ and use the permutation function on $ i_{k+1},c_k $ to find the next internal state, $ r_{k+1},c_{k+1}$. This is the absorbing phase.
In the squeezing phase, take the first $ r $ bits of the internal state; if the length is less than the desired hash length, $ l_h $, apply recursively the permutation function and extract the next $ r $ bits, until the length is greater $ l_h $.
If the resulting bit string is larger than $ l_h $, truncate the result to $ l_h $.

Merkle trees

We mentioned that hashes can be used to assure the authenticity of a file. This enables the use of remote storage since we can trust a large file to a server and verify that it has not been changed or replaced by checking its hash value, which we can store. If we have $ N $ files, a naïve solution is to have $ N $ hashes, but this has the problem that the amount of information scales linearly with the number of files (that is, we have one hash for each file). A better way to do this is through a Merkle tree: the leaves are the hashes of the files and these are combined to obtain other leaves. A picture of a Merkle tree is shown in figure .

Say we have $ 2^N $ hashes, $ h_1, h_2, h_3,...h_{2^N} $. We can compute $ 2^{N-1} $ hashes from those, by taking the hash value of a pair, $ h_j^1=H(h_j,h_{j+1}) $, reducing the number of necessary hashes. We can proceed further, $ h_j^2=H(h_j^1,h^1_{j+1}) $, until we get to only one hash value, which is known as the root. We can check the integrity of the whole storage just by looking at the root. This has found applications in blockchains, too.

Commitments

A commitment scheme allows one party to pledge to a certain message $ m $ (in truth, it can be almost anything, including functions or polynomials) by providing a commitment $ \mathrm{com} $, with the chance of later revealing it. We can think of the commitment scheme as a type of envelope, where we seal some message, store it safely and open it later to see whether the message was right. Anyone can see $ \mathrm{com} $, but nobody (except the sender) could learn its content before the right time. A commitment scheme should satisfy two properties:

Hiding: the commitment does not reveal anything about the message.
Binding: it is infeasible for the committer to output a different message than the one originally used in the commitment.

We can see that cryptographic hash functions and their properties are well-suited for commitment schemes. The hiding property is satisfied by the fact that the hash of an input looks random and the function is one-way. Collision resistance guarantees that it is infeasible to find $ m,m^\prime$ such that they hash to the same value. A simple commitment would thus be $ \mathrm{com}=H(m) $. The problem with this strategy is that, if the attacker has some information on the message, he can conduct a brute-force search over all possibilities and find the original message (this could be the case where we want to bet on the tossing of a coin and the chances are heads or tails).

A better way to commit would be to incorporate some randomness. We select at random a bit string $ r $ of length $ n $ and take the hash of the concatenation of $ m $ and $ r $: $ \mathrm{com}=H(m\vert \vert r) $.

Applications

Secure Hash Functions

Definition of collision resistant hash function, unpredictability, preimage resistance, finding collisions, etc

Commitments. Binding and hiding

Merkle trees

How things can go wrong

Compression based hash functions

Sponge constructions

ZKP friendly hashes

SHA-3

Blake2

Poseidon

Pseudorandom number generators

Digital signatures

Introduction

Applications

RSA signature

EdDSA

ECDSA

Message authentication codes

Introduction

We discussed previously how to ensure message confidentiality between two parties, Alice and Bob. We saw that we can use a symmetric key cipher, such as AES or ChaCha20 to encrypt messages between Alice and Bob so that only they can read them. However, when Bob gets a message from Alice, how does he know that it is truly from Alice, nor that it has not been changed by a malicious party? This is where authenticity comes into play. For example, during a key exchange in a Diffie-Hellman protocol, a man-in-the-middle (MIM) can try to impersonate both Alice and Bob. The scheme works as follows:

Alice chooses a random number $ a $ and computes $ g^a $ and sends it.
The MIM gets $ g^a $, chooses $ a^\prime $ and sends Bob $ g^{a^\prime} $.
Bob chooses a random number $ b $ and computes $ g^b $ and sends it.
The MIM gets $ g^b $, chooses $ b^\prime $ and sends Bob $ g^{b^\prime} $.
Alice gets the shared secret $ g^{ab^\prime} $ and Bob gets $ g^{a^\prime b} $; their messages get decrypted by the MIM, who can read them and then re-encrypt them to Alice or Bob with the corresponding secret key.

Authenticity is also important in contexts where confidentiality is not needed. For example, we could have some authentication tag that gives proof of the integrity of files in our hard drive. If an attacker gets access to our hard drive, he may try to change files in our operating system. The authentication tags would tell us if there has been a modification in our files or not.

A message authentication code (MAC) is a primitive which allows us to ensure the integrity of a given message. There are several constructions that can be used, depending on the context. Two commonly used constructions are CBC-MAC and HMAC. MACs play an important role in internet protocol security (IPsec), secure shell (ssh), and transport layer security (TLS), generating authentication codes for each packet that is transmitted.

We will discuss later how to combine authentication codes with encryption to obtain authenticated encryption, which can guarantee both semantic security (that is, the attacker cannot learn anything from a given ciphertext) and ciphertext integrity, leading to secure encryption against tampering.

What is a MAC?

A message authentication code is a pair of efficient algorithms, signing, and verification, $ S, V $ which work over a set of messages and tags and take keys. The key space is given by an n-bit string $ {0,1 }^n $. If we know the key, we can add authentication tags and verify them. The signing algorithm takes a message $ m $ and the key $ k $ and outputs a tag $ t $: \[ S(k,m)=t \] The verification algorithm gets a tag, $ t $, the key $ k $, and the message $ m $ and outputs a boolean variable $ b $, which tells us whether the tag corresponds to the given message: \[ V(k,t,m)=b \] To be useful, the MAC construction has to be secure, otherwise, an attacker could forge messages. We say that an attacker has produced a forgery if he can find a valid pair $ m,t $ without knowing the key.

Attacks against MAC

To see whether a MAC is secure or not, we need to establish the powers of the attacker and what would be a successful attack.

We suppose that the attacker can perform a chosen message attack (CMA). In simple words, the attacker is free to choose any messages $ m_i $ and can get access to the corresponding tag $ t_i=S(k,m_i) $, by having Alice or Bob calculate the tag. He does not have access to the key, though. While this may be seen as an awkward power (because he can get the tag of any message), this is something that could take place in the real world. The goal of the attacker is, given pairs $ (t_i,m_i) $ for $ i=1,2,...j $, to find a new valid pair $ t,m $, where $ m\neq m_i $. This is called an existential forgery. We will say that the MAC is secure if it is existentially unforgeable under CMA.

MACs could also be rendered insecure by replay attacks. In this situation, an adversary may capture a message and its tag from Alice to Bob and then use it to impersonate Alice by sending the same message To Bob sometime later. To avoid this, MACs include a message number, which is increased with each new message, or the time stamp, and this is authenticated together with the original message in the MAC.

Construction of MAC by pseudo-random functions (PRF)

We saw examples of pseudo-random functions when we talked about block ciphers. We mentioned that these behave as pseudo-random permutations, where we take a message $ m $ and map it over one of all the possible output messages. For example, the AES block cipher is a function $ f:K\times {0,1}^{128} \rightarrow {0,1}^{128} $, taking a message of 16 bytes and outputting a random string of 16-bytes.

We can construct a MAC from a given PRF, taking messages in a space $ X $ (for example, messages up to GB long), and outputting a tag in $ Y $ (for example, a 128-bit string), $ g:K\times X \rightarrow Y $ by doing \[ t=g(k,m) \] This MAC is secure provided that the PFR g is secure and that the output set is large enough, that is, the number of elements $ \vert Y \vert $ is greater than $ 2^{80} $.

If the tag space is small, then the attacker has a high probability of outputting the correct tag.

Block ciphers and cryptographic hash functions behave as pseudo-random functions; therefore, their use in constructing MAC is reasonable. In the first case, we get CBC-MAC, while in the second, HMAC.

CBC-MAC

To build CBC-MAC, we need a pseudo-random permutation (PRP), such as a block cipher. We can picture the MAC function as $ f:K^2\times M \rightarrow {0,1}^n$. It takes two different keys, $k_1,k_2 $, a message and outputs a tag. In the case of using AES as PRP, $ n=128 $. Given that AES works with 16-byte words, the message is split into equal blocks of 16 bytes. In case it is not a multiple of 16, we can always pad the message conveniently. Let's call $ m_0,m_1,...m_N $ the blocks composing the message and $ E(k,m)=C $ the AES encryption function, where the first argument is the key and the second is the message block. The algorithm proceeds as follows:

Compute $ t_1=E(k_1,m_0)$.
For $ j=2,...,N $ do $ t_{j-1}^\prime=t_{j-1}\oplus m_j $ $ t_j=E(k_1,t_{j-1}^\prime)$
Compute $ t=E(k_2,t_N) $

This last step is critical to make the MAC secure against existential forgery. If step 3 were omitted, then we can perform the following chosen message attack:

Choose $ m $
Request $ t=E(k,m) $
Obtain the tag for the forged message $ m,t\oplus m $.

We can see that we have obtained a valid pair by performing the calculations: $ f(k,m\vert \vert t\oplus m)=E(k,E(k,m)\oplus t\oplus m)=E(k,t\oplus t\oplus m)=E(k,m)=t $ where we have used the fact that $ a\oplus a\oplus b=b$ (XORing $b $ twice with the same bitstring returns $ b $).

NMAC

The NMAC construction is based on the cascade diagram. In this case, the NMAC function is $ g:K^2\times M\rightarrow K $. As in CBC-MAC, the message is split in $ N+1 $ equal blocks, $ m_0,m_1,...m_N $. To obtain the tag,

Set $ t_0=k_1 $.
For $i=1,...N $ perform $ F(t_{i-1},m_{i-1})=t_i$
Pad $ t_N $ with a fix pad element $ \mathrm{fpad} $ so that its length corresponds to the size of the elements in $ M $, $t_{N+1} $.
Compute $ t=F(k_2,t_{N+1}) $

Step 2 corresponds to the cascade. Step 4 is necessary once again to prevent a length extension attack. We can see that if we know the result of the cascade $ \mathrm{cascade}(k,m)$ then we can append any string $ w $ and obtain the exit of the cascade $ \mathrm{cascade}(k,m\vert \vert w)$.

Even though NMAC could be used in combination with AES, this proves inconvenient in practice, since there is a rapid change in the key scheduling. The strategy works best with cryptographic hash functions.

PMAC

The problem with the NMAC and CBC-MAC is that they are carried out sequentially. In the case of very long messages, this can be inconvenient, since we cannot leverage multiple processors to accelerate the calculation. Parallel MAC solves this problem by adopting a different scheme. To build PMAC we need two functions: a pseudo-random function $ F:K\times M\rightarrow M $ and a function taking a key and a counter $ P:K\times \mathbb{Z}_0^+ \rightarrow K $. To compute the tag:

For $ i=0,1,...N $ compute $ m_i^\prime=m_i\oplus P(k_1,i)$ and $ t_i=F(k_2,m_i^\prime) $.
Compute $ m^\prime=m_0^\prime \oplus m_1^\prime\oplus ...\oplus m_N^\prime $.
Obtain $ t=F(k_2,m^\prime) $.

Universal hashing and One-time MAC

A faster version than PFR function-based MACs is the one-time MAC; this can be secure against all adversaries. They are based on universal hash functions, which are weaker than cryptographic hash functions (they do not need to be collision-resistant) but operate much faster. A universal hash function (UHF) takes a key, $ k $ and a message, $ m $, and gives the hash $ h_m=UHF(k,m) $. The only security requirement is that for any two messages $ m_1,m_2 $ the probability that they hash to the same value for a random key is negligible: \[Pr(UHF(k,m_1)=UHF(k,m_2), k\leftarrow K)=\mathrm{neg}\ \forall m_1,m_2 \]

The idea behind this is to break the message into $ N $ blocks as before and interpret each of these blocks as a number over a large finite field, that is, every block is an element from $ {0,1,2,..,q-1} $. We can take each of them as the coefficient of a polynomial. To build the MAC, we fix a large prime $ q $ and take two random integers $ a,b $ in $ {1,2,...q-1}$. The signing algorithm is \[ S(a,b,m)=a^{N+1}+m_{N}a^N+m_{N-1}a^{N-1}+...a_1m_1+b \mod{q} \] The algorithm evaluates the polynomial with coefficients given by $ m_i $ at point $ a $, adds $ b $, and reduces the result modulo $ q $ so that the tag is also an element in the finite field $ \mathbb{F}_q $.

Poly1305 is an example of such a construction and is used in combination with AES or ChaCha20 (we will see shortly why we need to combine them) to provide a fast MAC, used, for example, by Google to secure HTTPS connections. In particular, Poly1305 breaks the messages into blocks of 16 bytes, and each of them is interpreted as a 129-bit number in little-endian form, by appending an additional bit to each block. The modulus is $q=2^{130}-5 $ and the final result is reduced by taking the remainder by $ 2^{128} $.

The problem with one-time MACs is that they can be used to authenticate only one message. An attacker can easily break the scheme to obtain both $ a $ and $ b $. Note that, if the attacker submits a message where each $ m_i=0 $, then $ S(a,b,m)=b $. Then, he can send the message $ m_1=1,m_i=0\ \forall \ i>1 $ and get $ S(a,b,m)=b+a \mod{q} $ and recover $ a $. We can solve this problem by improving on the construction and incorporating a pseudorandom function.

Carter-Wegman MAC

The Carter-Wegman MAC combines a PRF with a one-time MAC. If $ F:K_F\times {0,1}^n \rightarrow {0,1}^n$ is the pseudo-random function and $ S(k_S,m) $ is a secure one-time MAC, the Carter-Wegman MAC is calculated as follows: Pick at random $ r $ in $ {0,1}^n $ and calculate \[ CW(k_F,k_S,m)=(r,F(k_F,r)\oplus S(k_S,m)) \] The input to the pseudo-random function is small and, even though $ F $ can be slower than $ S $, it will be computed very fast. We leave the message to the one-time MAC, which can deal efficiently even with large messages.

HMAC

To construct HMAC we need a key, $ k $, inner and outer paddings, $ \mathrm{ipad,opad} $ and a secure cryptographic hash function $ H:{0,1}^\star\rightarrow {0,1}^n $. The signing algorithm is \[ S(k,m)=H(k\oplus \mathrm{opad},H(k\oplus \mathrm{ipad},m)) \] The pseudo-random function masks the only weakness of the UHF, by XORing the result of the UHF with a strongly random output from the PFR.

Timing attacks on tag verification

MAC verification can be subject to bugs or attacks if not done properly. One common attack against badly implemented MAC verification schemes is timing attacks. In the verification, the verifier takes the key, $ k $, and the message $ m $, computes the authentication tag $ t^\prime $, and compares the received tag $ t $. One naïve way to do this is by performing a byte by byte comparison, \[ t^\prime[i] == t[i]\] The problem with checking this way is that, as soon as two bytes differ, say, for example, byte number 3, an attacker can be certain that the first two bytes were correct. The attacker can then start trying different values for the third byte and measure the time it takes to verify. If it increases, then he knows that he got at least another byte correct. The process can continue until he exhausts the number of bytes in the tag, getting the valid tag $ t $ for a message $ m $, being able to produce an existential forgery. The lesson is, therefore, to ensure that the verification is done in constant time so that no information is leaked from the tag.

Need to change the key

In order to be secure, the MAC needs to be long enough. If not, they could be subjected to brute force attacks. We can find bounds for the number of messages we can MAC before we need to change keys. For example, in CBC-MAC, which outputs tags in $ {0,1}^n $, if the adversary can query $ q $ messages of length $ \ell $, then we need that \[ \frac{q^2\ell^2}{2^n} \ll 1\] This means that $ q\ell \ll 2^n $. If we use AES where $ n=128 $ and we consider that $ 2^{-32}\approx 2\times 10^{-10}$ to be sufficiently small, then $ q\ell \leq 2^{48} $. Given that 1 GB of data is $ 2^{30} $ bytes, we can encrypt several messages containing up to several GB in total before we need to change the key.

In the case of HMAC with SHA-256, we have $ n=256 $ and the amount of messages we can tag before reaching the limit is $ q \ll 2^{256/2} $, which, for our case could be something like $ 2^{100}\approx 10^{30} $

Summary

Encryption schemes, such as AES or ChaCha20 offer confidentiality, but cannot ensure the authenticity of messages nor that they have not been tampered with. The lack of authenticity can lead to devastating attacks and break cryptographic schemes. Message authentication codes (MAC) provide ways to ensure the integrity of the message, which can be combined with encryption schemes to provide authenticated encryption. To be secure, MACs need to satisfy existential unforgeability under chosen message attacks; given a new message $ m $, an attacker should not be able to generate a valid authentication tag $ t $, even if he has access to other valid pairs $ m_i,t_i $. MAC can be obtained from pseudo-random functions (such as hash functions or block ciphers, like AES) or universal hash functions, each offering advantages and disadvantages in terms of speed, size, processing in parallel, etc.

SNARKS

Introduction

Succinct Non-Interactive Arguments of Knowledge (SNARKs) are powerful cryptographic primitives, with applications in decentralized finances, governance, and computation. There are many different SNARKs out there, such as Marlin (which is the one Aleo is based on), Plonk, STARKs, Groth16, etc, depending on the tools they are built on, and with differences in performance, proof sizes, verification times, etc. However, they are all based on some common principles and properties. Among SNARKs, the most important ones for private applications are zero-knowledge SNARKs (zk-SNARKs, for short). They allow us to prove that we know certain variables, called witness, $ w $, such that the output of a function $ F $, evaluated at the witness and instance (public variables), $ x $, is $ F(x,w)=y $, without revealing anything about $ w $.

Computer programs can be converted to functions receiving input (some of which we may want to conceal) and we can prove its correct execution with SNARKs. For example, we can transform the program into an arithmetic circuit, $ C $, and, given the public input and output, $ x $, and secret data, $ w $, we can prove that the program was correctly executed by showing the satisfiability of the circuit, $ C(x,w)=0 $. Since arithmetic circuit satisfiability is an NP-complete problem, we can reduce any NP problem to an arithmetic circuit (this is not the only alternative, though, and several constructions rely on different techniques).

Before jumping into the details, let us first explain the main properties of a SNARK and the precise meaning of each term in the name. A zk-SNARK involves two parties, the prover and the verifier, where the first one tries to convince the latter of a given statement, such as the prover knows $ w $ such that $ C(w,x)=0 $. The SNARK must fulfill:

Completeness: If the prover knows $ w $, he will always be able to convince an honest verifier of the validity of the statement.
Soundness: if the statement is false, then no cheating prover could convince the verifier to accept, except with very low probability.
Zero-knowledge: the proof does not reveal anything about the witness.

As for the name, we have:

Succinctness: the proofs have to be short and quick to verify. This would enable us to delegate expensive computations to untrusted parties and check their validity without having to run the program by ourselves.
Non-interactive: does not require interaction between prover and verifier, nor to generate the proof or verify it. We will see that the construction will rely first on interactive proofs, but we can turn it into non-interactive by employing the Fiat-Shamir transform.
Argument of knowledge: we can prove with a high probability that we know the witness.

Setup

SNARKs require trusted setups. Among them, we can distinguish:

Uniform reference string or transparent setups (URS).
Structured reference string or private setup (SRS).

In the case of SRS, we can find two cases:

Universal (for example, MARLIN)
Specific (Groth 16)

In practice, the private setup is carried out as a multiparty computation; the construction will be secure as long as there is one honest party. The problem with specific SRS is that the string depends on the program and a new setup must be carried out for each one (this is an undesirable property).

Probabilistic proofs and capabilities of the verifier

The succinctness of the argument relies on probabilistic proofs. To do that, we first have to establish the things that the "powers" or capabilities of the verifier. We have:

Interaction: the verifier is allowed to interact with the prover, sending challenges and receiving responses.
Multiprover: there are several provers, but they are isolated.
Randomness: the verifier can select random elements or queries.
Ability to perform queries to an oracle: the verifier may make queries to the prover's messages.

When the verifier has access to more than one of these powers, we get different types of proofs:

Interactivity + Randomness: Interactive proofs (IP).
Randomness + Oracle: Probabilistically checkable proofs (PCP).
Interactivity + Randomness + Oracle: Interactive Oracle Proofs (IOP)

There are other possibilities, such as MIOP, but we will focus on the previous 3. Currently, IOPs give the most efficient constructions for SNARKs: quasilinear time verification, linear size proof lengths, linear time prover, and efficient implementations. PCP are interesting from an educational point of view but are not efficient in practice (it does not result in succinct proofs, except with linear queries). Finally, IP give some interesting building blocks, in the form of subroutines.

In an IOP, the prover and verifier exchange messages. The prover sends arbitrary messages (in a polynomial IOP, the prover sends commitments - see next section- to polynomials) and the verifier sends random challenges. After some rounds, the verifier makes queries to some values and decides whether to accept or reject.

Commitment schemes

The performance of SNARKs depends on the type of commitment used; there have been many advances over the last years to improve them.

We can think of a commitment as a safe box. We make some choice for a bet and store it in the safe box and hand it to someone else. Once the result is known, we can reveal our bet by opening the safe box.

A commitment has to verify two properties:

Binding: we cannot produce two valid openings for the same commitment. In other words, if we committed to some value $ a $, it should be impossible to find $ b $ such that the $ cm(a)=cm(b) $.
Hiding: the commitment reveals nothing about the committed data.

One way to commit to a message is by using a collision-resistant hash function. If we have a message $ m $ and some random value $ r $, $ \mathrm{cm}(m,r)=hash(m\mid r) $ Given that it is collision-resistant, we have the binding property. We can afterward open the commitment and verify: $ \mathrm{Verify}(m,r,\mathrm{cm})\rightarrow $ accept or reject One advantage of commitments is that they tend to be short. For example, if we use SHA-256, the length of the commitment will be 32 bytes.

One important group of commitments is the polynomial schemes. Here are some constructions and the math that they rely on:

Basic elliptic curves: bulletproofs
Bilinear groups: Kate-Zaverucha-Goldberg (KZG) polynomial commitments (pairings, trusted setup) DORY (no trusted setup)
Groups of unknown order: DARK
Hash functions only: FRI

Anatomy of a SNARK

A SNARK can be constructed by selecting the following two elements:

A type of probabilistic proof: for example, probabilistically checkable proof (PCP) or interactive oracle proof (IOP). A special type of IOP is polynomial IOP (PIOP).
A commitment scheme (cryptography). For example, polynomial/functional commitments, vector commitments, and linear encoding.

The probabilistic proof determines the type of computation. It can be either in the form of a machine computation (such as vmTinyRam) or a circuit.

The cryptographic element determines the cost to generate the proofs, whether it will be postquantum secure and the type of setup (transparent or structured). The math tools we need to work with each of them are:

Linear encoding: Elliptic curve pairings (ECP) + Lattices
Vector commitment: Collision resistant hash (CRH) functions + ECP.
Polynomial commitment: CRH, ECP, PO groups, UO groups

Some SNARK recipes are:

Linear PCP + Linear encoding: Groth16, Groth-Maller 17
PCP/IOP + vector commitment: STARKs
Polynomial PCP/IOP + polynomial commitment: MARLIN, SONIC, Plonk, Spartan.

Bulletproofs take some different combinations of the elements above and are based on cryptographic sum checks.

The proof's size depends strongly on the type of construction. For example, for PIOP with KZG polynomial commitments, proofs take less than 1 kB (two elliptic curve elements), whereas IOP with FRI (Fast Reed-Solomon Interactive Oracle Proofs of Proximity) needs around 100 kB, more than two orders of magnitude more! This is because FRI uses Merkle trees; the verification requires an authentication path, needing several hashes.

One problem we are faced with circuits is that the verifier should read it, resulting in a verification time that depends linearly on the size (which would make it non-succinct). To avoid this, we can preprocess it and attain sublinear verification times.

We will now focus on the KZG polynomial commitment, which is the basis of Marlin and Plonk.

KZG commitment scheme

The polynomial commitment scheme has the following algorithms:

Setup.
Commit.
Open.

To commit to a polynomial, we will evaluate the polynomial at a given, but unknown, point $ \alpha $.

The setup takes the maximum degree of the polynomial (which depends on the number of gates of the arithmetic circuit) and outputs the public parameters: proving key and verifying key. To be able to evaluate polynomials, we will use an elliptic curve (we will need it to be pairing friendly, such as BLS 12-377) to hide $ \alpha $ and its powers ($ \alpha $ is chosen during the setup ceremony and discarded as toxic waste!). To do that, we pick a generator of a group of large prime order $ d+1 $, $ g $ and calculate $ H={g,\alpha g, \alpha^2 g, \alpha^3 g,..., \alpha^{d} g}={h_0,h_1,h_2,...h_{d}} $ This will give us a very large collection of elliptic curve points (we will save them as a string) which will work as the proving key. In the case of a universal setup, the number of elements will coincide with the maximum size of the circuit. Since elliptic curve points take in the order of $ 100 $ B, if we want to deal with $ 10^{8} $ gates, we will need more than 1 GB just to store it. For a given circuit, which could be much smaller than the maximum, MARLIN trims the key, so that it is much simpler and faster to work. The setup also depends on a security parameter $ \lambda $, but we will consider it to be fixed in our analysis. We therefore have $ \mathrm{setup}(\lambda,N)\rightarrow \mathrm{pp(pk,vk)} $.

The prover generates the polynomial $ P(x)=p_0+p_1x+...p_dx^d $ and commits to it by evaluating it at $ \alpha $. Since we do not know $ \alpha $, only the scalar multiples of group elements of powers of $ \alpha $, $ \mathrm{cm}(P)=p_0g+p_1\alpha d+...+p_d\alpha^d g=p_0h_0+p_1h_1+...p_dh_d $ This is the problem of multiscalar multiplication (MSM). We see that the polynomial commitment corresponds to one group element of the elliptic curve.

We could use a Merkle tree to commit to a polynomial, too. The problem with Merkle trees is that the size of the tree would be dependent on the degree of the polynomial. In the case of KZG, the commitment is just one group element, which is independent of the size. Besides, when we want to evaluate the polynomial in a proof, we need to send all the coefficients in the clear (which reveals the polynomial), with the verifier having to do linear work on the size of the polynomial. On the other hand, we will see that KZG mostly hides the polynomial (unless there are lots of queries) and the verification is independent of the degree of the polynomial. Additionally, KZG allows for batch proofs: if we have several commitments $ \mathrm{cm}_1 $, $ \mathrm{cm}_2 $, ..., $ \mathrm{cm}_k $, we can generate a single proof showing that all commitments are valid.

Once the polynomial is committed, the verifier (in an interactive scheme) can send random points $ r_k $ to the prover and the latter gives the polynomial evaluated at $ r_k $, $ P(r_k) $. To make it non-interactive, we use the Fiat-Shamir transform, where the random challenges will be generated from a cryptographic hash function.

Suppose that the prover wants to convince the verifier that $ P(u)=v $. He can transform that equation into a polynomial, $ g(x)=P(x)-v $, which has a root at $ x=u $. This means that $ g(x) $ is divisible by $ x-u $, which we can write as $ g(x)=P(x)-v=Q(x)(x-u) $, were $ Q $ is the quotient polynomial. The prover can commit to $ Q(x) $ doing the same as before, that is $ \mathrm{cm}(Q)=q_0g+q_1\alpha d+...+q_{d-1}\alpha^{d-1} g=q_0h_0+q_1h_1+...q_{d-1}h_{d-1} $ which is another MSM. The proof $ \pi $ contains this group element, which is constant size! The proof will show that $ P(u)=v $ and $ P $ is indeed a polynomial of at most degree $ d $ and that the commitment of $ P $ is $ \mathrm{cm}(P) $.

The verifier accepts the proof if $ (\alpha-u)\mathrm{cm}(Q)=\mathrm{cm}(P)-vg $. The problem is that we do not know $ \alpha $. This is where pairings come to our aid and we only need the elements $ h_0 $ and $ h_1 $ to be able to compute. Roughly speaking, an elliptic curve pairing is a function $ f: \mathcal{G}_1 \times \mathcal{G}_2\rightarrow \mathcal{G}_t $ which takes two elements, $ P $ from $ \mathcal{G}_1 $ and $ Q $ from $ \mathcal{G}_2 $ and outputs an element $ R $ from $ \mathcal{G}_t $. All the groups have the same order $ r $ and correspond to groups in pairing-friendly elliptic curves. In the case of MARLIN, the curve BLS 12-377 is used. The pairing satisfies: $ f(P,Q)=f(g,g_2)^{pq} $ where $ g $ and $ g_2 $ are generators for the groups $ \mathcal{G}_1 $ and $ \mathcal{G}_2 $ (and $ P=pg $ and $ Q=qg_2 $). The form of the verification equation in terms of pairings is $ f(\mathrm{cm}(Q),(\alpha-u)g_2)=f(\mathrm{cm}(P)-vg,g_2) $ The check is done over $ \mathcal{G}_t $. We only need to know $ \alpha g_2 $ from the trusted setup.

Now, how can we be convinced that if we evaluated the polynomials at a single point and they coincide, then it is very likely that the argument is true? The answer lies with the Schwartz-Zippel lemma (we will state it for finite fields): for a polynomial $ P $ of degree $ d $ over a finite field of order $ p $, the probability that the polynomial is zero at a point sampled at random $ r $ is $ \mathrm{Pr}(P(r)=0)=d/p $ Given that the maximum size of circuits (which gives the maximum degree of a polynomial) is around $ 2^{26}\approx 10^8 $ and the size of the field is larger than $ 2^{256} $, the probability is around $ 2^{-230}\approx 10^{-70} $. If we say $ P_1 $ and $ P_2 $ coincide at one point $ r $, we can construct the polynomial $ P(x)=P_1(x)-P_2(x) $ (since polynomial addition is closed) and $ P(r)=0 $. Given how unlikely it is to hit a zero at random, we can be quite confident that $ P(x) $ is the zero polynomial.

Summary

zk-SNARKs have started gaining attention due to their use in developing fully private applications. They provide succinct proofs that a certain computation was carried out correctly, without revealing sensitive information. There are many possible constructions, based on probabilistic proofs and a commitment scheme. Depending on the different choices, more efficient versions are possible and determine the type of computation (machine or circuit computation). We explored the KZG commitment scheme, which shows the idea behind systems such as MARLIN and Plonk and the calculations we need to carry out to generate and verify the proofs.

Multiscalar Muforltiplication

The Multi-Scalar Multiplication (MSM) problem consists of, given an elliptic curve, calculating \[ \sum_{i=1}^{n} k_i P_i \] for some scalars $ k_i $, some elliptic curve points $ P_i = (x_i, y_i) $ and some $ n $ (For example, in the order of $ 2^{26} $).

It is estimated that around 80% of the time to produce a zk-SNARK proof is spent doing MSM, so optimizing it is very important for performance.

Bucketing method

We can break the MSM into smaller sums and reduce the total number of operations, by repeteadly using a method called the windowing technique. If we want to compute each $ k_iP_i $, we can break it into windows of size $ c $ \[ k_iP_i=k_{i0}P_i+k_{i1}2^{c} P_i+k_{i2}2^{2c} P_i+...+k_{i,m-1}2^{c(m-1)} P_i \] Using this, the MSM problem can be rewritten as \[P=\sum_{i} k_iP_i=\sum_{i}\sum_{j} k_{ij}2^{cj}P_i \] We can now change the order of the summations, \[ P=\sum_{i} k_iP_i=\sum_{j}2^{cj}\left(\sum_{i} k_{ij}P_i\right)=\sum_j 2^{cj} B_j \] In other words, we first divide the scalars into windows and then combine all the points in each window. Now we can focus on how to calculate each $ B_j $ efficiently: \[ B_j=\sum_{i} k_{ij}P_i=\sum_{\lambda=0}^{2^c-1} \lambda \sum_{u(\lambda)} P_u \] where the summation over $ u(\lambda) $ is done only over points whose coefficient is $ \lambda $. For example, if $ c=3 $ and we have $ 15 $ points, \[ B_1=4P_1+3P_2+5P_3+1P_4+4P_5+6P_7+6P_8+3P_{14}+5P_{15} \] We can split the summation by the coefficients $ \lambda $, taking values from $ 1 $ to $ 7 $. For $ \lambda=1 $, $ \sum_u P_u=P_4 $ (because $ P_4 $ is the only one with coefficient $ 1 $), for $ \lambda=4 $, $ \sum_u P_u=P_1+P_5 $, etc. What we are doing is just placing all points with a common coefficient $ \lambda $ into the $ \lambda $-bucket. Thus, \[ B_j=\sum_\lambda \lambda S_{j\lambda}=S_{j1}+2S_{j2}+3S_{j3}+4S_{4j}+5S_{5j}+6S_{j6}+7S_{j7} \] We can calculate this with a minimum number of point additions, using partial sums. $ T_{j1}=S_{j7} $ $ T_{j2}=T_{j1}+S_{j6} $ $ T_{j3}=T_{j2}+S_{j5} $ $ T_{j4}=T_{j3}+S_{j4} $ $ T_{j5}=T_{j4}+S_{j3} $ $ T_{j6}=T_{j5}+S_{j2} $ $ T_{j7}=T_{j6}+S_{j1} $ Each of these operations involves doing just one elliptic point addition. The final result can be obtained by summing these partial sums: \[ B_j=\sum T_{jk} \]

We can improve the calculations by changing the expansion of the coefficients $ k_i $. In binary representation, the Hamming weight is the number of non-zero bits; ideally, we would like this weight to be as small as possible to reduce the number of additions (For example, 65537, which is $ 2^{16}+1 $ is used as public key for the RSA cryptosystem in many implementations. The square and multiply algorithm requires only two multiplications). The average Hamming weight in a binary representation is $ 1/2 $; if we introduce a signed binary representation ($ -1,0,1 $), the average weight is reduced to $ 1/3 $, with the consequent decrease in the number of operations (on average).