Solving The Cryptopals Crypto Challenges (Set 1 - Challenge 6-8)

5 Dec 2024

Overview

I came across the cryptopals crypto challenges in this blog (A really cool blog, I recommend following it). The challenges looked interesting and I am solving them.

  • I am writing this blog series to record and analyse my process of solving a problem.
  • All published parts of the series can be found here.
  • All code in this series can be found here.
  • All solutions are in Golang.
  • As the intention of this exercise is to maximise learning, I will be doing it the old-fashioned way, sans LLMs.
  • I welcome all feedback and critique on any part of the blog series.

Challenge 6

Problem

The problem, given here, is to break the repeating XOR cipher.

There’s a file here. It’s been base64’d after being encrypted with repeating-key XOR.

Decrypt it.

Here’s how:

  1. Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40.
  2. Write a function to compute the edit distance/Hamming distance between two strings. The Hamming distance is just the number of differing bits.
  3. For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.
  4. The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.
  5. Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.
  6. Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.
  7. Solve each block as if it was single-character XOR. You already have code to do this.
  8. For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key.

Solutioning

I started working on the hamming distance calculation and the first implementation I did was

func hammingDistance(ip1 []byte, ip2 []byte) (int, error) {
	hDistance := 0
	if len(ip1) != len(ip2) {
		return hDistance, fmt.Errorf("Length of strings must be equal. In this case, len1 = %d, len2=%d", len(ip1), len(ip2))
	}

	for i := range len(ip1) {
		if ip1[i] != ip2[i] {
			hDistance ++
		}
	}
	return hDistance, nil
}

But this did not give the expected output. I went back to the problem and realised the hamming distance is the number of differing bits.

Calculating differing bits can be done by doing an XOR on the bytes and calculating the number of 1s when we represent the number in binary. While I could have done this by the old-fashioned modulus and division by 2 method, I wanted something clever. With some google-fu I came across a pretty cool solution here. It is based on the fact that n & (n-1) always eliminated the least significant 1. You can see it in the code below:

func hammingDistance(ip1 []byte, ip2 []byte) (int, error) {
	hDistance := 0
	if len(ip1) != len(ip2) {
		return hDistance, fmt.Errorf("Length of strings must be equal. In this case, len1 = %d, len2=%d", len(ip1), len(ip2))
	}

	numberOf1sinBinary := func(num int) int {
		count := 0
		for tmp := num; tmp != 0; {
			tmp = tmp & (tmp - 1)
			count++
		}
		return count
	}

	for i := range len(ip1) {
		xor := ip1[i] ^ ip2[i]
		count := numberOf1sinBinary(int(xor))
		hDistance += count
	}
	return hDistance, nil
}

The next step is to find the keysize. This is a bruteforce method. We try keysize, k (1 < k < 41) and try to find the normalised hamming distance for each keysize. We can do this by the following ways:

  1. Two the first 2 blocks of length k and find the hamming distance between them.
  2. Take the first 4 blocks of length k and find the average hamming distance between them.

I thought the more comparisons we have the better, so I split the input into n blocks of keysize k and compared all combinations of block n and found the hamming distances between them and found their averages.

All hamming distances are divided by the keysize k to normalise them. Here is the code for that.

func calculateNormEditDistance(ciphertext []byte, keySize float64) float64 {
	numOfBlocks := len(ciphertext) / int(keySize)
	var nBlocks = make([][]byte, numOfBlocks)
	var hDistances []float64
	normalisedHammingDistance := 0.0

	for i := range numOfBlocks {
		if (i*int(keySize))+int(keySize) > len(ciphertext) {
			tmp := ciphertext[i*int(keySize):]
			for j := 0; j < i*int(keySize)+int(keySize)-len(ciphertext); j++ {
				tmp = append(tmp, 0)
				nBlocks[i] = tmp
			}
		} else {
			nBlocks[i] = ciphertext[i*int(keySize) : (i*int(keySize))+int(keySize)]
		}
	}

	for i := range numOfBlocks {
		for j := range numOfBlocks {
			hDist := hammingDistance(nBlocks[i], nBlocks[j])
			hDistf := float64(hDist) / keySize
			hDistances = append(hDistances, hDistf)
		}
	}

	for _, i := range hDistances {
		normalisedHammingDistance += i
	}
	normalisedHammingDistance = normalisedHammingDistance / float64(len(hDistances))

	return normalisedHammingDistance
}

Once we found this score for everything, we get the minimum value out of all the keys and that should be key length. But as the problem said, we will be trying the 4 least values.

Once we find the possible keysize k, we split the ciphertext into blocks of length k and transpose all same indexed elements to a single block i.e., all elements in array index 0 of all blocks will now make up a new block and so on.

Once we have the transposed block, we just solve each block as a single key xor from the previous challenge and find an index of the key. Now we put this all together, we have the key.

While this theoretically sounds good, I implemented all the steps and did not get the answer. After hours of frustation, I decided that the only way to solve this would be to look up solutions. I did that and found this, this and this. I found that the scoring scheme that I used to rate a decryption of a single key XOR might be the problem, I changed this to the common algorithm that was used in all the above links.

func legitAnswerProbability(inp string) float64 {
	var characterFrequency = map[string]float64{
		"a": 0.0651738, "b": 0.0124248, "c": 0.0217339, "d": 0.0349835, "e": 0.1041442, "f": 0.0197881, "g": 0.0158610,
		"h": 0.0492888, "i": 0.0558094, "j": 0.0009033, "k": 0.0050529, "l": 0.0331490, "m": 0.0202124, "n": 0.0564513,
		"o": 0.0596302, "p": 0.0137645, "q": 0.0008606, "r": 0.0497563, "s": 0.0515760, "t": 0.0729357, "u": 0.0225134,
		"v": 0.0082903, "w": 0.0171272, "x": 0.0013692, "y": 0.0145984, "z": 0.0007836, " ": 0.1918182}

	score := 0.0
	for _, c := range inp {
		s := strings.ToLower(string(c))
		score += characterFrequency[s]
	}
	return score
}

This also did not give out any good results. That’s when I stumbled upon this. In this blog post, I stumbled upon the following line:

We use the .atob() function (MDN docs) to decode the base64 string,

and that’s when it hit me. The input was Base64 encoded. After slapping myself, just decoding the Base64 before processing the input gave me the answer. Below is the final code:

// Break repeating key XOR
package main

import (
	"encoding/base64"
	"encoding/hex"
	"fmt"
	"math"
	"sort"
	"strings"
)

var input64 = `HUIfTQsPAh9PE048GmllH0kcDk4TAQsHThsBFkU2AB4BSWQgVB0dQzNTTmVS
.
.
Jk8DCkkcC3hFMQIEC0EbAVIqCFZBO1IdBgZUVA4QTgUWSR4QJwwRTWM=
`

func decodeBase64(ip string) []byte {
	op := make([]byte, base64.StdEncoding.DecodedLen(len(ip)))
	base64.StdEncoding.Decode(op, []byte(ip))
	return op
}

func hammingDistance(ip1 []byte, ip2 []byte) int {
	hDistance := 0
	if len(ip1) != len(ip2) {
		lenDiff := math.Abs(float64((len(ip1) - len(ip2))))
		if len(ip1) > len(ip2) {
			for i := 0; i < int(lenDiff); i++ {
				ip2 = append(ip2, 0)
			}
		} else {
			for i := 0; i < int(lenDiff); i++ {
				ip1 = append(ip1, 0)
			}
		}
	}

	numberOf1sinBinary := func(num int) int {
		count := 0
		for tmp := num; tmp != 0; {
			tmp = tmp & (tmp - 1)
			count++
		}
		return count
	}

	for i := range len(ip1) {
		xor := ip1[i] ^ ip2[i]
		count := numberOf1sinBinary(int(xor))
		hDistance += count
	}
	return hDistance
}

func getMinKeys(m map[int]float64, n int) []int {
	// This is to make it work with the sort.Slice function
	type kv struct {
		Key   int
		Value float64
	}
	var pairs []kv

	// Populate the slice with key-value pairs from the map
	for k, v := range m {
		pairs = append(pairs, kv{Key: k, Value: v})
	}

	sort.Slice(pairs, func(i, j int) bool {
		if pairs[i].Value == pairs[j].Value {
			return pairs[i].Key < pairs[j].Key // Break ties by key
		}
		return pairs[i].Value < pairs[j].Value // Sort by value
	})

	// Extract the first n keys
	result := []int{}
	for i := 0; i < len(pairs) && i < n; i++ {
		result = append(result, pairs[i].Key)
	}

	return result
}

func calculateNormEditDistance(ciphertext []byte, keySize float64) float64 {
	numOfBlocks := len(ciphertext) / int(keySize)
	var nBlocks = make([][]byte, numOfBlocks)
	var hDistances []float64
	normalisedHammingDistance := 0.0

	for i := range numOfBlocks {
		if (i*int(keySize))+int(keySize) > len(ciphertext) {
			tmp := ciphertext[i*int(keySize):]
			for j := 0; j < i*int(keySize)+int(keySize)-len(ciphertext); j++ {
				tmp = append(tmp, 0)
				nBlocks[i] = tmp
			}
		} else {
			nBlocks[i] = ciphertext[i*int(keySize) : (i*int(keySize))+int(keySize)]
		}
	}

	for i := range numOfBlocks {
		for j := range numOfBlocks {
			hDist := hammingDistance(nBlocks[i], nBlocks[j])
			hDistf := float64(hDist) / keySize
			hDistances = append(hDistances, hDistf)
		}
	}

	for _, i := range hDistances {
		normalisedHammingDistance += i
	}
	normalisedHammingDistance = normalisedHammingDistance / float64(len(hDistances))

	return normalisedHammingDistance
}

func transposeBlock(ciphertext []byte, keySize int) [][]byte {
	transposedBlocks := make([][]byte, keySize)

	for i, b := range ciphertext {
		transposedBlocks[i%keySize] = append(transposedBlocks[i%keySize], b)
	}
	return transposedBlocks
}

func xor(op1 []byte, op2 []byte) []byte {
	op := make([]byte, len(op1))
	for i := 0; i < len(op1); i++ {
		op[i] = op1[i] ^ op2[i]
	}
	return op
}

func legitAnswerProbability(inp string) float64 {
	var characterFrequency = map[string]float64{
		"a": 0.0651738, "b": 0.0124248, "c": 0.0217339, "d": 0.0349835, "e": 0.1041442, "f": 0.0197881, "g": 0.0158610,
		"h": 0.0492888, "i": 0.0558094, "j": 0.0009033, "k": 0.0050529, "l": 0.0331490, "m": 0.0202124, "n": 0.0564513,
		"o": 0.0596302, "p": 0.0137645, "q": 0.0008606, "r": 0.0497563, "s": 0.0515760, "t": 0.0729357, "u": 0.0225134,
		"v": 0.0082903, "w": 0.0171272, "x": 0.0013692, "y": 0.0145984, "z": 0.0007836, " ": 0.1918182}

	score := 0.0
	for _, c := range inp {
		s := strings.ToLower(string(c))
		score += characterFrequency[s]
	}
	return score
}

func decryptSingleKeyXor(inp []byte) byte {
	hex2 := make([]byte, len(inp))
	highest_ratio := 0.0
	// var legitAnswer, key string
	var key byte
	for i := 0; i <= 255; i++ {
		for j := 0; j < len(inp); j++ {
			hex2[j] = byte(i)
		}
		res := xor(inp, hex2)
		stringRatio := legitAnswerProbability(string(res))

		if highest_ratio <= stringRatio {
			highest_ratio = stringRatio
			// legitAnswer = string(res)
			key = byte(i)
		}
	}
	// fmt.Print(legitAnswer, "\n")
	return key
}

func repeatingKeyXor(plaintext []byte, key []byte) []byte {
	op := make([]byte, len(plaintext))
	for i := 0; i < len(plaintext); i++ {
		op[i] = plaintext[i] ^ key[i%len(key)]
	}
	return op
}

func main() {
	bruteforcedResults := map[int]float64{}
	input := decodeBase64(input64)
	for keysize := 2; keysize <= 40; keysize++ {
		nDist := calculateNormEditDistance(input, float64(keysize))
		fmt.Print(keysize, ":", nDist, "\n")
		bruteforcedResults[keysize] = nDist
	}
	candidateKeys := getMinKeys(bruteforcedResults, 4)

	for _, keySize := range candidateKeys {
		transposedBlocks := transposeBlock(input, keySize)
		var key []byte
		for _, block := range transposedBlocks {
			k := decryptSingleKeyXor(block)
			key = append(key, k)
		}
		fmt.Print("Key=", string(key), "\n")
		fmt.Print(string(repeatingKeyXor(input, key)), "\n")
	}
}

Challenge 7

Problem

There is an AES128-ECB encrypted text. We have to decrypt it. The key is given to us. Seriously, see for yourself.

Solutioning

This is simple. We use Golang’s built-in AES128 cipher and decrypt it.

// Decrypt AES-128-ECB

package main

import (
	"crypto/aes"
	"encoding/base64"
	"fmt"
	"os"
)

var ciphertextInBase64 = `CRIwqt4+szDbqkNY+I0qbDe3LQz0wiw0SuxBQtAM5TDdMbjCMD/venUDW9BL
.
.
S15AVD2QS1V6fhRimJSVyT6QuGb8tKRsl2N+a2Xze36vgMhw7XK7zh//jC2H
`

func decodeBase64(ip string) []byte {
	op := make([]byte, base64.StdEncoding.DecodedLen(len(ip)))
	base64.StdEncoding.Decode(op, []byte(ip))
	return op
}

func main() {
	key := "YELLOW SUBMARINE"
	ciphertext := decodeBase64(ciphertextInBase64)
	var plaintext = make([]byte, len(ciphertext))

	aes256, err := aes.NewCipher([]byte(key))
	if err != nil {
		fmt.Print(err)
		os.Exit(1)
	}
	for i := 0; i < len(ciphertext); i += 16 {
		aes256.Decrypt(plaintext[i:i+16], ciphertext[i:i+16])
	}

	fmt.Print(string(plaintext))
}

Challenge 8

Problem

The problem, as stated here, gives a bunch of hex encoded ciphertexts which have been encrypted by AES-128-ECB. We have to find out which is the encrypted text.

Solutioning

The idea is that AES ECB mode is deterministic. So, the same input gives the same output. So, we iterate the on all the blocks and see which ciphertext has blocks repeated.

// Detect AES-128-ECB

package main

import (
	"encoding/hex"
	"fmt"
	"strings"
)

var ciphertextInHex = `8a10247f90d0a05538888ad6205882196f5f6d05c21ec8dca0cb0be02c3f8b09e382963f443aa514daa501257b09a36bf8c4c392d8ca1bf4395f0d5f2542148c7e5ff22237969874bf66cb85357ef99956accf13ba1af36ca7a91a50533c4d89b7353f908c5a166774293b0bf6247391df69c87dacc4125a99ec417221b58170e633381e3847c6b1c28dda2913c011e13fc4406f8fe73bbf78e803e1d995ce4d
.
.
06df04188832b10afff94209d2aa1c8a123702de28234dcd3e0a7d36c1aa8449e6fa55e3e1e3d77d8424e87a45e38697755f84c49a99473797268113eb69098888947526035b246d00a630f6201ecc4075d8aa6604de73e2119e264e4c96751f2a67a2e46cf467a0df8f0520bcf4762b2715aba266d9b3f5e8fa67d12f9caac928b07ac3be99f41120655aa77f6433fc264673a92929e792187f87b5fda50cf2
`

func decodeHex(ip string) []byte {
	op := make([]byte, hex.DecodedLen(len(ip)))
	_, err := hex.Decode(op, []byte(ip))
	if err != nil {
		fmt.Print(err)
	}
	return op
}

func detectAes(ciphertext []byte) bool {
	blockMap := make(map[string]int)

	for i := 0; i < len(ciphertext); i += 16 {
		block := string(ciphertext[i : i+16])
		blockMap[block]++
	}

	for _, v := range blockMap {
		if v > 1 {
			return true
		}

	}
	return false
}

func main() {
	lines := strings.Split(ciphertextInHex, "\n")
	for _, i := range lines {
		ciphertext := decodeHex(i)
		isEncrypted := detectAes(ciphertext)
		if isEncrypted {
			fmt.Print(i, "\n")
		}
	}
}

Conclusion

Identify and validate all your assumptions. While this is an obvious statement, it has been something that I have personally had the most trouble with for a long time now. One thing that helped me was writing down everything about the problem and working through it step by step.

Tags