Hash 系列算法

哈希函数（Hash Function，意译为 “散列函数”）是一种从任何一种数据中创建小的数字 “指纹” 的方法。散列函数把消息或数据计算成摘要 (Digest)，使得数据量变小，将数据的格式固定下来。该函数将数据打乱混合，重新创建一个叫做哈希值（Hash value，又叫散列值）的指纹。

为什么叫 “Hash”？

麦当劳售卖薯饼 (Hash Brown) 早餐，即将一整个马铃薯削成丝状，再将丝状马铃薯重新油炸成一整个薯饼。由于本质上是将原先的一个整体打散再重新组合且过程不可逆，正好符合该类算法的设计思路，于是称该类算法为哈希 (Hash) 算法。

由于哈希算法所计算出来的哈希值具有不可逆的性质，因此可有效的保护密码。

哈希算法如何保护密码？

用户注册时，密码先在前端进行哈希，哈希值传回后端数据库存储。用户登录时，输入的密码在前端进行哈希，然后传回后端进行比较。

哈希算法的基本性质

出于密码学目的，哈希函数通常需要具备以下特性：

确定性输出：给定输入 A（如 “I love cats”），每次哈希 A 都会得到相同的输出。
扩散性：输入的微小变化会导致输出的巨大变化。例如，“I love cats” 和 “I love kats” 的哈希值完全不同，无法相互识别。
不可预测性：哈希结果应该是完全不可预测的；在生成的哈希值中不应存在可识别的模式。
不可逆性：无法通过给定的哈希值重构有效输入，因此唯一验证输入是否对应哈希值的方法是穷举法（暴力破解！）。
抗碰撞性：找到产生相同哈希值（或部分匹配哈希值）的两个输入应该非常困难。

Note

如果两个散列值相同，两个输入值并非必定是相同的。当两个散列值相同，但两个输入值不同时，这种情况就称为散列碰撞 (Collision)，这通常是两个不同长度的输入值，刻意计算出相同的输出值。
不抗碰撞的哈希函数会完全破坏算法。
理论上，一个良好的哈希函数应该是不可碰撞的，但是 MD5、SHA-1 等早期哈希算法已经被证明可碰撞，因此这些算法已经不再推荐用于加密，而是用于验证文件完整性等非加密用途。

大多数哈希算法的输出具有固定长度。由于所有输入本质上都是信息比特流，我们实际上是将任意长度的比特序列转换为看似随机的固定长度比特序列。

有时，为了增强哈希算法的抗碰撞性，我们会给明文加上 “盐值”🧂。盐值会设置成一个随机的字符串，然后和明文进行拼接。由于加盐后的明文非常长，因此几乎不可能被碰撞。下面的 CTF 示例就有加盐的 MD5。

下面介绍几种常见的哈希算法。

MD5

MD5 信息摘要算法 (MD5 Message-Digest Algorithm) 是 MD 系列算法的最常见版本。算法会生成一个 128 位的哈希值（长度为 32 的十六进制值）

用伪代码表示为：

// : All variables are unsigned 32 bit and wrap modulo 2^32 when calculating
var int s[64], K[64]
var int i

// s specifies the per-round shift amounts
s[ 0..15] := { 7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22 }
s[16..31] := { 5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20 }
s[32..47] := { 4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23 }
s[48..63] := { 6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21 }

// Use binary integer part of the sines of integers (Radians) as constants:
for i from 0 to 63 do
    K[i] := floor(232 × abs(sin(i + 1)))
end for
// (Or just use the following precomputed table):
K[ 0.. 3] := { 0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee }
K[ 4.. 7] := { 0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501 }
K[ 8..11] := { 0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be }
K[12..15] := { 0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821 }
K[16..19] := { 0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa }
K[20..23] := { 0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8 }
K[24..27] := { 0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed }
K[28..31] := { 0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a }
K[32..35] := { 0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c }
K[36..39] := { 0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70 }
K[40..43] := { 0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x04881d05 }
K[44..47] := { 0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665 }
K[48..51] := { 0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039 }
K[52..55] := { 0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1 }
K[56..59] := { 0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1 }
K[60..63] := { 0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391 }

// Initialize variables:
var int a0 := 0x67452301   // A
var int b0 := 0xefcdab89   // B
var int c0 := 0x98badcfe   // C
var int d0 := 0x10325476   // D

// Pre-processing: adding a single 1 bit
append "1" bit to message<
 // Notice: the input bytes are considered as bit strings,
 //  where the first bit is the most significant bit of the byte.[52]

// Pre-processing: padding with zeros
append "0" bit until message length in bits ≡ 448 (mod 512)

// Notice: the two padding steps above are implemented in a simpler way
  //  in implementations that only work with complete bytes: append 0x80
  //  and pad with 0x00 bytes so that the message length in bytes ≡ 56 (mod 64).

append original length in bits mod 264 to message

// Process the message in successive 512-bit chunks:
for each 512-bit chunk of padded message do
    break chunk into sixteen 32-bit words M[j], 0 ≤ j ≤ 15
    // Initialize hash value for this chunk:
    var int A := a0
    var int B := b0
    var int C := c0
    var int D := d0
    // Main loop:
    for i from 0 to 63 do
        var int F, g
        if 0 ≤ i ≤ 15 then
            F := (B and C) or ((not B) and D)
            g := i
        else if 16 ≤ i ≤ 31 then
            F := (D and B) or ((not D) and C)
            g := (5×i + 1) mod 16
        else if 32 ≤ i ≤ 47 then
            F := B xor C xor D
            g := (3×i + 5) mod 16
        else if 48 ≤ i ≤ 63 then
            F := C xor (B or (not D))
            g := (7×i) mod 16
        // Be wary of the below definitions of a,b,c,d
        F := F + A + K[i] + M[g]  // M[g] must be a 32-bit block
        A := D
        D := C
        C := B
        B := B + leftrotate(F, s[i])
    end for
    // Add this chunk's hash to result so far:
    a0 := a0 + A
    b0 := b0 + B
    c0 := c0 + C
    d0 := d0 + D
end for

var char digest[16] := a0 append b0 append c0 append d0 // (Output is in little-endian)

Tip

不需要手写 MD5 算法过程。主流的加密库早已支持 MD5 加密。

MD5 算法于 1993 年发现 “伪碰撞”，1996 年发现部分碰撞。2004 年，中国的王小云教授宣布发现了 MD5 的完整碰撞，该算法被证明不具有抗碰撞性。

目前破解 MD5 算法的方法就是查字典，比如 CMD5。

虽然 MD5 不再建议用于加密用途，但由于它比 SHA 系列更方便计算，因此仍有很多人将 MD5 用于非加密用途。

SHA-1

Coming s∞n

破解方法

由于 Hash 几乎无法逆向，因此我们一般使用穷举（爆破）方法解决 Hash 函数。

首先，一些比较简单的 Hash 可以从已有数据库中查询到原文。这是最简单且快速的方法了，如果一个查不到可以试试其他网站，本地也有像 Hashcat 这样的工具。

但对于 CTF 而言，出题者多半不会让你如此无脑得逞。原文可能较长，或者含有特殊字符，基本没有已有工具会破解到这些字符串，这时就需要自己写爆破脚本了。

如果自己写破解脚本，需要通过逆向知道原文的构成规律 —— 个人计算机大概率是无法承受完全穷举的工作量的，所以我们需要有目的、有限制地爆破 Hash。下面举几个例子：

L3HCTF 2025 - TemporalParadox

Tip

flag 形式为 L3HCTF{sha1(query string)}

本题需要得到 “正确的查询字符串 (query string)”

题目是有一个简单的花指令的，去除一下，得到：

main

INT64 main(undefined8 param_1,undefined8 param_2,allocator *param_3,undefined8 param_4)

{
  INT64 uVar2;
  char *pcVar1;
  int *piVar2;
  ostream *poVar3;
  INT64 uVar1;
  string *unaff_RBP;

  FUN_140001d1e();
  FUN_14000a510((undefined4 *)(unaff_RBP + 0x20));
  uVar2 = FUN_140001493((__time64_t *)0x0);
  *(INT64 *)(unaff_RBP + 200) = uVar2;
  if ((0x686d4080 < *(longlong *)(unaff_RBP + 200)) && (*(longlong *)(unaff_RBP + 200) < 1752052052)
     ) {
    FUN_140001963(unaff_RBP + -0x60);
    *(string **)(unaff_RBP + 0xc0) = unaff_RBP + 0xb6;
    pcVar1 = (char *)std::__cxx11::string::c_str();
    piVar2 = md5Encode((int *)(unaff_RBP + 0x20),pcVar1,param_3,param_4);
    param_3 = (allocator *)(unaff_RBP + 0xb6);
    FUN_14000a820((_Alloc_hider *)(unaff_RBP + -0x40),(char *)piVar2,param_3);
    FUN_14000a6e0();
    poVar3 = std::operator<<((ostream *)PTR_PTR_cout_14000c9e0,"query: ");
    poVar3 = std::operator<<(poVar3,unaff_RBP + -0x60);
    std::ostream::operator<<(poVar3,(_func_ostream_ptr_ostream_ptr *)&DAT_140002018);
    poVar3 = std::operator<<((ostream *)PTR_PTR_cout_14000c9e0,unaff_RBP + -0x40);
    std::ostream::operator<<(poVar3,(_func_ostream_ptr_ostream_ptr *)&DAT_140002018);
    std::__cxx11::string::~string(unaff_RBP + -0x40);
    std::__cxx11::string::~string(unaff_RBP + -0x60);
  }
  std::__cxx11::string::string(unaff_RBP);
  poVar3 = std::operator<<((ostream *)PTR_PTR_cout_14000c9e0,
                           "Please input the right query string I used:");
  std::ostream::operator<<(poVar3,(_func_ostream_ptr_ostream_ptr *)&DAT_140002018);
  std::operator>>((istream *)PTR_PTR_cin_14000c9d0,unaff_RBP);
  *(string **)(unaff_RBP + 0xb8) = unaff_RBP + 0xb7;
  pcVar1 = (char *)std::__cxx11::string::c_str();
  piVar2 = md5Encode((int *)(unaff_RBP + 0x20),pcVar1,param_3,param_4);
  FUN_14000a820((_Alloc_hider *)(unaff_RBP + -0x20),(char *)piVar2,(allocator *)(unaff_RBP + 0xb7));
  FUN_14000a6e0();
  uVar1 = md5Validate(unaff_RBP + -0x20,"8a2fc1e9e2830c37f8a7f51572a640aa");
  if ((char)uVar1 == 0) {
    poVar3 = std::operator<<((ostream *)PTR_PTR_cout_14000c9e0,"Wrong!");
    std::ostream::operator<<(poVar3,(_func_ostream_ptr_ostream_ptr *)&DAT_140002018);
  }
  else {
    poVar3 = std::operator<<((ostream *)PTR_PTR_cout_14000c9e0,"Congratulations!");
    std::ostream::operator<<(poVar3,(_func_ostream_ptr_ostream_ptr *)&DAT_140002018);
  }
  std::__cxx11::string::~string(unaff_RBP + -0x20);
  std::__cxx11::string::~string(unaff_RBP);
  return 0;
}

（顺便提一下，这是我用 Ghidra 反编译的，IDA 的反编译看得我眼花缭乱 X - X）

这个字符串的 MD5 是 8a2fc1e9e2830c37f8a7f51572a640aa，这个 MD5 应该是没有已知数据库有记录的，因此需要我们自己爆破。

这个比赛在 2025/7/12 进行。当时我们发现直接运行程序，不会输出 query: ...，这个 (0x686d4080 < *(longlong *)(unaff_RBP + 200)) && (*(longlong *)(unaff_RBP + 200) < 1752052052) 条件在控制。

向上查看发现 unaff_RBP + 200 实际上是从 FUN_140001493 中的 _time64 得到的 UNIX 时间戳。转换发现条件是本机 UTC 时间为 2025/7/9 00:00:00 - 2025/7/9 09:07:32 时，执行函数体并输出该时刻的查询字符串。此时断网并调整本机时间。

现在我们想知道 query 是如何生成的，阅读后知道输出的 unaff_RBP + -0x60 来自 FUN_140001963

FUN_140001963

undefined8 FUN_140001963(undefined8 param_1)

{
  uint uVar1;
  ostream *poVar2;
  double dVar3;
  double dVar4;
  double dVar5;
  double dVar6;
  stringstream local_218 [16];
  ostream aoStack_208 [384];
  string salt [44];
  uint r;
  longlong t;
  int local_4c;
  uint y;
  uint x;
  uint b;
  uint a;

  FUN_140001518(salt);
  t = FUN_140001493((__time64_t *)0x0);
  FUN_1400014b5((int)t);
  a = 0;
  b = 0;
  x = 0;
  y = 0;
  local_4c = 0;
  while (uVar1 = FUN_1400014d5(), local_4c < (int)uVar1) {
    a = FUN_1400014d5();
    b = FUN_1400014d5();
    x = FUN_1400014d5();
    y = FUN_1400014d5();
    local_4c = local_4c + 1;
  }
  r = FUN_1400014d5();
  std::__cxx11::stringstream::stringstream(local_218);
                    /* 00000061h */
  dVar5 = (double)DAT_14000b0e0;
  dVar3 = (double)FUN_1400033a0((double)(int)(x | a),DAT_14000c1d0);
                    /* 0000000Bh */
  dVar6 = (double)DAT_14000b0e4;
  dVar4 = (double)FUN_1400033a0((double)(int)(y | b),DAT_14000c1d0);
  if (dVar5 * dVar3 == dVar4 * dVar6) {
    poVar2 = std::operator<<(aoStack_208,"salt=");
    poVar2 = std::operator<<(poVar2,salt);
    poVar2 = std::operator<<(poVar2,"&t=");
    poVar2 = (ostream *)std::ostream::operator<<(poVar2,t);
    poVar2 = std::operator<<(poVar2,"&r=");
    poVar2 = (ostream *)std::ostream::operator<<(poVar2,r);
    poVar2 = std::operator<<(poVar2,"&cipher=");
    uVar1 = FUN_14000184d(r,(uint)t);
    std::ostream::operator<<(poVar2,uVar1 & 0xffff);
  }
  else {
    poVar2 = std::operator<<(aoStack_208,"salt=");
    poVar2 = std::operator<<(poVar2,salt);
    poVar2 = std::operator<<(poVar2,"&t=");
    poVar2 = (ostream *)std::ostream::operator<<(poVar2,t);
    poVar2 = std::operator<<(poVar2,"&r=");
    poVar2 = (ostream *)std::ostream::operator<<(poVar2,r);
    poVar2 = std::operator<<(poVar2,"&a=");
    poVar2 = (ostream *)std::ostream::operator<<(poVar2,a);
    poVar2 = std::operator<<(poVar2,"&b=");
    poVar2 = (ostream *)std::ostream::operator<<(poVar2,b);
    poVar2 = std::operator<<(poVar2,"&x=");
    poVar2 = (ostream *)std::ostream::operator<<(poVar2,x);
    poVar2 = std::operator<<(poVar2,"&y=");
    std::ostream::operator<<(poVar2,y);
  }
  std::__cxx11::stringstream::str();
  std::__cxx11::stringstream::~stringstream(local_218);
  std::__cxx11::string::~string(salt);
  return param_1;
}

先运行程序看看输出

1
2
3

query: salt=tlkyeueq7fej8vtzitt26yl24kswrgm5&t=1751994230&r=938796188&a=1227348758&b=2062353588&x=353278121&y=90222372

query: salt=tlkyeueq7fej8vtzitt26yl24kswrgm5&t=1751994263&r=1788731889&a=1248035063&b=1976747241&x=78756750&y=1271164846

多次运行后，发现 salt 值不变，于是我们需要生成 r、a、b、x 和 y。

这五个值均由 FUN_1400014d5 生成：

FUN_1400014d5

uint FUN_1400014d5(void)

{
  uint uVar1;

  uVar1 = time_you_get ^ time_you_get << 13;
  uVar1 = uVar1 ^ uVar1 >> 17;
  time_you_get = uVar1 ^ uVar1 << 5;
  return time_you_get & 0x7fffffff;
}

生成后按格式拼接，计算 MD5 值并比较。完整脚本：

#include <iostream>
#include <string>
#include <cstdio>
#include <openssl/md5.h> // 需要安装 OpenSSL 库

using namespace std;

unsigned int encodeYourTime(int *time)
{
    unsigned int temp;
    temp = (*time) ^ (*time) << 13;
    temp = temp ^ temp >> 17;
    (*time) = temp ^ temp << 5;
    return (*time) & 0x7fffffff;
}

string calculateMD5(const string &input)
{
    unsigned char digest[MD5_DIGEST_LENGTH];
    MD5((unsigned char *)input.c_str(), input.length(), digest);

    char mdString[33];
    for (int i = 0; i < 16; i++)
        sprintf(&mdString[i * 2], "%02x", (unsigned int)digest[i]);

    return string(mdString);
}

int main()
{
    for (int i = 1751990400; i <= 1752052051; i++)
    {
        unsigned int r = 0;
        unsigned int a = 0;
        unsigned int b = 0;
        unsigned int x = 0;
        unsigned int y = 0;

        int time = (int)i;

        // 生成随机参数
        for (int j = 0; j < (int)encodeYourTime(&time); j++)
        {
            a = encodeYourTime(&time);
            b = encodeYourTime(&time);
            x = encodeYourTime(&time);
            y = encodeYourTime(&time);
        }
        r = encodeYourTime(&time);

        // 构建数据字符串
        char buffer[256];
        snprintf(buffer, sizeof(buffer),
                 "salt=tlkyeueq7fej8vtzitt26yl24kswrgm5&t=%u&r=%u&a=%u&b=%u&x=%u&y=%u",
                 i, r, a, b, x, y);
        string data = buffer;

        // 计算MD5
        string md5Hash = calculateMD5(data);

        // 输出结果
        cout << "Generated data: " << data << "\n";
        cout << "MD5 Hash: " << md5Hash << "\n";
        if(md5Hash == "8a2fc1e9e2830c37f8a7f51572a640aa")
        {
            cout << "Find match MD5!" << "\n";
            break;
        }
    }
    return 0;
}

运行得到：

1
2
3

Generated data: salt=tlkyeueq7fej8vtzitt26yl24kswrgm5&t=1751994277&r=101356418&a=1388848462&b=441975230&x=1469980073&y=290308156
MD5 Hash: 8a2fc1e9e2830c37f8a7f51572a640aa
Find match MD5!

再计算 Generated data 的 SHA1 值即得到 flag。