Fuzzamoto Intermediate Representation

At a high level, generic p2p testcases represent a sequence of actions performed against one or more target nodes:

  • Changing system time
  • Establishing a new p2p connection
  • Sending protocol messages on an established connection

Protocol messages in particular are highly structured, as they are serialized using a custom format, contain various cryptographic primitives (hash commitments, signatures, checksums, ...) and must fullfil various other structural requirements to be considered valid, such as:

  • Block headers must point to a prior block via its hash
  • Transaction inputs must point to existing unspent transactions outputs via transaction identifiers
  • blocktxn messages are only processed if requested (after a prior cmpctblock message)
  • ...

Therefore, naively fuzzing a scenario with a byte-array fuzzer, using the following input format (with e.g. Arbitrary) will mostly result in fuzzing the message (de)serialization code and other easy to reach protocol flows.


#![allow(unused)]
fn main() {
pub enum Action {
    SetTime { ... },
    Connect { ... },
    SendMessage { ... },
}

pub struct TestCase {
    pub actions: Vec<Action>,
}
}

If we want to focus on fuzzing deeper logic instead, then we'll need to make input generation/mutation aware of the structural requirements. This is were an intermediate representation, that holds relevant type and structural information, becomes useful.

Fuzzamoto IR describes small programs that can be compiled into the simple testcase format from above (TestCase). For the purpose of mutation/generation, the fuzzer (fuzzamoto-libafl) operates on testcases encoded as the IR (as it contains relevant type and structural information) and only compiles it to the simple format for target execution.

┌───────────────────────────────────────Fuzzer───────────────────────────────────────┐
│                                                                                    │
│ ┌─────────────┐                  ┌────────┐                   ┌──────────────────┐ │
│ │ Corpus (IR) ├──────Select─────►│ Mutate ├──────Compile──────► Target Execution │ │
│ └─────────────┘                  └────────┘                   └──────────────────┘ │
│                                                                                    │
└────────────────────────────────────────────────────────────────────────────────────┘

Design

Fuzzamoto IR consists of a sequence of operations that take some input variables and produce variables as output. All variables are typed (see variable.rs) and operations expect variables of compatible type. The IR uses static single assignement form (SSA), which means every variable in the IR is assigned exactly once. SSA helps simplify define-use analysis, type inference and code generation/mutation among other things.

Each IR program is associated with a context that represents the snapshot state of the test VM:

  • Mock time
  • Number of nodes
  • Number of existing connections made by the scenario
  • Available transaction outputs (Used for LoadTxo instructions)
  • Available block headers (Used for LoadHeader instructions)

Programs might not be valid/useful in a different context. E.g. a program that was generated within the context of 10 nodes and 200 connections might not be valid in a context with 1 nodes and 8 connections, as it might refer to non-existent nodes or connections.

In the following simple example, the IR describes the creation of a tx message from raw bytes that is then send to a node via one of the existing connections:

// Context: nodes=1 connections=8 timestamp=1296688802
v0 <- LoadConnection(5)
v1 <- LoadMsgType("tx")
v2 <- LoadBytes("fefe8520fefefe0000fffffe8520")
SendRawMessage(v0, v1, v2)

Note that this is a human readable representation and not the internal in-memory structure used by the fuzzer.

This example would compile into a single TestCase::SendMessage operation.

The next example is more complex and better demonstrates the strengths of the IR:

// Context: nodes=1 connections=8 timestamp=1296688802
v0 <- LoadBytes("5656565656565656567a7a7a7a7a7a7a7a7a7aa9ffff5656567a506464649b64596464f16463646464")
v1 <- LoadTxo(083666c9bf066f9d3a28ad30f5c0ed6fe463f7777e033783875b2523ef5214bb:0, 2500000000, 00204ae81572f06e1b88fd5ced7a1a000945432e83e1551e6f721ee9c00b8cc33260, , 51)
v2 <- LoadConnection(4)
v3 <- LoadTxVersion(2)
v4 <- LoadLockTime(144)
BeginBuildTx(v3, v4) -> v5
  BeginBuildTxInputs -> v6
    v7 <- LoadSequence(4294967295)
    AddTxInput(v6, v1, v7)
  v8 <- EndBuildTxInputs(v6)
  BeginBuildTxOutputs(v8) -> v9
    BeginWitnessStack -> v10
    v11 <- EndWitnessStack(v10)
    v12 <- BuildPayToWitnessScriptHash(v0, v11)
    v13 <- LoadAmount(100000000)
    AddTxOutput(v9, v12, v13)
  v14 <- EndBuildTxOutputs(v9)
v15 <- EndBuildTx(v5, v8, v14)
v16 <- TakeTxo(v15)
v17 <- LoadLockTime(508195987)
v18 <- LoadTxVersion(2)
BeginBuildTx(v18, v17) -> v19
  BeginBuildTxInputs -> v20
    v21 <- LoadSequence(4294967294)
    AddTxInput(v20, v16, v21)
  v22 <- EndBuildTxInputs(v20)
  BeginBuildTxOutputs(v22) -> v23
    v24 <- LoadBytes("51")
    BeginWitnessStack -> v25
    v26 <- EndWitnessStack(v25)
    v27 <- BuildPayToWitnessScriptHash(v24, v26)
    v28 <- LoadAmount(98500000)
    AddTxOutput(v23, v27, v28)
  v29 <- EndBuildTxOutputs(v23)
v30 <- EndBuildTx(v19, v22, v29)
SendTx(v2, v30)
SendTx(v2, v15)

Two transactions are build (v15, v30), that are then send to the node under test through connection v2. v15 spends from an output in the snapshot state v1 (loaded by LoadTxo). The relationship between the two transactions (i.e. v30 is the parent of v15) is encoded in the IR through the use of variables. v16 (output of TakeTxo) represents the output created by v15 and is added to v30 via the AddTxInput instruction in the BuildTxInputs block. v30 is sent before v15 potentially triggering 1P1C logic.

This example would compile into two TestCase::SendMessage operations, containing the correctly serialized transactions v15 and v30.

Table of Operations

NameDescription
Load* operationsLoad constant values from the test context.
LoadBytesLoads a raw byte array.
LoadMsgTypeLoads a message type for SendRawMessage.
LoadNodeLoads an index for one of the test nodes.
LoadConnectionLoads an index for one of the p2p connections.
LoadConnectionTypeLoads a connection type string.
LoadDurationLoads a time duration.
LoadTimeLoads a timestamp.
LoadAmountLoads a bitcoin amount.
LoadSizeLoads a size in bytes.
LoadTxVersionLoads a transaction version.
LoadBlockVersionLoads a block version.
LoadLockTimeLoads a transaction lock time.
LoadSequenceLoads a transaction input sequence number.
LoadBlockHeightLoads a block height.
LoadCompactFilterTypeLoads a compact filter type.
LoadPrivateKeyLoads a private key.
LoadSigHashFlagsLoads signature hash flags.
LoadTxoLoads a transaction output from the context.
LoadHeaderLoads a block header from the context.
Time operationsManipulate the mock time.
AdvanceTimeAdvances time by a given duration.
SetTimeSets the mock time to a specific value.
Script buildingConstruct various bitcoin scripts.
BuildRawScriptsBuild raw scripts (scriptSig, scriptPubKey, witness).
BuildPayToWitnessScriptHashCreates a P2WSH script.
BuildPayToPubKeyCreates a P2PK script.
BuildPayToPubKeyHashCreates a P2PKH script.
BuildPayToWitnessPubKeyHashCreates a P2WPKH script.
BuildPayToScriptHashCreates a P2SH script.
BuildOpReturnScriptsCreates an OP_RETURN script.
BuildPayToAnchorCreates a P2A (pay-to-anchor) script for CPFP.
Witness stackConstruct a witness stack.
BeginWitnessStackBegins building a witness stack.
AddWitnessAdds an item to the witness stack.
EndWitnessStackFinishes building the witness stack.
Transaction buildingConstruct a transaction.
BeginBuildTxBegins building a transaction.
BeginBuildTxInputsBegins building transaction inputs.
AddTxInputAdds an input to the transaction.
EndBuildTxInputsFinishes building transaction inputs.
BeginBuildTxOutputsBegins building transaction outputs.
AddTxOutputAdds an output to the transaction.
EndBuildTxOutputsFinishes building transaction outputs.
EndBuildTxFinishes building the transaction.
TakeTxoExtracts a specific output from a transaction.
Block buildingConstruct a block.
BeginBlockTransactionsBegins building the list of transactions for a block.
AddTxAdds a transaction to the block.
EndBlockTransactionsFinishes building the list of transactions.
BuildBlockBuilds a block.
Inventory buildingConstruct an inventory for inv and getdata messages.
BeginBuildInventoryBegins building an inventory.
AddTxidInvAdds a txid to the inventory.
AddTxidWithWitnessInvAdds a txid (with witness) to the inventory.
AddWtxidInvAdds a wtxid to the inventory.
AddBlockInvAdds a block hash to the inventory.
AddBlockWithWitnessInvAdds a block hash (with witness) to the inventory.
AddFilteredBlockInvAdds a filtered block to the inventory.
EndBuildInventoryFinishes building the inventory.
Message sendingSend messages to a node.
SendRawMessageSends a raw, untyped message.
SendGetDataSends a getdata message.
SendInvSends an inv message.
SendTxSends a tx message.
SendTxNoWitSends a tx message without witness data.
SendHeaderSends a header message.
SendBlockSends a block message.
SendBlockNoWitSends a block message without witness data.
SendGetCFiltersSends a getcfilters message.
SendGetCFHeadersSends a getcfheaders message.
SendGetCFCheckptSends a getcfcheckpt message.
Other
NopNo operation. Used during minimization.

Mutators

Several mutation strategies are available for IR programs:

  • InputMutator: Replaces an instruction's input variable with another variable of compatible type. This mutation aims at changing the dataflow of a given program by making an instruction operate on a different value.
  • OperationMutator: Mutate an operation, e.g. mutate the input values of a Load* operation.
  • CombineMutator: Insert an entire IR program into another one at a random location. This mutation aims at changing a program's control flow by combining two programs into one.
  • ConcatMutator: Append an entire IR program to another one at a random location. This mutation aims at changing a program's control flow by appending another program to it.

Generators offer further mutation strategies that involve the generation of new programs or new instructions into existing programs. They can be used to bootstrap an initial corpus as well as to mutate existing inputs during a fuzzing campaign. The following generators are available:

  • SendMessageGenerator: Generates a new SendRawMessage instruction
  • AdvanceTimeGenerator: Generates new AdvanceTime and SetTime instructions
  • CompactFilterQueryGenerator: Generates new SendGetCFilters, SendGetCFHeaders and SendGetCFCheckpt instructions
  • BlockGenerator: Generates instructions to build a block
  • HeaderGenerator: Generates instructions to build a header
  • AddTxToBlockGenerator: Generates instructions to add a transaction to a block
  • OneParentOneChildGenerator: Generates instructions for building two new transactions (a 1-parent 1-child package) and sending them to a node
  • ... see generators/

Minimizers

Minimizers are used to reduce the size of an interesting program (e.g. bug or new coverage). The following minimization strategies are available:

  • Nopping: Attempt to nop out instructions in an effort to reduce the size of a program.
  • Cutting: Attempt to cut out instructions of the end of a program in an effort to reduce the size of a program.
  • BlockNopping: Attempt to nop out entire blocks of instructions in an effort to reduce the size of a program (e.g. nop all instructions between and including BeginBlockTransactions and EndBlockTransactions).