Fuzzamoto Intermediate Representation
At a high level, generic p2p testcases represent a sequence of actions performed against one or more target nodes:
- Changing system time
- Establishing a new p2p connection
- Sending protocol messages on an established connection
Protocol messages in particular are highly structured, as they are serialized using a custom format, contain various cryptographic primitives (hash commitments, signatures, checksums, ...) and must fullfil various other structural requirements to be considered valid, such as:
- Block headers must point to a prior block via its hash
- Transaction inputs must point to existing unspent transactions outputs via transaction identifiers
blocktxn
messages are only processed if requested (after a priorcmpctblock
message)- ...
Therefore, naively fuzzing a scenario with a byte-array fuzzer, using the
following input format (with e.g.
Arbitrary
) will mostly result in
fuzzing the message (de)serialization code and other easy to reach protocol
flows.
#![allow(unused)] fn main() { pub enum Action { SetTime { ... }, Connect { ... }, SendMessage { ... }, } pub struct TestCase { pub actions: Vec<Action>, } }
If we want to focus on fuzzing deeper logic instead, then we'll need to make input generation/mutation aware of the structural requirements. This is were an intermediate representation, that holds relevant type and structural information, becomes useful.
Fuzzamoto IR describes small programs that can be compiled into the simple
testcase format from above (TestCase
). For the purpose of
mutation/generation, the fuzzer (fuzzamoto-libafl
)
operates on testcases encoded as the IR (as it contains relevant type and
structural information) and only compiles it to the simple format for target
execution.
┌───────────────────────────────────────Fuzzer───────────────────────────────────────┐
│ │
│ ┌─────────────┐ ┌────────┐ ┌──────────────────┐ │
│ │ Corpus (IR) ├──────Select─────►│ Mutate ├──────Compile──────► Target Execution │ │
│ └─────────────┘ └────────┘ └──────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────────────────┘
Design
Fuzzamoto IR consists of a sequence of operations that take some input variables and produce variables as output. All variables are typed (see variable.rs) and operations expect variables of compatible type. The IR uses static single assignement form (SSA), which means every variable in the IR is assigned exactly once. SSA helps simplify define-use analysis, type inference and code generation/mutation among other things.
Each IR program is associated with a context that represents the snapshot state of the test VM:
- Mock time
- Number of nodes
- Number of existing connections made by the scenario
- Available transaction outputs (Used for
LoadTxo
instructions) - Available block headers (Used for
LoadHeader
instructions)
Programs might not be valid/useful in a different context. E.g. a program that was generated within the context of 10 nodes and 200 connections might not be valid in a context with 1 nodes and 8 connections, as it might refer to non-existent nodes or connections.
In the following simple example, the IR describes the creation of a tx message from raw bytes that is then send to a node via one of the existing connections:
// Context: nodes=1 connections=8 timestamp=1296688802
v0 <- LoadConnection(5)
v1 <- LoadMsgType("tx")
v2 <- LoadBytes("fefe8520fefefe0000fffffe8520")
SendRawMessage(v0, v1, v2)
Note that this is a human readable representation and not the internal in-memory structure used by the fuzzer.
This example would compile into a single TestCase::SendMessage
operation.
The next example is more complex and better demonstrates the strengths of the IR:
// Context: nodes=1 connections=8 timestamp=1296688802
v0 <- LoadBytes("5656565656565656567a7a7a7a7a7a7a7a7a7aa9ffff5656567a506464649b64596464f16463646464")
v1 <- LoadTxo(083666c9bf066f9d3a28ad30f5c0ed6fe463f7777e033783875b2523ef5214bb:0, 2500000000, 00204ae81572f06e1b88fd5ced7a1a000945432e83e1551e6f721ee9c00b8cc33260, , 51)
v2 <- LoadConnection(4)
v3 <- LoadTxVersion(2)
v4 <- LoadLockTime(144)
BeginBuildTx(v3, v4) -> v5
BeginBuildTxInputs -> v6
v7 <- LoadSequence(4294967295)
AddTxInput(v6, v1, v7)
v8 <- EndBuildTxInputs(v6)
BeginBuildTxOutputs(v8) -> v9
BeginWitnessStack -> v10
v11 <- EndWitnessStack(v10)
v12 <- BuildPayToWitnessScriptHash(v0, v11)
v13 <- LoadAmount(100000000)
AddTxOutput(v9, v12, v13)
v14 <- EndBuildTxOutputs(v9)
v15 <- EndBuildTx(v5, v8, v14)
v16 <- TakeTxo(v15)
v17 <- LoadLockTime(508195987)
v18 <- LoadTxVersion(2)
BeginBuildTx(v18, v17) -> v19
BeginBuildTxInputs -> v20
v21 <- LoadSequence(4294967294)
AddTxInput(v20, v16, v21)
v22 <- EndBuildTxInputs(v20)
BeginBuildTxOutputs(v22) -> v23
v24 <- LoadBytes("51")
BeginWitnessStack -> v25
v26 <- EndWitnessStack(v25)
v27 <- BuildPayToWitnessScriptHash(v24, v26)
v28 <- LoadAmount(98500000)
AddTxOutput(v23, v27, v28)
v29 <- EndBuildTxOutputs(v23)
v30 <- EndBuildTx(v19, v22, v29)
SendTx(v2, v30)
SendTx(v2, v15)
Two transactions are build (v15
, v30
), that are then send to the node under
test through connection v2
. v15
spends from an output in the snapshot state
v1
(loaded by LoadTxo
). The relationship between the two transactions (i.e.
v30
is the parent of v15
) is encoded in the IR through the use of
variables. v16
(output of TakeTxo
) represents the output created by v15
and is added to v30
via the AddTxInput
instruction in the BuildTxInputs
block. v30
is sent before v15
potentially triggering 1P1C logic.
This example would compile into two TestCase::SendMessage
operations,
containing the correctly serialized transactions v15
and v30
.
Table of Operations
Name | Description |
---|---|
Load* operations | Load constant values from the test context. |
LoadBytes | Loads a raw byte array. |
LoadMsgType | Loads a message type for SendRawMessage . |
LoadNode | Loads an index for one of the test nodes. |
LoadConnection | Loads an index for one of the p2p connections. |
LoadConnectionType | Loads a connection type string. |
LoadDuration | Loads a time duration. |
LoadTime | Loads a timestamp. |
LoadAmount | Loads a bitcoin amount. |
LoadSize | Loads a size in bytes. |
LoadTxVersion | Loads a transaction version. |
LoadBlockVersion | Loads a block version. |
LoadLockTime | Loads a transaction lock time. |
LoadSequence | Loads a transaction input sequence number. |
LoadBlockHeight | Loads a block height. |
LoadCompactFilterType | Loads a compact filter type. |
LoadPrivateKey | Loads a private key. |
LoadSigHashFlags | Loads signature hash flags. |
LoadTxo | Loads a transaction output from the context. |
LoadHeader | Loads a block header from the context. |
Time operations | Manipulate the mock time. |
AdvanceTime | Advances time by a given duration. |
SetTime | Sets the mock time to a specific value. |
Script building | Construct various bitcoin scripts. |
BuildRawScripts | Build raw scripts (scriptSig , scriptPubKey , witness). |
BuildPayToWitnessScriptHash | Creates a P2WSH script. |
BuildPayToPubKey | Creates a P2PK script. |
BuildPayToPubKeyHash | Creates a P2PKH script. |
BuildPayToWitnessPubKeyHash | Creates a P2WPKH script. |
BuildPayToScriptHash | Creates a P2SH script. |
BuildOpReturnScripts | Creates an OP_RETURN script. |
BuildPayToAnchor | Creates a P2A (pay-to-anchor) script for CPFP. |
Witness stack | Construct a witness stack. |
BeginWitnessStack | Begins building a witness stack. |
AddWitness | Adds an item to the witness stack. |
EndWitnessStack | Finishes building the witness stack. |
Transaction building | Construct a transaction. |
BeginBuildTx | Begins building a transaction. |
BeginBuildTxInputs | Begins building transaction inputs. |
AddTxInput | Adds an input to the transaction. |
EndBuildTxInputs | Finishes building transaction inputs. |
BeginBuildTxOutputs | Begins building transaction outputs. |
AddTxOutput | Adds an output to the transaction. |
EndBuildTxOutputs | Finishes building transaction outputs. |
EndBuildTx | Finishes building the transaction. |
TakeTxo | Extracts a specific output from a transaction. |
Block building | Construct a block. |
BeginBlockTransactions | Begins building the list of transactions for a block. |
AddTx | Adds a transaction to the block. |
EndBlockTransactions | Finishes building the list of transactions. |
BuildBlock | Builds a block. |
Inventory building | Construct an inventory for inv and getdata messages. |
BeginBuildInventory | Begins building an inventory. |
AddTxidInv | Adds a txid to the inventory. |
AddTxidWithWitnessInv | Adds a txid (with witness) to the inventory. |
AddWtxidInv | Adds a wtxid to the inventory. |
AddBlockInv | Adds a block hash to the inventory. |
AddBlockWithWitnessInv | Adds a block hash (with witness) to the inventory. |
AddFilteredBlockInv | Adds a filtered block to the inventory. |
EndBuildInventory | Finishes building the inventory. |
Message sending | Send messages to a node. |
SendRawMessage | Sends a raw, untyped message. |
SendGetData | Sends a getdata message. |
SendInv | Sends an inv message. |
SendTx | Sends a tx message. |
SendTxNoWit | Sends a tx message without witness data. |
SendHeader | Sends a header message. |
SendBlock | Sends a block message. |
SendBlockNoWit | Sends a block message without witness data. |
SendGetCFilters | Sends a getcfilters message. |
SendGetCFHeaders | Sends a getcfheaders message. |
SendGetCFCheckpt | Sends a getcfcheckpt message. |
Other | |
Nop | No operation. Used during minimization. |
Mutators
Several mutation strategies are available for IR programs:
InputMutator
: Replaces an instruction's input variable with another variable of compatible type. This mutation aims at changing the dataflow of a given program by making an instruction operate on a different value.OperationMutator
: Mutate an operation, e.g. mutate the input values of aLoad*
operation.CombineMutator
: Insert an entire IR program into another one at a random location. This mutation aims at changing a program's control flow by combining two programs into one.ConcatMutator
: Append an entire IR program to another one at a random location. This mutation aims at changing a program's control flow by appending another program to it.
Generators offer further mutation strategies that involve the generation of new programs or new instructions into existing programs. They can be used to bootstrap an initial corpus as well as to mutate existing inputs during a fuzzing campaign. The following generators are available:
SendMessageGenerator
: Generates a newSendRawMessage
instructionAdvanceTimeGenerator
: Generates newAdvanceTime
andSetTime
instructionsCompactFilterQueryGenerator
: Generates newSendGetCFilters
,SendGetCFHeaders
andSendGetCFCheckpt
instructionsBlockGenerator
: Generates instructions to build a blockHeaderGenerator
: Generates instructions to build a headerAddTxToBlockGenerator
: Generates instructions to add a transaction to a blockOneParentOneChildGenerator
: Generates instructions for building two new transactions (a 1-parent 1-child package) and sending them to a node- ... see generators/
Minimizers
Minimizers are used to reduce the size of an interesting program (e.g. bug or new coverage). The following minimization strategies are available:
Nopping
: Attempt to nop out instructions in an effort to reduce the size of a program.Cutting
: Attempt to cut out instructions of the end of a program in an effort to reduce the size of a program.BlockNopping
: Attempt to nop out entire blocks of instructions in an effort to reduce the size of a program (e.g. nop all instructions between and includingBeginBlockTransactions
andEndBlockTransactions
).