java.lang.Object | |
↳ | com.pnfsoftware.jeb.core.units.code.asm.cfg.CFG<InsnType extends com.pnfsoftware.jeb.core.units.code.IInstruction> |
This class represents a Control Flow Graph for a routine/method, or more generally, any
body of code. The CFG can be typed to handle specific instructions
.
This class provides basic Data Flow Analysis support. A client can request simple and full du- and ud-chains. Simple chains are for intra-block use and reference instructions by index; Full chains contain precise cross-block information and reference instructions by address.
There are two ways to build a CFG:
CFG(long, List)
and providing a list of pre-parsed basic
blocks
.CFG(List, List)
and providing a flat list of instructions.Constants | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
int | FLAG_ALLOW_ARTIFICIAL_BLOCK_END | ||||||||||
int | FLAG_SUBROUTINE_CALL_NOT_BREAKING |
Public Constructors | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
CFG(long entry, List<BasicBlock<InsnType>> blocks)
Create a standard CFG using a pre-constructed list of basic blocks.
| |||||||||||
CFG(List<? extends InsnType> insns, List<IrregularFlowData> irrdata)
Convenience constructor.
| |||||||||||
CFG(List<? extends InsnType> insns, List<IrregularFlowData> irrdata, IInstructionAugmenter augmenter, long base, long entry, int flags)
Create a CFG by recursively processing a list of sequential instructions.
| |||||||||||
CFG(Map<Long, InsnType> offsetToInsn, List<IrregularFlowData> irrdata, IInstructionAugmenter augmenter, long entry, int flags)
Create a CFG by recursively processing a collection of instructions.
|
Public Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
boolean |
addEdge(BasicBlock<InsnType> x, BasicBlock<InsnType> y, boolean keepDFA)
Add a regular edge X->Y.
| ||||||||||
Iterable<AddressableInstruction<InsnType>> | addressableInstructions() | ||||||||||
void |
deleteAllEdges()
Remove all block edges.
| ||||||||||
boolean |
deleteDuplicateEdge(BasicBlock<InsnType> x, BasicBlock<InsnType> y, boolean keepDFA)
Delete one duplicate edge X->Y.
| ||||||||||
boolean |
deleteEdge(BasicBlock<InsnType> x, BasicBlock<InsnType> y, boolean keepDFA)
Delete a regular edge X->Y.
| ||||||||||
void |
doDataFlowAnalysis()
Perform data flow analysis on the CFG.
| ||||||||||
void |
doDataFlowAnalysis(boolean redo)
Perform data flow analysis on the CFG.
| ||||||||||
void |
doDataFlowAnalysis(boolean redo, boolean integrateCalculatedInputRegisters)
Perform data flow analysis on the CFG.
| ||||||||||
String |
format(boolean formatAddresses, int formatChains, boolean formatInOut, IFormattingContextFactory<InsnType> fcf)
Format the CFG into a printable string, with optional data chains.
| ||||||||||
String |
format(boolean formatAddresses, int formatChains, boolean formatInOut)
Format the CFG into a printable string, with optional data chains.
| ||||||||||
String |
format()
Format this CFG as an assembly-code listing with simple data chains and input/output
registers
Consider using a | ||||||||||
String |
formatSimple()
Format this CFG as an assembly-code listing without any DFA data.
| ||||||||||
BasicBlock<InsnType> |
get(int index)
Retrieve a basic block.
| ||||||||||
Map<Long, BasicBlock<InsnType>> |
getAddressBlockMap()
Get a complete map of the basic blocks and their addresses in the CFG.
| ||||||||||
BasicBlock<InsnType> |
getBlockAt(long base)
Get the basic block that starts at the provided address or offset.
| ||||||||||
BasicBlock<InsnType> |
getBlockContaining(long address)
Get the basic block that contains the provided address.
| ||||||||||
List<BasicBlock<InsnType>> |
getBlocks()
Get a copy of the block list of the CFG.
| ||||||||||
int |
getEffectiveSize()
Calculate the 'effective' size of this CFG, that is, the sum of the size of each basic block.
| ||||||||||
long |
getEndAddress()
Routine highest address (exclusive).
| ||||||||||
long |
getEntryAddress()
Routine entry-point address.
| ||||||||||
BasicBlock<InsnType> |
getEntryBlock()
Get the entry block.
| ||||||||||
List<BasicBlock<InsnType>> |
getExitBlocks()
Get the ordered list of exit blocks.
| ||||||||||
long |
getFirstAddress()
Routine lowest address (inclusive).
| ||||||||||
int | getFlags() | ||||||||||
List<Couple<Long, Long>> |
getGaps()
Get a list of gaps found in the routine CFG.
| ||||||||||
void |
getGraphRepresentation(List<int[]> edges, List<int[]> irregularEdges)
Get the a graph representation of the CFG.
| ||||||||||
Map<Integer, Set<Long>> | getInputMap() | ||||||||||
InsnType |
getInstruction(long address)
Get the instruction located at the exact address.
| ||||||||||
int |
getInstructionCount()
Get the total number of instructions in the CFG.
| ||||||||||
Couple<BasicBlock<InsnType>, Integer> | getInstructionLocation(long address) | ||||||||||
List<InsnType> |
getInstructions()
Get the instruction list of this CFG by aggregating each instruction of every block.
| ||||||||||
Map<Long, InsnType> |
getInstructionsMap()
Get the map of addresses to instructions that compose this CFG.
| ||||||||||
long |
getLastAddress()
Routine highest address (inclusive).
| ||||||||||
Map<Integer, Set<Long>> | getOuputMap() | ||||||||||
IVariableInformationProvider | getVariableInformationProvider() | ||||||||||
boolean |
hasExit()
Determine if this CFG has exit blocks, that is, blocks without out-edges.
| ||||||||||
boolean |
hasNoExit()
Determine if this CFG does not have any exit block.
| ||||||||||
Iterable<InsnType> | instructions() | ||||||||||
void |
invalidateDataFlowAnalysis()
Explicitly invalidate a previously performed data flow analysis.
| ||||||||||
Iterator<BasicBlock<InsnType>> | iterator() | ||||||||||
int |
pruneOrphans()
Remove orphan basic blocks, i.e.
| ||||||||||
int |
reconnectEdge(BasicBlock<InsnType> x, BasicBlock<InsnType> y, BasicBlock<InsnType> z, boolean keepDFA)
This method regularly reconnects a block X from Y to Z.
| ||||||||||
void |
removeBlock(BasicBlock<InsnType> b, boolean keepDFA)
Remove a block, and update the fixtures.
| ||||||||||
InsnType |
replaceInstruction(long address, InsnType insn)
Replace an instruction and invalidate the DFA.
| ||||||||||
InsnType |
replaceInstruction(long address, InsnType insn, boolean keepDFA)
Replace an instruction.
| ||||||||||
IVariableInformationProvider | setVariableInformationProvider(IVariableInformationProvider prv) | ||||||||||
CFG<InsnType> |
shallowCopy(boolean copyBlockReferences)
Shallow duplication of a CFG:
- Block references are optionally copied - DFA data is not copied | ||||||||||
int |
simplify(boolean mergeOnCalls, boolean mergeOnJumps, boolean mergeNonConsecutive)
Merge the blocks that can be merged, without changing the flow of the graph.
| ||||||||||
int |
size()
Get the number of blocks.
| ||||||||||
String | toString() |
[Expand]
Inherited Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
![]() | |||||||||||
![]() | |||||||||||
![]() |
Create a standard CFG using a pre-constructed list of basic blocks.
Convenience constructor. Same as
CFG(insns, irrdata, null, 0,
0, 0)
.
insns | list of sequential instructions; they must be contiguous (no gap), which means that the first instruction is at address `base`, the second at `base`+sizeofFirstInstruction, etc. |
---|---|
irrdata | irregular control flow information |
Create a CFG by recursively processing a list of sequential instructions.
Delay-slot instruction sets are not supported by this constructor.
insns | list of sequential instructions; they must be contiguous (no gap), which means that the first instruction is at address `base`, the second at `base`+sizeofFirstInstruction, etc. |
---|---|
irrdata | irregular control flow information |
augmenter | optional instruction augmenter |
base | routine base address |
entry | routine entry-point address |
flags | parsing flags, see FLAG_xxx constants
|
Create a CFG by recursively processing a collection of instructions.
Delay-slot instruction sets are not supported by this constructor.
offsetToInsn | map of address-to-instructions |
---|---|
irrdata | irregular control flow information |
augmenter | optional instruction augmenter |
entry | routine entry-point address |
flags | parsing flags, see FLAG_xxx constants
|
Add a regular edge X->Y. This method fails if the edge already exists. Both blocks must be part of the CFG.
x | source block |
---|---|
y | destination block |
keepDFA | if true the DFA is considered to be still valid after the edge addition |
Remove all block edges.
Delete one duplicate edge X->Y.
x | source block |
---|---|
y | destination block |
keepDFA | if true, if true the DFA is considered to be still valid after the edge removal |
Delete a regular edge X->Y. Warning! The method throws if duplicate edges x->y are found.
x | source block |
---|---|
y | destination block |
keepDFA | if true, if true the DFA is considered to be still valid after the edge removal |
Perform data flow analysis on the CFG.
This method does not force a new analysis: Same as doDataFlowAnalysis(false)
.
Perform data flow analysis on the CFG. Input registers are integrated into
This method does not propagate input registers: Same as
doDataFlowAnalysis(redo, true)
.
redo | if true, force a new analysis even if one is not required |
---|
Perform data flow analysis on the CFG.
redo | if true, force a new analysis even if one is not required |
---|---|
integrateCalculatedInputRegisters | if true, the live registers determined after analysis will be integrated in the use-def chains |
Format the CFG into a printable string, with optional data chains.
Consider using a CFGFormatter
instead of this method.
formatAddresses | true to prepend instructions by their address |
---|---|
formatChains | 0=no, 1=simple chains, 2=full chains |
formatInOut | true to format the input(live) and output(reaching) registers |
Format the CFG into a printable string, with optional data chains.
Consider using a CFGFormatter
instead of this method.
Format this CFG as an assembly-code listing with simple data chains and input/output registers
Consider using a CFGFormatter
instead of this method.
Format this CFG as an assembly-code listing without any DFA data.
Consider using a CFGFormatter
instead of this method.
Get a complete map of the basic blocks and their addresses in the CFG. This method is recommended for use when multiple, repeated invocations of #getBlockAt(int) are to be made.
Get the basic block that starts at the provided address or offset.
base | the block address/offset |
---|
Get the basic block that contains the provided address.
Note that the address just needs to be in the block address range; it does not need to point to the beginning of an instruction within the block
address | an address within the block |
---|
Get a copy of the block list of the CFG. The list is ordered by ascending block address. Modifying the list does not impact the CFG.
Calculate the 'effective' size of this CFG, that is, the sum of the size of each basic block.
Routine highest address (exclusive).
Routine entry-point address. Note that this address may not be the lowest one in the CFG.
Get the entry block. The entry block is unique.
Get the ordered list of exit blocks. A CFG may have zero or more exit blocks; CFG representing standard routines will have at least one (generally one) exit block.
Routine lowest address (inclusive). Note that this address may not be the entry-point address for this routine.
Get a list of gaps found in the routine CFG. Gaps are memory spaces between blocks that do not belong to the CFG itself.
Optimizing compilers and obfuscators can produce routines with gaps.
Get the a graph representation of the CFG. The list of edges return use a 1-based node numbering scheme.
edges | (output) array of regular edges, eg: {{1,2},{1,3},{2,3}} |
---|---|
irregularEdges | (output) array of irregular edges |
Get the instruction located at the exact address.
Get the total number of instructions in the CFG. This method sums the number of instructions of each block of the CFG.
Get the instruction list of this CFG by aggregating each instruction of every block. The list is ordered by ascending address/offset.
Get the map of addresses to instructions that compose this CFG.
Routine highest address (inclusive).
Determine if this CFG has exit blocks, that is, blocks without out-edges.
Determine if this CFG does not have any exit block.
Explicitly invalidate a previously performed data flow analysis.
To be used for instance after a CFG (or parts of it) is modified, eg, an instruction is pulled out or optimized (changing its register usage), etc.
Remove orphan basic blocks, i.e. basic blocks with no parents which are not the entry block, and all their descendants.
This method regularly reconnects a block X from Y to Z. (x->y becomes x->z). Warning! The method throws if duplicate edges x->y are found.
x | the original source block |
---|---|
y | the original destination block |
z | the new destination block |
keepDFA | if true the DFA is considered to be still valid after the reconnect |
Remove a block, and update the fixtures. This method is content-agnostic.
A CFG should never contain empty blocks or orphan blocks (no parents). However, when the CFG is optimized, instructions and edges are removed, and some nodes might end up empty or orphans. This transient, stale state is fine AS LONG AS the client knows what it is doing and removes the blocks as soon as they're done, before passing it down the processing chain. This method removes such blocks and updates the edges of connected and connecting blocks.
Caveats:
Data flow analysis: not used; invalidation depends on keepDFA parameter.
b | block to be removed |
---|---|
keepDFA | if true the DFA is considered still valid after the removal (BE CAUTIOUS!!) |
Replace an instruction and invalidate the DFA. An existing instruction must exist at the provided address. No adjustment is done (eg, the replaced instruction size is not modified). It is up to the caller to adjust the instruction in order to keep a valid CFG obect.
address | the target address |
---|---|
insn | the instruction that will replace the instruction at the provided address |
Replace an instruction. An existing instruction must exist at the provided address. No adjustment is done (eg, the replaced instruction size is not modified). It is up to the caller to adjust the instruction in order to keep a valid CFG obect.
address | the target address |
---|---|
insn | the instruction that will replace the instruction at the provided address |
keepDFA | whether or not existing data flow analysis should be invalidated upon a successful replacement |
Shallow duplication of a CFG:
- Block references are optionally copied
- DFA data is not copied
Merge the blocks that can be merged, without changing the flow of the graph.
Necessary conditions include: adjacent blocks, first block falls thru the second one (possibly through two duplicate edges), no block shall have irregular outputs, and only the first block may have irregular inputs.
Data flow analysis: not used; invalidated (if simplifications were performed).
This method uses getBreakingFlow(long)
and
getRoutineCall(long)
to determine block ends.
mergeOnCalls | if true, this optimization will merge blocks ending by a call to a sub-routine |
---|---|
mergeOnJumps | if true, this optimization will merge blocks ending by a simple branching instruction |
mergeNonConsecutive | allow to merge non strictly consecutive blocks. There are some conditions to respect: the order must be correct, there must NOT exist a block between two consecutive blocks AND the second block must have no outblock. Be careful that there may remain a non-sense jump instruction and it must be necessary to add a jump to previously fallthrough address (caller has to manage it). |
Get the number of blocks.