저번 스터디 시간에 나왔던 애기인 Interpreter, Binary Translation 챕터에 Inst -> Assembly code 번역에 대해서
CISC 인 IA-32 CPU에서 어떻게 될까? 에 될까에 대한건 간단히 참조할만한게 뭐가 있을까 생각하다가
밤에 잠도 안오고 해서 잠시 OllyDebugger 사이트에 있는 Disassm.zip 받아서 봤습니다.
(마음은 IDA 소스 코드만 볼 수 있다면 보고 싶은데 ㅋㅋ)
Disasm() 함수가 도움이 될거 같습니다.
올리디버그는 다음과 같은 구조체를 Disassemble 결과로 사용하고 있었습니다.
typedef struct t_disasm { // Results of disassembling
ulong ip; // Instrucion pointer
char dump[TEXTLEN]; // Hexadecimal dump of the command
char result[TEXTLEN]; // Disassembled command
char comment[TEXTLEN]; // Brief comment
int cmdtype; // One of C_xxx
int memtype; // Type of addressed variable in memory
int nprefix; // Number of prefixes
int indexed; // Address contains register(s)
ulong jmpconst; // Constant jump address
ulong jmptable; // Possible address of switch table
ulong adrconst; // Constant part of address
ulong immconst; // Immediate constant
int zeroconst; // Whether contains zero constant
int fixupoffset; // Possible offset of 32-bit fixups
int fixupsize; // Possible total size of fixups or 0
int error; // Error while disassembling command
int warnings; // Combination of DAW_xxx
} t_disasm;
Readme 에 각 인자에 대해서 다음과 같이 설명이 적혀 있네요
Members:
- pi - address of the disassembled command;
- dump - ASCII string, formatted hexadecimal dump of the command;
- result - ASCII string, disassembled command itself;
- comment - ASCII string, brief comment that applies to the whole command;
- cmdtype - type of the disassembled command, one of C_xxx possibly ORed with C_RARE to indicate that command is seldom in ordinary Win32 applications. Commands of type C_MMX additionally contain size of MMX data in the 3 least significant bits (0 means 8-byte operands). Non-MMX commands may have C_EXPL bit set which means that some memory operand has size which is not conform with standard 80x86 rules;
- memtype - type of memory operand, one of DEC_xxx, or DEC_UNKNOWN if operand is non-standard or command does not access memory;
- nprefix - number of prefixes that this command contains;
- indexed - if memory address contains index register, set to scale, otherwise 0;
- jmpconst - address of jump destination if this address is a constant, and 0 otherwise;
- jmptable - if indirect jump can be interpreted as switch, base address of switch table and 0 otherwise;
- adrconst - constant part of memory address;
- immconst - immediate constant or 0 if command contains no immediate constant. The only command that contains two immediate constants is ENTER. Disasm() ignores second constant which is anyway 0 in most cases;
- zeroconst - nonzero if command contains immediate zero constant;
- fixupoffset - possible start of 32 bit fixup within the command, or 0 if command can't contain fixups;
- fixupsize - possible total size of fixups (0, 4 or 8). If command contains both immediate constant and immediate address, they are always adjacent on 80x86 processors;
- error - Disasm() was unable to disassemble command (for example, command does not exist or crosses end of memory block), one of DAE_xxx;
- warnings - command is suspicious or meaningless (for example, far jump or MOV EAX,EAX preceded with segment prefix), combination of DAW_xxx bits;
구조체에서 cmdtype 이 궁금해서 define 값들을 보니 이렇게 나누고 있구요
#define C_CMD 0x00 // Ordinary instruction
#define C_PSH 0x10 // 1-word PUSH instruction
#define C_POP 0x20 // 1-word POP instruction
#define C_MMX 0x30 // MMX instruction
#define C_FLT 0x40 // FPU instruction
#define C_JMP 0x50 // JUMP instruction
#define C_JMC 0x60 // Conditional JUMP instruction
#define C_CAL 0x70 // CALL instruction
#define C_RET 0x80 // RET instruction
#define C_FLG 0x90 // Changes system flags
#define C_RTF 0xA0 // C_JMP and C_FLG simultaneously
#define C_REP 0xB0 // Instruction with REPxx prefix
#define C_PRI 0xC0 // Privileged instruction
#define C_DAT 0xD0 // Data (address) doubleword
#define C_NOW 0xE0 // 3DNow! instruction
#define C_BAD 0xF0 // Unrecognized command
#define C_RARE 0x08 // Rare command, seldom used in programs
#define C_SIZEMASK 0x07 // MMX data size or special flag
#define C_EXPL 0x01 // (non-MMX) Specify explicit memory size
#define C_DANGER95 0x01 // Command is dangerous under Win95/98
#define C_DANGER 0x03 // Command is dangerous everywhere
#define C_DANGERLOCK 0x07 // Dangerous with LOCK prefix
평범한 명령어 / 스택 명령어 / MMX / 부동소수점 연산 / 점프 / 분기 / 플레그 세팅 / 반복 / Privileged / 3DNow! / ...
등 책에서 Dispatch() 함수에서 분류하는 것 보다 좀 세밀하군요 ' ㅁ . ..
그리고 이런 주석이 보이더군요
// Correct 80x86 command may theoretically contain up to 4 prefixes belonging
// to different prefix groups. This limits maximal possible size of the
// command to MAXCMDSIZE=16 bytes. In order to maintain this limit, if
// Disasm() detects second prefix from the same group, it flushes first
// prefix in the sequence as a pseudocommand.
이런게 있군요 -_-;
MadDisasm 에선 다음과 같은걸 쓰네요
type
TCodeInfo = record
IsValid : boolean; // was the specified code pointer valid?
Opcode : word; // Opcode, one byte ($00xx) or two byte ($0fxx)
ModRm : byte; // ModRm byte, if available, otherwise 0
Call : boolean; // is this instruction a call?
Jmp : boolean; // is this instruction a jmp?
RelTarget : boolean; // is this target relative (or absolute)?
Target : pointer; // absolute target address
PTarget : pointer; // pointer to the target information in the code
PPTarget : TPPointer; // pointer to pointer to the target information
TargetSize : integer; // size of the target information in bytes (1/2/4)
Enlargeable : boolean; // can the target size of this opcode be extended?
This : pointer; // where does this instruction begin?
Next : pointer; // next code location
end;
점프나 분기에 대한건 애도 따로 분류해주고 있군요,
SIB 가 있는데 왜 구조체에 relative 나 absoulte 가 있는진 잘 모르겠네요 -ㅅ-;
얼른 자고 일찍 일어나서 봐야 겠네요!
링크 : http://www.ollydbg.de/disasm.zip
.
헛! 흥미롭네요
다 이해는 못했지만 ㅋ 주석부분이 재밌네요
첫번째 prefix를 flush 하는 이유가 언급되어있네요.
same group 에서 두번째 prefix를 detect하면 MAXCMDSIZE limit를 유지하기 위해
첫번째 prefix를 슈도커맨드로 무시한다 라는 내용같은데 (해석이 맞나 ㅎㅎ )
그럼 첫번째 prefix는 grouping을 위한 용도겠군요.
더나아가서 두번째 prefix가 detect 되어야지만 첫번째 prefix가 grouping 용도인지 아닌지 알 수 있다.
인 것 같습니다.
그럼 나머지 3개의 prefix는 size에도 관여하고 있는 건가요
4byte 라고 했을 때는 32byte 크기까지 커버하는데 한 바이트 빼고 나면 16byte 까지 커버하니..
( prefix 만 보고 inst 사이즈를 알 수 있다는 ..???? )
흐음... prefix 를 detect 하는 과정을 보면 이해가 빠를 것 같습니다.
prefix 가 0인 경우 에서 4개까지 인 경우까지 하나하나 보고싶네요!