On Mon, 2017-05-29 at 23:48 +0200, Borislav Petkov wrote:
On Fri, May 05, 2017 at 11:17:07AM -0700, Ricardo Neri wrote:
String instructions are special because in protected mode, the linear address is always obtained via the ES segment register in operands that use the (E)DI register.
... and DS for rSI.
Right, I omitted this in the commit message.
If we're going to account for both operands of string instructions with two operands.
Btw, LODS and OUTS use only DS:rSI as a source operand. So we have to be careful with the generalization here. So if ES:rDI is the only seg. reg we want, then we don't need to look at those insns... (we assume DS by default).
My intention with this function is to write a function that does only one thing: identify string instructions, irrespective of the operands they use. A separate function, resolve_seg_register, will have the logic to decide what to segment register to use based on the registers used as operands, whether we are looking at a string instruction, whether we have segment override prefixes and whether such overrides should be ignored.
If I was to leave out string instructions from this function it should be renamed as is_string_instruction_non_lods_outs. In my opinion this separation makes the code more clear and I would end up having logic to decide which segment register to use in two places. Does it makes sense to you?
...
+/**
- is_string_instruction - Determine if instruction is a string instruction
- @insn: Instruction structure containing the opcode
- Return: true if the instruction, determined by the opcode, is any of the
- string instructions as defined in the Intel Software Development manual.
- False otherwise.
- */
+static bool is_string_instruction(struct insn *insn) +{
- insn_get_opcode(insn);
- /* all string instructions have a 1-byte opcode */
- if (insn->opcode.nbytes != 1)
return false;
- switch (insn->opcode.bytes[0]) {
- case INSB:
/* fall through */
- case INSW_INSD:
/* fall through */
- case OUTSB:
/* fall through */
- case OUTSW_OUTSD:
/* fall through */
- case MOVSB:
/* fall through */
- case MOVSW_MOVSD:
/* fall through */
- case CMPSB:
/* fall through */
- case CMPSW_CMPSD:
/* fall through */
- case STOSB:
/* fall through */
- case STOSW_STOSD:
/* fall through */
- case LODSB:
/* fall through */
- case LODSW_LODSD:
/* fall through */
- case SCASB:
/* fall through */
That "fall through" for every opcode is just too much. Also, you can use the regularity of the x86 opcode space and do:
case 0x6c ... 0x6f: /* INS/OUTS */ case 0xa4 ... 0xa7: /* MOVS/CMPS */ case 0xaa ... 0xaf: /* STOS/LODS/SCAS */ return true; default: return false; }
And voila, there's your compact is_string_insn() function! :^)
Thanks for the suggestion! It looks really nice. I will implement accordingly.
Thanks and BR, Ricardo