On Fri, May 05, 2017 at 11:17:07AM -0700, Ricardo Neri wrote:
String instructions are special because in protected mode, the linear address is always obtained via the ES segment register in operands that use the (E)DI register.
... and DS for rSI.
If we're going to account for both operands of string instructions with two operands.
Btw, LODS and OUTS use only DS:rSI as a source operand. So we have to be careful with the generalization here. So if ES:rDI is the only seg. reg we want, then we don't need to look at those insns... (we assume DS by default).
...
+/**
- is_string_instruction - Determine if instruction is a string instruction
- @insn: Instruction structure containing the opcode
- Return: true if the instruction, determined by the opcode, is any of the
- string instructions as defined in the Intel Software Development manual.
- False otherwise.
- */
+static bool is_string_instruction(struct insn *insn) +{
- insn_get_opcode(insn);
- /* all string instructions have a 1-byte opcode */
- if (insn->opcode.nbytes != 1)
return false;
- switch (insn->opcode.bytes[0]) {
- case INSB:
/* fall through */
- case INSW_INSD:
/* fall through */
- case OUTSB:
/* fall through */
- case OUTSW_OUTSD:
/* fall through */
- case MOVSB:
/* fall through */
- case MOVSW_MOVSD:
/* fall through */
- case CMPSB:
/* fall through */
- case CMPSW_CMPSD:
/* fall through */
- case STOSB:
/* fall through */
- case STOSW_STOSD:
/* fall through */
- case LODSB:
/* fall through */
- case LODSW_LODSD:
/* fall through */
- case SCASB:
/* fall through */
That "fall through" for every opcode is just too much. Also, you can use the regularity of the x86 opcode space and do:
case 0x6c ... 0x6f: /* INS/OUTS */ case 0xa4 ... 0xa7: /* MOVS/CMPS */ case 0xaa ... 0xaf: /* STOS/LODS/SCAS */ return true; default: return false; }
And voila, there's your compact is_string_insn() function! :^)
(Modulo the exact list, as I mentioned above).
Thanks.