Post by Scott LurndalPost by Martin GregoriePost by Dennis Lee BieberPost by gareth evansThinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC interpreter in order
to enhance it, I guess that dis-assemblers and decompilers must now be
ten-a-penny,
especially for programs running under Windows where the structure of
Windows programs is well-known with an assumption that C was the source
language?
Actually, I think the use of disassemblers et al has fallen away.
Modern processors have so many peephole optimizations and out-of-order
execution streams that converting an executable back to assembly source
is almost meaningless -- and getting back to a high-level language is
near impossible. One would have to be an expert at the assembly for a
processor to have any chance of understanding the result.
The retro-computing guys - those who are fans of the MC6800 and MC6809
microprocessors anyway, anyway, seem to be getting a rather good semi-
interactive disassembler up and running.
Security experts have several very powerful disassemblers and decompilers
they use for Intel/AMD/ARM processors.
https://en.wikibooks.org/wiki/X86_Disassembly/Disassemblers_and_Decompilers
Yes. I am certain that certain compilers and certain languages leave a
fingerprint, Always THAT resister, used to do THAT job, always that
particular sequence of assembly to mimic that high level construct.
I cut my teeth on microprocessor assembly. The C. Some things that are
neat in assembler are ugly as sin in C. Take a call table. In assembler,
you set up a range of memory whose contents contain the addresses of
subroutines. You load the accumulator with a number, left shift it once,
add it to the content of a register set to point to the base of that
memory block, and use that register as pointing to an address whose
contents are the address you want to 'call' Simple, efficient and
provided you ensure nothing out of bounds is in the accumulator, bomb proof.
Now try that in C, you need an array of pointers to functions, and a
simple check on the index you engage, followed by a declaration to call
the function whose address is in the array of pointers to functions. I
never ever managed to get an 8 bit compiler to actually do that. People
just don't call the contents of an array of pointers to functions.
Its easier by far to set up a switch statement, which takes care of out
of bounds defaults, and ends up producing a chain of if..else if.. else
conditional calls to hardwired functions.
That's how you write it, because its pretty much as fast on a pipelined
processor, RAM is cheap and comprehensibility beats programming elegance
hands down in the real world.
I've examined a lot of compiled machine code and its pretty easy to tell
what language it is, and what roughly it was written as. Stack based
variables is a bit of a give away pointing to C or a similar langauge.
highly optimised compilers of course automatically obfuscate things, but
that's the fun isn't it?
I gave up writing assembler for *86 CPUs when the Gnu compiler was
patently doing a better job than I would in assembler, and the ability
to write something long winded and easy to understand and have the
compiler completely rearrange it and turn it into three lines of
incomprehensible assembler, was to be respected.
I think it is up to a limited point entirely possible to make an AI that
could replace machine code with editable and compilable source code.
But there will always be the Problem Of Induction. Many many possible
constructs in source using an infinite number of random variable and
function names, could compile to the same object code. And there is no
way to reinstate the comments either, so it becomes an exercise
ultimately in hand editing and reinstating the comments manually -
almost as big a job as writing from scratch.
I suspect this is how Linux writers write freeware drivers for
proprietary hardware. Disassemble the manufacturers drivers, and at
least mimic the program flow, if not the actual source code.
--
“I know that most men, including those at ease with problems of the
greatest complexity, can seldom accept even the simplest and most
obvious truth if it be such as would oblige them to admit the falsity of
conclusions which they have delighted in explaining to colleagues, which
they have proudly taught to others, and which they have woven, thread by
thread, into the fabric of their lives.”
― Leo Tolstoy