AI and decompilation?

Discussion:

AI and decompilation?

(too old to reply)

gareth evans

2021-01-04 11:00:29 UTC

Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?

But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?

I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

Ahem A Rivet's Shot

2021-01-04 11:42:52 UTC

On Mon, 4 Jan 2021 11:00:29 +0000

Post by gareth evans
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?

Now *that* would be an interesting AI project to see the results
of. I'm pretty sure the answer to your question is "Nobody knows, please
publish when you find out" or thereabouts.

There's plenty of training material available in the form of open
source compiled for all sorts of platforms you just need to decide on an
AI architecture that's up to the job (hopefully something short of
Alpha Go Zero), build it (or rent it in "the cloud") and train it. It would
still be useful if you had to train one for each instruction set (or
family).

The biggest challenge would be comparing the source codes, but code
that compiles to an equivalent binary would be good enough as long as it
didn't cheat (create binary array and call it for example).

--
Steve O'Hara-Smith | Directable Mirror Arrays
C:\>WIN | A better way to focus the sun
The computer obeys and wins. | licences available see
You lose and Bill collects. | http://www.sohara.org/

Pancho

2021-01-04 13:08:10 UTC

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.

What do you want it to do?

gareth evans

2021-01-04 17:51:14 UTC

Post by Pancho

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.
What do you want it to do?

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.

Pancho

2021-01-04 21:57:32 UTC

Post by gareth evans

Post by Pancho

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.
What do you want it to do?

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.

Play with what thing? What is an instruction set, what is the Binary
Blob? Why do you need an AI?

Most compilers leave fingerprints on executables you don't need an AI to
detect them. I remember decompiling in the early 80's but complex modern
code can often be a challenge to naively reverse engineer a high level
understanding from even if you do have source code. Take away sensible
variable and function names and you are stuffed.

gareth evans

2021-01-04 22:23:14 UTC

Post by Pancho

Post by gareth evans

Post by Pancho

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.
What do you want it to do?

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.

Play with what thing? What is an instruction set, what is the Binary
Blob? Why do you need an AI?
Most compilers leave fingerprints on executables you don't need an AI to
detect them. I remember decompiling in the early 80's but complex modern
code can often be a challenge to naively reverse engineer a high level
understanding from even if you do have source code. Take away sensible
variable and function names and you are stuffed.

Somehow I think that we're not singing from the same hymn sheet.

Sorry.

Martin Gregorie

2021-01-04 23:09:07 UTC

Post by gareth evans
Somehow I think that we're not singing from the same hymn sheet.

There is an intermediate disassembler style that sits between a
traditional disassembler and the mythical AI disassembler: that is the
'semi-interactive' type I mentioned. Since I know of at least one of
these that is currently up and running I probably should have explained
it better, so here goes:

What I meant by this is a disassembler that initially generates an
assembly source file but doesn't just save it. Instead it shows that to
the user in an interactive, scrolling display which allows the user to
assign names to branch destinations, call targets and addresses of
variables, while simultaneously storing these in a symbol table, which is
also viewable, editable on screen and can be saved and later reloaded at
the start of a future session.

Most importantly, at any point you can rerun the disassembly, but this
time the disassembler will use the symbol table to include names in the
symbol table in its output. IOW, after you've added one or more
name/address pairs to the symbol table, rerunning the disassembler will
incorporate these into the new version of the disassembled source.
Working this way is obviously faster and less error-prone than saving the
first pass disassembler output and manually editing it.

For extra points the disassembler should be able to:

- start by reading a predefined symbol set that contains the OS API names
and names of OS public variables.

- be configurable to search for and read in more than one symbol set.

- use a modified version of the symbol table editor to add comments that
will appear as comment blocks in front of a nominated address or after
the address content as a trailing content.

- generate a disassembled source file that can be assembled without
needing further changes.

--
--
Martin | martin at
Gregorie | gregorie dot org

Dan Espen

2021-01-04 22:50:16 UTC

Post by Pancho

Post by gareth evans

Post by Pancho

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.
What do you want it to do?

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.

Play with what thing? What is an instruction set, what is the Binary
Blob? Why do you need an AI?
Most compilers leave fingerprints on executables you don't need an AI
to detect them. I remember decompiling in the early 80's but complex
modern code can often be a challenge to naively reverse engineer a
high level understanding from even if you do have source code. Take
away sensible variable and function names and you are stuffed.

I've had more than one experience in putting those meaningful variable
names right back. It's actually pretty easy, a somewhat rote process.
Find the read input instruction. Since you know the layout of the input
record, you now have labels to many of the references to that input
area.

I think you can work out how to proceed.

--
Dan Espen

Pancho

2021-01-04 23:00:54 UTC

Post by Dan Espen

Post by Pancho

Post by gareth evans

Post by Pancho

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.
What do you want it to do?

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.

Play with what thing? What is an instruction set, what is the Binary
Blob? Why do you need an AI?
Most compilers leave fingerprints on executables you don't need an AI
to detect them. I remember decompiling in the early 80's but complex
modern code can often be a challenge to naively reverse engineer a
high level understanding from even if you do have source code. Take
away sensible variable and function names and you are stuffed.

I've had more than one experience in putting those meaningful variable
names right back. It's actually pretty easy, a somewhat rote process.
Find the read input instruction. Since you know the layout of the input
record, you now have labels to many of the references to that input
area.
I think you can work out how to proceed.

Without the source how do you know any meaningful variable names in the
first place?

Peter Flass

2021-01-04 23:59:03 UTC

Post by Pancho

Post by Dan Espen

Post by Pancho

Post by gareth evans

Post by Pancho

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.
What do you want it to do?

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.

Play with what thing? What is an instruction set, what is the Binary
Blob? Why do you need an AI?
Most compilers leave fingerprints on executables you don't need an AI
to detect them. I remember decompiling in the early 80's but complex
modern code can often be a challenge to naively reverse engineer a
high level understanding from even if you do have source code. Take
away sensible variable and function names and you are stuffed.

I've had more than one experience in putting those meaningful variable
names right back. It's actually pretty easy, a somewhat rote process.
Find the read input instruction. Since you know the layout of the input
record, you now have labels to many of the references to that input
area.
I think you can work out how to proceed.

Without the source how do you know any meaningful variable names in the
first place?

I dis a fun side project a few years back. The source for one module of
PL/I(F) was chooched on the distribution tape, about the last third was
missing. I disassembled the object module, and was able to recognize
variable names and standard compiler macros. I got my restored version back
to identical to the original, and also a fairly readable source.

--
Pete

J. Clarke

2021-01-05 01:42:04 UTC

On Mon, 4 Jan 2021 23:00:54 +0000, Pancho

Post by Pancho

Post by Dan Espen

Post by Pancho

Post by gareth evans

Post by Pancho

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.
What do you want it to do?

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.

Play with what thing? What is an instruction set, what is the Binary
Blob? Why do you need an AI?
Most compilers leave fingerprints on executables you don't need an AI
to detect them. I remember decompiling in the early 80's but complex
modern code can often be a challenge to naively reverse engineer a
high level understanding from even if you do have source code. Take
away sensible variable and function names and you are stuffed.

I've had more than one experience in putting those meaningful variable
names right back. It's actually pretty easy, a somewhat rote process.
Find the read input instruction. Since you know the layout of the input
record, you now have labels to many of the references to that input
area.
I think you can work out how to proceed.

Without the source how do you know any meaningful variable names in the
first place?

You start with the inputs and outputs and work into the algorithms and
eventually maybe you can make sense of it.

Dan Espen

2021-01-05 01:59:35 UTC

Post by J. Clarke
On Mon, 4 Jan 2021 23:00:54 +0000, Pancho

Post by Pancho

Post by Dan Espen

Post by Pancho

Post by gareth evans

Post by Pancho

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.
What do you want it to do?

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.

Play with what thing? What is an instruction set, what is the Binary
Blob? Why do you need an AI?
Most compilers leave fingerprints on executables you don't need an AI
to detect them. I remember decompiling in the early 80's but complex
modern code can often be a challenge to naively reverse engineer a
high level understanding from even if you do have source code. Take
away sensible variable and function names and you are stuffed.

I've had more than one experience in putting those meaningful variable
names right back. It's actually pretty easy, a somewhat rote process.
Find the read input instruction. Since you know the layout of the input
record, you now have labels to many of the references to that input
area.
I think you can work out how to proceed.

Without the source how do you know any meaningful variable names in the
first place?

You start with the inputs and outputs and work into the algorithms and
eventually maybe you can make sense of it.

Yep.

One place I was working they had a lost source code program
reconstructed from object code and they were complaining no one
could work on it because of the variable and routine names.

Seemed easy enough to me and I fixed it up in a day or 2.

--
Dan Espen

Dan Espen

2021-01-05 01:55:36 UTC

Post by Pancho

Post by Dan Espen

Post by Pancho

Post by gareth evans

Post by Pancho

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.
What do you want it to do?

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.

Play with what thing? What is an instruction set, what is the Binary
Blob? Why do you need an AI?
Most compilers leave fingerprints on executables you don't need an AI
to detect them. I remember decompiling in the early 80's but complex
modern code can often be a challenge to naively reverse engineer a
high level understanding from even if you do have source code. Take
away sensible variable and function names and you are stuffed.

I've had more than one experience in putting those meaningful
variable
names right back. It's actually pretty easy, a somewhat rote process.
Find the read input instruction. Since you know the layout of the input
record, you now have labels to many of the references to that input
area.
I think you can work out how to proceed.

Without the source how do you know any meaningful variable names in
the first place?

The programs were reading our files.
We already had record layouts for those files.

--
Dan Espen

Pancho

2021-01-05 10:38:16 UTC

Post by Dan Espen

Post by Pancho

Post by Dan Espen

Post by Pancho

Post by gareth evans

Post by Pancho

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.
What do you want it to do?

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.

Play with what thing? What is an instruction set, what is the Binary
Blob? Why do you need an AI?
Most compilers leave fingerprints on executables you don't need an AI
to detect them. I remember decompiling in the early 80's but complex
modern code can often be a challenge to naively reverse engineer a
high level understanding from even if you do have source code. Take
away sensible variable and function names and you are stuffed.

I've had more than one experience in putting those meaningful variable
names right back. It's actually pretty easy, a somewhat rote process.
Find the read input instruction. Since you know the layout of the input
record, you now have labels to many of the references to that input
area.
I think you can work out how to proceed.

Without the source how do you know any meaningful variable names in
the first place?

The programs were reading our files.
We already had record layouts for those files.

Yes, I understand how you can disassemble a simple program. I did it
myself in the 1980s.

However modern programs are much more complex. They are built upon many
levels of indirection, libraries, composition, inheritance, function
pointers, events, etc, etc... We use structure, design patterns and such
like to allow us to recognise complex ideas quickly. That gets lost in
compilation.

I just can't see how I would reverse engineer an understanding of
anything but the most simple disassembly in any reasonable time frame.

Bob Eager

2021-01-05 12:46:13 UTC

Post by Pancho
I just can't see how I would reverse engineer an understanding of
anything but the most simple disassembly in any reasonable time frame.

One of my former colleagues did a Ph.D. on it:

https://kar.kent.ac.uk/61349/

--
Using UNIX since v6 (1975)...

Use the BIG mirror service in the UK:
http://www.mirrorservice.org

Pancho

2021-01-05 13:43:03 UTC

Post by Bob Eager

Post by Pancho
I just can't see how I would reverse engineer an understanding of
anything but the most simple disassembly in any reasonable time frame.

https://kar.kent.ac.uk/61349/

That looks like a surprisingly good fit for my idle, poorly thought out,
wonderings.

Bob Eager

2021-01-05 14:23:20 UTC

Post by Pancho

Post by Bob Eager

Post by Pancho
I just can't see how I would reverse engineer an understanding of
anything but the most simple disassembly in any reasonable time frame.

https://kar.kent.ac.uk/61349/

That looks like a surprisingly good fit for my idle, poorly thought out,
wonderings.

If you have trouble getting a full copy (you shouldn't) let me know. I
can ask him.

--
Using UNIX since v6 (1975)...

Use the BIG mirror service in the UK:
http://www.mirrorservice.org

gareth evans

2021-01-05 15:22:35 UTC

Post by Bob Eager

Post by Pancho
I just can't see how I would reverse engineer an understanding of
anything but the most simple disassembly in any reasonable time frame.

https://kar.kent.ac.uk/61349/

Thanks for the heads-up.

Downloaded for off line perusing.

This has been an interest of mine for some time, and, not
really being in contact with either industry or acadaemia, I bethought
me to be a lone wolf, a voice crying in the wilderness, to borrow
a phrase from the Biblial religionists.

Dan Espen

2021-01-05 14:05:56 UTC

Post by Pancho
Yes, I understand how you can disassemble a simple program. I did it
myself in the 1980s.

The programs I remember were substantial.
Otherwise someone would have just wrote a new one instead of trying
to recover the source.

--
Dan Espen

The Natural Philosopher

2021-01-05 10:51:35 UTC

Post by Pancho

Post by Dan Espen

Post by Pancho

Post by gareth evans

Post by Pancho

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.
What do you want it to do?

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.

Play with what thing? What is an instruction set, what is the Binary
Blob? Why do you need an AI?
Most compilers leave fingerprints on executables you don't need an AI
to detect them. I remember decompiling in the early 80's but complex
modern code can often be a challenge to naively reverse engineer a
high level understanding from even if you do have source code. Take
away sensible variable and function names and you are stuffed.

I've had more than one experience in putting those meaningful variable
names right back. It's actually pretty easy, a somewhat rote process.
Find the read input instruction. Since you know the layout of the input
record, you now have labels to many of the references to that input
area.
I think you can work out how to proceed.

Without the source how do you know any meaningful variable names in the
first place?

Well you have hints. From what the code does...lets say you have code
that loads data from two stack based memory locations adds them together
and used then to access what is clearly an array, - that gives a strong
hint that the original variables can be integers, and the index one is
simply a temporary way to get a value into that array, so you call that
'i' or 'arrayIndex' pro tem...

Then once you have an idea as to what data that array holds, you can
update it and the index to something more meaningful.

The whole process is actually covered in philosophy: It is the problem
of induction. How do you work back from results to causes?

Given that the answer to Life The Universe and Everything was '42', what
in fact was the question? (40+2)? (6x7)?

There are an infinite number of expressions that give that answer, and
an infinite number that don't.

This is where Karl Poppers philosophy of science steps in. Instead of
regarding there to be One True Reason why science works, namely that
scientists are in the business of discovering the Truth, he pointed out
that just because stuff worked (and 6x7 does indeed give 42) that was no
reason to suppose that some other completely different construct might
not work equally as well, and that had indeed happened with relativity
and Newtonian gravity.

The Problem of Induction is that many theories can give the same
predicted result. Sherlock Holmes is a sham. The Dog That Didnt Bark in
the Night didn't bark, allegedly, because it knew the thief. Why? It
might have been abducted by aliens, drugged, actually out hunting
rabbits, in a soundproof box, or the Russians did it using a robot. or
just too plumb wore out with old age to care.

The truth is not provable. All we have is stuff that works. Given
running machine code, there are an infinite number of source codes that
might have produced it, and an infinite number that did not.

We aren't there, ultimately, to reproduce *the* exact source, but to
arrive at *an* editable source, that we can use.
Like science, and religion, it doesn't have to be true, to be useful,
and like science, and religion, its ultimate content will be forever
truth-indecidable.

--
"First, find out who are the people you can not criticise. They are your
oppressors."
- George Orwell

Martin Gregorie

2021-01-05 11:52:10 UTC

Post by The Natural Philosopher
We aren't there, ultimately, to reproduce *the* exact source, but to
arrive at *an* editable source, that we can use.
Like science, and religion, it doesn't have to be true, to be useful,
and like science, and religion, its ultimate content will be forever
truth-indecidable.

+1

--
--
Martin | martin at
Gregorie | gregorie dot org

gareth evans

2021-01-05 12:06:29 UTC

Post by The Natural Philosopher
The whole process is actually covered in philosophy: It is the problem
of induction. How do you work back from results to causes?
Given that the answer to Life The Universe and Everything was '42', what
in fact was the question? (40+2)? (6x7)?
There are an infinite number of expressions that give that answer, and
an infinite number that don't.
This is where Karl Poppers philosophy of science steps in. Instead of
regarding there to be One True Reason why science works, namely that
scientists are in the business of discovering the Truth, he pointed out
that just because stuff worked (and 6x7 does indeed give 42) that was no
reason to suppose that some other completely different construct might
not work equally as well, and that had indeed happened with relativity
and Newtonian gravity.
The Problem of Induction is that many theories can give the same
predicted result. Sherlock Holmes is a sham. The Dog That Didnt Bark in
the Night didn't bark, allegedly, because it knew the thief. Why? It
might have been abducted by aliens, drugged, actually out hunting
rabbits, in a soundproof box, or the Russians did it using a robot. or
just too plumb wore out with old age to care.
The truth is not provable. All we have is stuff that works. Given
running machine code, there are an infinite number of source codes that
might have produced it, and an infinite number that did not.
We aren't there, ultimately, to reproduce *the* exact source, but to
arrive at *an* editable source, that we can use.
Like science, and religion, it doesn't have to be true, to be useful,
and like science, and religion, its ultimate content will be forever
truth-indecidable.

That's an interesting and thought-provoking aside!

Charlie Gibbs

2021-01-05 20:12:39 UTC

Post by The Natural Philosopher
Given that the answer to Life The Universe and Everything was '42',
what in fact was the question? (40+2)? (6x7)?

9x6 - if you're in base 13.

--
/~\ Charlie Gibbs | "Some of you may die,
\ / <***@kltpzyxm.invalid> | but it's a sacrifice
X I'm really at ac.dekanfrus | I'm willing to make."
/ \ if you read it the right way. | -- Lord Farquaad (Shrek)

gareth evans

2021-01-05 11:45:43 UTC

Post by Dan Espen

Post by Pancho
Most compilers leave fingerprints on executables you don't need an AI
to detect them. I remember decompiling in the early 80's but complex
modern code can often be a challenge to naively reverse engineer a
high level understanding from even if you do have source code. Take
away sensible variable and function names and you are stuffed.

I've had more than one experience in putting those meaningful variable
names right back. It's actually pretty easy, a somewhat rote process.
Find the read input instruction. Since you know the layout of the input
record, you now have labels to many of the references to that input
area.
I think you can work out how to proceed.

ISTR that my attack on the executable started by seeking out lines
of code that might be subroutine calls, "JSR PC, address" in the
PDP11 code. This served to create a number of identifiable and
separate blocks from which to proceed.

Of course, this was much easier as it was a stand-alone paper
tape program with no operating system underneath to muddy the
water.

Peter Flass

2021-01-05 21:25:36 UTC

Post by gareth evans

Post by Dan Espen

Post by Pancho
Most compilers leave fingerprints on executables you don't need an AI
to detect them. I remember decompiling in the early 80's but complex
modern code can often be a challenge to naively reverse engineer a
high level understanding from even if you do have source code. Take
away sensible variable and function names and you are stuffed.

I've had more than one experience in putting those meaningful variable
names right back. It's actually pretty easy, a somewhat rote process.
Find the read input instruction. Since you know the layout of the input
record, you now have labels to many of the references to that input
area.
I think you can work out how to proceed.

ISTR that my attack on the executable started by seeking out lines
of code that might be subroutine calls, "JSR PC, address" in the
PDP11 code. This served to create a number of identifiable and
separate blocks from which to proceed.
Of course, this was much easier as it was a stand-alone paper
tape program with no operating system underneath to muddy the
water.

Most architectures seem to be simpler than x86 with its mix of random
instruction lengths. Start at almost any byte and a disassembler would
probably be able to find a run of “instructions” that don’t make any sense
when examined by a human. Disassemblers I have worked with allow for human
input to mark constants, for example, and allow them to be skipped.

--
Pete

J. Clarke

2021-01-05 01:38:50 UTC

On Mon, 4 Jan 2021 21:57:32 +0000, Pancho

Post by Pancho

Post by gareth evans

Post by Pancho

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.
What do you want it to do?

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.

Play with what thing?

The pieces of the hardware supported by the Blob.

Post by Pancho
What is an instruction set,

The list of binary codes that tell the procesor what to do.

Post by Pancho
what is the Binary Blob?

On the Raspberry Pi it is the non-Open-Source proprietary code that is
provided by the chip manufacturer, including parts of the boot loader
and the 3D drivers among other things.

Post by Pancho
Why do you need an AI?

Why not?

Post by Pancho
Most compilers leave fingerprints on executables you don't need an AI to
detect them. I remember decompiling in the early 80's but complex modern
code can often be a challenge to naively reverse engineer a high level
understanding from even if you do have source code. Take away sensible
variable and function names and you are stuffed.

He's talking about something that you can give a pile of object code
from an unknown source (I mean _really_ unknown--it could be for Z/OS
or a VAX or Intel or Alpha or any other architecture, compiled from C
or PL/I or Fortran or pick a language at random, with it figuring from
there what the code does.

gareth evans

2021-01-05 11:54:18 UTC

Post by J. Clarke
He's talking about something that you can give a pile of object code
from an unknown source (I mean _really_ unknown--it could be for Z/OS
or a VAX or Intel or Alpha or any other architecture, compiled from C
or PL/I or Fortran or pick a language at random, with it figuring from
there what the code does.

Indeed!

I've discussed this before (And probably too often according to
my biographers and stalkers! but I'm interested in computers for
themselves, as wonderful complex machines, and not interested in
what you can use them for.

My frustration lies with the Raspberry Pi series that come,
for very little outlay of pennies, with a multi processor
graphics chip which is believed to exceed the capabilities of
the associated ARM processor but about which no detailed
information is forthcoming.

Peter Flass

2021-01-05 21:06:56 UTC

Post by J. Clarke
On Mon, 4 Jan 2021 21:57:32 +0000, Pancho

Post by Pancho

Post by gareth evans

Post by Pancho

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.
What do you want it to do?

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.

Play with what thing?

The pieces of the hardware supported by the Blob.

Post by Pancho
What is an instruction set,

The list of binary codes that tell the procesor what to do.

Post by Pancho
what is the Binary Blob?

On the Raspberry Pi it is the non-Open-Source proprietary code that is
provided by the chip manufacturer, including parts of the boot loader
and the 3D drivers among other things.

Post by Pancho
Why do you need an AI?

Why not?

Post by Pancho
Most compilers leave fingerprints on executables you don't need an AI to
detect them. I remember decompiling in the early 80's but complex modern
code can often be a challenge to naively reverse engineer a high level
understanding from even if you do have source code. Take away sensible
variable and function names and you are stuffed.

He's talking about something that you can give a pile of object code
from an unknown source (I mean _really_ unknown--it could be for Z/OS
or a VAX or Intel or Alpha or any other architecture, compiled from C
or PL/I or Fortran or pick a language at random, with it figuring from
there what the code does.

The object code format would give you a clue, at least for most mainstream
architectures.

--
Pete

gareth evans

2021-01-05 22:52:42 UTC

Post by Peter Flass

Post by J. Clarke
On Mon, 4 Jan 2021 21:57:32 +0000, Pancho

Post by Pancho

Post by gareth evans

Post by Pancho

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

I think a lot of the problem is defining the question.
What do you want it to do?

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.

Play with what thing?

The pieces of the hardware supported by the Blob.

Post by Pancho
What is an instruction set,

The list of binary codes that tell the procesor what to do.

Post by Pancho
what is the Binary Blob?

On the Raspberry Pi it is the non-Open-Source proprietary code that is
provided by the chip manufacturer, including parts of the boot loader
and the 3D drivers among other things.

Post by Pancho
Why do you need an AI?

Why not?

Post by Pancho
Most compilers leave fingerprints on executables you don't need an AI to
detect them. I remember decompiling in the early 80's but complex modern
code can often be a challenge to naively reverse engineer a high level
understanding from even if you do have source code. Take away sensible
variable and function names and you are stuffed.

He's talking about something that you can give a pile of object code
from an unknown source (I mean _really_ unknown--it could be for Z/OS
or a VAX or Intel or Alpha or any other architecture, compiled from C
or PL/I or Fortran or pick a language at random, with it figuring from
there what the code does.

The object code format would give you a clue, at least for most mainstream
architectures.

In the case of the RPi GPU the format is completley unknown.

The Natural Philosopher

2021-01-05 10:29:12 UTC

Post by Pancho
Most compilers leave fingerprints on executables you don't need an AI to
detect them. I remember decompiling in the early 80's but complex modern
code can often be a challenge to naively reverse engineer a high level
understanding from even if you do have source code. Take away sensible
variable and function names and you are stuffed.

+1001

--
"First, find out who are the people you can not criticise. They are your
oppressors."
- George Orwell

Dennis Lee Bieber

2021-01-04 16:05:55 UTC

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?

Actually, I think the use of disassemblers et al has fallen away.
Modern processors have so many peephole optimizations and out-of-order
execution streams that converting an executable back to assembly source is
almost meaningless -- and getting back to a high-level language is near
impossible. One would have to be an expert at the assembly for a processor
to have any chance of understanding the result.

--
Wulfraed Dennis Lee Bieber AF6VN
***@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/

Martin Gregorie

2021-01-04 17:07:35 UTC

Post by Dennis Lee Bieber

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC interpreter in order
to enhance it, I guess that dis-assemblers and decompilers must now be
ten-a-penny,
especially for programs running under Windows where the structure of
Windows programs is well-known with an assumption that C was the source
language?

Actually, I think the use of disassemblers et al has fallen away.
Modern processors have so many peephole optimizations and out-of-order
execution streams that converting an executable back to assembly source
is almost meaningless -- and getting back to a high-level language is
near impossible. One would have to be an expert at the assembly for a
processor to have any chance of understanding the result.

The retro-computing guys - those who are fans of the MC6800 and MC6809
microprocessors anyway, anyway, seem to be getting a rather good semi-
interactive disassembler up and running. So far it understands
executables that run under FLEX, FLEX09 for both 6800 and 6809 and under
UniFlex and OS9/level 1 and 2 on a 6809 and can automatically detect
which OS the binary was compiled for. This is quite impressive, since all
four OSen have very different API call structures despite FLEX09,UniFlex
and OS/9 all running on the same chip.

--
--
Martin | martin at
Gregorie | gregorie dot org

Scott Lurndal

2021-01-04 17:52:31 UTC

Post by Martin Gregorie

Post by Dennis Lee Bieber

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC interpreter in order
to enhance it, I guess that dis-assemblers and decompilers must now be
ten-a-penny,
especially for programs running under Windows where the structure of
Windows programs is well-known with an assumption that C was the source
language?

Actually, I think the use of disassemblers et al has fallen away.
Modern processors have so many peephole optimizations and out-of-order
execution streams that converting an executable back to assembly source
is almost meaningless -- and getting back to a high-level language is
near impossible. One would have to be an expert at the assembly for a
processor to have any chance of understanding the result.

The retro-computing guys - those who are fans of the MC6800 and MC6809
microprocessors anyway, anyway, seem to be getting a rather good semi-
interactive disassembler up and running.

Security experts have several very powerful disassemblers and decompilers
they use for Intel/AMD/ARM processors.

https://en.wikibooks.org/wiki/X86_Disassembly/Disassemblers_and_Decompilers

The Natural Philosopher

2021-01-05 10:28:26 UTC

Post by Scott Lurndal

Post by Martin Gregorie

Post by Dennis Lee Bieber

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC interpreter in order
to enhance it, I guess that dis-assemblers and decompilers must now be
ten-a-penny,
especially for programs running under Windows where the structure of
Windows programs is well-known with an assumption that C was the source
language?

Actually, I think the use of disassemblers et al has fallen away.
Modern processors have so many peephole optimizations and out-of-order
execution streams that converting an executable back to assembly source
is almost meaningless -- and getting back to a high-level language is
near impossible. One would have to be an expert at the assembly for a
processor to have any chance of understanding the result.

The retro-computing guys - those who are fans of the MC6800 and MC6809
microprocessors anyway, anyway, seem to be getting a rather good semi-
interactive disassembler up and running.

Security experts have several very powerful disassemblers and decompilers
they use for Intel/AMD/ARM processors.
https://en.wikibooks.org/wiki/X86_Disassembly/Disassemblers_and_Decompilers

Yes. I am certain that certain compilers and certain languages leave a
fingerprint, Always THAT resister, used to do THAT job, always that
particular sequence of assembly to mimic that high level construct.
I cut my teeth on microprocessor assembly. The C. Some things that are
neat in assembler are ugly as sin in C. Take a call table. In assembler,
you set up a range of memory whose contents contain the addresses of
subroutines. You load the accumulator with a number, left shift it once,
add it to the content of a register set to point to the base of that
memory block, and use that register as pointing to an address whose
contents are the address you want to 'call' Simple, efficient and
provided you ensure nothing out of bounds is in the accumulator, bomb proof.

Now try that in C, you need an array of pointers to functions, and a
simple check on the index you engage, followed by a declaration to call
the function whose address is in the array of pointers to functions. I
never ever managed to get an 8 bit compiler to actually do that. People
just don't call the contents of an array of pointers to functions.

Its easier by far to set up a switch statement, which takes care of out
of bounds defaults, and ends up producing a chain of if..else if.. else
conditional calls to hardwired functions.

That's how you write it, because its pretty much as fast on a pipelined
processor, RAM is cheap and comprehensibility beats programming elegance
hands down in the real world.

I've examined a lot of compiled machine code and its pretty easy to tell
what language it is, and what roughly it was written as. Stack based
variables is a bit of a give away pointing to C or a similar langauge.
highly optimised compilers of course automatically obfuscate things, but
that's the fun isn't it?

I gave up writing assembler for *86 CPUs when the Gnu compiler was
patently doing a better job than I would in assembler, and the ability
to write something long winded and easy to understand and have the
compiler completely rearrange it and turn it into three lines of
incomprehensible assembler, was to be respected.

I think it is up to a limited point entirely possible to make an AI that
could replace machine code with editable and compilable source code.
But there will always be the Problem Of Induction. Many many possible
constructs in source using an infinite number of random variable and
function names, could compile to the same object code. And there is no
way to reinstate the comments either, so it becomes an exercise
ultimately in hand editing and reinstating the comments manually -
almost as big a job as writing from scratch.

I suspect this is how Linux writers write freeware drivers for
proprietary hardware. Disassemble the manufacturers drivers, and at
least mimic the program flow, if not the actual source code.

--
“I know that most men, including those at ease with problems of the
greatest complexity, can seldom accept even the simplest and most
obvious truth if it be such as would oblige them to admit the falsity of
conclusions which they have delighted in explaining to colleagues, which
they have proudly taught to others, and which they have woven, thread by
thread, into the fabric of their lives.”

― Leo Tolstoy

Thomas Koenig

2021-01-05 13:06:44 UTC

Post by The Natural Philosopher
The C. Some things that are
neat in assembler are ugly as sin in C.

One thing that is hard to do with C is to have different entries
to the same function, something like:

bar:
.cfi_startproc
... do something
foo:
... do something else

ret

and then either call foo or bar.

The Natural Philosopher

2021-01-05 13:41:14 UTC

Post by Thomas Koenig

Post by The Natural Philosopher
The C. Some things that are
neat in assembler are ugly as sin in C.

One thing that is hard to do with C is to have different entries
.cfi_startproc
... do something
... do something else
ret
and then either call foo or bar.

Indeed.

I had to write an extended but of assemble once to fit in 2K eprom - a
bios for larger disks using larger sectors than standard msdos.

I was frustratingly about 80 bytes short.
I looked at the code for a long time trying to think what I could use in
various places - and found something.
The original cider was always in the habit of using the same two
register - AX and BX, as scratch, (Sometimes he used DX as well)
so every subroutine started with push AX, PUSH BX, and ended with POP
BX, POP AX, RET. Sometimes he used DX as well

that was three words or four . But JMP MY_EXIT (or DX_EXIT)was only two...

DX_EXIT: POP DX
MY_EXIT: POP BX
POP AX
RET

But that is exactly the sort of things a compiler optimised for code
size rather than speed, would be expected to do. Find common bits of
code and jump to them

--
Climate Change: Socialism wearing a lab coat.

gareth evans

2021-01-05 15:30:06 UTC

Post by Thomas Koenig

Post by The Natural Philosopher
The C. Some things that are
neat in assembler are ugly as sin in C.

One thing that is hard to do with C is to have different entries
.cfi_startproc
... do something
... do something else
ret
and then either call foo or bar.

Blimey, that takes me back over 40 years to a neat trick of mine to
save a couple of bytes (but something today that might get
you the sack in these times of the high cost of software maintenance!).

It's a way of passing on the stack a zero / non zero value.

In PDP11 (Octal!!!!!!!) opcodes ..

012746
5046

the first word says Push the value 5046 onto the stack, but
the second word, 5046 means clear the next stack entry.

So, by jumping to the second word of the instruction, you
push a zero value!

That it warrants such an involved explanation is very good
reason why such techniques should be avoided today! :-)

Martin Gregorie

2021-01-05 16:42:57 UTC

That it warrants such an involved explanation is very good reason why
such techniques should be avoided today! :-)

Agreed.

That sort of thing is so much easier in Java or Algol 68, which both
recognise that methods/procedures with the same name but different
parameter lists are indeed different pieces of code rather than a stupid
mistake.

--
--
Martin | martin at
Gregorie | gregorie dot org

Charlie Gibbs

2021-01-05 20:12:42 UTC

Post by Martin Gregorie

That it warrants such an involved explanation is very good reason why
such techniques should be avoided today! :-)

Agreed.
That sort of thing is so much easier in Java or Algol 68, which both
recognise that methods/procedures with the same name but different
parameter lists are indeed different pieces of code rather than a stupid
mistake.

I avoid that technique - it can lead to other stupid mistakes.

--
/~\ Charlie Gibbs | "Some of you may die,
\ / <***@kltpzyxm.invalid> | but it's a sacrifice
X I'm really at ac.dekanfrus | I'm willing to make."
/ \ if you read it the right way. | -- Lord Farquaad (Shrek)

Peter Flass

2021-01-05 21:25:37 UTC

Post by Thomas Koenig

Post by The Natural Philosopher
The C. Some things that are
neat in assembler are ugly as sin in C.

One thing that is hard to do with C is to have different entries
.cfi_startproc
... do something
... do something else
ret
and then either call foo or bar.

Simple in PL/I, although it turns out there is more overhead than you’d
think, particularly if foo and bar have different return types. I used
multiple entries extensively in the Iron Spring PL/I compiler, but it turns
out the “package” construct (once I implemented it) is much cleaner.
Multiple entries is also error-prone if the entries have different
parameters.

--
Pete

Tauno Voipio

2021-01-06 12:17:30 UTC

Post by Thomas Koenig

Post by The Natural Philosopher
The C. Some things that are
neat in assembler are ugly as sin in C.

One thing that is hard to do with C is to have different entries
.cfi_startproc
... do something
... do something else
ret
and then either call foo or bar.

This is a common construction in compiler-generated
machine code, if the first function calls another
just before return.

bar: .cfi_startproc
... do something
call foo
ret

foo: .. do more ..
ret

If the functions have different stacks, there may be
a need to adjust the stack first before entering the ¨
second function.

--
-TV

Ahem A Rivet's Shot

2021-01-06 12:42:05 UTC

On Wed, 6 Jan 2021 14:17:30 +0200

Post by Tauno Voipio
This is a common construction in compiler-generated
machine code, if the first function calls another
just before return.
bar: .cfi_startproc
... do something
call foo
ret

I recall optimising things like that by changing the last two lines
to:
jmp foo

Post by Tauno Voipio
foo: .. do more ..
ret

--
Steve O'Hara-Smith | Directable Mirror Arrays
C:\>WIN | A better way to focus the sun
The computer obeys and wins. | licences available see
You lose and Bill collects. | http://www.sohara.org/

Tauno Voipio

2021-01-06 14:42:53 UTC

Post by Ahem A Rivet's Shot
On Wed, 6 Jan 2021 14:17:30 +0200

Post by Tauno Voipio
This is a common construction in compiler-generated
machine code, if the first function calls another
just before return.
bar: .cfi_startproc
... do something
call foo
ret

I recall optimising things like that by changing the last two lines
jmp foo

Post by Tauno Voipio
foo: .. do more ..
ret

That's what I intended to say. Try the current release of
GCC for ARM Cortex.

There may be a register pop before the jump, to keep the
stack correct.

--
-TV

Kerr-Mudd,John

2021-01-08 09:48:44 UTC

Post by Ahem A Rivet's Shot
On Wed, 6 Jan 2021 14:17:30 +0200

Post by Tauno Voipio
This is a common construction in compiler-generated
machine code, if the first function calls another
just before return.
bar: .cfi_startproc
... do something
call foo
ret

I recall optimising things like that by changing the last two lines
jmp foo

Post by Tauno Voipio
foo: .. do more ..
ret

I'm naive; what's the problem with:

bar: .cfi_startproc
... do something

;;; call foo
;;; ret
; just fallthru to execute foo and exit.

foo: .. do more ..
ret

--
Bah, and indeed, Humbug.

Ahem A Rivet's Shot

2021-01-08 10:27:51 UTC

On Fri, 8 Jan 2021 09:48:44 -0000 (UTC)

Post by Tauno Voipio

Post by Ahem A Rivet's Shot
On Wed, 6 Jan 2021 14:17:30 +0200

Post by Tauno Voipio
This is a common construction in compiler-generated
machine code, if the first function calls another
just before return.
bar: .cfi_startproc
... do something
call foo
ret

I recall optimising things like that by changing the last two

lines

Post by Ahem A Rivet's Shot
jmp foo

Post by Tauno Voipio
foo: .. do more ..
ret

bar: .cfi_startproc
... do something
;;; call foo
;;; ret
; just fallthru to execute foo and exit.
foo: .. do more ..
ret

Nothing as long as you only have one bar for your foo, often foo
was common finishing for several bars.

--
Steve O'Hara-Smith | Directable Mirror Arrays
C:\>WIN | A better way to focus the sun
The computer obeys and wins. | licences available see
You lose and Bill collects. | http://www.sohara.org/

Questor

2021-01-08 21:40:40 UTC

On Wed, 6 Jan 2021 14:17:30 +0200, Tauno Voipio

Post by Tauno Voipio
This is a common construction in compiler-generated
machine code, if the first function calls another
just before return.
bar: .cfi_startproc
... do something
call foo
ret
foo: .. do more ..
ret

It's a common construction in human-generated assembly as well, at least on the
PDP10.

Instead of

BAR: [do bar stuff]
PUSHJ P, FOO
POPJ, P

One writes

BAR: [do bar stuff]
JRST FOO

and lets the POPJ at the end of FOO return from the call to BAR. Saves an
instruction. In PDP10 land, the mnemonic PJRST is defined to be the JRST
instruction in order to alert the reader of this intention, so one would write

BAR: [do bar stuff]
PJRST FOO

Similarly, routines will often pop (restore) saved registers off the stack
before returning. Rather than duplicate that code, one uses a PJRST to a label
in another routine that does the same thing.

BAR: PUSH P, T1
PUSH P, T2
[do bar stuff]
TPOPJ2: POP P, T2
TPOPJ1: POP P, T1
POPJ P,

FOO: PUSH P, T1
PUSH P, T2
[do foo stuff]
PJRST TPOPJ2

Peter Flass

2021-01-05 21:06:57 UTC

Post by The Natural Philosopher

Post by Scott Lurndal

Post by Martin Gregorie

Post by Dennis Lee Bieber

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC interpreter in order
to enhance it, I guess that dis-assemblers and decompilers must now be
ten-a-penny,
especially for programs running under Windows where the structure of
Windows programs is well-known with an assumption that C was the source
language?

Actually, I think the use of disassemblers et al has fallen away.
Modern processors have so many peephole optimizations and out-of-order
execution streams that converting an executable back to assembly source
is almost meaningless -- and getting back to a high-level language is
near impossible. One would have to be an expert at the assembly for a
processor to have any chance of understanding the result.

The retro-computing guys - those who are fans of the MC6800 and MC6809
microprocessors anyway, anyway, seem to be getting a rather good semi-
interactive disassembler up and running.

Security experts have several very powerful disassemblers and decompilers
they use for Intel/AMD/ARM processors.
https://en.wikibooks.org/wiki/X86_Disassembly/Disassemblers_and_Decompilers

Yes. I am certain that certain compilers and certain languages leave a
fingerprint, Always THAT resister, used to do THAT job, always that
particular sequence of assembly to mimic that high level construct.
I cut my teeth on microprocessor assembly. The C. Some things that are
neat in assembler are ugly as sin in C. Take a call table. In assembler,
you set up a range of memory whose contents contain the addresses of
subroutines. You load the accumulator with a number, left shift it once,
add it to the content of a register set to point to the base of that
memory block, and use that register as pointing to an address whose
contents are the address you want to 'call' Simple, efficient and
provided you ensure nothing out of bounds is in the accumulator, bomb proof.
Now try that in C, you need an array of pointers to functions, and a
simple check on the index you engage, followed by a declaration to call
the function whose address is in the array of pointers to functions. I
never ever managed to get an 8 bit compiler to actually do that. People
just don't call the contents of an array of pointers to functions.

You still have to set up the arguments for each in assembler, unless they
all take the same arguments (or a pointer to an argument list)

You shouldn’t need declarations in C unless you’re using one of those
new-fangled compilers that requires them. Old code should still be
supported, though.

--
Pete

Martin Gregorie

2021-01-05 22:27:26 UTC

Post by Peter Flass
You shouldn’t need declarations in C unless you’re using one of those
new-fangled compilers that requires them. Old code should still be
supported, though.

Last time I tried it, (about 2 months ago), the current GNU C compiler
accepts the old K&R C first edition procedure declaration syntax. I wish
more compilers worked this way.

--
--
Martin | martin at
Gregorie | gregorie dot org

Charlie Gibbs

2021-01-06 00:14:31 UTC

Post by Martin Gregorie

Post by Peter Flass
You shouldn’t need declarations in C unless you’re using one of those
new-fangled compilers that requires them. Old code should still be
supported, though.

Last time I tried it, (about 2 months ago), the current GNU C compiler
accepts the old K&R C first edition procedure declaration syntax. I wish
more compilers worked this way.

I write functions this way:

#ifdef PROTOTYPE
char *foo(char *bar, int baz)
#else
char *foo(bar, baz) char *bar; int baz;
#endif

One #define in a header file adapts it to any old or new compiler.
It works for declarations too.

--
/~\ Charlie Gibbs | "Some of you may die,
\ / <***@kltpzyxm.invalid> | but it's a sacrifice
X I'm really at ac.dekanfrus | I'm willing to make."
/ \ if you read it the right way. | -- Lord Farquaad (Shrek)

Martin Gregorie

2021-01-06 08:25:33 UTC

Post by Martin Gregorie

Post by Peter Flass
You shouldn’t need declarations in C unless you’re using one of those
new-fangled compilers that requires them. Old code should still be
supported, though.

Last time I tried it, (about 2 months ago), the current GNU C compiler
accepts the old K&R C first edition procedure declaration syntax. I
wish more compilers worked this way.

#ifdef PROTOTYPE char *foo(char *bar, int baz)
#else char *foo(bar, baz) char *bar; int baz;
#endif
One #define in a header file adapts it to any old or new compiler.
It works for declarations too.

That's safe but not necessary, for GNU C anyway.

The GNU C compiler series maintains backward compatibility to the year
dot. Dunno about other brands of C compiler, though. Just as well since I
have some sources that were written under OS/9 v2.4, so use the syntax
defined in the original K&R edition and I hate having to edit a source
file just because a new compiler version dropped support for everything
except the latest syntax.

Thats one reason I don't like Python.

COBOL is another language that historically tended to support only the
latest syntax, which is a pain since source files can be huge. I've
worked on COBOL program modules that ran to over 5000 lines back in the
day, i.e before 1978, when COBOL didn't yet support writing separately
compiled subroutines (no LINKAGE SECTION), though AFAIK COBOL has always
supported calling subroutines written in other languages).

--
--
Martin | martin at
Gregorie | gregorie dot org

Dennis Lee Bieber

2021-01-09 02:52:39 UTC

On Wed, 6 Jan 2021 08:25:33 -0000 (UTC), Martin Gregorie

Post by Martin Gregorie
COBOL is another language that historically tended to support only the
latest syntax, which is a pain since source files can be huge. I've
worked on COBOL program modules that ran to over 5000 lines back in the
day, i.e before 1978, when COBOL didn't yet support writing separately
compiled subroutines (no LINKAGE SECTION), though AFAIK COBOL has always
supported calling subroutines written in other languages).

LINKAGE SECTION was part of the COBOL-74 standard, and I recall it
existed on the Xerox Sigma-6 COBOL that was used at my college when I
attended (76-80). Our assignments may not have used it -- or we only had a
short intro to the concept.

However, I'm fairly certain my college compiler did not support "copy
books"... And since that time-frame meant 24x80 text terminals, and line
mode text editors, one would have to manually duplicate the section from a
listing... Or write the program on the IBM 029 card punch -- feeding the
linkage section into it in duplicate mode, then inserting the copy into the
second file...

--
Wulfraed Dennis Lee Bieber AF6VN
***@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/

Martin Gregorie

2021-01-09 15:37:17 UTC

Post by Dennis Lee Bieber
On Wed, 6 Jan 2021 08:25:33 -0000 (UTC), Martin Gregorie

Post by Martin Gregorie
COBOL is another language that historically tended to support only the
latest syntax, which is a pain since source files can be huge. I've
worked on COBOL program modules that ran to over 5000 lines back in the
day, i.e before 1978, when COBOL didn't yet support writing separately
compiled subroutines (no LINKAGE SECTION), though AFAIK COBOL has always
supported calling subroutines written in other languages).

LINKAGE SECTION was part of the COBOL-74 standard, and I recall it
existed on the Xerox Sigma-6 COBOL that was used at my college when I
attended (76-80). Our assignments may not have used it -- or we only had
a short intro to the concept.
However, I'm fairly certain my college compiler did not support "copy
books"... And since that time-frame meant 24x80 text terminals, and line
mode text editors, one would have to manually duplicate the section from
a listing... Or write the program on the IBM 029 card punch -- feeding
the linkage section into it in duplicate mode, then inserting the copy
into the second file...

Compiler features do vary: I don't recall seeing LINKAGE section in any
version of the ICL 1900 COBOL compilers, which I was using 1968-1977. If
LINKAGE sections had been available I'm sure I would have used them and
coded subroutines in COBOL, but though our COBOL code regularly called
subroutines, these were all written in PLAN (assembler). From 1978 onward
I was programming ICL 2900s: 2900 COBOL implemented LINKAGE sections and
we made extensive use of them to split large COBOL programs into modules.

After 1984 I wrote very little COBOL, and that was for DEC and MicroFocus
compilers. None of these projects used COBOL subroutines: the DEC RDB
interface module was language agnostic and so LINKAGE sections weren't
needed. The MicroFocus COBOL projects called C functions.

COPY books were fairly common on ICL 1900 projects.

The ICL 2900 world used COPY books too, though it implemented them as
calls to the Advanced Data Dictionary) rather than as traditional copy
libraries, and handled the IDMSX database interactions via a preprocessor
that converted pseudo-COBOL statements COBOL programs into COBOL
subroutine calls. The IDMSX schema processor converted schema definitions
into the COBOL subroutines called by application programs. All quite
neat, easy to use, and worked very well.

--
--
Martin | martin at
Gregorie | gregorie dot org

Peter Flass

2021-01-10 13:40:12 UTC

Post by Dennis Lee Bieber
On Wed, 6 Jan 2021 08:25:33 -0000 (UTC), Martin Gregorie

Post by Martin Gregorie
COBOL is another language that historically tended to support only the
latest syntax, which is a pain since source files can be huge. I've
worked on COBOL program modules that ran to over 5000 lines back in the
day, i.e before 1978, when COBOL didn't yet support writing separately
compiled subroutines (no LINKAGE SECTION), though AFAIK COBOL has always
supported calling subroutines written in other languages).

LINKAGE SECTION was part of the COBOL-74 standard, and I recall it
existed on the Xerox Sigma-6 COBOL that was used at my college when I
attended (76-80). Our assignments may not have used it -- or we only had a
short intro to the concept.
However, I'm fairly certain my college compiler did not support "copy
books"... And since that time-frame meant 24x80 text terminals, and line
mode text editors, one would have to manually duplicate the section from a
listing... Or write the program on the IBM 029 card punch -- feeding the
linkage section into it in duplicate mode, then inserting the copy into the
second file...

Either your memory is off or this was some site restriction. I did a lot of
COBOL on a Sigma 6, and I’m sure I would have remembered this. I’m trying
to recall the dates, but the numbers won’t come - mid 70s maybe? We started
using BPM/BTM and moved on to UTS when it was released.

--
Pete

Scott Lurndal

2021-01-06 15:15:36 UTC

Post by Martin Gregorie

You shouldnât need declarations in C unless youâre using one of those
new-fangled compilers that requires them. Old code should still be
supported, though.

Last time I tried it, (about 2 months ago), the current GNU C compiler
accepts the old K&R C first edition procedure declaration syntax. I wish
more compilers worked this way.

It will not, however, accept the original V6 C "a =+ b" ambiguous syntax,
so older code may still need to be edited before compilation with a modern
compiler.

Martin Gregorie

2021-01-06 16:09:40 UTC

Post by Scott Lurndal

Post by Martin Gregorie

Post by Peter Flass
You shouldn’t need declarations in C unless you’re using one of those
new-fangled compilers that requires them. Old code should still be
supported, though.

Last time I tried it, (about 2 months ago), the current GNU C compiler
accepts the old K&R C first edition procedure declaration syntax. I wish
more compilers worked this way.

It will not, however, accept the original V6 C "a =+ b" ambiguous
syntax, so older code may still need to be edited before compilation
with a modern compiler.

I don't *think* I've ever written that or even seen it in valid code.

--
--
Martin | martin at
Gregorie | gregorie dot org

Scott Lurndal

2021-01-06 17:07:35 UTC

Post by Martin Gregorie

Post by Scott Lurndal

Post by Martin Gregorie

You shouldnât need declarations in C unless youâre using one of those
new-fangled compilers that requires them. Old code should still be
supported, though.

Last time I tried it, (about 2 months ago), the current GNU C compiler
accepts the old K&R C first edition procedure declaration syntax. I wish
more compilers worked this way.

It will not, however, accept the original V6 C "a =+ b" ambiguous
syntax, so older code may still need to be edited before compilation
with a modern compiler.

I don't *think* I've ever written that or even seen it in valid code.

From the V6 C compiler source:

/*
* The hash table locations of the keywords
* are marked; if an identifier hashes to one of
* these locations, it is looked up in in the keyword
* table first.
*/
for (ip=kwtab; (sp = ip->kwname); ip++) {
i = 0;
while (*sp)
i =+ *sp++;
hshtab[i%hshsiz].hflag = FKEYW;
}

Note also that in that version of the compiler, MOS
(member of structure) names were global and could be
used with any pointer regardless of type.

https://github.com/mortdeus/legacy-cc/blob/master/last1120c/c00.c

This makes it difficult to build the original V6 c compiler using
a modern compiler :-)

Martin Gregorie

2021-01-06 17:38:47 UTC

Post by Martin Gregorie

Post by Scott Lurndal

Post by Martin Gregorie

Post by Peter Flass
You shouldn’t need declarations in C unless you’re using one of
those new-fangled compilers that requires them. Old code should
still be supported, though.

Last time I tried it, (about 2 months ago), the current GNU C compiler
accepts the old K&R C first edition procedure declaration syntax. I
wish more compilers worked this way.

It will not, however, accept the original V6 C "a =+ b" ambiguous
syntax, so older code may still need to be edited before compilation
with a modern compiler.

I don't *think* I've ever written that or even seen it in valid code.

/*
* The hash table locations of the keywords * are marked; if an
identifier hashes to one of * these locations, it is looked up
in in the keyword * table first.
*/
for (ip=kwtab; (sp = ip->kwname); ip++) {
i = 0;
while (*sp)
i =+ *sp++;
hshtab[i%hshsiz].hflag = FKEYW;
}
Note also that in that version of the compiler, MOS (member of
structure) names were global and could be used with any pointer
regardless of type.
https://github.com/mortdeus/legacy-cc/blob/master/last1120c/c00.c
This makes it difficult to build the original V6 c compiler using a
modern compiler :-)

Quite!

--
--
Martin | martin at
Gregorie | gregorie dot org

druck

2021-01-05 21:20:52 UTC

Post by The Natural Philosopher
Yes. I am certain that certain compilers and certain languages leave a
fingerprint, Always THAT resister, used to do THAT job, always that
particular sequence of assembly to mimic that high level construct.

They certainly do, I wrote !ARMalyser to analyse RISC OS executables and
to aid the conversion from the old 26 bit ARM mode to modern Aarch32. It
was very obvious if Norcroft C, GCC or handwritten assembly had been
used by looking at any chunk of the code, not just the obvious file headers.

Post by The Natural Philosopher
I think it is up to a limited point entirely possible to make an AI that
could replace machine code with editable and compilable source code.
But there will always be the Problem Of Induction. Many many possible
constructs in source using an infinite number of random variable and
function names, could compile to the same object code. And there is no
way to reinstate the comments either, so it becomes an exercise
ultimately in hand editing and reinstating the comments manually -
almost as big a job as writing from scratch.

I was not attempting to turn the executable in to a high level language,
but to give the user as much help understanding the assembler code as
possible, to aid the conversion.

At the lowest level identifying what was code and what was data, easy in
well defined executable formats produced by compilers, but hard in
handwritten assembler, which had often used every trick in the book to
squeeze out performance on a 8MHz ARM2 with 512MB of RAM.

The next step was using knowledge of the Standard C Library functions
and SWI APIs to annotate the registers passed and returned from the APIs
and where those registers contain static addresses, the data blocks they
point to.

To allow code to be modified with additional instructions to recreate
flag preserving behaviour of the 26 bit code (in the few cases it is
actually necessary) and data added to make the larger 32 bit file
headers, all code and data addresses are identified and converted in to
labels.

ARMalyser outputs in the standard Object Assembler syntax so it can be
reassembled to produce an identical executable, and subsequently
modified. It can also add syntax colouring in various formats such as
XML, HTML/CSS for viewing.

If you were in marketing you could say the code which does this is 'AI',
but its really a huge chunk of tangled heuristics, which works well most
of the time, but occasionally miss-identifies code or data. Its a bit
too eager to identify code, due to the tricks assembler programmers
used, if I ripped all that out and only worked on compiler generated
executables, it would be a lot more reliable.

---druck

gareth evans

2021-01-04 17:47:06 UTC

Post by Dennis Lee Bieber
Actually, I think the use of disassemblers et al has fallen away.
Modern processors have so many peephole optimizations and out-of-order
execution streams that converting an executable back to assembly source is
almost meaningless -- and getting back to a high-level language is near
impossible. One would have to be an expert at the assembly for a processor
to have any chance of understanding the result.

AF6VN DE G4SDW

But we Radio Hams thrive on such low level technicalities! :-)

73.

Dan Espen

2021-01-04 19:18:47 UTC

Post by Dennis Lee Bieber

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?

Actually, I think the use of disassemblers et al has fallen away.
Modern processors have so many peephole optimizations and out-of-order
execution streams that converting an executable back to assembly source is
almost meaningless -- and getting back to a high-level language is near
impossible. One would have to be an expert at the assembly for a processor
to have any chance of understanding the result.

Well, in my last job I often used disassemblers.
IBM z/OS.
Very useful for understanding IBM code.

I can't see what out of order execution has to do with a disassembler.
You disassemble executables.

Since I understand Assembler, I certainly got meaning out of it
even if the original was an optimized HLL. You can see what services
are being called.

--
Dan Espen

Peter Flass

2021-01-04 23:54:04 UTC

Post by Dan Espen

Post by Dennis Lee Bieber

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?

Actually, I think the use of disassemblers et al has fallen away.
Modern processors have so many peephole optimizations and out-of-order
execution streams that converting an executable back to assembly source is
almost meaningless -- and getting back to a high-level language is near
impossible. One would have to be an expert at the assembly for a processor
to have any chance of understanding the result.

Well, in my last job I often used disassemblers.
IBM z/OS.
Very useful for understanding IBM code.

I was going to say that disassemblers for IBM seem to work fairly well.
I’ve used them a few times.

Post by Dan Espen
I can't see what out of order execution has to do with a disassembler.
You disassemble executables.
Since I understand Assembler, I certainly got meaning out of it
even if the original was an optimized HLL. You can see what services
are being called.

I think, for example, that one disassembler might recognize the SVC
number.i think it put the macro name in as a comment (LINK, GETMAIN, etc.)

--
Pete

Theo

2021-01-04 23:01:05 UTC

Post by Dennis Lee Bieber
Actually, I think the use of disassemblers et al has fallen away.
Modern processors have so many peephole optimizations and out-of-order
execution streams that converting an executable back to assembly source is
almost meaningless -- and getting back to a high-level language is near
impossible. One would have to be an expert at the assembly for a processor
to have any chance of understanding the result.

Apple essentially do this for their Rosetta 2 x86-to-ARM converter. They
take existing x86 executables, which are likely generated by their Xcode
LLVM compiler. They convert the assembly back into LLVM's intermediate
representation, which is the idealised-assembly representation most of the
compiler stages work on. Then they push that IR through the regular ARM
LLVM backend, including optimiser stages, to produce 64-bit ARM executables.

It's not a language intended for humans to read, but it's high enough for
the compiler stages to work on. Doing it this way avoids having to emulate
any ARM instructions.

Theo

Eli the Bearded

2021-01-04 20:11:28 UTC

Post by gareth evans
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?

I suspect AI could be trained to do that, perhaps better than being
trained to read English. Not sure if anyone has ever tried.

Post by gareth evans
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

The info-sec people use disassemblers all the time, and don't limit
themselves to compiled from C and intended for Windows binaries. They
try to extract passwords and locate flaws in firmware for all sorts
of internet-connected things. I recall Cybergibbons creating some
tutorials in November or December. It was linked from his twitter
account, but I didn't pay that close attention to where it was. A
quick look at his blog and youtube didn't find them, but he's got a
robust web presence.

Elijah
------
have you searched if anyone else has reversed engineered it already?

Richard Kettlewell

2021-01-05 09:07:21 UTC

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

Why would you do that instead of reading a reference manual for the
target architecture?

--
https://www.greenend.org.uk/rjk/

Ahem A Rivet's Shot

2021-01-05 09:47:47 UTC

On Tue, 05 Jan 2021 09:07:21 +0000

Post by Richard Kettlewell

Post by gareth evans
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

Why would you do that instead of reading a reference manual for the
target architecture?

The documentation for the GPU on the RPi has not been published, he
seeks to reverse engineer it from the binary code that implements a
published API on it.

--
Steve O'Hara-Smith | Directable Mirror Arrays
C:\>WIN | A better way to focus the sun
The computer obeys and wins. | licences available see
You lose and Bill collects. | http://www.sohara.org/

Richard Kettlewell

2021-01-05 11:13:03 UTC

Post by Ahem A Rivet's Shot

Post by Richard Kettlewell

Post by gareth evans
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

Why would you do that instead of reading a reference manual for the
target architecture?

The documentation for the GPU on the RPi has not been published,
he seeks to reverse engineer it from the binary code that implements a
published API on it.

I was under the impression it was a VideoCore IV, which appears to be
sufficiently documented for GNU toolchain port.

https://docs.broadcom.com/doc/12358545
https://github.com/itszor/vc4-toolchain

--
https://www.greenend.org.uk/rjk/

gareth evans

2021-01-05 12:12:11 UTC

Post by Richard Kettlewell

Post by Ahem A Rivet's Shot

Post by Richard Kettlewell

Post by gareth evans
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

Why would you do that instead of reading a reference manual for the
target architecture?

The documentation for the GPU on the RPi has not been published,
he seeks to reverse engineer it from the binary code that implements a
published API on it.

I was under the impression it was a VideoCore IV, which appears to be
sufficiently documented for GNU toolchain port.
https://docs.broadcom.com/doc/12358545
https://github.com/itszor/vc4-toolchain

The first of those does not produce anything.

Does the second describe the GPU in some detail and describe
the instruction set such that I might produce my own binary blob
to do something completely different?

Also, AIUI, a different GPU has been incorporated into the
64-bit RPis.

Anyway, thanks for your input.

Richard Kettlewell

2021-01-05 13:39:07 UTC

Post by gareth evans

Post by Richard Kettlewell

Post by Ahem A Rivet's Shot
The documentation for the GPU on the RPi has not been published,
he seeks to reverse engineer it from the binary code that implements a
published API on it.

I was under the impression it was a VideoCore IV, which appears to be
sufficiently documented for GNU toolchain port.
https://docs.broadcom.com/doc/12358545
https://github.com/itszor/vc4-toolchain

The first of those does not produce anything.

It’s the VideoCore IV 3D Architecture Reference Guide.

Post by gareth evans
Does the second describe the GPU in some detail and describe
the instruction set such that I might produce my own binary blob
to do something completely different?

It’s a toolchain port.

--
https://www.greenend.org.uk/rjk/

gareth evans

2021-01-05 15:32:08 UTC

Post by Richard Kettlewell

Post by gareth evans

Post by Richard Kettlewell

Post by Ahem A Rivet's Shot
The documentation for the GPU on the RPi has not been published,
he seeks to reverse engineer it from the binary code that implements a
published API on it.

I was under the impression it was a VideoCore IV, which appears to be
sufficiently documented for GNU toolchain port.
https://docs.broadcom.com/doc/12358545
https://github.com/itszor/vc4-toolchain

The first of those does not produce anything.

It’s the VideoCore IV 3D Architecture Reference Guide.

Perhaps not available to the Man On The Clapham Omnibus.

Are you in a privileged position with Broadcom to have
such access?

gareth evans

2021-01-05 15:40:26 UTC

Post by gareth evans

Post by Richard Kettlewell

Post by gareth evans

Post by Richard Kettlewell

Post by Ahem A Rivet's Shot
The documentation for the GPU on the RPi has not been published,
he seeks to reverse engineer it from the binary code that implements a
published API on it.

I was under the impression it was a VideoCore IV, which appears to be
sufficiently documented for GNU toolchain port.
https://docs.broadcom.com/doc/12358545
https://github.com/itszor/vc4-toolchain

The first of those does not produce anything.

It’s the VideoCore IV 3D Architecture Reference Guide.

Perhaps not available to the Man On The Clapham Omnibus.
Are you in a privileged position with Broadcom to have
such access?

Mea Culpa !!!!!!!!

Firefox did not display anything but was quietly downloading the PDF
in the background without me realising!

Scott Lurndal

2021-01-05 16:02:05 UTC

Post by gareth evans

Post by Richard Kettlewell

Post by Ahem A Rivet's Shot

Post by Richard Kettlewell

Post by gareth evans
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

Why would you do that instead of reading a reference manual for the
target architecture?

The documentation for the GPU on the RPi has not been published,
he seeks to reverse engineer it from the binary code that implements a
published API on it.

I was under the impression it was a VideoCore IV, which appears to be
sufficiently documented for GNU toolchain port.
https://docs.broadcom.com/doc/12358545
https://github.com/itszor/vc4-toolchain

The first of those does not produce anything.

Except a PDF entitled "VideoCore IV 3d Architecture Reference Guide".

Check your downloads directory.

e.g.

Fragment shaders are started automatically each time the FEP accumulates a vector of up to four quads (16
pixels) to shade together. The quad input data from the FEP is automatically written into per-thread QPU
registers when the fragment shader is started. The following data is written to these QPU registers, in addition
to the normal PC address, uniforms base address, and uniforms size:

Ahem A Rivet's Shot

2021-01-05 18:27:45 UTC

On Tue, 05 Jan 2021 11:13:03 +0000

Post by Richard Kettlewell
I was under the impression it was a VideoCore IV, which appears to be
sufficiently documented for GNU toolchain port.
https://docs.broadcom.com/doc/12358545

Whoah there, that was proprietary and NDS only last time I looked.
Seems Broadcom have been decent while I wasn't looking - good news!

--
Steve O'Hara-Smith | Directable Mirror Arrays
C:\>WIN | A better way to focus the sun
The computer obeys and wins. | licences available see
You lose and Bill collects. | http://www.sohara.org/

Vir Campestris

2021-01-05 21:15:36 UTC

Post by Richard Kettlewell
I was under the impression it was a VideoCore IV, which appears to be
sufficiently documented for GNU toolchain port.
https://docs.broadcom.com/doc/12358545
https://github.com/itszor/vc4-toolchain

I understand that the latest Pi is indeed a VC4.

Be aware that as a SIMD processor it's ... odd. Very odd.

That documentation ties up with what I remember about the device, and I
wish I'd had it when I was working on it.

Andy

gareth evans

2021-01-05 22:54:22 UTC

Post by Vir Campestris

Post by Richard Kettlewell
I was under the impression it was a VideoCore IV, which appears to be
sufficiently documented for GNU toolchain port.
https://docs.broadcom.com/doc/12358545
https://github.com/itszor/vc4-toolchain

I understand that the latest Pi is indeed a VC4.
Be aware that as a SIMD processor it's ... odd. Very odd.
That documentation ties up with what I remember about the device, and I
wish I'd had it when I was working on it.

You were working on it? What can you tell us?

gareth evans

2021-01-05 11:56:23 UTC

Post by Richard Kettlewell

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

Why would you do that instead of reading a reference manual for the
target architecture?

Because no such manuals are available. The BroadCom GPUs are
a closely guarded proprietary secret to hoi polloi.

A. Dumas

2021-01-05 14:11:33 UTC

Post by gareth evans
Because no such manuals are available. The BroadCom GPUs are
a closely guarded proprietary secret to hoi polloi.

Thanks for not writing the hoi polloi :)

Bob Eager

2021-01-05 14:22:00 UTC

Post by A. Dumas

Because no such manuals are available. The BroadCom GPUs are a closely
guarded proprietary secret to hoi polloi.

Thanks for not writing the hoi polloi :)

Rare, but good!

--
Using UNIX since v6 (1975)...

Use the BIG mirror service in the UK:
http://www.mirrorservice.org

gareth evans

2021-01-05 15:35:26 UTC

Post by A. Dumas

Post by gareth evans
Because no such manuals are available. The BroadCom GPUs are
a closely guarded proprietary secret to hoi polloi.

Thanks for not writing the hoi polloi :)

... and yet my multimeter will read AC current! :-)

... or, in the office, all electrical equipment has to
be PAT tested.

Ahem A Rivet's Shot

2021-01-05 18:47:45 UTC

On Tue, 5 Jan 2021 15:35:26 +0000

Post by gareth evans

Post by A. Dumas

Post by gareth evans
Because no such manuals are available. The BroadCom GPUs are
a closely guarded proprietary secret to hoi polloi.

Thanks for not writing the hoi polloi :)

... and yet my multimeter will read AC current! :-)
... or, in the office, all electrical equipment has to
be PAT tested.

Yes and you need your PIN number to use the ATM machine.

--
Steve O'Hara-Smith | Directable Mirror Arrays
C:\>WIN | A better way to focus the sun
The computer obeys and wins. | licences available see
You lose and Bill collects. | http://www.sohara.org/

Charlie Gibbs

2021-01-05 20:15:49 UTC

Post by Ahem A Rivet's Shot
On Tue, 5 Jan 2021 15:35:26 +0000

Post by gareth evans

Post by A. Dumas

Post by gareth evans
Because no such manuals are available. The BroadCom GPUs are
a closely guarded proprietary secret to hoi polloi.

Thanks for not writing the hoi polloi :)

... and yet my multimeter will read AC current! :-)
... or, in the office, all electrical equipment has to
be PAT tested.

Yes and you need your PIN number to use the ATM machine.

This message has been brought to you by the
Department of Redundancy Department (just down
the hall from the Department of Incomplete

--
/~\ Charlie Gibbs | "Some of you may die,
\ / <***@kltpzyxm.invalid> | but it's a sacrifice
X I'm really at ac.dekanfrus | I'm willing to make."
/ \ if you read it the right way. | -- Lord Farquaad (Shrek)

J. Clarke

2021-01-05 12:32:38 UTC

On Tue, 05 Jan 2021 09:07:21 +0000, Richard Kettlewell

Post by Richard Kettlewell

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

Why would you do that instead of reading a reference manual for the
target architecture?

Because there are features not described in the reference manual.

Scott Lurndal

2021-01-05 16:00:14 UTC

Post by Richard Kettlewell

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

Why would you do that instead of reading a reference manual for the
target architecture?

Unfortunately, broadcom does not play well with others. There is no
reference manual for their graphics on the SoC used by the RPi
available without an NDA. Sure, documentation on the ARM core
is available from arm, but the graphics are proprietary to broadcom.

Adrian Caspersz

2021-01-05 11:26:13 UTC

Post by gareth evans
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?

If that became possible, it would not be a far step for an AI machine to
self-analyse itself or another AI machine. It could make clones and
unwittingly modify them.

Who knows where that could lead, or what mutations could happen? Life?

Post by gareth evans
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

The Chinese would be very interested in you.

I'm sure some of the architecture is provided in layers, some public
like frame buffers and some not like acceleration features. So your
machine code experiments could be done on the former, to learn to walk
first. Or choose another more open graphics chipset if you need more
documentation to get to first base. Perhaps there is on a low end mobile
phone?

Here's a manual way of reverse engineering random chinese hardware.

[016] IT9919 Hacking - part 1 - Reading firmware with flashrom

Your AI solution would have to replicate the ability of the human.

--
Adrian C

Thomas Koenig

2021-01-05 13:07:22 UTC

Post by Adrian Caspersz
If that became possible, it would not be a far step for an AI machine to
self-analyse itself or another AI machine. It could make clones and
unwittingly modify them.

The solution to the halting problem :-)

The Natural Philosopher

2021-01-05 13:42:14 UTC

Post by Thomas Koenig

Post by Adrian Caspersz
If that became possible, it would not be a far step for an AI machine to
self-analyse itself or another AI machine. It could make clones and
unwittingly modify them.

The solution to the halting problem :-)

I like it when you talk dirty...

--
In a Time of Universal Deceit, Telling the Truth Is a Revolutionary Act.

- George Orwell

Eli the Bearded

2021-01-05 18:31:09 UTC

Post by Thomas Koenig

Post by Adrian Caspersz
If that became possible, it would not be a far step for an AI machine to
self-analyse itself or another AI machine. It could make clones and
unwittingly modify them.

The solution to the halting problem :-)

That olcott fellow in comp.theory, comp.ai.philosophy, comp.lang.c (and
who knows where else) would like you to believe he has solved that pesky
halting problem _already_.

Elijah
------
suggested http://web.mst.edu/%7Elmhall/WhatToDoWhenTrisectorComes.pdf

K. Krause

2021-01-05 15:01:07 UTC

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,

I remember DCP16 and DCP24 under OS/8, which I used some years ago
to disassemble PDP8-binarys.
Very efficient tools, which didn't do the job automatically, but
they used tables with symbols, comments and directives how to
interprete different parts of the binary: code, tables, strings,
variables and so on.
Interesting, efficient, great fun. :-)

Klemens

druck

2021-01-05 20:51:33 UTC

Post by gareth evans
Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?

Modern compilers of any language output a structured executable file,
such as Portable Execution format for Windows and ELF for Linux.

Post by gareth evans
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?

That's two separate problems. The first is taking any block of binary
and identifying if it contains an executable format of a particular
processor architecture and OS.

The second is taking a known executable format, turning it in to a human
readable form, such as a high level language - which doesn't have to be
the same language it was written in.

Post by gareth evans
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

That's a third problem. No matter how good your program is that
identified and produces pseudo-source code, it needs someone to put in a
huge amount of work to interpret and document the driver creating
certain structures in memory and poking values in to registers.

---druck

gareth evans

2021-01-05 22:51:12 UTC

Post by druck

Post by gareth evans
But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?

That's two separate problems. The first is taking any block of binary
and identifying if it contains an executable format of a particular
processor architecture and OS.
The second is taking a known executable format, turning it in to a human
readable form, such as a high level language - which doesn't have to be
the same language it was written in.

Sorry but neither. I'm positing the problem of analysing binary when it
does not feature in any known published format.

Dennis Lee Bieber

2021-01-09 03:23:13 UTC

Post by gareth evans
Sorry but neither. I'm positing the problem of analysing binary when it
does not feature in any known published format.

And no published instruction set either?

Consider this (assembly source) structure... (I may have some mistakes
in it, as my manuals are hiding in a storage facility).

arg1 data 1
arg2 data 2
retval data 0
...

bal,15 dostuff
data arg1 ;arg1/arg2/retval are the addresses of the data
data arg2
data retval
...

dostuff lw,14 *15 ;* is indirect access operator
stw,14 param1 ;retrieve and save address of param1
adi,15 1 ;increment link register
lw,14 *15
stw,14 param2
adi,15 1
ld,10 *param1 ;access param1 data
mw,10 *param2
stw,10 *15 ;save result return value
adi,15 1
b *15 ;return from routine
param1 data 0
param2 data 0

--
Wulfraed Dennis Lee Bieber AF6VN
***@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/

Andy Burns

2021-02-12 15:40:36 UTC

Post by gareth evans
I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.

Here's something for you, a C64 emulator that runs baremetal on Pi
hardware (no Linux involved) so you can see exactly how it talks to the GPU

<https://github.com/randyrossi/bmc64>

90 Replies
3 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

gareth evans 2021-01-04 11:00:29 UTC

Ahem A Rivet's Shot 2021-01-04 11:42:52 UTC

Pancho 2021-01-04 13:08:10 UTC

gareth evans 2021-01-04 17:51:14 UTC

Pancho 2021-01-04 21:57:32 UTC

gareth evans 2021-01-04 22:23:14 UTC

Martin Gregorie 2021-01-04 23:09:07 UTC

Dan Espen 2021-01-04 22:50:16 UTC

Pancho 2021-01-04 23:00:54 UTC

Peter Flass 2021-01-04 23:59:03 UTC

J. Clarke 2021-01-05 01:42:04 UTC

Dan Espen 2021-01-05 01:59:35 UTC

Dan Espen 2021-01-05 01:55:36 UTC

Pancho 2021-01-05 10:38:16 UTC

Bob Eager 2021-01-05 12:46:13 UTC

Pancho 2021-01-05 13:43:03 UTC

Bob Eager 2021-01-05 14:23:20 UTC

gareth evans 2021-01-05 15:22:35 UTC

Dan Espen 2021-01-05 14:05:56 UTC

The Natural Philosopher 2021-01-05 10:51:35 UTC

Martin Gregorie 2021-01-05 11:52:10 UTC

gareth evans 2021-01-05 12:06:29 UTC

Charlie Gibbs 2021-01-05 20:12:39 UTC

gareth evans 2021-01-05 11:45:43 UTC

Peter Flass 2021-01-05 21:25:36 UTC

J. Clarke 2021-01-05 01:38:50 UTC

gareth evans 2021-01-05 11:54:18 UTC

Peter Flass 2021-01-05 21:06:56 UTC

gareth evans 2021-01-05 22:52:42 UTC

The Natural Philosopher 2021-01-05 10:29:12 UTC

Dennis Lee Bieber 2021-01-04 16:05:55 UTC

Martin Gregorie 2021-01-04 17:07:35 UTC

Scott Lurndal 2021-01-04 17:52:31 UTC

The Natural Philosopher 2021-01-05 10:28:26 UTC

Thomas Koenig 2021-01-05 13:06:44 UTC

The Natural Philosopher 2021-01-05 13:41:14 UTC

gareth evans 2021-01-05 15:30:06 UTC

Martin Gregorie 2021-01-05 16:42:57 UTC

Charlie Gibbs 2021-01-05 20:12:42 UTC

Peter Flass 2021-01-05 21:25:37 UTC

Tauno Voipio 2021-01-06 12:17:30 UTC

Ahem A Rivet's Shot 2021-01-06 12:42:05 UTC

Tauno Voipio 2021-01-06 14:42:53 UTC

Kerr-Mudd,John 2021-01-08 09:48:44 UTC

Ahem A Rivet's Shot 2021-01-08 10:27:51 UTC

Questor 2021-01-08 21:40:40 UTC

Peter Flass 2021-01-05 21:06:57 UTC

Martin Gregorie 2021-01-05 22:27:26 UTC

Charlie Gibbs 2021-01-06 00:14:31 UTC

Martin Gregorie 2021-01-06 08:25:33 UTC

Dennis Lee Bieber 2021-01-09 02:52:39 UTC

Martin Gregorie 2021-01-09 15:37:17 UTC

Peter Flass 2021-01-10 13:40:12 UTC

Scott Lurndal 2021-01-06 15:15:36 UTC

Martin Gregorie 2021-01-06 16:09:40 UTC

Scott Lurndal 2021-01-06 17:07:35 UTC

Martin Gregorie 2021-01-06 17:38:47 UTC

druck 2021-01-05 21:20:52 UTC

gareth evans 2021-01-04 17:47:06 UTC

Dan Espen 2021-01-04 19:18:47 UTC

Peter Flass 2021-01-04 23:54:04 UTC

Theo 2021-01-04 23:01:05 UTC

Eli the Bearded 2021-01-04 20:11:28 UTC

Richard Kettlewell 2021-01-05 09:07:21 UTC

Ahem A Rivet's Shot 2021-01-05 09:47:47 UTC

Richard Kettlewell 2021-01-05 11:13:03 UTC

gareth evans 2021-01-05 12:12:11 UTC

Richard Kettlewell 2021-01-05 13:39:07 UTC

gareth evans 2021-01-05 15:32:08 UTC

gareth evans 2021-01-05 15:40:26 UTC

Scott Lurndal 2021-01-05 16:02:05 UTC

Ahem A Rivet's Shot 2021-01-05 18:27:45 UTC

Vir Campestris 2021-01-05 21:15:36 UTC

gareth evans 2021-01-05 22:54:22 UTC

gareth evans 2021-01-05 11:56:23 UTC

A. Dumas 2021-01-05 14:11:33 UTC

Bob Eager 2021-01-05 14:22:00 UTC

gareth evans 2021-01-05 15:35:26 UTC

Ahem A Rivet's Shot 2021-01-05 18:47:45 UTC

Charlie Gibbs 2021-01-05 20:15:49 UTC

J. Clarke 2021-01-05 12:32:38 UTC

Scott Lurndal 2021-01-05 16:00:14 UTC

Adrian Caspersz 2021-01-05 11:26:13 UTC

Thomas Koenig 2021-01-05 13:07:22 UTC

The Natural Philosopher 2021-01-05 13:42:14 UTC

Eli the Bearded 2021-01-05 18:31:09 UTC

K. Krause 2021-01-05 15:01:07 UTC

druck 2021-01-05 20:51:33 UTC

gareth evans 2021-01-05 22:51:12 UTC

Dennis Lee Bieber 2021-01-09 03:23:13 UTC

Andy Burns 2021-02-12 15:40:36 UTC

about - legalese

Loading...