Discussion:
what's a mainframe
(too old to reply)
Lars Poulsen
2024-09-12 02:24:14 UTC
Permalink
I think you'll find a pattern since the
CDC shock of making CPUs fast enough to keep the RAM and I/O devices busy while having
the error checking and recovery features so the systems keep running for years at a time.
So do these systems not require security patches?
Or do they apply PTFs to the running system? (reliably?)
They don't just update the software, they swap out entire hardware subsystems while the
overall system keeps running.
The multiprocessor systems I worked on were small (-4 CPUs), and the
processors shared one memory space, i.e. one OS image. But yes, I have
seen the descriptions of how the Z systems allow a "workload" (i.e. a
job?) to move from one processor to another, and the cluster has spare
processors to move them to.

My brother spent the latter half of his career in that world, in
operations groups supporting banking systems. In departments that
changed owners several times even as the customers remained the same. At
one point his group was part of IBM, then it became Maersk, then a bank
consortium, then spun out as a standalone company, which was then
acquired by an Italian holding company. His wife did software for bank
card processors ... same kind of shifting ownership.

I spent my life in smaller engineering companies, doing embedded micro
work and device drivers for custom peripherals in various minicomputer
OSes and RTOSes.
Lars Poulsen
2024-09-24 01:03:39 UTC
Permalink
From a programmer's perspective, VAX exception handling was very nice.
It may have been high overhead, though.
Very high overhead. But it was also language-independent, and integrated
into the procedure-calling convention, which also managed to be language-
independent.
There is an internal memo on Bitsavers somewhere, critiquing a proposal to
adopt the MIPS architecture (which DEC did, for just one machine, the
DECstation 3000 if I recall rightly), and one of the points against MIPS
was that it didn’t have language-independent exception handling. But then
no other architecture, before the VAX or since, has been able to do that.
VAX "call" instruction ABI was beautiful. As was the conceptual design
of the variable descriptors. But the descriptors collided with the
expectations of the programming languages, and especially C had problems
with both, so they ended up being little used in my experience.
Scott Alfter
2024-09-25 17:01:06 UTC
Permalink
There is an internal memo on Bitsavers somewhere, critiquing a proposal to
adopt the MIPS architecture (which DEC did, for just one machine, the
DECstation 3000 if I recall rightly), and one of the points against MIPS
was that it didn't have language-independent exception handling. But then
no other architecture, before the VAX or since, has been able to do that.
Speaking of MIPS and DECstations, somebody emulated one of those on an Intel
4004 recently. Of course it was used to boot Linux...which took the better
part of five days to get to a shell prompt:

https://www.tomshardware.com/pc-components/cpus/linux-takes-476-days-to-boot-on-an-ancient-intel-4004-cpu-cpu-precedes-the-os-by-20-years

At 1:11 in the linked time-lapse video, the kernel message that scrolls past
on the VFD is "This is a DECstation 2100/3100."
--
_/_
/ v \ Scott Alfter (remove the obvious to send mail)
(IIGS( https://alfter.us/ Top-posting!
\_^_/ >What's the most annoying thing on Usenet?
Lars Poulsen
2024-09-26 17:12:51 UTC
Permalink
Post by Scott Alfter
Speaking of MIPS and DECstations, somebody emulated one of those on an Intel
4004 recently. Of course it was used to boot Linux...which took the better
https://www.tomshardware.com/pc-components/cpus/linux-takes-476-days-to-boot-on-an-ancient-intel-4004-cpu-cpu-precedes-the-os-by-20-years
At 1:11 in the linked time-lapse video, the kernel message that scrolls past
on the VFD is "This is a DECstation 2100/3100."
I'm not sure what the point of this exercise is.

The 4004 is a very slow CPU. It has very limited resources, both in
registers and in memory. It is difficult to even add on a limited amount
of additional resources such as disk storage to keep tract of more bits
than will fit inside the system core.

The MIPS processor in the DECstation is a vastly more resourceful
processor, presumably with a full-fledged memory management system to
support paging. To emulate this on a 4004 is a task well beyond what
could be considered reasonable to get running or likely to succeed. So
ths is about bragging rights to write complex (but not useful) software
on a small system.

It would seem more interesting to me to hear that a MIPS emulator
running under Linux on a modern cheap desktop machine could boot the old
DECstation software (OSF/1?) and hear how that system would perform
compared to the same software on the original hardware.

Like I have been impressed that Hercules on a similar platform runs
OS/360 MVT with a performance like a 1960s mainframe.
David Wade
2024-09-26 17:44:07 UTC
Permalink
Post by Lars Poulsen
Post by Scott Alfter
Speaking of MIPS and DECstations, somebody emulated one of those on an Intel
4004 recently.  Of course it was used to boot Linux...which took the
better
https://www.tomshardware.com/pc-components/cpus/linux-takes-476-days-to-boot-on-an-ancient-intel-4004-cpu-cpu-precedes-the-os-by-20-years
At 1:11 in the linked time-lapse video, the kernel message that scrolls past
on the VFD is "This is a DECstation 2100/3100."
I'm not sure what the point of this exercise is.
The 4004 is a very slow CPU. It has very limited resources, both in
registers and in memory. It is difficult to even add on a limited amount
of additional resources such as disk storage to keep tract of more bits
than will fit inside the system core.
The MIPS processor in the DECstation is a vastly more resourceful
processor, presumably with a full-fledged memory management system to
support paging. To emulate this on a 4004 is a task well beyond what
could be considered reasonable to get running or likely to succeed. So
ths is about bragging rights to write complex (but not useful) software
on a small system.
It would seem more interesting to me to hear that a MIPS emulator
running under Linux on a modern cheap desktop machine could boot the old
DECstation software (OSF/1?) and hear how that system would perform
compared to the same software on the original hardware.
Not sure about a DecStation but my laptop runs OpenVMS 7.3 faster than
my VAX VLC but of course thats CICS emulation.
Post by Lars Poulsen
Like I have been impressed that Hercules on a similar platform runs
OS/360 MVT with a performance like a 1960s mainframe.
On my not very new laptop, its a darn sight faster that a 1960s
mainframe. I get similar performance to a 4mips mainframe on a PI4...

Dave
Kurt Weiske
2024-09-27 14:43:00 UTC
Permalink
To: David Wade
-=> David Wade wrote to alt.folklore.computers <=-

DW> On my not very new laptop, its a darn sight faster that a 1960s
DW> mainframe. I get similar performance to a 4mips mainframe on a PI4...

At the Vintage Computer Festival last year, I saw what looked like a
full-sized IBM midrange control panel with all of the blinkenlights
connected to a Pi 4 in the background. Apparently they were running the
whole stack on it.

kurt weiske | kweiske at realitycheckbbs dot org
| http://realitycheckbbs.org
| 1:218/***@fidonet




--- MultiMail/Win v0.52
--- Synchronet 3.20a-Win32 NewsLink 1.114
* realitycheckBBS - Aptos, CA - telnet://realitycheckbbs.org
D
2024-09-27 16:00:57 UTC
Permalink
Post by Kurt Weiske
To: David Wade
-=> David Wade wrote to alt.folklore.computers <=-
DW> On my not very new laptop, its a darn sight faster that a 1960s
DW> mainframe. I get similar performance to a 4mips mainframe on a PI4...
At the Vintage Computer Festival last year, I saw what looked like a
full-sized IBM midrange control panel with all of the blinkenlights
connected to a Pi 4 in the background. Apparently they were running the
whole stack on it.
kurt weiske | kweiske at realitycheckbbs dot org
| http://realitycheckbbs.org
--- MultiMail/Win v0.52
--- Synchronet 3.20a-Win32 NewsLink 1.114
* realitycheckBBS - Aptos, CA - telnet://realitycheckbbs.org
never used a '60s mainframe ('70s, early cad) but since this is about
mainframe computer stories, this "early history of usenet" was posted
five years ago . . . it's quite interesting, here's a sample w/ t.o.c:

(using Tor Browser 13.5.5)
https://www.cs.columbia.edu/~smb/blog/2019-11/2019-11-14a.html
Post by Kurt Weiske
The Early History of Usenet, Part II: The Technological Setting
14 November 2019
Usenet--Netnews--was conceived almost exactly 40 years ago this month.
To understand where it came from and why certain decisions were made
the way they were, it's important to understand the technological
constraints of the time.
Metanote: this is a personal history as I remember it. None of us were
taking notes at the time; it's entirely possible that errors have crept
in, especially since my brain cells do not even have parity checking,
let alone ECC. Please send any corrections.
In 1979, mainframes still walked the earth. In fact, they were the
dominant form of computing. The IBM PC was about two years in the
future; the microcomputers of the time, as they were known, had too
little capability for more or less anything serious. For some purposes,
especially in research labs and process control systems, so-called
minicomputers--which were small, only the size of one or two full-size
refrigerators--were used. So-called "super-minis", which had the raw
CPU power of a mainframe though not the I/O bandwidth, were starting
to become available.
At the time, Unix ran on a popular line of minicomputers, the Digital
Equipment Corporation (DEC) PDP-11. The PDP-11 had a 16-bit address
space (though with the right OS, you could quasi-double that by using
one 16-bit address space for instructions and a separate one for data);
depending on the model, memory was limited to the 10s of kilobytes
(yes, kilobytes) to a very few megabytes. No one program could access
more than 64K at a time, but the extra physical memory meant that a
context switch could often be done without swapping, since other
processes might still be memory-resident. (Note well: I said "swapping",
not "paging"; the Unix of the time did not implement paging. There was
too little memory per process to make it worthwhile; it was easier to
just write the whole thing out to disk...)
For most people, networking was non-existent. The ARPANET existed (and
I had used it by then), but to be on it you had be a defense contractor
or a university with a research contract from DARPA. IBM had assorted
forms of networking based on leased synchronous communications lines
(plus some older mechanisms for dial-up batch remote job entry), and
there was at least one public packet-switched network, but very, very
few places had connections to it. The only thing that was halfway
common was the dial-up modem, which ran at 300 bps. The Bell 212A full-
duplex, dial-up modem had just been introduced but it was rare. Why?
Because you more or less had to lease it from the phone company: Ma
Bell, more formally known as AT&T. It was technically legal to buy your
own modems, but to hardwire them to the phone network required going
through a leased adapter known as a DAA (data access arrangement) to
"protect the phone network". (Explaining that would take a far deeper
dive into telephony regulation than I have the energy for tonight.)
Usenet originated in a slightly different regulatory environment,
though: Duke University was served by Duke Telecom, a university entity
(and Durham was GTE territory), while UNC Chapel Hill, where I was a
student, was served by Chapel Hill Telephone-the university owned the
phone, power, water, and sewer systems, though around this time the
state legislature ordered that the utilities be divested.
There was one more piece to the puzzle: the computing environments at
UNC and Duke computer science. Duke had a PDP-11/70, then the high-end
model, running Unix. We had a PDP-11/45 intended as a dedicated machine
for molecular graphics modeling; it ran DOS, a minor DEC operating
system. It had a few extra terminal ports, but these didn't even have
modem control lines, i.e., the ports couldn't tell if the line had
dropped. We hooked these to the university computer center's Gandalf
port selector. With assistance from Duke, I and a few others brought up
6th Edition Unix on our PDP-11, as a part-time OS. Some of the faculty
were interested enough that they scrounged enough money to buy a better
8-port terminal adapter and some more RAM (which might have been core
storage, though around that time semiconductor RAM was starting to
become affordable). We got a pair of VAX-11/780s soon afterwards, but
Usenet originated on this small, slow 11/45.
The immediate impetus for Usenet was the desire to upgrade to 7th
Edition Unix. On 6th Edition Unix, Duke had used a modification they
got from elsewhere to provide an announcement facility to send messages
to users when they logged in. It wasn't desirable to always send such
messages; at 300 bps--30 characters a second--a five-line message took
annoying long to print (and yes, I do mean "print" and not "display";
hardcopy terminals were still very, very common). This modification was
not even vaguely compatible with the login command on 7th Edition; a
completely new implementation was necessary. And 7th Edition had uucp
(Unix-to-Unix Copy), a dial-up networking facility. This set the stage
for Usenet.
To be continued...
[end quote plain text]

(using Tor Browser 13.5.5)
Post by Kurt Weiske
https://www.cs.columbia.edu/~smb/blog/2019-11/2019-11-14.html
The Early History of Usenet, Part I: Prologue
https://www.cs.columbia.edu/~smb/blog/2019-11/2019-11-14a.html
The Early History of Usenet, Part II: The Technological Setting
https://www.cs.columbia.edu/~smb/blog/2019-11/2019-11-15.html
The Early History of Usenet, Part III: Hardware and Economics
https://www.cs.columbia.edu/~smb/blog/2019-11/2019-11-17.html
The Early History of Usenet, Part IV: File Format
https://www.cs.columbia.edu/~smb/blog/2019-11/2019-11-21.html
The Early History of Usenet, Part V: Implementation and User Experience
https://www.cs.columbia.edu/~smb/blog/2019-11/2019-11-22.html
The Early History of Usenet, Part VI: Authentication and Norms
https://www.cs.columbia.edu/~smb/blog/2019-11/2019-11-25.html
The Early History of Usenet, Part VII: The Public Announcement
https://www.cs.columbia.edu/~smb/blog/2019-11/2019-11-30.html
The Early History of Usenet, Part VIII: Usenet Growth and B-news
https://www.cs.columbia.edu/~smb/blog/2019-12/2019-12-26.html
The Early History of Usenet, Part IX: The Great Renaming
https://www.cs.columbia.edu/~smb/blog/2020-01/2020-01-09.html
The Early History of Usenet, Part X: Retrospective Thoughts
https://www.cs.columbia.edu/~smb/blog/2020-01/2020-01-09a.html
The Early History of Usenet, Part XI: Errata
https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history
The tag URL ...#TH_Usenet_history will always take you to an index of all
blog posts on this topic.
[end quote]
Lynn Wheeler
2024-09-26 18:39:41 UTC
Permalink
Post by Lars Poulsen
Like I have been impressed that Hercules on a similar platform runs
OS/360 MVT with a performance like a 1960s mainframe.
also in the wake of the Future System implosion, I also got con'ed by
Endicott into helping with 138/148 ECPS microcode (148 was about 600KIPs
370)... told that there was 6kbytes and needed to indentify the highest
executed 6kbytes of kernel 370 execution segments. 370 instruction
simulation ran avg ten native instructions per 370 instruction (about
the same as some of the 80s i86 370 emulators) ... and dropping kernel
370 instructions into microcode about on byte-for-byte ... running ten
times faster. old archived (a.f.c.) post with top 370 6kbytes
accounting for 79.55% of kernel execution:
https://www.garlic.com/~lynn/94.html#21

little over decade ago was asked to track down the IBM decision to add
virtual memory to all 370s, found staff member to executive making the
decision. Basically MVT storage management was so bad that regions sizes
had to be specified four times larger than used ... as a result typical
1mbyte 370/165 only ran four concurrent regions, insufficient to keep
system busy and justified. Mapping MVT to 16mbyte virtual memory would
allow concurrent regions to be increased by factor of four times (caped
at 15 for the 4mbit storage protect keys) with little or no paging (aka
VS2/SVS), sort of like running MVT in a CP/67 16mbyte virtual machine.

Lat 80s got approval for HA/6000 project, originally for NYTimes to move
their newspaper system (ATEX) off DEC VAXCluster to RS/6000. I rename it
HA/CMP when I start doing numeric/scientific cluster scale-up with the
national labs and commercial cluster scale-up with RDBMS vendors
(Oracle, Sybase, Informix, and Ingres that had RDBMS VAXCluster support
in the same source base with Unix).

IBM had been marketing a fault tolerant system as S/88 and the S/88
product administrator started taking us around to their customers
... and also got me to write a section for the corporate continuous
availability strategy document (section got pulled when both Rochester
(AS/400) and POK (mainframe) complained that they couldn't meet the
requirements.

Early Jan1992, in meeting with Oracle CEO, AWD/Hester told Ellison that
we would have 16-system clusters by mid-92 and 128-system clusters by
ye-92 ... however by end of Jan1992, cluster scale-up had been
transferred for announce as IBM Supercomputer and we were told we
couldn't work on anything with more than four processors (we leave IBM a
few months later). Complaints from the other IBM groups likely
contributed to the decision.

(benchmarks are number of program iterations compared to reference
platform, not actual instruction count)

1993: eight processor ES/9000-982 : 408MIPS, 51MIPS/processor
1993: RS6000/990 : 126MIPS; 16-system: 2016MIPs, 128-system: 16,128MIPS

trivia: in the later half of the 90s, the i86 processor chip vendors do
a hardware layer that translates i86 instructions into RISC micro-ops
for execution.

1999 single IBM PowerPC 440 hits 1,000MIPS
1999 single Pentium3 (translation to RISC micro-ops for execution)
hits 2,054MIPS (twice PowerPC 440)

2003 single Pentium4 processor 9.7BIPS (9,700MIPS)

2010 E5-2600 XEON server blade, two chip, 16 processor, aggregate
500BIPS (31BIPS/processor)

The 2010-era mainframe was 80 processor z196 rated at 50BIPS aggregate
(625MIPS/processor), 1/10th XEON server blade
--
virtualization experience starting Jan1968, online at home since Mar1970
Waldek Hebisch
2024-09-28 14:30:45 UTC
Permalink
Post by Lynn Wheeler
2010 E5-2600 XEON server blade, two chip, 16 processor, aggregate
500BIPS (31BIPS/processor)
The 2010-era mainframe was 80 processor z196 rated at 50BIPS aggregate
(625MIPS/processor), 1/10th XEON server blade
I wonder how you get those numbers? Basicaly processor speed is clock
frequency times with of processor time utilization of that. IIUC Z series
processors have pretty high clock frequency. I think that Z can execute 2
instructions in parallel while Xeon 4. My measurements and published
figures indicate that on average Xeon gets about half of peak execution
rate. I have no data for Z, but with narrower machine it is easier
to utilize execution units. And relatively simple scalar (withd 1)
machines get about 0.7 instructions per clock, so while Z _may_ be
worse at utilizing it execution units than Xeon, it should not be
_much_ worse. There is also question of number of instructions
needed to do the work, 360 was rather bad here, Z should be better
but may be worse than Xeon. But this should be factor of say 1.5
in favour of Xeon. So rather conservative estimate of Z capabilities
would suggest that single Xeon processor (core) could do about 3 times
work of single Z processor, while your number imply 50 times more
work.

In business data processing movement of date may be the most impartant
thing and Z seem to have huge caches and wide busses, probably giving
it some advantage over Xeon.

BTW: I have seen old comparison between Pentium and Cray for scientific
computing. Pentium was someting like 4 time faster when looking at
speed of aritmetic operations. But before performing arithemtic
processor needs first to fetch data from memory, and for specific
problem Cray could do this faster. And Cray was faster because
fetching data was the bottleneck, Pentium artimetic units stayed
idle most of the time waiting for data.
--
Waldek Hebisch
Lynn Wheeler
2024-09-28 23:02:48 UTC
Permalink
Post by Waldek Hebisch
I wonder how you get those numbers? Basicaly processor speed is clock
frequency times with of processor time utilization of that.
from original post:

(benchmarks are number of program iterations compared to reference
platform, not actual instruction count)

...

industry standard MIPS benchmark had been number of program iterations
compared to one of the reference platforms (370/158-3 assumed to be one
MIPS) ... not actual instruction count ... sort of normalizes across
large number of different architectures.

consideration has been increasing processor rates w/o corresponding
improvement in memory latency. For instance IBM documentation claimed
that half of the per processor throughput increase going from z10 to
z196 was the introduction of some out-of-order execution (attempting
some compensation for cache miss and memory latency, features that have
been in other platforms for decades).

z10, 64 processors, 30BIPS (469MIPS/proc), Feb2008
z196, 80 processors, 50BIPS (625MIPS/proc), Jul2010

aka half of the 469MIPS/proc to 625MIPS/proc ... (625-469)/2; aka 78MIPS
per processor from Z10 to z196 due to some out-of-order execution.

There have been some pubs about recent memory latency when measured in
terms of processor clock cycles is similar to 60s disk latency when
measured in terms of 60s processor clock cycles.

trivia: early 80s, I wrote a tome that disk relative system throughput
had declined by an order of magnitude since mid-60 (i.e. disks got 3-5
faster while systems got 40-50 times faster). Disk division executive
took exception and assigned the performance group to refute the
claims. After a few weeks they came back and effectively said I had
slightly understated the problem. They then respun the analysis to
configuring disks to increase system throughput (16Aug1984, SHARE 63,
B874).

trivia2: a litle over decade ago, I was asked to track down the decision
to add virtual memory to all IBM 370s. I found staff member to executive
making the decision. Basically MVT storage management was so bad that
region sizes had to be specified four times larger than used. As a
result a typical 1mbyte, 370/165 only ran four concurrent regions at a
time, insufficient to keep 165 busy and justified. Going to MVT in
16mbyte virtual memory (VS2/SVS) allowed increasing the number of
regsions by factor of four times (caped at 15 because of 4bit storage
protect keys) with little or no paging ... similar to running MVT in a
CP67 16mbyte virtual machine (aka increasing overlapped execution while
waiting on disk I/O, and our-of-order execution increasing overlapped
execution while waiting on memory).
--
virtualization experience starting Jan1968, online at home since Mar1970
Lynn Wheeler
2024-09-29 00:01:37 UTC
Permalink
emulation trivia

Note upthread mentions helping endicott do 138/148 ECPS ... basically
manual compiling selected code into "native" (micro)code running ten
times faster. Then in the late 90s did some consulting for Fundamental
Software
https://web.archive.org/web/20240130182226/https://www.funsoft.com/

What is this zPDT? (and how does it fit in?)
https://www.itconline.com/wp-content/uploads/2017/07/What-is-zPDT.pdf

More recent versions of zPDT have added a "Just-In-Time" (JIT) compiled
mode to this. Some algorithm determines whether a section of code should
be interpreted or whether it would be better to invest some more initial
cycles to compile the System z instructions into equivalent x86
instructions to simplify the rocess somewhat). This interpreter plus JIT
compiler is what FLEX-ES used to achieve its high performance. FLEX-ES
also cached the compiled sections of code for later reuse. I have not
been able to verify that zPDT does this caching also, but I suspect so.

... snip ...
--
virtualization experience starting Jan1968, online at home since Mar1970
John Ames
2024-09-30 18:04:47 UTC
Permalink
On Sat, 28 Sep 2024 14:01:37 -1000
Post by Lynn Wheeler
More recent versions of zPDT have added a "Just-In-Time" (JIT)
compiled mode to this. Some algorithm determines whether a section of
code should be interpreted or whether it would be better to invest
some more initial cycles to compile the System z instructions into
equivalent x86 instructions to simplify the rocess somewhat).
[...]
FLEX-ES also cached the compiled sections of code for later reuse. I
have not been able to verify that zPDT does this caching also, but I
suspect so.
I understand the usefulness of JIT in circumstances where you want good
performance in an interpreted language, but may never know ahead of
time what code you'll actually be running (i.e. web browsers,) but I've
always wondered, in a system oriented towards running software built for
a virtual-machine architecture across-the-board, why you wouldn't just
statically translate VM instructions to native code at install time
(doing profiling/optimization while you're at it) and cache that
alongside the source "binary..."
Dennis Boone
2024-10-01 00:24:45 UTC
Permalink
Post by John Ames
I understand the usefulness of JIT in circumstances where you want good
performance in an interpreted language, but may never know ahead of
time what code you'll actually be running (i.e. web browsers,) but I've
always wondered, in a system oriented towards running software built for
a virtual-machine architecture across-the-board, why you wouldn't just
statically translate VM instructions to native code at install time
(doing profiling/optimization while you're at it) and cache that
alongside the source "binary..."
I think Android is doing pretty much exactly that, when it does the
"optimizing applications" phase during system upgrades.

De

Waldek Hebisch
2024-09-30 03:43:11 UTC
Permalink
Post by Lynn Wheeler
Post by Waldek Hebisch
I wonder how you get those numbers? Basicaly processor speed is clock
frequency times with of processor time utilization of that.
(benchmarks are number of program iterations compared to reference
platform, not actual instruction count)
Well, if you mean Dhrystone MIPS, then on Ryzen 5 3600 I get 30501
MIPS from Dhrystone 2.2. This Ryzen has cores faster then Xeon-s that
I used, but various factors could give your number. Still, this is
result for a single core and due to termal bounds does not hold when
one wants to utilize all cores.
Post by Lynn Wheeler
...
industry standard MIPS benchmark had been number of program iterations
compared to one of the reference platforms (370/158-3 assumed to be one
MIPS) ... not actual instruction count ... sort of normalizes across
large number of different architectures.
IIUC the factor was chosen so that 1 MIPS corresponded to 370
executing about 1 million istructions per second. Original
Dhrystone had instruction mix which was supposed to match real
programs. With modern compilers Dhrystone results are inflated
because compilers can substantially reduce number of instructions
needed to do the work. Dhrystone 2.2 is modified compared to
original Dhrystone make it less optimizable (more like real
programs). Anyway, I did not see any public results of
Dhrystone on Z (IIUC IBM forbids publishing benchmark results).
And the figure you gave looks way too low: to do the work machine
needs to execute some number of instructions. How many depends
on optimization in compiler and on "efficiency" of the architecture.
IIUC compilers on Z are reasonably good and architecture is less
efficient than i386, but this should not be big factor. Ryzen
that I tested probably worked at 4.2 GHz (clock is dynamic).
Z was claimed to work above 5GHz, so it is highly unclear to
me how 5GHz machine could benchmark at 625 Dhrystone MIPS.
To put it differently, simple scalar (width 1) processors tend
to benchark at 0.7 to 1.5 Dhrystone MIPS per megahertz.
Z is supposed to be superscalar (of width 2 IIRC), so this
factor should be higher.
Post by Lynn Wheeler
consideration has been increasing processor rates w/o corresponding
improvement in memory latency. For instance IBM documentation claimed
that half of the per processor throughput increase going from z10 to
z196 was the introduction of some out-of-order execution (attempting
some compensation for cache miss and memory latency, features that have
been in other platforms for decades).
Pentium-Pro was out-of-order, but other architectures went to
out-of-order much later
Post by Lynn Wheeler
z10, 64 processors, 30BIPS (469MIPS/proc), Feb2008
z196, 80 processors, 50BIPS (625MIPS/proc), Jul2010
aka half of the 469MIPS/proc to 625MIPS/proc ... (625-469)/2; aka 78MIPS
per processor from Z10 to z196 due to some out-of-order execution.
I wonder if that Z really has 80 processors. Maybe single physical
processor is divided into multiple logical ones? That could explain
low performance of a single processor.
--
Waldek Hebisch
Lawrence D'Oliveiro
2024-09-30 03:50:09 UTC
Permalink
IIUC the factor was chosen so that 1 MIPS corresponded to 370 executing
about 1 million istructions per second.
The VAX-11/780 was the standard for “MIPS”, not IBM.
Waldek Hebisch
2024-09-30 10:37:03 UTC
Permalink
Post by Lawrence D'Oliveiro
IIUC the factor was chosen so that 1 MIPS corresponded to 370 executing
about 1 million istructions per second.
The VAX-11/780 was the standard for “MIPS”, not IBM.
It is well-known that 1 MIPS VAX executed signinficantly less
then one million instructions per second (closer to half million).
And this VAX was comparable in performance with IBM machines
actually doing one million instructions per second. So,
reference machine was a VAX, but calibration factor was based
on IBM machines.
--
Waldek Hebisch
Bill Findlay
2024-09-27 22:59:55 UTC
Permalink
Post by Lars Poulsen
Post by Scott Alfter
Speaking of MIPS and DECstations, somebody emulated one of those on an Intel
4004 recently. Of course it was used to boot Linux...which took the better
https://www.tomshardware.com/pc-components/cpus/linux-takes-476-days-to-boot
-on-an-ancient-intel-4004-cpu-cpu-precedes-the-os-by-20-years
At 1:11 in the linked time-lapse video, the kernel message that scrolls past
on the VFD is "This is a DECstation 2100/3100."
I'm not sure what the point of this exercise is.
Fun.
--
Bill Findlay
geodandw
2024-09-28 05:43:41 UTC
Permalink
Post by Lars Poulsen
Post by Scott Alfter
Speaking of MIPS and DECstations, somebody emulated one of those on an Intel
4004 recently. Of course it was used to boot Linux...which took the better
https://www.tomshardware.com/pc-components/cpus/linux-takes-476-days-to-boot
-on-an-ancient-intel-4004-cpu-cpu-precedes-the-os-by-20-years
At 1:11 in the linked time-lapse video, the kernel message that scrolls past
on the VFD is "This is a DECstation 2100/3100."
I'm not sure what the point of this exercise is.
Fun.
I don't think an emulator for MIPS or
Decstation could be written in the space available on a 4004.
Waldek Hebisch
2024-09-28 15:18:24 UTC
Permalink
Post by Lars Poulsen
Like I have been impressed that Hercules on a similar platform runs
OS/360 MVT with a performance like a 1960s mainframe.
Faithfully emulating a mainframe required substantial work. But
when discussing speed most impressive is raw speed of modern
hardware. Simple benchmarks indicate that average modern
desktop core is about 10000 times faster than VAX 750.
Strightforward emulation usually needs betwenn 50 and 100
native instructions per emulated instruction. That gives
speed of tens of MIPS for emulated mainframe. And this
rough estimate agrees with my experience with Hercules.
There is faster technigue: translating parts of emulated program
into native instructions (QEMU is doing this). With such
a techinque one can get someting like 5 native instructions
per emulated instruction (in average). IIUC commercial Turbo
Hercules wanted to do (did??) this to get much better
speed. But business plan of Turbo Hercules did not work:
they wanted to use IBM OS and IBM refused to licence its
OS (and court took IBM side).

In samewhat different spirit, I played both with Bochs and
Hercules on the same machine. Hercules gave me about 10
times better speed than Bochs (Bochs emulating i386).

Waldek Hebisch
Loading...