29 January 2017, 16:17 | #1 |
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
I am seeking a way to Amiga 1200...
I have a mathematical demo project - it is a program to calculate number π. I am using FS UAE. It is not quite accurate for Amiga 1200. Could anybody help with the genuine hardware? The attachment contains two programs for Amiga 500 and Amiga 1200. These programs are also at the provided disc image. I gather results for 100, 1000 and 3000 digits for Amiga 1200 with both programs. Some project details are at another thread -
http://eab.abime.net/showthread.php?t=85525. Thanks in advance. EDIT. For the best results the console window must be maximized and cleaned (by ECHO CTRL-L) for the every run. Last edited by litwr; 30 January 2017 at 05:11. |
31 January 2017, 00:33 | #2 |
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
I don't have an Amiga 1200 but I tried pi-amiga1200 on a real Amiga3000T with 68060@75MHz (68060 released in 1994). The program appears to work correctly. I entered 9256 digits (max) and then many digits are printed followed by a blank and a number. I maximized my shell before starting (800x600x16 RTG) and used the cls command (CLear Screen) before each run. My results for the last number of 7 runs follows.
1) 25.10 2) 24919.5 3) 24920.0 4) 24920.0 5) 24919.5 6) 25.10 7) 24920.0 I have no clue what these numbers mean but I suspect there is some kind of timing error? I switched to 1600x1200x8 RTG so no scrolling is necessary (all numbers take less than 1/2 the window) and obtained the following results from 7 runs. 1) 25.05 2) 25.05 3) 25.05 4) 25.05 5) 25.05 6) 24917.5 7) 24918.0 You may receive more feedback if you had better directions. There is no set resolution or depth for even a stock Amiga 1200 and different settings may affect the performance. Fast memory would also likely affect the performance. I doubt many active Amiga users are using a real stock Amiga 1200 which may reduce the amount of feedback you receive. The Amiga OS is called AmigaOS and not WB by the way. |
31 January 2017, 08:05 | #3 |
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
Thank you very much. It is the unpleasant surprise for me that Amiga has problems with timing. There is my thread about it - http://eab.abime.net/showthread.php?t=82392.
I don't understand why timing results with the upgraded A3000T are so weird. 25 seconds look like correct timing though. My A500 program uses a raster interrupt count, A1200 - the timer at $bfea01. Does Amiga-3000T have a bit different timer at $bfea01? Did you try the program for A500 with A3000T? My programs are mostly to test hardware of the 80s... I still hope for help from collectors. EDIT. https://en.wikipedia.org/wiki/Workbench_(AmigaOS) says that Workbench and AmigaOS are almost the same things. I used Amiga-500 at 1990. These days we used a word Workbench. Last edited by litwr; 31 January 2017 at 08:45. |
31 January 2017, 14:02 | #4 |
Registered User
Join Date: Oct 2009
Location: Germany
Posts: 3,305
|
On my A1200 040/40 32MB from shell:
pi-amiga1200 (with 100 digits): .02 (whithout text scrolling) pi-amiga1200 (with 1000 digits): 1.00 (whithout text scrolling) pi-amiga1200 (with 3000 digits): 7.28 (whithout text scrolling) and around 9.xx (with text scrolling) pi-amiga1200 (with 9256 digits): 68.36 (with text scrolling) Bonus: Code:
pi_css5 4096 Calculation of PI using FFT and AGM, ver. LG1.1.2-MP1.5.2a.memsave initializing... nfft= 1024 radix= 10000 error_margin= 0.000176453 calculating 4096 digits of PI... AGM iteration precision= 48: 1.08 sec precision= 80: 1.08 sec precision= 176: 1.08 sec precision= 352: 1.06 sec precision= 688: 1.08 sec precision= 1392: 1.08 sec precision= 2784: 1.08 sec precision= 5584: 1.08 sec writing pi4096.txt... 11.82 sec. (real time) pi_css5 16384 Calculation of PI using FFT and AGM, ver. LG1.1.2-MP1.5.2a.memsave initializing... nfft= 4096 radix= 10000 error_margin= 0.000878583 calculating 16384 digits of PI... AGM iteration precision= 48: 4.84 sec precision= 80: 4.86 sec precision= 176: 4.82 sec precision= 352: 4.84 sec precision= 688: 4.86 sec precision= 1392: 4.84 sec precision= 2784: 4.84 sec precision= 5584: 4.86 sec precision= 11168: 4.84 sec precision= 22336: 4.86 sec writing pi16384.txt... 62.56 sec. (real time) Workbench (started by LoadWB) is "only" the GUI of AmigaOS IMHO. AmigaOS is the whole thing (KickROM + Workbench). I for example use DOpus5 as WB-Replacement. So no WB = no OS?? However, many people say/think Workbench is (equal to) AmigaOS. |
31 January 2017, 19:25 | #5 | |||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
The pi-amiga program crashes most of the time for me. I did get 2 runs with max digits to give 18.05 and 17.92 which is strange as that would be faster than the 68020 version. I did some more tests with the pi-amiga1200 version and less digits. 68060@75MHz 100 digits: .00 .02 .00 .00 .00 .00 .02 .00 .00 .02 Average is .006 1000 digits: .33 .33 .33 .35 .33 .35 .35 .33 .35 .33 Average is .338 3000 digits: 2.75 2.77 2.75 2.75 2.77 2.77 2.75 2.77 2.75 2.77 Average is 2.76 These results were consistent over 10 runs each and likely correct so maybe you aren't so far off. There is probably some kind of bug when the digits are very high and for the 68000/500 version which crashes here. Quote:
Quote:
Workbench is not even the GUI which is Intuition/BOOPSI/Reaction/Gadtools by default (builtin). WB is a file manager kind of like the early versions of Windows where DOS was the OS. Yes, WB is part of the AmigaOS but unnecessary even for this test. C= did not do a good job of making it clear. Last edited by matthey; 31 January 2017 at 19:31. |
|||
31 January 2017, 21:06 | #6 | |||||||
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
68060 is very good. It is sad that it was used so seldom. However, IMHO, Intel made a bit better architecture. Motorola couldn't skip the spirit of dino-like VAX completely. If they supported and developed 6502 architecture then they most probably would be leaders of CPU today. Quote:
|
|||||||
31 January 2017, 23:24 | #7 | |||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
Many of the early Amiga games failed because they were poorly programmed. AmigaOS upgrades were more likely to cause problems than a faster CPU. There were some CPU changes which did cause problems like the VBR moving but an MMU can now map back to the original address. Caches can be turned off for compatibility which also slows the CPU. The timing of most Amiga games is based on the video timing so they run at the proper speed. Quote:
Quote:
The 68020 ISA introduced some complex (VAX/PDP-11 like) addressing modes which were challenging for those early processors. The 68060 solved the problem enough that they are not a bottleneck (used sparingly in normal code) and that is with limited transistors. Modern OoO processors where transistor counts matter little would not have a problem with these addressing modes. A modern 68k CPU would probably not clock as high as an x86_64 CPU but likely could be more powerful for the clock speed in single core performance. It depends on if you want to use the colloquial (slang) term or the proper name. |
|||
01 February 2017, 01:07 | #8 |
Registered User
Join Date: Oct 2009
Location: Germany
Posts: 3,305
|
|
01 February 2017, 08:35 | #9 |
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
I ran pi-calculator with an emulated A1200 with WB 1.3.3 a lot of times. It always worked nice. So there is only a problem for more modern OS.
Intel architecture is not perfect but it has less oddity than Motorola. The dedicated address registers, two carry flags, two stacks, ... - all this stuff looks like an implementation of some bad theory. I have to say that NS320xx or DEC CPUs have even more oddities than 680x0. Only 6502 and ARM have more clear architecture than Intel x86. 8086 is 16 bit CPU and it was the first in the x86 line. |
01 February 2017, 10:29 | #10 | ||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,335
|
Quote:
Their protected mode is a total ununderstandable mess. The registers all have special purpose and while we have AL, AH, we don't have anything to access any other byte, nor the upper word of EAX. Addressing modes are quite irregular. Segment registers weren't exactly the best thing they invented either. So yes 68k isn't perfect, but has less oddities than Intel. Quote:
The dedicated registers allow encoding regs in only 3 bits instead of 4, allowing twice the amount of instructions be encoded in the same space, that for the price of a very minor annoyance. And as data/address regs don't behave the same, it's often, in fact, useful to have them separate. Two stacks are *necessary*. When an interrupt occurs, do you really want its data being pushed to the user program's stack ? There is only one supervisor stack but as many user stacks as we have programs (but perhaps you don't know how it works in a multitasking environment ?). Not having two stack pointers would simply mess things up. Well, with all this said, i agree that two carry flags wasn't the exact best idea they could have - but guess what, x86 also has two carry flags (it even has useless parity flag, and its overflow bit isn't in user land) so it has no lesson to give here. Clear, but a lot too spartan and therefore weak. And just everyone has more clear architecture than Intel x86 (8086 wasn't so bad but the more x86 advanced, the worse it became). Now if you don't agree - i believe this will be the case - it could be interesting to compare cpu families by comparing the same code for all of them (spigot seems too simple for this). Maybe worth opening a thread - I will if enough people want to write code for non-68k cpus. Good idea or not ? |
||
01 February 2017, 19:34 | #11 | |||||||
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
Quote:
Why? It is was very clumsy with 80286 but all use paged mode since 80386 which is still the best known. Quote:
It is quite regular since 80386. Quote:
Quote:
Quote:
Quote:
Spartans were not weak. They were the great warriors. Quote:
|
|||||||
01 February 2017, 20:46 | #12 | ||||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,335
|
Quote:
And i'm not speaking about encoding oddities ! Moto's are not postulated, they're easy to understand when having coded enough on it. Just ask me if you want to know more Yet some things on x86 are just plain stupid, like that "DF" bit in the flags, making instructions behave in a different way depending on it, and, of course, making the auto-decrement inconsistent with that of the stack. On 68k the code always disassembles the same way, on x86 you now have 3 modes with incompatible code. Quote:
I have read many manuals and still have no clue of what happens when you change the privilege level on x86. Perhaps you could explain - should be easy if it's "best known". Quote:
It didn't change in 80386, new things just got added. So we even have two very different ways to interpret an addressing mode now - worse than before. Anyway, no dep[sp], no [bp] mode, strange SIB byte in encoding - not what i'd call regular. Quote:
Quote:
Quote:
And anyway user programs don't need to care about this. Quote:
Can you tell me what are the exact operations that are taken when an interrupt comes in protected mode ? Quote:
Quote:
Quote:
But they didn't exactly lived an enjoyable life... And a spartan cpu is weak. Because it doesn't provide the tools to do the job so many instructions have to be used in place of one. |
||||||||||
02 February 2017, 19:06 | #13 |
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
Motorola CPU have a lot of attractive features but they also have some irritating bulky parts. Motorola always tried to make good features which couldn't properly supported by the time technology. 6800/6809 have 16 bit index registers. It is very good theoretically but practically having 8-bit ALU and 8-bit data bus it is very slow and bulky. Motorola 680x0 have less bulky parts but 68000 or 68020 require more ticks for the similar instructions than 8086 or 80286. 68000 or 68020 have more register space and this allows to compensate more ticks. I was a bit irritated by 68000 because I have to work with 32-bit address for 24-bit address bus. It was very irritating when you need registers for data and can't use address registers for this.
I didn't note any problem with Intel LAHF/SAHF. If you need OF to check just use JO or JNO. If you need 16-bit flags just use PUSHF and POP AX. This is not a practical problem. It is about to seek beauty but tastes are different for different men. SHL or SAL are names for the same instruction because logical or arithmetic shifts left are the same. IMHO it is obvious. You say that you can explain all Moto's oddities. Please explain two carry flags. What is wrong with DF flag? It allows to seek a word in a text from left to right or from right to left. It is a very good feature. Moto's ISA has no instructions like very useful and fast REP MOV, REP SCAS, ... Yes, Intel x86 has 3 modes: USE16, USE32, USE64. It is common for complex CPU. Even old 6809 or 65816 have modes. I write x86 programs sometimes and I can assure your I had no problems with it at all. x86 assembly code commonly uses only one mode, a programmer does not have to think about modes. Only a bootstrap code may use them all or some kind of OS which uses calls to ROM BIOS. The modes have proper support with assemblers and debuggers. x86_64 is very easy with privilege levels. It has only two such levels: user and kernel. I don't quite understand your problem. A privilege level is set by OS. It is common to use only 2 levels even with x86 software (Linux, Microsoft Windows). Sorry I do not know anything about protected mode or MMU with 680x0. Amiga 500 or 1200 didn't use it. AH, BH, AL, R8L, DIL, ... are very useful to work with bytes. Intel has a big advantage here over 680x0. You say that you have disappointed by 80386 instruction coding. It is a bit odd. A man should not work as disassembler. I know men from DEC times they liked to write programs directly in ML. It is nonsense today. 80386 provides an easy regular syntax for addressing, it is fast for any of its format. Just use an assembler. "Anyway, no dep[sp], no [bp] mode" - what is it about? It is possible to write, for example, MOV EAX,[ESP+EAX*8+120] or MOV ESP,[ESP+EBP*8+120]. It is slightly odd to say that two stack is safer facing x86 gigantic and very complex software. An interrupt in the protected mode uses a special gate and a common OS stack. The details are not complex, just look Wikipedia for them. However, it is for the simple cases only. It may require a task switch in the general case and every task may have its own stack. I am not OS writer so I have to check the documentation for the details too... BCD had no any sense. It was just a fashion and and attempt to avoid slow and not always available hardware division and multiplication. The example for the use of parity bit is TEST AL,0A0H. The parity bit allows to get value of the 5th bit of AL. It a common misbelief that ARM instructions are too easy. On the contrary an ARM instruction, as rule, requires 2-4 instructions of Intel x86 or 680x0. For example, ADD R1,R2,R3,shl 3 which means to assign R1 the value of R2 + R3*8 without affecting flags. EDIT. I have just corrected information about interrupts in the protected mode. Last edited by litwr; 02 February 2017 at 20:40. |
02 February 2017, 20:32 | #14 |
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
@daxb
Could you detach your 68040 co-pro and run a bare A1200? Please. |
02 February 2017, 21:59 | #15 | ||||||||||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,335
|
Quote:
Quote:
Quote:
Quote:
Quote:
Having two carries is odd but it's not an annoyance. Next oddity, please. Quote:
And Moto's ISA can do ADD.W (A0)+,D0 while Intel's ISA just can't. We can even do MOVE.B (A0)+,-(A1) and x86 can't. Of course 68k can move memory to memory with any addressing modes. Quote:
Quote:
On 68k this is simple. We swap stacks if previously in user mode, then address, sr and eventual extra data gets pushed on stack, then we go to supervisor mode, then we read the vector and jump there. Very easy. No change whether we're using mmu or not. Now on x86 with protected mode ??? Quote:
Quote:
I can resource any 68k program. For x86 : nope - i can't do even one. Quote:
MOV EAX,[ESP+120] MOV EAX,[BP] ... or i missed something with this damned sib byte. Quote:
Quote:
Quote:
Quote:
You can get 5th bit of AL with BTS instruction. 68k has BTST instruction. Therefore, parity bit is useless. Quote:
Now do a division on the ARM. Oh, well. It doesn't even have it... The fact individual instructions can do a lot does not mean this whole bunch is really useful. Try to write a big enough program with ARM and you will see the opposite situation. Btw do you have ARM assembly for your pi calculation program ? |
||||||||||||||||
03 February 2017, 01:19 | #16 | ||||||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
It is a drawback of the 68k that the separation between An and Dn registers is so hard. It was a design decision for the 68000 that separate register files would provide better performance. However, the 68020+ used a monolithic register file and many of the arbitrary barriers were never lifted even though the encodings are often available and would reduce unnecessary data movement (my 68kF ISA shows it would be practical to open up An sources in most instructions while An destinations are more complex and less consistent). The 68k still had 16 registers to the x86 8 registers so it is not like the 68k was at a disadvantage here. There is a little bit to learn about the 68k register division and the auto sign extension but it can be useful and saves encoding bits as you pointed out. Many of those RISC code compressions like Thumb (and others I listed in the other thread) dropped back to 8 GP registers because 3 encoding bits was the most encoding efficient found (immediate sizes were often reduced like 68k quick instructions also). The RISC code compressions tend to have access to less registers with more restrictions than the 68k while providing less code density. How diabolically ingenious to create something that is less than what you started with while getting paid all the way (CISC->RISC->RISC code compression).
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
http://litwr2.atspace.eu/pi/pi-spigot-benchmark.html The code density was mediocre but the performance was good for 1986. |
||||||
03 February 2017, 04:11 | #17 | |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,796
|
Quote:
Can someone explain why code density is important, because I don't get it |
|
03 February 2017, 08:52 | #18 | ||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
Quote:
http://eab.abime.net/showthread.php?...21#post1138521 Did you not understand? Larger code gives more ICache misses (slower and/or uses more electricity). There is also less chance for parallelism if the fetch is small. A low code density CPU requires more resources (larger ICache, more memory, wider/faster memory bus, larger instruction fetch, more power consumption, etc.) to keep up with a high code density CPU. |
||
03 February 2017, 09:56 | #19 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,335
|
I'd like to resource PC programs and edit them and reassemble them, just like i can do on the Amiga.
If i could convert the code on the fly, perhaps a few game ports could even be done as well. However, no tools for that. I've checked several disassemblers on the PC and read several docs. No two of them agree on all details... There appears to be no full opcode map giving everything about x86 including the most recent additions, btw. Quote:
Quote:
And if you attempt to do decimal floating-point this way, even worse. Quote:
Well, ok. There are other reasons. Pressure on the ICache is something, but we could just have more of it. However, bandwidth is more limited. As an example, let's say your ICache can output 8 bytes per clock. If you have 2-byte instructions you can execute up to 4 of them per clock. If they are 4-byte, you can just execute 2. So it's more or less twice faster if instructions are twice slower (overly simplified but you see the idea). Quote:
Code to perform the operation for ABCD/SBCD doesn't sound trivial to me. I don't think you can "emulate" those with only a handful instructions. In matter of BCD, 6502 got it wrong. Code becomes unreadable because you never know if your ADC/SBC is executed bith "D" bit set or not. x86 also got it wrong. You need to perform regular operation then adjust ; this needs a secondary carry used only for that purpose, where few instructions could have made it right away. Of course if BCD wasn't there at all, any program using it would have to do things "by hand", leading to cumbersome code. So perhaps the 68k got it right there too, in spite what many people think about it. More generally, people who pretend x86, arm, or whatever is better than 68k on some aspect like programming or code density, always speak on theoretical grounds and never have any code to show. I could write code for an example in 68k and someone else does x86 or arm, to see if and how 68k is superior to x86 and arm (or, for that matter, to anything else). Any sample of significative size (20-40 instructions) doing some useful work is ok. For example I can do that pi-spigot main loop in just 9 instructions. I don't think x86 can do that. Nor do i think arm can. And this, while it's too short to show much anyway. Anyone can attempt to prove me wrong. But I now think this whole OT should be stopped now, or at least, sent to a dedicated thread. Last edited by meynaf; 03 February 2017 at 09:58. Reason: forgot something |
||||
03 February 2017, 10:46 | #20 | ||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,796
|
Thanks for the code density explanation
Quote:
Quote:
On 68k larger bases are still useful if you don't want largish tables (2 BCD digits x 2 BCD digits = 93636 byte table). It's also potentially better if code that uses tables wouldn't fit in the cache while mul + div would. Edit: Actually, mul + div might be faster than a table approach on 68020/30 because of all the overhead instructions. Last edited by Thorham; 03 February 2017 at 11:07. |
||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
FOR SALE: Amiga 1200 Job Lot 200+ games, 2x Amiga 1200 lots of accessories and spares | erniet5 | MarketPlace | 0 | 28 April 2015 13:34 |
Desperately seeking Amiga Demo Coder | slayerGTN | Amiga scene | 2 | 02 August 2010 23:34 |
Seeking External Amiga Disk Drives (AMP) | Crown | MarketPlace | 5 | 29 October 2008 19:34 |
Seeking for 1 or 2 external disk drives (amiga) | Crown | MarketPlace | 0 | 08 September 2006 09:42 |
Seeking for Amiga music composers | Crown | Amiga scene | 0 | 18 May 2006 12:47 |
|
|