English Amiga Board


Go Back   English Amiga Board > Support > support.Hardware > Hardware mods

 
 
Thread Tools
Old 16 June 2013, 09:07   #201
lukassid
Registered User
 
lukassid's Avatar
 
Join Date: Oct 2012
Location: Surrey
Posts: 390
Hey guys please let's not start giving personal preferences or wishes again.
Instead support Majsta in his quest.
lukassid is offline  
Old 16 June 2013, 10:22   #202
adrianh78
Registered User
 
Join Date: Nov 2012
Location: Northampton
Posts: 25
14 mips on a A600 - absolutely incredible

Well done sir

@majsta - if you don't mind me saying so, this is totally your quest so if you heart has got you this far on a A600 project, if it was me I would see it through to conclusion rather than taking on other projects. Plenty of times for those if you then wish later on?

If you could keep the mips higher that would be amazing; I could see SCUMM ECS as an immediate application open with more power available.

I'll finish my post with this
adrianh78 is offline  
Old 16 June 2013, 10:29   #203
Dokugogagoji
 
Posts: n/a
Impressive work!

Can't wait to see your next results!
 
Old 16 June 2013, 10:32   #204
adrianh78
Registered User
 
Join Date: Nov 2012
Location: Northampton
Posts: 25
@majsta - take you time and enjoy the ride - we'll enjoy reading about it
adrianh78 is offline  
Old 16 June 2013, 11:25   #205
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,574
Quote:
Originally Posted by matthey View Post
The TG68 isn't going to do it in a reasonably priced fpga despite what SysInfo says.
Exactly, do not use SysInfo to measure CPU speed. Never use SysInfo to measure CPU speed! At least do not use it compare results with 68060! Never ever.

SysInfo speed benchmark is totally stupid, any CPU that can execute multiple instructions (68060) at the same time will get too slow results because code is extremely stupid. Also it uses very long instruction sequences and no loops like normal real world code has which makes caches (at least small 68020/030 caches) mostly useless.

Yes, it does test CPU speed but it does not test real world CPU speed.

Code:
<snip>
14077F90 4840                     SWAP.W D0
14077F92 c141                     EXG.L D0,D1
14077F94 4281                     CLR.L D1
14077F96 4480                     NEG.L D0
14077F98 d080                     ADD.L D0,D0
14077F9A d080                     ADD.L D0,D0
14077F9C d080                     ADD.L D0,D0
14077F9E d080                     ADD.L D0,D0
14077FA0 0680 0000 3039           ADD.L #$00003039,D0
14077FA6 0480 0000 3039           SUB.L #$00003039,D0
14077FAC 207c 1407 9250           MOVEA.L #$14079250,A0
14077FB2 2028 0004                MOVE.L (A0, $0004) == $14079254,D0
14077FB6 2028 0008                MOVE.L (A0, $0008) == $14079258,D0
14077FBA 2028 000c                MOVE.L (A0, $000c) == $1407925c,D0
</snip>
It is difficult for the CPU to execute multiple instructions at the same time when nearly all instructions use and modify same registers!
Toni Wilen is offline  
Old 16 June 2013, 11:40   #206
majsta
www.majsta.com
 
majsta's Avatar
 
Join Date: Jun 2010
Location: Banjaluka/Republic of Srpska
Age: 43
Posts: 448
@Loedown You are missing one importan thing regarding PPC and BGA. Emulating PPC in FPGA would require FPGA with lot of LE and only FPGA's capable of emulating complex designs with lot's of LE are also in BGA actually FBGA or UFBGA so Another thing design will be opensourced so there is no special requirements for specific FPGA or some other parts.

@all yes many projects for Amiga were canceled because community constantly demanded some changes or upgrades but here we are beyond that because on same hardware design we could add later something we want. It is enough to properly connect all FPGA unused pins to some header and on that header we can add hardware part. That is the reason why I created 2 headers who are compatible with almost all cheap arduino addition boards. So if you check about arduino project you will notice that simple wireless adapters or usb are few USD so that way we will never have expensive Amiga hardware. So what I m trying to do is to stop exploiting Amiga scene where people give bunch of money for 1 mega memory. Complete goal is to have fastest Amiga accelerator for A600 for smallest possible price and I have managed to remove some parts in the design to reduce manufacturing costs and drop price to 90Euro. What is most important here that if you don't like the price you can try to build it yourself because you will have all documents needed when time comes.
majsta is offline  
Old 16 June 2013, 12:03   #207
Loedown
Precious & fragile things
 
Join Date: Feb 2009
Location: Victoria, Australia
Posts: 1,946
Quote:
Originally Posted by majsta View Post
@Loedown You are missing one importan thing regarding PPC and BGA. Emulating PPC in FPGA would require FPGA with lot of LE and only FPGA's capable of emulating complex designs with lot's of LE are also in BGA actually FBGA or UFBGA so Another thing design will be opensourced so there is no special requirements for specific FPGA or some other parts.
So sadly true but I have noticed that manufacturers seem to be starting to steer away from BGA, it's a flawed system IMHO.

Is it then possible to use several of your FPGA in parallel to do complex emulation?

I am genuinely not trying to hijack the thread but I am also interested in future-proofing your solution.
Loedown is offline  
Old 16 June 2013, 12:21   #208
majsta
www.majsta.com
 
majsta's Avatar
 
Join Date: Jun 2010
Location: Banjaluka/Republic of Srpska
Age: 43
Posts: 448
It could be done to connect several FPGA actually it is easy task. For example to use one just for cache but there is no point doing that now at this conditions. Design to achieve those last performance i presented takes about 6000LE and could be dropped more with code optimizations so using larger FPGA is not an option because simply there is no need for that. For example complete Amiga recreation code can take about 20 000LE and those days we have FPGA capable of more than 100 000LE.
majsta is offline  
Old 16 June 2013, 15:15   #209
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by majsta View Post
@codeflash as I stated number of times it is not important what core you use and what speed is used for that core or even what CPU you use it is all about the cache. Any core or CPU used to replicate 7MHz bus cycles at one side an on other side it works at higher speed when communicate with SDRAM. So without cache total performance of this design is about 2.64Mips, adding cache is something that gives performance boost.
Cache performance makes a huge difference. That is the biggest reason why a 68k processor destroyed the PowerPC 440 and NIOS processors in the benchmarks I linked. When the cache handling is efficient, the next bottleneck will limit performance. That may be the lack of pipelining on the TG68 or something else.

I agree with your other points and nice work on the performance. I hope you find the memory problem.

Quote:
Originally Posted by Toni Wilen View Post
Exactly, do not use SysInfo to measure CPU speed. Never use SysInfo to measure CPU speed! At least do not use it compare results with 68060! Never ever.

...

Yes, it does test CPU speed but it does not test real world CPU speed.
It's about the worst benchmark ever .

Quote:
Originally Posted by Toni Wilen View Post
It is difficult for the CPU to execute multiple instructions at the same time when nearly all instructions use and modify same registers!
Some modern processors would have a bigger problem but the 68060 sometimes handles multiple consecutive instructions using the same register by using early instruction completion and register forwarding when the results are longword. There usually is a change/use delay when loading an address register (or scale register) and using it right away (none in this case for 060) but the same applies to the 68040 (which has bigger penalties). The 68060 can only do SWAP and EXG in the pOEP (primary integer unit) so they aren't 2x as fast. They should have put SWAP in both OEP considering the 68k does not have a shift >8 and the result can be forwarded.

Last edited by matthey; 16 June 2013 at 17:58.
matthey is offline  
Old 16 June 2013, 16:29   #210
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
Quote:
Originally Posted by matthey View Post
Some modern processors would have a problem but the 68060 handles multiple consecutive instructions using the same register by using register forwarding as long as the results are longword which they are in the code you provided.
Great for basic one-instruction-per-cycle pipelining, but not superscalar. You can't execute two instructions at once if they use the same register, or you would have to forward the result back in time!
Mrs Beanbag is offline  
Old 16 June 2013, 17:00   #211
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by Mrs Beanbag View Post
Great for basic one-instruction-per-cycle pipelining, but not superscalar. You can't execute two instructions at once if they use the same register, or you would have to forward the result back in time!
Sometimes you can. I should say the 68060 handles these problems as well as possible when the registers are a longword. There are several common cases where the 68060 can do multiple instructions in the same cycle using the same register because of early instruction completion/retirement and register forwarding. This code is just about worst case for a modern processor but the 68060 can still do better than 1 instruction per cycle (barely) and doesn't need time travel .

Last edited by matthey; 16 June 2013 at 17:52.
matthey is offline  
Old 16 June 2013, 17:37   #212
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
Quote:
Originally Posted by matthey View Post
Sometimes you can. I should say the 68060 handles these problems as well as possible when the registers are a longword . There are several common cases where the 68060 can do multiple instructions in the same cycle using the same register because of early instruction completion/retirement and register forwarding.
If the source operand is the same for both instructions it can do what it likes... if the second instruction depends on the result of the first, it can't do the second until after the first has executed in the ALU. Register forwarding bypasses the register writeback/register read cycle to pass the result straight back into the ALU ready for the next cycle. You can't do two calculations at the same time and have one of them depend on the result of the other, that's not just an architectural problem, that's a basic logical impossibility.*

Well there are instances, I suppose, where two instructions could be replaced by one. For instance the above code where Add D0,D0; Add D0,D0 can be replaced by Lsl #2,D0. I believe people do write such things in real code because on a 68000 it is faster than the equivalent shift. But you'd have to be some kind of maniac to design a CPU to do that when a programmer/compiler could just avoid doing such tricks.

Also if the second instruction is a Move, well the result of the first instruction is going to get trashed anyway so the CPU could just not even bother doing it. But that would also be a crazy thing for the programmer to do.

*Until they invent practical quantum computers
Mrs Beanbag is offline  
Old 16 June 2013, 17:59   #213
codeflash
Registered User
 
Join Date: Jun 2013
Location: Erimang
Posts: 45
If the FPGA is enough for a complete Amiga. One could implement AGA inside it and stream the output to the main computer bus to be displayed by the original graphics
codeflash is offline  
Old 16 June 2013, 18:12   #214
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by Mrs Beanbag View Post
Well there are instances, I suppose, where two instructions could be replaced by one. For instance the above code where Add D0,D0; Add D0,D0 can be replaced by Lsl #2,D0. I believe people do write such things in real code because on a 68000 it is faster than the equivalent shift. But you'd have to be some kind of maniac to design a CPU to do that when a programmer/compiler could just avoid doing such tricks.
An all register ADD instruction could be completed early in the EA calculate stage and forwarded to the other OEP for the main ALU execute stage. That is similar to how longword immediates are handled in the 68060.

Code:
 moveq #9,d0 ;pOEP cycle 1, EA calc->
 add.l d0,d0 ;sOEP cycle 1, ->ALU execute
This code deserves to be in the SysInfo benchmark except the 68060 can handle it too well .

Last edited by matthey; 16 June 2013 at 18:21.
matthey is offline  
Old 16 June 2013, 18:30   #215
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
Quote:
Originally Posted by matthey View Post
An all register ADD instruction could be completed early in the EA calculate stage and forwarded to the other OEP for the main ALU execute stage. That is similar to how longword immediates are handled in the 68060.
Well one could certainly do
lea (D0,A0),A0
add.l D0,A0
and the result of the first would be available for the second, but a standard integer ADD instruction being executed by the Address Generation stage is something I never heard of.

Quote:
Originally Posted by matthey View Post
Code:
 moveq #9,d0 ;pOEP cycle 1, EA calc->
 add.l d0,d0 ;sOEP cycle 1, ->ALU execute
This code deserves to be in the SysInfo benchmark except the 68060 can handle it too well .
Of course this pair is trivial since the #9 is available immediately in any case, without the ALU having to do anything to it. Although why you wouldn't just moveq #18 I can't imagine.
Mrs Beanbag is offline  
Old 16 June 2013, 18:48   #216
desiv
Registered User
 
desiv's Avatar
 
Join Date: Oct 2009
Location: Salem, OR
Posts: 1,770
Quote:
Originally Posted by matthey View Post
It takes a powerful fpga and/or a high clock speed and a good core design to beat a 68060. The TG68 isn't going to do it in a reasonably priced fpga despite what SysInfo says.
Today...

But the nice thing about designing an FPGA solution is that FPGAs are getting more powerful and less expensive over time..
Not so with the actual 060's. (At least the ones that are compatible with the Amiga)

It's just a matter of time.. ;-)

desiv
desiv is offline  
Old 16 June 2013, 19:28   #217
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
Quote:
Originally Posted by desiv View Post
Today...

But the nice thing about designing an FPGA solution is that FPGAs are getting more powerful and less expensive over time..
Not so with the actual 060's. (At least the ones that are compatible with the Amiga)

It's just a matter of time.. ;-)

desiv
TG68 is a pretty basic 68000 implementation from what I gather (I don't say that to do it down any, it's still a great achievement). A fully pipelined implementation, even without such luxuries as superscalar execution, could do very well. If we got even half of a 68060's instructions per clock, at 80MHz, it would be worth doing. Especially since memory access could be that much faster, caches that much bigger etc..
Mrs Beanbag is offline  
Old 16 June 2013, 19:50   #218
kipper2k
Registered User
 
Join Date: Sep 2006
Location: Thunder Bay, Canada
Posts: 4,323
I think with the reduced availability of the older processers then the time is right for the FGPA's to take over. With the constant increase in the performance of these things then quite literally the speed capabilities of future upgrades for the Amiga/Atari is looking very bright .

Hats off to Majsta for sticking it out all this time, i know there were times when he felt as though it was too tough to continue.
kipper2k is offline  
Old 16 June 2013, 20:14   #219
amiman99
Registered User
 
amiman99's Avatar
 
Join Date: Sep 2009
Location: San Antonio, TX USA
Age: 50
Posts: 1,185
Quote:
Originally Posted by Lord Aga View Post
Madre de Dios !
Sweet Mother of God !
14.35 Mips !
Great job majsta, I remember when you almost gave up.

It's funny how we get so exited about 14.35MIPS, when we can get processors with 177,000MIPS(i7)
amiman99 is offline  
Old 16 June 2013, 20:15   #220
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
Yeah I don't understand how anyone can have anything negative to say about this project.

Whether it can ever match a 68060 or not, people are still buying 68030 and even 68020 accelerators, anyway 10-20x speed-up for an A600 with a completely home-made card is impressive. I'd like to see some real software running on it though, would be a better demo than a synthetic benchmark.

Quote:
Originally Posted by amiman99 View Post
It's funny how we get so exited about 14.35MIPS, when we can get processors with 177,000MIPS(i7)
14.35 MIPS is a triumph, 177,000 MIPS is a statistic.
Mrs Beanbag is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Vampire 600 and floppy issues Firestone support.Hardware 15 22 March 2017 18:11
Vampire 600 more cores.. Turran support.Hardware 48 14 January 2015 17:39
Vampire 600 wierd issues Retro support.Other 4 05 September 2014 22:36
Vampire 600 troubles Viserion support.Hardware 21 10 December 2013 20:28
WTB: Amiga 600 Accelerator Gordon MarketPlace 4 21 February 2009 16:06

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 05:01.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11040 seconds with 14 queries