English Amiga Board


Go Back   English Amiga Board > Support > support.Hardware > Hardware mods

 
 
Thread Tools
Old 17 June 2013, 00:13   #221
Stedy
Registered User
 
Stedy's Avatar
 
Join Date: Jan 2008
Location: United Kingdom
Age: 46
Posts: 733
@Majsta

Well done. I know you had a few dark days but you persevered and have some impressive results.

With regard to the SDRAM stability issues I have a few ideas that may help.

1) How did you extend the SDRAM controller from 4 bit to 16 bit, you must update the page refresh boundaries?
2) On your PCB, did you match all PCB track lengths on all SDRAM signals to +/- 10mm?
3) Do you use a testbench to verify the FPGA design, you should and did you know you can get VHDL/Verilog models of the SDRAM?
4) Did you update the code to write the mode registers in the SDRAM when you moved to 16 bit?

If you tell me the SDRAM part numbers, old and new, I can double check a few items for you.

Ian
Stedy is offline  
Old 17 June 2013, 00:25   #222
codeflash
Registered User
 
Join Date: Jun 2013
Location: Erimang
Posts: 45
177 000 MIPS is used in a system that wastes enough to make it feel like a 2 MIPS system ..
For starters protected mode can reduce the actual speed with 3x (80386).

As for performance benchmarking. There ought to be something mathematically one can calculate to get a more real-life benchmark? like calculating huffman coding, Fibonacci sequence, Pascal's triangle, sorting or some fixed point trigonometry. Ie how much some real work actually takes.

SysInfo tries to measure and estimate how long time real work would take. But if one measure real work instead of some virtual task. It ought to be correctly. The task used for measure should probably be large enough to be larger than any cache.

As for old hardware and FPGA in general. Better get accurate hardware descriptions (VHDL/Verilog) while there is anything to model and test. With time old hardware will go *p00f* and FPGAs will be cheaper and faster and larger.
codeflash is offline  
Old 17 June 2013, 00:31   #223
TCD
HOL/FTP busy bee
 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,525
Quote:
Originally Posted by codeflash View Post
177 000 MIPS is used in a system that wastes enough to make it feel like a 2 MIPS system ..
You sure know your stuff

The whole thread gets derailed into a general 'FPGA possibilities' discussion. Might be time to split some posts in a new thread...
TCD is offline  
Old 18 June 2013, 05:33   #224
codeflash
Registered User
 
Join Date: Jun 2013
Location: Erimang
Posts: 45
@Majsta, any progress?
codeflash is offline  
Old 19 June 2013, 17:28   #225
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by codeflash View Post
As for performance benchmarking. There ought to be something mathematically one can calculate to get a more real-life benchmark? like calculating huffman coding, Fibonacci sequence, Pascal's triangle, sorting or some fixed point trigonometry. Ie how much some real work actually takes.
How about a simple bubble sort? It's a tight loop that throws many modern processors fits (especially superscalar RISC) but it's realistic and a really good test. Here is an 8400 byte Amiga 68k program I prepared:

http://www.heywheel.com/matthey/Amiga/sortbench

It requires a 68020+ but does not use the FPU, ixemul.library or 64 bit integer instructions. Here is some more info on the benchmark:

http://www.apollo-core.com/sortbench/

The 68060 results were from version 1.0 that used the FPU. Here are the best results from this version with my 68060@75MHz:

1k 204 MB/s
2k 114 MB/s
4k 88 MB/s
8k 77 MB/s
16k 71 MB/s
32k 69 MB/s

The first few numbers can vary a lot while the latter are consistent. The 68060 performance in MB/s/MHz is very good (beat only by OoO x86/x64). I'd love to see the numbers from the TG68 to see how close it is. My guess it that it is not. It should test well compared to a 68020 or 68030 though.

Quote:
Originally Posted by codeflash View Post
SysInfo tries to measure and estimate how long time real work would take. But if one measure real work instead of some virtual task. It ought to be correctly. The task used for measure should probably be large enough to be larger than any cache.
The cache performance tells what the CPU is capable of and how good the design is. Not using the caches measures the memory performance too which is less interesting. Most realistic programs operate in the caches. Don't we want realistic performance results?
matthey is offline  
Old 19 June 2013, 17:36   #226
blizz1220
Registered User
 
blizz1220's Avatar
 
Join Date: Jun 2013
Location: Belgrade / Serbia
Posts: 48
Quote:
Originally Posted by matthey View Post
It requires a 68020+ but does not use the FPU, ixemul.library or 64 bit integer instructions. Here is some more info on the benchmark:

http://www.apollo-core.com/sortbench/
I'm scared to ask but what is this Apollo core , who, what , where ???
blizz1220 is offline  
Old 19 June 2013, 17:58   #227
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by blizz1220 View Post
I'm scared to ask but what is this Apollo core , who, what , where ???
It's an enhanced 68k fpga CPU programmed in VHDL much like the TG68. It's based on Jen's N68050 for the Natami. Gunnar von Boehn has done most of the new work. The current design is superscalar, fully pipelined and has an efficient caching system as did the N68050. In other words, it's more advanced than the TG68, if it ever works
matthey is offline  
Old 19 June 2013, 18:35   #228
blizz1220
Registered User
 
blizz1220's Avatar
 
Join Date: Jun 2013
Location: Belgrade / Serbia
Posts: 48
Quote:
Originally Posted by matthey View Post
It's an enhanced 68k fpga CPU programmed in VHDL much like the TG68. It's based on Jen's N68050 for the Natami. Gunnar von Boehn has done most of the new work. The current design is superscalar, fully pipelined and has an efficient caching system as did the N68050. In other words, it's more advanced than the TG68, if it ever works
Did anyone try it so far ???
blizz1220 is offline  
Old 19 June 2013, 18:51   #229
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by blizz1220 View Post
Did anyone try it so far ???
Only in simulation and on a custom development board so far. Sortbench needs the Amiga timers so there are no results from it yet. I've been trying to talk Gunnar into providing it to the TiNA project for testing. I think their fpga is big enough unlike the one Majsta uses. The fpga Arcade guys seem to be too busy to talk or release anything :/.
matthey is offline  
Old 19 June 2013, 19:05   #230
blizz1220
Registered User
 
blizz1220's Avatar
 
Join Date: Jun 2013
Location: Belgrade / Serbia
Posts: 48
Quote:
Originally Posted by matthey View Post
Only in simulation and on a custom development board so far. Sortbench needs the Amiga timers so there are no results from it yet. I've been trying to talk Gunnar into providing it to the TiNA project for testing. I think their fpga is big enough unlike the one Majsta uses. The fpga Arcade guys seem to be too busy to talk or release anything :/.
Thanks for the information ...
MikeJ is making his own core , I remember reading that he has no interest
in this ... If Gunnar REALLY pulled it off , hats down to him ...
blizz1220 is offline  
Old 19 June 2013, 19:17   #231
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by blizz1220 View Post
Thanks for the information ...
MikeJ is making his own core , I remember reading that he has no interest
in this ... If Gunnar REALLY pulled it off , hats down to him ...
Isn't Mike's core a customized TG68? I don't have much confidence in Gunnar but I continue to support his attempts hoping something good will come from it. I wish Jens (N68050) was leading the project and more involved.
matthey is offline  
Old 19 June 2013, 21:29   #232
codeflash
Registered User
 
Join Date: Jun 2013
Location: Erimang
Posts: 45
@matthey, I think a test has to be able to run on a bare bone A500 to be able to compare the whole product line of Amigas. Perhaps this test could also make it possible to run the same test on Atari STs to see how the whole system compares when it comes to raw performance.

You also have a good point regarding cache. It's more relevant to test within cache. But testing cache trashing routines could have its uses too.

Is there any Amiga farms anywhere, where you can test a program on all machines in one go?
codeflash is offline  
Old 19 June 2013, 22:23   #233
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by codeflash View Post
@matthey, I think a test has to be able to run on a bare bone A500 to be able to compare the whole product line of Amigas. Perhaps this test could also make it possible to run the same test on Atari STs to see how the whole system compares when it comes to raw performance.
The original C source code is given on the Apollo site. I modified it to use 64 bit integer math instead of the FPU and figured out how to get it to compile with vbbc (gcc needs the ixemul.library which is big and slow) and code from Frank's posix library (it didn't work out of the box). I can compile for 68000 but does anyone really want to test on a 68000? It would probably take a 1/2 hour to finish and I'm not sure it would even run on AmigaOS 1.x. The 68000 might be <1 MB/s and not even register. How about someone post up results from something faster first?

Quote:
Originally Posted by codeflash View Post
You also have a good point regarding cache. It's more relevant to test within cache. But testing cache trashing routines could have its uses too.
Our 68k processors don't have large enough caches for the larger elements in this test so it really tests both. Modern processors (32k/32k, lvl 1 & 2 caches) have pretty consistent performance for all elements while my 68060 at least falls off quickly after it's 8k/8k cache limits but actually performs well from memory too.

Quote:
Originally Posted by codeflash View Post
Is there any Amiga farms anywhere, where you can test a program on all machines in one go?
There are probably some Amiga "collections" that could .
matthey is offline  
Old 19 June 2013, 22:52   #234
blizz1220
Registered User
 
blizz1220's Avatar
 
Join Date: Jun 2013
Location: Belgrade / Serbia
Posts: 48
@matthey

I could be wrong Matt , but I'm pretty sure that MikeJ said that
the core was initially based on Minimig but that later he practicly
had to write it from scratch to get best AGA compatibility.He also
said that he intends to release code as GPL but only when it is
complete and the boards are out.This is from my memory , though ...
blizz1220 is offline  
Old 20 June 2013, 01:07   #235
majsta
www.majsta.com
 
majsta's Avatar
 
Join Date: Jun 2010
Location: Banjaluka/Republic of Srpska
Age: 43
Posts: 448
@matthey
First of all I have no idea how did you compile the code because I have trying to compile it for about 5 hours without luck.
Ixemul just reported something like ssystem() not suported anymore
Did you compile it with gcc -o -noixemul or did you use gccv or there was soem changes in code. I would really wanted to know how did you compile it. OH now i have read rest of the posts
Anyhow yesterday I have received this compiled version and run TG68 tests, without cache or anything but I don't want to say that those are low.

1k 2 MB/s
Then system crash due to errors in my codes. But i think that my system can't be tested like this because here on this tests you perform cache test that are inside core and TG68 don't have d cache for example. Every performance I get are from external cache between TG68 and SDRAM.

For comparison mc68040

1k 8 MB/s

Regarding adding different core into my project. I said that it could took me about 2 hours to redesign complete thing for let's say Cyclone 5 or something. it has 115 000 LE or how do they call them those days. But then again lets finish what i have started and for now it seems that I m going nowhere. Today I attached new SDRAM controller completely different than those used before and problems remain. So maybe problems are somewhere else. We will see.

Last edited by majsta; 20 June 2013 at 01:16.
majsta is offline  
Old 20 June 2013, 01:24   #236
blizz1220
Registered User
 
blizz1220's Avatar
 
Join Date: Jun 2013
Location: Belgrade / Serbia
Posts: 48
@majsta

Great work
Get plenty of rest and don't push yourself we're not in a hurry
blizz1220 is offline  
Old 20 June 2013, 02:19   #237
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by majsta View Post
@matthey
First of all I have no idea how did you compile the code because I have trying to compile it for about 5 hours without luck.
Ixemul just reported something like ssystem() not suported anymore
Did you compile it with gcc -o -noixemul or did you use gccv or there was soem changes in code. I would really wanted to know how did you compile it. OH now i have read rest of the posts
GCC 3.4.0 worked the first time here but required an FPU and ixemul (the default). I used "GCC -O2 -m68060 -fomit-frame-pointer". "-O3" works also but won't inline for some reason and the output is the same size. The -m68060 keeps the integer 64 bit math functions from being used which could be a problem on an fpga 68k CPU. Also, there are different versions of the ixemul.library that users can have and takes 150k-300k of memory. My vbbc build is integer only with no ixemul dependency. The source is not pretty because I included parts of Frank Wille's posix library to get it to build. I'm also using a new unreleased version of vbcc in testing. Vbcc generates a little better code than GCC for this test. The sort algorithm was originally a 68k assembler routine (look at the variable names) but 68k C compilers could not faithfully reproduce it so I did a few hand optimizations where needed as Gunnar requested. The hand optimizations don't make much difference to the results surprisingly.

Quote:
Originally Posted by majsta View Post
Anyhow yesterday I have received this compiled version and run TG68 tests, without cache or anything but I don't want to say that those are low.

1k 2 MB/s

Then system crash due to errors in my codes. But i think that my system can't be tested like this because here on this tests you perform cache test that are inside core and TG68 don't have d cache for example. Every performance I get are from external cache between TG68 and SDRAM.

For comparison mc68040

1k 8 MB/s
The 68040 with 4k/4k caches only tested 8MB/s and the TG68 did 2MB/s without a data cache? I'm surprised the 68040 is that weak but the TG68 did good in comparison . The best comparison to your current TG68 would probably be a 68020 as it also has no data cache. It may not test high enough to give 1 MB/s though. I look forward to hearing some more results as the bugs are fixed and a data cache is added. A data cache should provide a nice speedup for this benchmark.

Quote:
Originally Posted by majsta View Post
Regarding adding different core into my project. I said that it could took me about 2 hours to redesign complete thing for let's say Cyclone 5 or something. it has 115 000 LE or how do they call them those days. But then again lets finish what i have started and for now it seems that I m going nowhere. Today I attached new SDRAM controller completely different than those used before and problems remain. So maybe problems are somewhere else. We will see.
It makes sense to test the hardware with a simple and more tested fpga CPU design first. You are doing a good job of getting some performance out of it. I don't think Gunnar's CPU is quite ready to try yet anyway.
matthey is offline  
Old 20 June 2013, 18:28   #238
majsta
www.majsta.com
 
majsta's Avatar
 
Join Date: Jun 2010
Location: Banjaluka/Republic of Srpska
Age: 43
Posts: 448
But as recall there was ixemul build to transfer code from unix to amiga and also there was ixemul without FPU. Never mind it was so strange to me that I couldn't compile it. Regarding benchmark tests I think that when I solve my problems with code that i will have better results.
majsta is offline  
Old 20 June 2013, 20:55   #239
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by majsta View Post
But as recall there was ixemul build to transfer code from unix to amiga and also there was ixemul without FPU. Never mind it was so strange to me that I couldn't compile it. Regarding benchmark tests I think that when I solve my problems with code that i will have better results.
A properly working GCC install on the Amiga is no easy task. There are several versions of ixemul also that behave differently and have different bugs. They are not very Amiga friendly being basically a BSD (Unix/Linux like) OS running on the Amiga. It's a mess. Vbcc generates a little better code for this test anyway. Here is the messy code (contains parts of Frank Wille's posix lib) that I have included:

http://www.heywheel.com/matthey/Amiga/sortbench.c

It should be easy to compile now with "vc -c99 -O2 -cpu=68060 sortbench.c". The "-O2" and "-cpu=68060" can be changed to anything of your liking.

I think the sortbench results could easily double with a data cache. Tobias Gubener's TG68k design performs very well considering the limited pipelining. It looks like 68040 performance will be surpassed in a Cyclone series fpga which is awesome!
matthey is offline  
Old 20 June 2013, 21:45   #240
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
not convinced that a bubble sort is an especially realistic benchmark, it's a very simple and slow sort so it's only ever used in limited circumstances (and even then insertion sort is often preferable). Quicksort is still one of the most popular sorts used in real programs, although you could try a few different sorts, Heapsort is very neat.

I was reading about Dhrystones benchmark the other day. Now something to point out, since somebody mentioned the 177,000 MIPS on the Core i7. That's not really MIPS it's DMIPS - Dhrystone MIPS. Just a little history here... the VAX 11/780 was taken as a canonical 1 MIPS computer, and the Dhrystone benchmark was calibrated to that, so that the VAX got exactly 1 DMIPS. There are problems with interpreting the result as "MIPS" because

1. it actually measures overall system performance not just "instructions executed," since Dhrystone doesn't actually define a fixed number of machine-language instructions, but rather a set of "realistic" tasks in a loop. One simply counts the number of loop iterations performed in a certain time and divides by the VAX score to get the DMIPS. Newer CPUs can fit the whole loop in the cache, vastly inflating the score. The Core i7 doesn't really execute 9 instructions per core per clock! It can do up to 6 if the wind is blowing the right way.

2. The VAX was an assumed 1 MIPS machine, but in reality it got about half of that.
Mrs Beanbag is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Vampire 600 and floppy issues Firestone support.Hardware 15 22 March 2017 18:11
Vampire 600 more cores.. Turran support.Hardware 48 14 January 2015 17:39
Vampire 600 wierd issues Retro support.Other 4 05 September 2014 22:36
Vampire 600 troubles Viserion support.Hardware 21 10 December 2013 20:28
WTB: Amiga 600 Accelerator Gordon MarketPlace 4 21 February 2009 16:06

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 11:29.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.24478 seconds with 16 queries