English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 13 August 2018, 12:10   #41
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,409
I very much doubt that the speed difference between assembly and C code is a factor of 10. More so because quite a few games (mostly American strategy and RPG) on the Amiga where in fact coded fully in C and that worked out just fine.

There will be a speed difference, but at the moment I remain fairly convinced the main problem here is probably in the algorithms you use more than what actual code is being generated.

Anyway, even if you want an assembly version as an end result for optimal performance, experimenting with the algorithms in C to see what works best might still be the best option - it'll get results faster and when you're happy you can take what you've learned and use that as the base for your assembly version.
roondar is offline  
Old 13 August 2018, 18:42   #42
deimos
It's coming back!
 
deimos's Avatar
 
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
Quote:
Originally Posted by roondar View Post
at the moment I remain fairly convinced the main problem here is probably in the algorithms you use more than what actual code is being generated.
All of them though?

To get the speed I want I need to drastically reduce the time taken by all three parts of the code (3D calculations, scanline edge tracking and horizontal line drawing).
deimos is offline  
Old 13 August 2018, 22:36   #43
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,409
Well, the reason I'm guessing the algorithms are probably the issue is that I don't think the C compiler will slow down your code so drastically as to make it unusable.

I've said before, I'm not a 3D expert, but reading the thread and the responses you've got from people who are more knowledgeable about 3D on the Amiga, it seems to me you may be overdoing what is needed.

One thing I can offer is some insights, namely a) part of the starglider Atari ST source and (https://tcrf.net/Starglider_(Atari_ST)) b) a neat blog from someone who managed to get a fast 3D drawing routine on Atari/Amiga 'back in the day' (https://grenouillebouillie.wordpress...n-of-3d-games/)

Of particular note here are the notes the blog makes about he sped up the calculations.

Edit: another neat find might be this pdf: http://www.atarimania.com/documents/...0ST%20User.pdf
Do note it's for the Atari ST and doesn't go into too much depth, but may still be useful.

Last edited by roondar; 13 August 2018 at 22:56.
roondar is offline  
Old 14 August 2018, 11:22   #44
zero
Registered User
 
Join Date: Jun 2016
Location: UK
Posts: 428
On 68000 the compiler can cause massive slow downs for stuff like matrix maths.

If you hand optimize the inner loops they can be orders of magnitude faster just by managing registers better.

Even the best C compilers on the most popular architectures often don't do the best job with register management because they have a lot of rules to enable function/library calls.
zero is offline  
Old 14 August 2018, 14:06   #45
chb
Registered User
 
Join Date: Dec 2014
Location: germany
Posts: 439
@roondar: Nice find with Alpha Waves, thanks.

I'd also say that C just does not give you enough control for optimizing the time critical inner loops. If portability is not important for you, I'd do at least the most critical stuff (3d math, line drawing, filling) in asm.

Regarding filling: I honestly do not think the blitter is very helpful here. It has a theoretical speed advantage, because it has twice the raw memory transfer speed compared to the CPU, but this is hard to realize:
  • In fillmode, the blitter needs to read a word for every word written, plus there is an idle cycle (which can be used by the cpu), so 6 clock cycles and 2 memory cycles per word. A movem is 4 cycles per word, and one memory cycle. A movem is 32 bit per register, so 10 registers are sufficient to fill a 320 pix wide span in one bitplane. You can pre-generate again code for every span length and position (modulo 16).
  • Blitter fill is not well suited for overlapping polygons - you'd need to draw them in an extra buffer, then do a fill, then a two-source copy, then erase the buffer. A cpu filler has to care for the background only in the first and last word, and can just overwrite the rest of the words.
  • Blitter fill works on rectangular areas, so if your poly fills only a rather small part of the rectangle, a lot of memory accesses is just wasted. You can subdivide of course, but this increases complexity.
  • Blitter fill will not take much advantage of faster cpus and fastmem.
  • Setup time: It might be easier just to draw lines and fill with the blitter than calculating span length for every polygon row and calling the fill code, but blitter needs either cpu polling, interrupts or copper lists to control the blits.
I'd say apart from some simple things like clearing it is not really worth the hassle to use the blitter in a general 3D routine with overlapping polys. Situation is different if you only have one simple convex object like your sphere without background, but that's probably no what you are aiming for.
If you have some memory cycles left, e.g. when you're using cpu multiplication or in the border, using it may give a nice litte speed up.
Anecdotal evidence, Need for Speed No Second Prize :-D, one of the fastest 3D vector games on the amiga, with rather large screen and large polygons, does not use the blitter at all.

Last edited by chb; 14 August 2018 at 15:03. Reason: got the name wrong :)
chb is offline  
Old 14 August 2018, 15:12   #46
grond
Registered User
 
Join Date: Jun 2015
Location: Germany
Posts: 1,918
Quote:
Originally Posted by chb View Post
In fillmode, the blitter needs to read a word for every word written, plus there is an idle cycle (which can be used by the cpu), so 6 clock cycles and 2 memory cycles per word. A movem is 4 cycles per word, and one memory cycle. A movem is 32 bit per register, so 10 registers are sufficient to fill a 320 pix wide span in one bitplane.
A 32bit move to chipmem will take two mem cycles for OCS and A600 which only have a 16bit chipmem bus.
grond is offline  
Old 14 August 2018, 15:24   #47
chb
Registered User
 
Join Date: Dec 2014
Location: germany
Posts: 439
Quote:
Originally Posted by grond View Post
A 32bit move to chipmem will take two mem cycles for OCS and A600 which only have a 16bit chipmem bus.
Yes, sure, did I write something else? The cpu can access the memory at max every 2nd memory cycle (once in 4 cpu clock cycles), 16 bit wide. But movem from register to memory is minimal 8 + 4n cycles for n being the number of registers written and 8 the overhead for instruction fetch, so using the longword instruction reduces the overhead for the same amount of data written.
chb is offline  
Old 14 August 2018, 16:04   #48
ovale
Registered User
 
Join Date: Jun 2014
Location: milan / italy
Posts: 174
@deimos, in your last test you are using EHB. Assuming you are on OCS, then this almost halves the memory access slots for the CPU. If you reference is starglider then use 4 bitplanes.
ovale is offline  
Old 14 August 2018, 16:19   #49
deimos
It's coming back!
 
deimos's Avatar
 
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
That SG2 code looks like it’s using the same basic scanline fill and fast horizontal line draw algorithms as me (which is not surprising, they’re well known). I’ll be writing my own assembly versions of them soon to see what difference it makes.

Quote:
Originally Posted by roondar View Post
Well, the reason I'm guessing the algorithms are probably the issue is that I don't think the C compiler will slow down your code so drastically as to make it unusable.

I've said before, I'm not a 3D expert, but reading the thread and the responses you've got from people who are more knowledgeable about 3D on the Amiga, it seems to me you may be overdoing what is needed.

One thing I can offer is some insights, namely a) part of the starglider Atari ST source and (https://tcrf.net/Starglider_(Atari_ST)) b) a neat blog from someone who managed to get a fast 3D drawing routine on Atari/Amiga 'back in the day' (https://grenouillebouillie.wordpress...n-of-3d-games/)

Of particular note here are the notes the blog makes about he sped up the calculations.

Edit: another neat find might be this pdf: http://www.atarimania.com/documents/...0ST%20User.pdf
Do note it's for the Atari ST and doesn't go into too much depth, but may still be useful.
deimos is offline  
Old 14 August 2018, 16:27   #50
deimos
It's coming back!
 
deimos's Avatar
 
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
Quote:
Originally Posted by ovale View Post
@deimos, in your last test you are using EHB. Assuming you are on OCS, then this almost halves the memory access slots for the CPU. If you reference is starglider then use 4 bitplanes.
I have actually tested using 1, 2, 3, 4, 5 and 6 bit planes. The difference in speed is only in the horizontal line draw code and is pretty much exactly proportional to number of planes. There’s no dramatic difference between 5 and 6 planes, which I thought there would be. I don’t know if this is because I’ve got some fast memory, so the only access the CPU needs to chip memory is the actual drawing.

I should probably test with 6 planes being displayed but only drawing the ball to the first 3, and without fastmem.

Last edited by deimos; 14 August 2018 at 16:40.
deimos is offline  
Old 14 August 2018, 17:30   #51
ovale
Registered User
 
Join Date: Jun 2014
Location: milan / italy
Posts: 174
Yes, Fast men explains what you are observing. In Fast men the CPU has not to compete with DMA
ovale is offline  
Old 14 August 2018, 22:39   #52
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,409
Quote:
Originally Posted by zero View Post
On 68000 the compiler can cause massive slow downs for stuff like matrix maths.

If you hand optimize the inner loops they can be orders of magnitude faster just by managing registers better.
Orders of magnitude!?

Again, I'm not saying that C compilers generate code that can compete with properly optimised assembly, but that does seem like an exaggeration to me. Like I pointed out earlier, there are in fact quite a few commercial Amiga games that where written fully or almost fully in C (just about all the stuff made by Westwood for instance).

These tend to be slower paced games, yes. But even then, most of them will include at least some multidimensional array accesses and operations between items in one or more of such arrays. That's awfully close to what matrix operations do (really differing only in the operations done), and yet - these games are perfectly playable.

In short, I'm not seeing it - if C compilers back in the day truly made code that was that much slower, we'd never see anything of any serious complexity made in C - it would be far to slow.

Now I could still be wrong, but I would like to see some examples of such a big difference. A difference, sure. A big difference (say two to four times)? Possible.

But 100's or 1000's? No, that doesn't seem right at all.
Quote:
Even the best C compilers on the most popular architectures often don't do the best job with register management because they have a lot of rules to enable function/library calls.
This is quite true, but I feel (like I said above) that this effect will have less of an impact than you seem to suggest. It'll definitely slow things down though.
roondar is offline  
Old 15 August 2018, 09:55   #53
zero
Registered User
 
Join Date: Jun 2016
Location: UK
Posts: 428
Quote:
Originally Posted by ovale View Post
Yes, Fast men explains what you are observing. In Fast men the CPU has not to compete with DMA
If only the 512k expansions for the A500 had been fast RAM, the speed boost would have been considerable.
zero is offline  
Old 15 August 2018, 15:04   #54
hooverphonique
ex. demoscener "Bigmama"
 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,624
Quote:
Originally Posted by zero View Post
If only the 512k expansions for the A500 had been fast RAM, the speed boost would have been considerable.

Implementing trapdoor slow ram on the A500 costs basically nothing more than a pcb and the ram chips themselves, this is why it is so widespread.. It lacks the signals to implement proper fast ram.
hooverphonique is offline  
Old 15 August 2018, 18:29   #55
Estrayk
Registered User
 
Estrayk's Avatar
 
Join Date: Apr 2015
Location: Spain
Posts: 511
Quote:
Originally Posted by chb View Post
Anecdotal evidence, Need for Speed No Second Prize :-D, one of the fastest 3D vector games on the amiga, with rather large screen and large polygons, does not use the blitter at all.
You said, one of the fastest. Is there any faster than NSP for Amiga OCS?

I ask because I thought it was an Atari ST port.
Estrayk is offline  
Old 15 August 2018, 22:17   #56
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,409
Quote:
Originally Posted by Estrayk View Post
You said, one of the fastest. Is there any faster than NSP for Amiga OCS?

I ask because I thought it was an Atari ST port.
There is that jet fighter game that apparently ran at 50Hz. But I'm forgetting the name now. It was on here a few months back too

Back in the day I was personally rather impressed with Xiphos, but I have no idea if that is as fast as or slower than others named in this thread.
roondar is offline  
Old 16 August 2018, 09:47   #57
zero
Registered User
 
Join Date: Jun 2016
Location: UK
Posts: 428
Quote:
Originally Posted by roondar View Post
In short, I'm not seeing it - if C compilers back in the day truly made code that was that much slower, we'd never see anything of any serious complexity made in C - it would be far to slow.
While C compilers do a reasonable job overall, for certain specific tasks like matrix multiplication they don't.

The fastest way to do it on 68000 is to use all the registers and create a highly optimized unrolled loop. Avoid unnecessary memory access by keeping everything in registers for as long as possible. Also align everything in memory to make lookups as fast as possible.

C compilers will struggle to do that. You have to remember that the compiler can't tell that the matrix math code is timing critical and so it will use the same strategy for register management as the rest of the code.

You can sometimes manually tune the C code to get better performance. I've seen an order of magnitude improvement just from doing that before. I have a polyphonic music player that I managed to get down from 80% CPU consumption to 16% just by adjusting the C code, and that was with GCC which has a relatively good optimizer for AVR.
zero is offline  
Old 16 August 2018, 14:25   #58
chb
Registered User
 
Join Date: Dec 2014
Location: germany
Posts: 439
Quote:
Originally Posted by Estrayk View Post
You said, one of the fastest. Is there any faster than NSP for Amiga OCS?

I ask because I thought it was an Atari ST port.
No idea if it was the fastest (how would you measure that, btw?), but I do not know any game that was faster at that level of detail. It's an ST port for sure, but a) Thalion showed their competence on both machines, so not a quick shoddy port, and b) for polygonal 3d games ST and Amiga are actually quite similar as long as you do not use the blitter, which IMHO isn't giving you a lot of advantages for that type of engines.
chb is offline  
Old 20 August 2018, 12:11   #59
Hewitson
Registered User
 
Hewitson's Avatar
 
Join Date: Feb 2007
Location: Melbourne, Australia
Age: 41
Posts: 3,772
Quote:
Originally Posted by roondar View Post
I very much doubt that the speed difference between assembly and C code is a factor of 10. More so because quite a few games (mostly American strategy and RPG) on the Amiga where in fact coded fully in C and that worked out just fine.
Whilst I agree that the factor of 10 is an exaggeration, I would argue that those American RPG's ran like shit.

Even a basic one like Ultima runs very slowly.
Hewitson is offline  
Old 20 August 2018, 14:45   #60
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,409
Quote:
Originally Posted by Hewitson View Post
Whilst I agree that the factor of 10 is an exaggeration, I would argue that those American RPG's ran like shit.

Even a basic one like Ultima runs very slowly.
There are counter examples, such as Eye of the Beholder I&II. Which are perfectly playable and run quite nicely

Likewise, Dune II, though not fast, runs pretty respectable considering what that game is doing.

A lot of the Cinemaware stuff also runs pretty decently and IIRC these are also done in C (though I'm not 100% sure here).
roondar is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Games that are Full Frame Rate or Slower - Limitations or Choice? Foebane Retrogaming General Discussion 35 08 April 2018 13:22
F1 grand prix frame rate universale support.Games 18 13 July 2015 21:45
The First Person Shooter frame rate tolerance poll... DDNI Retrogaming General Discussion 41 30 June 2011 03:32
Vsync Fullscreen and Double Buffer, incorrect frame rate? rsn8887 support.WinUAE 1 07 April 2011 20:43
Propper speed request when recording with "Disable frame rate" turned on. Ironclaw request.UAE Wishlist 9 02 August 2006 07:21

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 07:28.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.29406 seconds with 13 queries