13 August 2018, 12:10 | #41 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
|
I very much doubt that the speed difference between assembly and C code is a factor of 10. More so because quite a few games (mostly American strategy and RPG) on the Amiga where in fact coded fully in C and that worked out just fine.
There will be a speed difference, but at the moment I remain fairly convinced the main problem here is probably in the algorithms you use more than what actual code is being generated. Anyway, even if you want an assembly version as an end result for optimal performance, experimenting with the algorithms in C to see what works best might still be the best option - it'll get results faster and when you're happy you can take what you've learned and use that as the base for your assembly version. |
13 August 2018, 18:42 | #42 | |
It's coming back!
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
|
Quote:
To get the speed I want I need to drastically reduce the time taken by all three parts of the code (3D calculations, scanline edge tracking and horizontal line drawing). |
|
13 August 2018, 22:36 | #43 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
|
Well, the reason I'm guessing the algorithms are probably the issue is that I don't think the C compiler will slow down your code so drastically as to make it unusable.
I've said before, I'm not a 3D expert, but reading the thread and the responses you've got from people who are more knowledgeable about 3D on the Amiga, it seems to me you may be overdoing what is needed. One thing I can offer is some insights, namely a) part of the starglider Atari ST source and (https://tcrf.net/Starglider_(Atari_ST)) b) a neat blog from someone who managed to get a fast 3D drawing routine on Atari/Amiga 'back in the day' (https://grenouillebouillie.wordpress...n-of-3d-games/) Of particular note here are the notes the blog makes about he sped up the calculations. Edit: another neat find might be this pdf: http://www.atarimania.com/documents/...0ST%20User.pdf Do note it's for the Atari ST and doesn't go into too much depth, but may still be useful. Last edited by roondar; 13 August 2018 at 22:56. |
14 August 2018, 11:22 | #44 |
Registered User
Join Date: Jun 2016
Location: UK
Posts: 428
|
On 68000 the compiler can cause massive slow downs for stuff like matrix maths.
If you hand optimize the inner loops they can be orders of magnitude faster just by managing registers better. Even the best C compilers on the most popular architectures often don't do the best job with register management because they have a lot of rules to enable function/library calls. |
14 August 2018, 14:06 | #45 |
Registered User
Join Date: Dec 2014
Location: germany
Posts: 439
|
@roondar: Nice find with Alpha Waves, thanks.
I'd also say that C just does not give you enough control for optimizing the time critical inner loops. If portability is not important for you, I'd do at least the most critical stuff (3d math, line drawing, filling) in asm. Regarding filling: I honestly do not think the blitter is very helpful here. It has a theoretical speed advantage, because it has twice the raw memory transfer speed compared to the CPU, but this is hard to realize:
If you have some memory cycles left, e.g. when you're using cpu multiplication or in the border, using it may give a nice litte speed up. Anecdotal evidence, Need for Speed No Second Prize :-D, one of the fastest 3D vector games on the amiga, with rather large screen and large polygons, does not use the blitter at all. Last edited by chb; 14 August 2018 at 15:03. Reason: got the name wrong :) |
14 August 2018, 15:12 | #46 | |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,918
|
Quote:
|
|
14 August 2018, 15:24 | #47 |
Registered User
Join Date: Dec 2014
Location: germany
Posts: 439
|
Yes, sure, did I write something else? The cpu can access the memory at max every 2nd memory cycle (once in 4 cpu clock cycles), 16 bit wide. But movem from register to memory is minimal 8 + 4n cycles for n being the number of registers written and 8 the overhead for instruction fetch, so using the longword instruction reduces the overhead for the same amount of data written.
|
14 August 2018, 16:04 | #48 |
Registered User
Join Date: Jun 2014
Location: milan / italy
Posts: 174
|
@deimos, in your last test you are using EHB. Assuming you are on OCS, then this almost halves the memory access slots for the CPU. If you reference is starglider then use 4 bitplanes.
|
14 August 2018, 16:19 | #49 | |
It's coming back!
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
|
That SG2 code looks like it’s using the same basic scanline fill and fast horizontal line draw algorithms as me (which is not surprising, they’re well known). I’ll be writing my own assembly versions of them soon to see what difference it makes.
Quote:
|
|
14 August 2018, 16:27 | #50 | |
It's coming back!
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
|
Quote:
I should probably test with 6 planes being displayed but only drawing the ball to the first 3, and without fastmem. Last edited by deimos; 14 August 2018 at 16:40. |
|
14 August 2018, 17:30 | #51 |
Registered User
Join Date: Jun 2014
Location: milan / italy
Posts: 174
|
Yes, Fast men explains what you are observing. In Fast men the CPU has not to compete with DMA
|
14 August 2018, 22:39 | #52 | ||
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
|
Quote:
Again, I'm not saying that C compilers generate code that can compete with properly optimised assembly, but that does seem like an exaggeration to me. Like I pointed out earlier, there are in fact quite a few commercial Amiga games that where written fully or almost fully in C (just about all the stuff made by Westwood for instance). These tend to be slower paced games, yes. But even then, most of them will include at least some multidimensional array accesses and operations between items in one or more of such arrays. That's awfully close to what matrix operations do (really differing only in the operations done), and yet - these games are perfectly playable. In short, I'm not seeing it - if C compilers back in the day truly made code that was that much slower, we'd never see anything of any serious complexity made in C - it would be far to slow. Now I could still be wrong, but I would like to see some examples of such a big difference. A difference, sure. A big difference (say two to four times)? Possible. But 100's or 1000's? No, that doesn't seem right at all. Quote:
|
||
15 August 2018, 09:55 | #53 |
Registered User
Join Date: Jun 2016
Location: UK
Posts: 428
|
|
15 August 2018, 15:04 | #54 | |
ex. demoscener "Bigmama"
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,624
|
Quote:
Implementing trapdoor slow ram on the A500 costs basically nothing more than a pcb and the ram chips themselves, this is why it is so widespread.. It lacks the signals to implement proper fast ram. |
|
15 August 2018, 18:29 | #55 | |
Registered User
Join Date: Apr 2015
Location: Spain
Posts: 511
|
Quote:
I ask because I thought it was an Atari ST port. |
|
15 August 2018, 22:17 | #56 | |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
|
Quote:
Back in the day I was personally rather impressed with Xiphos, but I have no idea if that is as fast as or slower than others named in this thread. |
|
16 August 2018, 09:47 | #57 | |
Registered User
Join Date: Jun 2016
Location: UK
Posts: 428
|
Quote:
The fastest way to do it on 68000 is to use all the registers and create a highly optimized unrolled loop. Avoid unnecessary memory access by keeping everything in registers for as long as possible. Also align everything in memory to make lookups as fast as possible. C compilers will struggle to do that. You have to remember that the compiler can't tell that the matrix math code is timing critical and so it will use the same strategy for register management as the rest of the code. You can sometimes manually tune the C code to get better performance. I've seen an order of magnitude improvement just from doing that before. I have a polyphonic music player that I managed to get down from 80% CPU consumption to 16% just by adjusting the C code, and that was with GCC which has a relatively good optimizer for AVR. |
|
16 August 2018, 14:25 | #58 |
Registered User
Join Date: Dec 2014
Location: germany
Posts: 439
|
No idea if it was the fastest (how would you measure that, btw?), but I do not know any game that was faster at that level of detail. It's an ST port for sure, but a) Thalion showed their competence on both machines, so not a quick shoddy port, and b) for polygonal 3d games ST and Amiga are actually quite similar as long as you do not use the blitter, which IMHO isn't giving you a lot of advantages for that type of engines.
|
20 August 2018, 12:11 | #59 | |
Registered User
Join Date: Feb 2007
Location: Melbourne, Australia
Age: 41
Posts: 3,772
|
Quote:
Even a basic one like Ultima runs very slowly. |
|
20 August 2018, 14:45 | #60 | |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
|
Quote:
Likewise, Dune II, though not fast, runs pretty respectable considering what that game is doing. A lot of the Cinemaware stuff also runs pretty decently and IIRC these are also done in C (though I'm not 100% sure here). |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Games that are Full Frame Rate or Slower - Limitations or Choice? | Foebane | Retrogaming General Discussion | 35 | 08 April 2018 13:22 |
F1 grand prix frame rate | universale | support.Games | 18 | 13 July 2015 21:45 |
The First Person Shooter frame rate tolerance poll... | DDNI | Retrogaming General Discussion | 41 | 30 June 2011 03:32 |
Vsync Fullscreen and Double Buffer, incorrect frame rate? | rsn8887 | support.WinUAE | 1 | 07 April 2011 20:43 |
Propper speed request when recording with "Disable frame rate" turned on. | Ironclaw | request.UAE Wishlist | 9 | 02 August 2006 07:21 |
|
|