English Amiga Board


Go Back   English Amiga Board > Support > support.Hardware

 
 
Thread Tools
Old 04 March 2024, 15:57   #161
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by Dunny View Post
Yeah with Emu68 it depends heavily on a few factors:

1. What code is running - does it cause cache flushes, does it branch a lot and cause blocks of compiled ARM code to drop out of the internal emulation cache
I plan to measure the exact size [in Bytes] of the lowest 3 innermost loops so that we can have a meaningful debate on pros/cons of each approach.

From my current understanding, this is not Emu68-speific. Caching/Branching is a major performance/consideration factor for 060 (and to some extent 040). Since I will be using some FPU code, I don't have to care about anything below 040 (besides, it's RTG, so it's not like it can reasonably run on A1200 anyway).

Quote:
Originally Posted by Dunny View Post
2. How often it hits the chipset. The PiStorm has halved chipset bandwidth for some accesses, so if you're working in ChipRAM then you're gonne be limited unless you're really careful.
So, this is the million dollar question. If all I am doing is rendering to a FrameBuffer in RAM (no Blitter, no Sprites) does it mean that the only time it's hitting the chipset is when the RTG driver does C2P when I present the framebuffer via RTG ?

Or does RTG completely bypass the chipset malarkey ? I figure it should be possible if the board has a separate video out, then it can safely ignore chipset, no ? I've seen plenty YT vids where there are 2 monitors connected - one to the chipset output and another to the RTG one - so it looks like this is the reason (well, best as I can guess anyway)

Last edited by VladR; 04 March 2024 at 15:59. Reason: typos
VladR is offline  
Old 04 March 2024, 17:18   #162
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,165
I'd like a VamPiStorm.
Karlos is offline  
Old 04 March 2024, 17:37   #163
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,104
Quote:
Originally Posted by VladR View Post
I plan to measure the exact size [in Bytes] of the lowest 3 innermost loops so that we can have a meaningful debate on pros/cons of each approach.

From my current understanding, this is not Emu68-speific. Caching/Branching is a major performance/consideration factor for 060 (and to some extent 040). Since I will be using some FPU code, I don't have to care about anything below 040 (besides, it's RTG, so it's not like it can reasonably run on A1200 anyway).
Not touching chip ram unless you have to, not using unimplemented instructions (64-bit mul/div) in time critical code, and proper cache utilization are in that order the most important things on 060 (IME).

Quote:
Originally Posted by VladR View Post
So, this is the million dollar question. If all I am doing is rendering to a FrameBuffer in RAM (no Blitter, no Sprites) does it mean that the only time it's hitting the chipset is when the RTG driver does C2P when I present the framebuffer via RTG ?

Or does RTG completely bypass the chipset malarkey ? I figure it should be possible if the board has a separate video out, then it can safely ignore chipset, no ? I've seen plenty YT vids where there are 2 monitors connected - one to the chipset output and another to the RTG one - so it looks like this is the reason (well, best as I can guess anyway)
No C2P/chipset involved when you use RTG. BitMap is essentially in PI video memory, so once you're done with your frame, you just lock the buffer and dump it there (and unlock buffer of course). Very fast, and no chipset involved.
paraj is offline  
Old 05 March 2024, 14:31   #164
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,354
Quote:
Originally Posted by Karlos View Post
I'd like a VamPiStorm.
That is possible I guess. MShulz could add the 68080 instructions to Emu68.
alexh is offline  
Old 05 March 2024, 14:44   #165
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by alexh View Post
That is possible I guess. MShulz could add the 68080 instructions to Emu68.
Yeah, as much as I would like that, having written some AMMX code on Vampire for my flatshader (I genuinely love the AMMX feature-set and don't consider it a "herecy" as many do), would such feature even make it to Top 10 on his ToDo list ? I'm guessing not.

Of course, this is most definitely something I could contribute myself, given how many different assembler backends I wrote for my compiler. I don't really foresee (once I'd have dev env set-up), AMMX support taking more than 2 weeks of work as it should be just a simple replacement of the AMMX op with an ARM equivalent stream of ops.
But you need real Amiga HW for that, so we're back to square one on this one...
VladR is offline  
Old 05 March 2024, 14:55   #166
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by paraj View Post
Not touching chip ram unless you have to
That's good to hear - I certainly don't need chip-ram for a SW rasterizer - just wasn't sure on driver-side of RTG things (like, perhaps the bitmaps does have to be shuffled through chip-ram to be displayed).

Quote:
Originally Posted by paraj View Post
not using unimplemented instructions (64-bit mul/div) in time critical code, and proper cache utilization are in that order the most important things on 060 (IME).
MC68060UM.pdf, section 11.1.2.2.4 mentions 3 unimplemented ops (PTEST,MOVEC of MMUSR,PLPA). Then there's PFLUSH and Section C.2 (DIV(U/S).L, MUL (U/S).L, CHK2, CMP2, CAS2). I'm sad we can't have 64-bit div/mul


Quote:
Originally Posted by paraj View Post
No C2P/chipset involved when you use RTG. BitMap is essentially in PI video memory, so once you're done with your frame, you just lock the buffer and dump it there (and unlock buffer of course). Very fast, and no chipset involved.
Fantastic. Thanks for confirming !
Still, I will keep an option for not displaying the final frame so that we can benchmark how long it takes for RTG to display the rendered frame (mostly for BFG and other cards).
VladR is offline  
Old 05 March 2024, 17:55   #167
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,104
Quote:
Originally Posted by VladR View Post
MC68060UM.pdf, section 11.1.2.2.4 mentions 3 unimplemented ops (PTEST,MOVEC of MMUSR,PLPA). Then there's PFLUSH and Section C.2 (DIV(U/S).L, MUL (U/S).L, CHK2, CMP2, CAS2). I'm sad we can't have 64-bit div/mul
There are also the unimplemented FPU ones (FSINCOS etc), but you don't need to forgo 64-bit math (or sin/cos) - just don't do it with the unimplemented instructions. Call library functions or inline the code, or use the FPU (if you require that anyway) - extended precision gives exact 64-bit integer results.
paraj is offline  
Old 05 March 2024, 18:48   #168
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by paraj View Post
There are also the unimplemented FPU ones (FSINCOS etc), but you don't need to forgo 64-bit math (or sin/cos) - just don't do it with the unimplemented instructions. Call library functions or inline the code, or use the FPU (if you require that anyway) - extended precision gives exact 64-bit integer results.
Thanks, I forgot about those. Just found them in section C.3 - it's 28 unimplemented FP ops in total.

On Vampire I noticed a significant boost when using FP and INT ops in parallel especially for scanline traversal, so I plan on using FP as much as possible (though I intend to keep whole pipeline/coordinate system Integer/FixedPoint).

Did Motorola run out of money when working on 060 or something ? Why did they butcher the functionality they already have for 040 ? Heat/silicon/size perhaps ?
VladR is offline  
Old 05 March 2024, 19:10   #169
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,104
Quote:
Originally Posted by VladR View Post
Thanks, I forgot about those. Just found them in section C.3 - it's 28 unimplemented FP ops in total.

On Vampire I noticed a significant boost when using FP and INT ops in parallel especially for scanline traversal, so I plan on using FP as much as possible (though I intend to keep whole pipeline/coordinate system Integer/FixedPoint).

Did Motorola run out of money when working on 060 or something ? Why did they butcher the functionality they already have for 040 ? Heat/silicon/size perhaps ?
FP operations can run in parallel on 060 as well (notably you can fdiv while doing integer ops).
I won't speculate too much on why MC did what they did (lest we move too much off topic), but if you don't need to support 64-bit mul/div in HW you can have a strictly 32-bit ALU which I imagine is a significant win. For the FPU it's obvious (most (all?) were already not in silicon on 040 either). In the documentation MC claims unimplemented instructions are rare in the programs they looked at (which is probably true). Apparently MacOS used FPU for most of its 64-bit math anyway, so they wouldn't be hurting that market (though I don't know if 060 was even available there..), and for something like a "big integer" library you would re-do that anyway for the new faster processor.
paraj is offline  
Old 05 March 2024, 19:37   #170
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by paraj View Post
FP operations can run in parallel on 060 as well (notably you can fdiv while doing integer ops).
I won't speculate too much on why MC did what they did (lest we move too much off topic), but if you don't need to support 64-bit mul/div in HW you can have a strictly 32-bit ALU which I imagine is a significant win. For the FPU it's obvious (most (all?) were already not in silicon on 040 either). In the documentation MC claims unimplemented instructions are rare in the programs they looked at (which is probably true). Apparently MacOS used FPU for most of its 64-bit math anyway, so they wouldn't be hurting that market (though I don't know if 060 was even available there..), and for something like a "big integer" library you would re-do that anyway for the new faster processor.
Thanks for the insights. I'll create a different thread for the 060 questions I have so we don't go too OT here.

I was considering creating a separate branch for the 64-bit coordinate system given that I am targeting high-end specs, as that would significantly simplify things for many scenarios (based on early estimates in my excel), but based on this thread I'll just keep it 32-bit as it is and focus on other features instead.

I truly appreciate opportunity to learn from these threads

Last edited by VladR; 05 March 2024 at 19:38. Reason: typos
VladR is offline  
Old 05 March 2024, 22:57   #171
NovaCoder
Registered User
 
NovaCoder's Avatar
 
Join Date: Sep 2007
Location: Melbourne/Australia
Posts: 4,400
Personally for my new 1200 build I'm thinking about a PiStorm running Emu68 and just sticking with AGA for that authentic feel. Either that or I'll just go with a TF030 for a basic classic experience.

I've never been that interested in the Vampire, always seemed a bit too far removed from the original hardware in my eyes.
NovaCoder is offline  
Old 05 March 2024, 23:26   #172
Dunny
Registered User
 
Dunny's Avatar
 
Join Date: Aug 2006
Location: Scunthorpe/United Kingdom
Posts: 1,989
Quote:
Originally Posted by VladR View Post
Yeah, as much as I would like that, having written some AMMX code on Vampire for my flatshader (I genuinely love the AMMX feature-set and don't consider it a "herecy" as many do), would such feature even make it to Top 10 on his ToDo list ? I'm guessing not.

Of course, this is most definitely something I could contribute myself, given how many different assembler backends I wrote for my compiler. I don't really foresee (once I'd have dev env set-up), AMMX support taking more than 2 weeks of work as it should be just a simple replacement of the AMMX op with an ARM equivalent stream of ops.
But you need real Amiga HW for that, so we're back to square one on this one...
I'm sure that if 68080/Vampire-only software becomes popular and numerous, someone will add it to Emu68. It's open source so there's no real reason for anyone not to.
Dunny is offline  
Old 06 March 2024, 09:55   #173
Kin Hell
0ld0r Git
 
Kin Hell's Avatar
 
Join Date: Mar 2009
Location: Cornwall, UK
Posts: 1,581
Quote:
Originally Posted by VladR View Post
Yeah, as much as I would like that, having written some AMMX code on Vampire for my flatshader (I genuinely love the AMMX feature-set and don't consider it a "herecy" as many do), would such feature even make it to Top 10 on his ToDo list ? I'm guessing not.

Of course, this is most definitely something I could contribute myself, given how many different assembler backends I wrote for my compiler. I don't really foresee (once I'd have dev env set-up), AMMX support taking more than 2 weeks of work as it should be just a simple replacement of the AMMX op with an ARM equivalent stream of ops.
But you need real Amiga HW for that, so we're back to square one on this one...
Coding & such is way beyond my level of Engineering but for crying out loud, isn't there someone local to VladR that could loan him an Amiga for a few weeks to get this stuff tried out?

Maybe a sponsorship or something to raise funds for the hardware required?? - I wouldn't mind chipping in a few UK bucks to kick off.
Kin Hell is offline  
Old 06 March 2024, 14:50   #174
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by Kin Hell View Post
Coding & such is way beyond my level of Engineering but for crying out loud, isn't there someone local to VladR that could loan him an Amiga for a few weeks to get this stuff tried out?

Maybe a sponsorship or something to raise funds for the hardware required?? - I wouldn't mind chipping in a few UK bucks to kick off.
Oh no, sponsoring hardware is a very bad idea that I would not dare to do.
There's nobody in North Dakota anyway. There's folks in Minneapolis, but that's a 12-hour roundtrip drive.

EDIT: This year has only started but I'll try to budget $1,500 for a properly expandable A1200 with 060 accelerator (by the end of 2024). But I only do temp jobs to cover 3-4 month coding spurs anyway.

Last edited by VladR; 06 March 2024 at 16:10. Reason: .
VladR is offline  
Old 06 March 2024, 15:26   #175
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,354
If you could scrape together the low-level technical information about the "parts" of 68080 instruction set you would need (e.g. AMMX) and maybe a test program then it could be submitted to Michal for consideration. I imagine it would be low down on his list but at least it would be there.
alexh is offline  
Old 06 March 2024, 16:21   #176
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by alexh View Post
If you could scrape together the low-level technical information about the "parts" of 68080 instruction set you would need (e.g. AMMX) and maybe a test program then it could be submitted to Michal for consideration. I imagine it would be low down on his list but at least it would be there.
True, but I also realized this morning while thinking about it during my commute that I could simply get some ARM emulator (I assume there must be some? I'll google later), and just tinker with it locally on PC.
  • Basically, for each AMMX op, create a function that will replicate that functionality in the ARM ASM.
  • Have few unit & functional tests run as a part of a build.
  • It's basically the same thing I've done when creating my Higgs compiler - each CPU backend (Z80/6502/RISC/68000) started that exact way.
Technically, that can totally be done using a regular stand-alone rPi - I wouldn't even need Emu68 (and an actual Amiga HW) for this. I'm sure we could figure out the way how to merge it into Emu68 if&when it's done...
VladR is offline  
Old 06 March 2024, 16:56   #177
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,354
Makes sense. Just remember that the BCM2837 (RPi3) and BCM2711 (RPi4) which are required for PiStorm both have the ARMv8-A ISA. Their equivalent of AMMX is I *think* Neon?
alexh is offline  
Old 06 March 2024, 23:07   #178
Kin Hell
0ld0r Git
 
Kin Hell's Avatar
 
Join Date: Mar 2009
Location: Cornwall, UK
Posts: 1,581
Quote:
Originally Posted by VladR View Post
Oh no, sponsoring hardware is a very bad idea that I would not dare to do.
There's nobody in North Dakota anyway. There's folks in Minneapolis, but that's a 12-hour roundtrip drive.

EDIT: This year has only started but I'll try to budget $1,500 for a properly expandable A1200 with 060 accelerator (by the end of 2024). But I only do temp jobs to cover 3-4 month coding spurs anyway.
If you pardon the pun....

Just trying to think of way's to get you "kickstart"ed....

Good luck which ever way you decide to go.
Kin Hell is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
PiStorm for Amiga1200? AmiBoy Hardware mods 123 01 September 2023 12:22
Amiga Case Badges - TF1260 - PiStorm - 1260 WARP RetroPassionUK MarketPlace 4 26 June 2021 17:11
A1000 with PiStorm is a thing apparently eXeler0 support.Hardware 6 12 May 2021 20:23
[WANTED] RGBtoHDMI v2 and PiStorm hiroshima MarketPlace 1 02 May 2021 15:49

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 07:31.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.22362 seconds with 13 queries