English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 19 October 2020, 10:02   #1
kamelito
Zone Friend
 
kamelito's Avatar
 
Join Date: May 2006
Location: France
Posts: 1,801
Blitter emulator for 68k

Hi
I know that there is a C blitter emulator for PPC Amigas.
I wonder if there's existing 68k routine(s) that can replace the blitter.
You feed the exact same HW registers but instead of triggering the blitter using bltsize you pass the parameters to a routine that does the job using the CPU.

Is such code exist?
kamelito is offline  
Old 19 October 2020, 10:24   #2
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,330
Most programs only use a handful of the blitter functions so a 1:1 emulator of all functions of the blitter would seem to be a bit overkill?

I imagine most late Amiga programs detect the CPU and switch to home-made SW functions.

Here is a nice article about using the blitter and software blitter side by side

http://powerprograms.nl/amiga/cpu-blit-assist.html

Including the code.
alexh is offline  
Old 19 October 2020, 10:26   #3
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,406
I don't have exactly what you're asking for, but I did make some soft-blitting routines for the 68020 as a part of my experiment in using the CPU and Blitter concurrently.

You can find my article & video describing how it works here: http://powerprograms.nl/amiga/cpu-blit-assist.html

There's a link to the source code in the article, which contains assembly routines to do CPU based "Blitter copying" and CPU based "Blitter Cookie-Cutting".

Edit:
Quote:
Originally Posted by alexh View Post
Here is a nice article about using the blitter and software blitter side by side

http://powerprograms.nl/amiga/cpu-blit-assist.html

Including the code.
This is not concurrent blitting, but rather concurrent posting
roondar is offline  
Old 19 October 2020, 11:04   #4
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
Quote:
Originally Posted by kamelito View Post
You feed the exact same HW registers but instead of triggering the blitter using bltsize you pass the parameters to a routine that does the job using the CPU.
A Blitter-emulator running on Amigas or on other 68k hardware? You certainly know that most registers are read-only or write-only.

You would need quite powerful hardware and an MMU to catch and handle all accesses to the custom chip area. A 040/060 MMU has a minimum page size of 4K so you have to emulate the whole custom chipset (or pass the read/write accesses through for the non-blitter registers). The exception handling would hurt general performance, even if you can theoretically implement a faster Blitter on 060.

When I got my A4000 in 1993 I had similar crazy ideas to implement a software Action Replay Cartridge with the MMU, which remembers all write accesses to the custom chips before performing them. So you can analyze the state of your hardware at any point (what UAE does for me today). I never finished that project, of course.
phx is offline  
Old 19 October 2020, 11:04   #5
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,322
It could be as simple as compiling that C code to 68k, then optimising it in asm if performance is an issue.

The question is missing an important information : the reason why such a blitter emulation is needed.
I can't see a case where full emulation is required, except of course if you wanted to write an Amiga emulator for another 68k machine...
meynaf is offline  
Old 19 October 2020, 13:56   #6
kamelito
Zone Friend
 
kamelito's Avatar
 
Join Date: May 2006
Location: France
Posts: 1,801
Reasons : being able to run modified programs on HW where the blitter is not present and improve the speed on Amiga where a faster CPU exist by replacing blitter code by cpu routines.
@Romdar and Alexh thanks I’ll have a look.
@phx nice project having all access trapped using an MMU could be nice but I’m not sure all this can be properly synchronized.
@Meymaf the C emulator is on Aminet done by Peter Gordon.
kamelito is offline  
Old 19 October 2020, 14:30   #7
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,160
Quote:
When I got my A4000 in 1993 I had similar crazy ideas to implement a software Action Replay Cartridge with the MMU, which remembers all write accesses to the custom chips before performing them. So you can analyze the state of your hardware at any point (what UAE does for me today). I never finished that project, of course.
whdload does that. but doesn't give the info back to any monitor.
jotd is offline  
Old 19 October 2020, 15:05   #8
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
Quote:
Originally Posted by jotd View Post
whdload does that.
Really? Wow! And without much impact on the performance?
phx is offline  
Old 19 October 2020, 21:00   #9
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,160
it's terrible on a 68040 it slows down games a lot. On WinUAE it's not noticeable. Each write to a custom or CIA register probably triggers a page fault and that exception must be handled.

It's useful to detect wrong bits set in custom registers, word reads in CIA registers, reading write-only registers, analyse copperlists; wrong blitter operations, blitter writes when blitter operation isn't completed...
jotd is offline  
Old 21 October 2020, 08:14   #10
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,214
Quote:
Originally Posted by phx View Post
A Blitter-emulator running on Amigas or on other 68k hardware? You certainly know that most registers are read-only or write-only.

You would need quite powerful hardware and an MMU to catch and handle all accesses to the custom chip area. A 040/060 MMU has a minimum page size of 4K so you have to emulate the whole custom chipset (or pass the read/write accesses through for the non-blitter registers). The exception handling would hurt general performance, even if you can theoretically implement a faster Blitter on 060.

It is not quite as bad. Yes, the minimum page size is 4K, but you would not need to emulate the whole chipset. The mmulib gives you access to the read/written data, so you can catch the data that was supposed to be written and perform the write manually. The exception handler of the MuLib is advanced enough to support such "virtual hardware". Actually, it was designed to allow exactly that - even on a 68040 or 68060.


Yes, of course the thing is slow as it has to go through the exception, but the burden on the implementation is not as high as if you had to do all this yourself.
Thomas Richter is offline  
Old 28 October 2020, 17:14   #11
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,330
Out of curiosity how does FBLIT work and is it relevant for this conversation?

http://aminet.net/package/util/boot/fblit

Someone looked within the AmigaOS gfx routines to see which used the blitter and manually patched them?

It looks "smarter" than that otherwise it would need different patches for the different AmigaOS versions?
alexh is offline  
Old 29 October 2020, 09:12   #12
Wepl
Moderator
 
Wepl's Avatar
 
Join Date: Nov 2001
Location: Germany
Posts: 866
Quote:
Originally Posted by jotd View Post
whdload does that. but doesn't give the info back to any monitor.
WHLoad returns the table via Control and Private1/3 (data/flags).
If you make the changes to the monitor
Wepl is offline  
Old 29 October 2020, 18:18   #13
kamelito
Zone Friend
 
kamelito's Avatar
 
Join Date: May 2006
Location: France
Posts: 1,801
Fblit surely so have interesting bit but, I’ve look at the files and it is messy IMO.
kamelito is offline  
Old 29 October 2020, 20:04   #14
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,214
Quote:
Originally Posted by alexh View Post
Out of curiosity how does FBLIT work and is it relevant for this conversation?

It is not a generic blitter emulation. It is an emulation of the blitter function provided by the graphics.library and its API. P96 contains emulation of the graphics.library API as well, but not only for planar data (Amiga chipset) but also for chunky, hi-color and true-color graphics.
Thomas Richter is offline  
Old 31 October 2020, 12:43   #15
BastyCDGS
Registered User
 
Join Date: Nov 2015
Location: Freiburg / Germany
Age: 44
Posts: 200
Send a message via ICQ to BastyCDGS
What about a hybrid approach in order to increase speed?

Use the MMU to catch the instructions which access the blitter and then patch that address with a jsr $xxxxxxxx to your own code which executes the original instruction and your emulation stuff (storing writes to hardware registers to re-read them later, etc.).

This way the slow MMU exceptions are only executed once for each triggering instruction.

Just an idea...
BastyCDGS is offline  
Old 31 October 2020, 14:24   #16
Wepl
Moderator
 
Wepl's Avatar
 
Join Date: Nov 2001
Location: Germany
Posts: 866
There are many instructions with the size of only one or two words (e.g. move dx,(d16,ax)). I think this is complicated.
On the 68040 may you not even have the PC of the faulted instruction.
Wepl is offline  
Old 01 November 2020, 13:07   #17
BastyCDGS
Registered User
 
Join Date: Nov 2015
Location: Freiburg / Germany
Age: 44
Posts: 200
Send a message via ICQ to BastyCDGS
Quote:
Originally Posted by Wepl View Post
There are many instructions with the size of only one or two words (e.g. move dx,(d16,ax)). I think this is complicated.
One could have a look at the WinUAE or WinFellow source code on how do it efficiently. Looking at the JIT code might help, too.

The main question would be if this is still faster in a significant way than the MMU exception overhead.

Quote:
Originally Posted by Wepl View Post
On the 68040 may you not even have the PC of the faulted instruction.
Does 68040 point to the next instruction in this case? If so, 68040 will need to be cared specifically. How does, e.g. WHDLoad do it with MMU snoop options enabled?
BastyCDGS is offline  
Old 01 November 2020, 17:52   #18
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,214
Quote:
Originally Posted by BastyCDGS View Post
Does 68040 point to the next instruction in this case?
Not guaranteed, not necessary. If the next instruction after the faulting one is a jump, the PC on the exception stack frame could be somewhere completely else. The reason for this is the 68040 push buffer. The exception is not reported unless the write reaches the MMU, which may take a couple of cycles, and the PC may be already ahead. How long that is depends on the instructions and on what else is in the push buffer at that point. The 68040 can stack up to two 32-bit words this time, (actually, the push buffer is four 32-bit lines, but due to the way how the exceptions happen, access faults through the MMU can only delay two 32-bit words. Physical errors can stack more).



The entire way how the 68040 reports exceptions, and how these exceptions have to be reapired is quite complicated. Worst case are FPU write accesses.


If you want to read the fully story, download the MMULib memory from Aminet and read the section about the MMULib exception handler. The library tries to hide all this complexity from you and abstract it from the MMU model.
Thomas Richter is offline  
Old 01 November 2020, 17:55   #19
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
Quote:
Originally Posted by BastyCDGS View Post
One could have a look at the WinUAE or WinFellow source code on how do it efficiently.

How does that help to patch the code? Did you read what you quoted?
Wepl said that there may be short instructions like
move.w d0,REG(A0)
(4 Bytes) or even
move.w d0,(a0)
(2 Bytes) writing to hardware registers. You cannot patch them with a
jsr
(6 Bytes). And you would still have to find a way to encode the information which data was written to which register, before calling your handler.
phx is offline  
Old 01 November 2020, 19:01   #20
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,214
The way how MuRedox solves this problem is that it uses a JSR.W, and reserves the last 32K of the 32bit address space for a jump table. Thus, you encode the register and the target in the target address in the target to the last 32K, where you place another JMP to the actual emulation. For FPU instructions, that always work as they are always at least 32 bit in size.

So, that part is essentially doable, though MuRedox has no problem of finding the PC of the faulting instruction, and the instruction is always 32 bit. Here, you do not have either - instructions can be 16 bit ("move.w d0,(a0)&quot, and you do not have the PC.

You can still capture the data that was written and go through a complete virtual hardware emulation, but that is really slow.

Replacing the graphics.library functions is a better option (as done by P96), but it cannot cover the cases where applications hit the blitter directly, for example DPaint.
Thomas Richter is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
When is the 68k processor faster then the blitter at copying memory redblade Coders. Asm / Hardware 20 08 May 2019 22:57
Fake86 (68k SDL Port) - 8086 PC emulator NovaCoder project.Amiga Game Factory 5 08 November 2018 11:13
Hatari ST emulator for 68K? Angus request.Apps 14 25 September 2013 20:31
amiga 68k emulator petee1979 support.OtherUAE 11 28 June 2008 10:07
68k Mega Drive Emulator Hewitson request.Apps 11 04 February 2008 08:55

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 08:43.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.21511 seconds with 14 queries