Crazy Library Idea

Inner200k · 22 June 2023, 11:07

Would there be any benefit patching certain math*.library(s) to call host hardware to do the calculations?

Would we get and speed improvement?

Not that it matters too much my I9 with 3080 runs a pretty fast Amiga by itself, Would be interested in thoughts, as I've always wonder why no one in all this time has thought of it or build an expansion card virtually that added the host cpu as an accelerator.

Sani11 · 22 June 2023, 13:31

That's an interesting idea, not gonna lie. While I don't have the deep technical knowledge to assess the viability of patching math*.library(s) for host hardware calls, I think it's worth exploring. The main question, as you mentioned, is whether the speed improvement would be significant enough to justify the effort.

For the second part, using the host CPU as an accelerator, there might be some complexities, but it could bring some solid improvements to performance. It's a neat idea that I'm surprised hasn't been brought up more often.

SpeedGeek · 22 June 2023, 14:54

Quote:

Originally Posted by Inner200k

Would there be any benefit patching certain math*.library(s) to call host hardware to do the calculations?

Would we get and speed improvement?

Not that it matters too much my I9 with 3080 runs a pretty fast Amiga by itself, Would be interested in thoughts, as I've always wonder why no one in all this time has thought of it or build an expansion card virtually that added the host cpu as an accelerator.

The P5 libraries already do patch the math libraries. The problem is they were developed under OS 3.1 and they can't guarantee compatibility with later versions of the math libraries. Cosmos released an updated version of the P5 libraries some years ago which disables this patching.

This would allow P5 library users to use the fpsp.resource to avoid the exception trap overhead and also provides compatibility with later math library updates.

Of course, the 68881/2 FPU systems don't need the FPSP code at all.

PeterK · 23 June 2023, 00:54

Speeding up the mathieee.libraries brings just nothing at all for real applications. And all those programs that would really need a lot of floating point calculations will always use direct FPU inline instructions in their code instead of calling the mathlibs over a slow interface with a lot of inefficient parameter ping pong in the registers and on the stack.

A typical application program will call floating point functions in the mathlibs for far less than 1-3 % of its time. So, it doesn't matter how much you can accelerate the mathlibs, you will never save more than these optimistic 1-3 % of your programs runtime.

There are already many different mathlibs on Aminet, trying to use the FPU hardware instead of the CPU for the mathlibs, and even the OS libs are doing that if an FPU is available.

I've tried that when I once wrote the MathLibsWinUAE, and the benchmarks showed indeed a big speed gain, but then I could never find a single application which benefits from these improvements, because nothing calls the mathlib functions thousands of times like a stupid synthetic benchmark.

That's why I decided to install some of my floating point routines under WinUAE 1.0 many years ago in order to use the x87 FPU directly, to let the 68k FPU inline instructions have some profit and speed up, too. The code was later much more improved by Toni Wilen when he added the 80-bit FPU support.

The x87 FPU might not be the best choice nowadays anymore to perform floating point calculations, there are also other instruction sets on modern CPUs implemented, which could do this better, but I'm not a PC coder and I don't need that additional speed for the AmigaOS anyway. Why? Just for higher benchmark results?

Inner200k · 23 June 2023, 10:22

Quote:

Originally Posted by PeterK

The x87 FPU might not be the best choice nowadays anymore to perform floating point calculations, there are also other instruction sets on modern CPUs implemented, which could do this better, but I'm not a PC coder and I don't need that additional speed for the AmigaOS anyway. Why? Just for higher benchmark results?

I'm not sure 'why' exactly, it's a bit like driving a car your stock engine goes x speed and you could put a better engine in it but if you did you still have to obey the road rules so have you gained anything? not really, 1-3% ain't much gain though, at best it was an interesting musing from an over worked under paid mcdonalds employee.

It comes from a curiosity of how fast can we make this Amiga run under emulation.

paraj · 23 June 2023, 18:25

Idea is fine in theory, and I actually did something similar for a university project (not for 68k) with a virtual PCI card in qemu that allow the guest OS to use HW acceleration for large linear algebra operations (BLAS routines). You need to target something that's both more frequently used and has a larger granularity to make it worthwhile though.

Graphics functions (or maybe something like CopyMem or whatever) would be better candidates I think.

Photon · 23 June 2023, 23:00

Math libraries are much too fine-grained. If you don't have an FPU, being able to just add or compare two numbers is vital, and corresponds to a lot of (integer arithmetic) instructions. Makes sense.

It doesn't quite make sense to use a library function to add or compare two FP numbers, if you can just write FADD or FCMP instead.

Something like a linear algebra library where you can transform or project a set of vectors using matrices would be much more sensible. Each function corresponds to many FPU instructions.

22 June 2023, 11:07	#1
Inner200k Registered User Join Date: May 2023 Location: Christchurch, New Zealand Posts: 33	Crazy Library Idea Would there be any benefit patching certain math*.library(s) to call host hardware to do the calculations? Would we get and speed improvement? Not that it matters too much my I9 with 3080 runs a pretty fast Amiga by itself, Would be interested in thoughts, as I've always wonder why no one in all this time has thought of it or build an expansion card virtually that added the host cpu as an accelerator.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Crazy idea to improve speed on games for AGA machines	jotd	Games images which need to be WHDified	38	19 February 2024 16:01
Crazy Cars 2 (C64),...IT'S CRAZY CARS 1 FFS! WTF??	ZEUSDAZ	Retrogaming General Discussion	6	25 September 2022 17:13
Another day, another crazy idea.. decentralised Web page rendering..	eXeler0	Hardware mods	28	01 December 2016 10:45
Crazy USB Link Hardware Emulation Idea	Djay	Hardware mods	1	26 December 2010 02:33
A500 IDE and/or Ethernet solution? OK, a crazy idea...	rlake	support.Hardware	17	08 May 2002 00:50

22 June 2023, 13:31	#2
Sani11 Registered User Join Date: Apr 2023 Location: Washington/USA Posts: 35	That's an interesting idea, not gonna lie. While I don't have the deep technical knowledge to assess the viability of patching math*.library(s) for host hardware calls, I think it's worth exploring. The main question, as you mentioned, is whether the speed improvement would be significant enough to justify the effort. For the second part, using the host CPU as an accelerator, there might be some complexities, but it could bring some solid improvements to performance. It's a neat idea that I'm surprised hasn't been brought up more often.

23 June 2023, 00:54	#4
PeterK Registered User Join Date: Apr 2005 Location: digital hell, Germany, after 1984, but worse Posts: 3,385	Speeding up the mathieee.libraries brings just nothing at all for real applications. And all those programs that would really need a lot of floating point calculations will always use direct FPU inline instructions in their code instead of calling the mathlibs over a slow interface with a lot of inefficient parameter ping pong in the registers and on the stack. A typical application program will call floating point functions in the mathlibs for far less than 1-3 % of its time. So, it doesn't matter how much you can accelerate the mathlibs, you will never save more than these optimistic 1-3 % of your programs runtime. There are already many different mathlibs on Aminet, trying to use the FPU hardware instead of the CPU for the mathlibs, and even the OS libs are doing that if an FPU is available. I've tried that when I once wrote the MathLibsWinUAE, and the benchmarks showed indeed a big speed gain, but then I could never find a single application which benefits from these improvements, because nothing calls the mathlib functions thousands of times like a stupid synthetic benchmark. That's why I decided to install some of my floating point routines under WinUAE 1.0 many years ago in order to use the x87 FPU directly, to let the 68k FPU inline instructions have some profit and speed up, too. The code was later much more improved by Toni Wilen when he added the 80-bit FPU support. The x87 FPU might not be the best choice nowadays anymore to perform floating point calculations, there are also other instruction sets on modern CPUs implemented, which could do this better, but I'm not a PC coder and I don't need that additional speed for the AmigaOS anyway. Why? Just for higher benchmark results?

23 June 2023, 18:25	#6
paraj Registered User Join Date: Feb 2017 Location: Denmark Posts: 1,226	Idea is fine in theory, and I actually did something similar for a university project (not for 68k) with a virtual PCI card in qemu that allow the guest OS to use HW acceleration for large linear algebra operations (BLAS routines). You need to target something that's both more frequently used and has a larger granularity to make it worthwhile though. Graphics functions (or maybe something like CopyMem or whatever) would be better candidates I think.

23 June 2023, 23:00	#7
Photon Moderator Join Date: Nov 2004 Location: Eksjö / Sweden Posts: 5,662	Math libraries are much too fine-grained. If you don't have an FPU, being able to just add or compare two numbers is vital, and corresponds to a lot of (integer arithmetic) instructions. Makes sense. It doesn't quite make sense to use a library function to add or compare two FP numbers, if you can just write FADD or FCMP instead. Something like a linear algebra library where you can transform or project a set of vectors using matrices would be much more sensible. Each function corresponds to many FPU instructions.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)