30 August 2022, 01:04 | #441 | |
Registered User
Join Date: Jun 2018
Location: Calgary/Canada
Posts: 247
|
Quote:
http://amiga.resource.cx/exp/delfina And looking beyond Amiga. https://en.wikipedia.org/wiki/Atari_Falcon The NeXT also had it. https://en.wikipedia.org/wiki/NeXTcube The DSP3210 is fine, of course, if you're okay with state of the art in 1992. I know Dave Haynie loved it and I know there's prototypes and some libraries for it, but there are a few* things I disagree with him on and this is one of them. - It really was never anything more than Dave's dream. The DSP56K at least made it into some real products for the Amiga. No Amiga was ever sold with the DSP3210 in it and no expansion exists with it. - It needs more RAM throughput and storage for the same amount of data. - It leans on the 68040 to convert to-and-from integer formats the can actually be used to, e.g., draw actual pixels on screen. - It's an architectural dead-end, there was nothing after it. - It wasn't as fast as AT&T claimed, Radius PhotoEngine (quad 66MHz DSP3210) on a Quadra only gave you a 2-4 times increase in Photoshop (over a 33MHz 68040!) - They were stupidly expensive; Radius PhotoEngine retailed for $1,099 in 1994 when a brand-new Power Mac 6100 was about $1,750 and made EVERYTHING two-to-four times as fast. They maxed out around 66MHz. Modern DSP56K's clock at 250MHz and are dual core. If this is about just resurrecting the best 1994 had to offer, then fine. If this is about making something amazing, now, then the DSP3210 is laughable and even the DSP56K is reaching it's end-of-life. But if you want SOME compatibility, it's the only thing remotely modern. * The other was the pointlessly complex AAA chipset when the industry had already moved to simpler chunky bitmap graphics. Last edited by nonarkitten; 30 August 2022 at 01:09. |
|
30 August 2022, 01:11 | #442 |
Registered User
Join Date: Jun 2018
Location: Calgary/Canada
Posts: 247
|
Like do the math -- a 100MHz 68060 beats a quad 66MHz DSP3210 set up. It's so "meh" it transitions from uninteresting to being the literal embodiment of an anti-pattern.
|
30 August 2022, 01:54 | #443 | ||||||||||||||
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,361
|
Quote:
Quote:
Quote:
Of course we are talking here about vintage implementations in both cases. Quote:
This combination was sold as a working and certified ultrasonic unit to many medical doctors and hospitals. (ATL HDI 1000 Ultrasound machine) Quote:
Quote:
Quote:
Why else would we discuss how to reimplement them today? AT&T's history is quite complex - them stopping to develop DSPs is not really a sign of anything regarding the merits of that specific design, is it? Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Last edited by Gorf; 30 August 2022 at 02:29. |
||||||||||||||
30 August 2022, 02:23 | #444 | |
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,361
|
Quote:
This comparison of old chips to even older chips doesn't tell us very much about how a modern reimplementation of one or the other concept/ISA would behave within something like the Vampire or any other FPGA (and potentially ASIC) implementation. Last edited by Gorf; 30 August 2022 at 02:42. |
|
30 August 2022, 06:53 | #445 |
Registered User
Join Date: Sep 2013
Location: Poland
Posts: 847
|
@Gorf - the point is - and should be always considered - what advantage has DSP3210 implemented in FPGA over "hard processor" in silicon? While you can hook it up directly to common large dma-enabled chip-ram like local memory shared by SAGA and AC68080 that's the only real advantage. It won't get anything like real 56k running at 250MHz. And it both needs larger and more expensive FPGA (which is a first "NO") and yet additional coding effort to both implement it and keep it in line with the rest of the "virtual chipset" (which is a second "NO"). Should Coldfire V4 be more compatible with 68060 there'd be no need to make AC68080 in the first place. The same applies to DSP... although it's not like DSP56k codebase for Amiga users would be large or the impact from introducing it to amiga world (and so is AMMX atm). And to use DSP features (either from hard, external DSP or softcore inside FPGA) you'll have to make extra effort (since it's different architecture than main processor and has it's own set of tools for development). In this aspect I must say that AMMX is straightforward approach. It gives some performance benefits while allowing to use one set of coding tools with updated compiler. Should there be any effort to make heterogenous architecture there are plenty of other choices up there with even greater performance. And since potential code base for either 3210 or 56k ain't that big to make a difference in amiga world we wouldn't lose much from dumping both of those solutions anyway.
As for blitter - blitter and copper are co-processors with very limited programming capabilities. That's because anything more would've been more expensive at that time. One way of making them better is making them faster. Other way of making them better is expanding bandwidth and range of accessed memory. Both are done in SAGA afaik. There's also an option to add fully programmable unit close-by. And Apollo card doesn't really need that since AC68080 is as close as it can be - by design. Since I am a fan of chipset on-board (genuine commodore chipset) I'd rather see a solution which allows both original chipset + cpu coexist and perhaps add 3rd coprocessor working on chipram in between CPU&Chipset cycles. That'd most likely require dropping support for on-board chipram and moving it to e.g. fast SRAM or PSRAM under FPGA control but that might introduce new effects with relatively simple and inexpensive hardware. Just think about e.g. RISC-V softcore moving around few dozens of software sprites while 68k just handles regular stuff within it's dma time slot limits and so does Agnus/Alice. BTW what is the timetable to get ASIC ? |
30 August 2022, 10:10 | #446 | ||||
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,920
|
Quote:
Quote:
Quote:
Quote:
Why do you skip zero pixels instead of writing them? Is this to preserve background information? If not, don't clear the buffer and write out zeros avoiding all the branching as this should be faster. In any case I would treat two pixels at once using a 1024 byte table instead of a 64 byte table and thereby avoid the bitfields altogether by working on byte indeces. Or is this loop run only very few times and needs a new table set up each time it gets called? In this case it is no miracle that using AMMX instructions for the scatter operation isn't much faster. Instead of bitfields I would expect masks and AND-instructions working on a register full of pixel data to be faster. |
||||
30 August 2022, 10:18 | #447 | |||
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,920
|
Quote:
Quote:
Quote:
|
|||
30 August 2022, 12:06 | #448 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,249
|
Quote:
Needless to say, this is terribly slow. Not by the microcode, but because the 68060 breaks up the instructions into multiple reads. I know because I'm just through a couple of optimizations of the latest P96 release where the blitter emulation changed for the 68060 for exactly this reason. You are better off reading the data manually and shifting it in place rather than using the bitfields. For the 68030, the situation is interestingly just the reverse. Comparing with the rest of the CPU, the bitfield instructions are fast. They surely require multiple cycles, but so do many other instructions, and they operate by a single bus cycle if possible, not by multple cycles. |
|
30 August 2022, 18:43 | #449 | |||||||
Registered User
Join Date: Jun 2018
Location: Calgary/Canada
Posts: 247
|
Quote:
Quote:
Quote:
Quote:
Those are transparent pixels. We're rendering sprites. Keep up. Quote:
Quote:
Coalescing two 16-bit writes into a single 32-bit write MIGHT speed things up a tiny bit, but with caching probably not. Quote:
Since on the NEOGEO, everything's a sprite, this code is executed exhaustively for the entire screen. Possibly many times per pixel since there's no "overdraw" testing -- I would guess in the 2-3 times territory. I also like how you ignored the actual metrics and are still harping on "tables" and "crap code" to try and prove some point. AMMX was 10% faster on sprite rendering in GNGEO. Only 10% over my so-called "crap code". If you think you can write better 68K code, then that only proves my point further that AMMX is basically rubbish in something it was specifically designed for. |
|||||||
30 August 2022, 18:48 | #450 | ||
Registered User
Join Date: Jun 2018
Location: Calgary/Canada
Posts: 247
|
Quote:
Quote:
So you don't like the Amiga then? Interesting. Why are you here then? |
||
30 August 2022, 19:01 | #451 | ||||
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,361
|
Quote:
Quote:
I really have nothing against the DSP56K or any other DSP per se. It is just that the DSP3210 was part of the A3000+ and these ultrasounds machines and now even new rebuild A3000+ boards exist and are actually running ... Yes, the DSP56K is on then Delfina but I could not find any software that makes use of it other than sound effects directly on this ZorroII card ... That said: I would have nothing against some DSP features directly build into Paula ... Quote:
Quote:
And If you think about it, per definition the Blitter already is a DSP: It takes one or more input-streams aka signals and transforms them into one output stream. In this case the operations on this data are rather simple but nevertheless it is already digital signal processing ... |
||||
30 August 2022, 19:31 | #452 | |||||||||||||
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,920
|
In some things yes, in others it is superior, in some inferior. Who would've thunk?
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Last edited by grond; 30 August 2022 at 19:37. |
|||||||||||||
30 August 2022, 19:36 | #453 | ||
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,920
|
Quote:
Quote:
|
||
30 August 2022, 19:48 | #454 | |
Registered User
Join Date: Jun 2018
Location: Calgary/Canada
Posts: 247
|
Quote:
Now, can you make this code run equally fast on the 68060? Maybe. Probably. But running pure 68K isn't for the benefit of people with 68060's since GNGEO could not, ever, practically run on one -- it's to be able to debug in UAE. UAE doesn't have AMMX. Probably never will. But compiling and debugging ON the Vampire is painful in comparison. And all that optimization to make it faster on the 68060 would make it a lot slower on the Vampire. So yeah, on the 68040 and 68060 this is perhaps not the fastest code. It's what GCC gave me, it runs well enough to debug and test on UAE and then run on the Vampire to check performance. AMMX only gets in the way here. |
|
30 August 2022, 19:53 | #455 |
Registered User
Join Date: Jun 2018
Location: Calgary/Canada
Posts: 247
|
|
30 August 2022, 20:18 | #456 | |
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,361
|
Quote:
Well OK ... If we are talking about real chips now ... the fastest easy available DSP would probably be some TMS320xxxx @ 1.25 GHz |
|
30 August 2022, 20:25 | #457 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,920
|
And we could add a 5 GHz Ryzen processor. We could have an Amiga task allocate the Ryzen as a resource and load Linux or Windows into it. No more need for PCTask...
|
30 August 2022, 20:31 | #458 | ||||||||||||
Registered User
Join Date: Jun 2018
Location: Calgary/Canada
Posts: 247
|
Quote:
Quote:
Quote:
Quote:
The move is necessary to move the sprite data from sprite memory to screen memory. This is not useless. It's required by the engine. You can shove your condescending tone right along with your sarcasm. Quote:
Quote:
Quote:
Quote:
If I was trying to make GNGEO run on a real, physical 68060, I might care about this level of hyper-optimization. But you people are so far off the reservation at this point. Quote:
Quote:
Quote:
Quote:
For the record, here's the AMMX inner loop. At the time, GCC didn't understand AMMX (not sure if it does yet), so that's all using DC.W with the original instructions in the comments. Code:
__asm__ volatile ( "\n" "\tmove.w 0(%0),d0 \n" "\tmove.w 2(%0),d1 \n" "\tmove.w 4(%0),d2 \n" "\tmove.w 6(%0),d3 \n" // TRANSi takes 8, 4-bit values from source and uses // words stored in E8 thru E23 to write the dest // since this needs 128-bit, this uses a register pair "\tdc.w 0xfe00,0x1803 \n" // TRANSi-LO D0, E0:E1 "\tdc.w 0xfe01,0x1a03 \n" // TRANSi-LO D1, E2:E3 "\tdc.w 0xfe02,0x1c03 \n" // TRANSi-LO D2, E4:E5 "\tdc.w 0xfe03,0x1e03 \n" // TRANSi-LO D3, E6:E7 // STOREM3 will conditionally store each word "\tdc.w 0xfe11,0x9926 \n" // STOREM3.W E1,E1,(A1) "\tdc.w 0xfe29,0xbb26,0x0008 \n" // STOREM3.W E3,E3,(8,A1) "\tdc.w 0xfe29,0xdd26,0x0010 \n" // STOREM3.W E5,E5,(16,A1) "\tdc.w 0xfe29,0xff26,0x0018 \n" // STOREM3.W E7,E7,(24,A1) : "+a"(gfxdata),"+a"(tilepos) :: "d0","d1","d2","d3" ); |
||||||||||||
30 August 2022, 20:33 | #459 | |
Registered User
Join Date: Jun 2018
Location: Calgary/Canada
Posts: 247
|
Quote:
Yes, there are some impressive DSP's from TI. Absolutely zero legacy with the Amiga though and zero code that would use them. |
|
30 August 2022, 20:33 | #460 |
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,361
|
|
Currently Active Users Viewing This Thread: 2 (0 members and 2 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Vampire V4 plus Amiga 1200 and 500 for sale | drusso66 | MarketPlace | 7 | 14 November 2021 05:59 |
For Sale: Amiga 1200 with vampire 1200 v2 | supperbin | MarketPlace | 8 | 09 July 2021 15:47 |
Warp 1260 or Vampire 1200 V2 | dude1995 | MarketPlace | 0 | 20 May 2021 04:05 |
Vampire 1200 | HanSolo | support.Hardware | 55 | 19 June 2017 10:15 |
Amiga 1200 Vampire Cards | PaulG | Amiga scene | 61 | 24 February 2017 03:47 |
|
|