31 May 2014, 11:49 | #121 | |
Registered User
Join Date: May 2014
Location: inside the emulator
Posts: 377
|
Quote:
However that is also the problem: if one wants a semi-compatible system capable of running some old software MorphOS and AOS4 does provide that. But people interested in those systems are extremely few compared to those running their original Amigas for nostalgia reasons. |
|
01 June 2014, 11:14 | #122 |
Registered User
Join Date: Apr 2014
Location: Germany
Posts: 154
|
On an accelerated AMIGA the copy from fastmem to chip mem is the main bottleneck.
For example: Looking at the Phoenix_demo4 the CPU on the A600-Vampire could could reach 45 FPS if the chipmem bus would not be the bottle neck. The best solution to fix this is to add VIDEO-out to the Turbocard. The next CPU-Card comes with Video out - also supporting chunky / truecolor. This will remove this bottleneck allowing much faster games. |
02 June 2014, 08:55 | #123 | |||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Well, dunno. How many clocks for fsin in a 68882 ? That must be the target.
I don't remember the exact fpga model, but it was probably Altera. Quote:
Also HAM rendering with a good quality is several orders of magnitude more complex than a simple MOVEP... Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
|||||||||
02 June 2014, 12:10 | #124 | |
Registered User
Join Date: Apr 2014
Location: Germany
Posts: 154
|
Quote:
Where is actually the difference of doing an instruction in Software and doing an instruction in hardware? Lets compare some 68000. Phoenix is hardwired and does all its instructions in hardware - and all normals ones in a single cycle. The original 68_000 did all instructions in software in several cycles! The software for them was in the ROM in the 68K CPU. The FSIN on the 68882 was a routine that was in fact executed from the ROM of the 68882. When the 68040 was designed Motorola figured that spending chip size on ROMS to including FSIN routines - will occupy valuable chip space. And they figures that spending this chip space to increase Cache size is the nbetter desicion - as the increased cache size will benefit the CPU performance. Motorolas logic was good. And nothing has changed since then. Instead adding ROMS with the routines - spending the chip space on bigger Caches is the most sensible solution. And Motorala is not alone in this idea - all chip companies figured the same ... |
|
02 June 2014, 13:35 | #125 |
Registered User
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,157
|
The difference, to my mind, is simply in whether or not the implementation is completely transparent to the software (including OS) running on the machine. Thus microcode counts as "hardware", whereas traps don't - even though, depending upon implementation, the only practical difference might well be that trap code is visible to the rest of the computer.
|
02 June 2014, 14:38 | #126 | ||||
Registered User
Join Date: Jun 2010
Location: PL?
Posts: 2,867
|
Quote:
As a programmer you may reuse library with soft FSIN, you may reuse code with FSIN or do this in a flavor you want as purpose of Sine(x) can be context dependent an sometimes simple LUT is sufficient, sometimes not. So in other words - from how smart programmer/developer you are and how do you know problem you want to solve depends what method most optimal you selecting. From usual life practice Sine is substituted usually by simpler approximations which are sufficient from problem point of view and you don't need 80 bit FP precision. Quote:
Quote:
Quote:
But this is plain dispute as there is no open code to provide HQ HAM conversion, existing C (open) code for HAM conversion is quite simple and should be not to difficult to implement such code in VHDL/Verilog (but i agree - it will be very poor so perhaps it should be improved but in a way to be still useful with limited amount LE's we have). We have CPU accelerator without other way to display data than feeding CHIP mem (or banging registers) so i would say that we should focus how to use existing display hardware - i found C2P and HAM usage as most important especially for OCS/ECS Amiga models. |
||||
02 June 2014, 14:39 | #127 | |
Registered User
Join Date: Apr 2014
Location: Germany
Posts: 154
|
Quote:
but placed in an external ROM? Lets say just like the MICROCODE ROM of the original 68K, this external ROM is there and does not depends on OS or library support. So even any "old" software would run out of the box. The main difference to the 68882 ROM would be that the new ROM is external for cost reduction. How would you call this setup? |
|
02 June 2014, 14:47 | #128 | |
Registered User
Join Date: Apr 2014
Location: Germany
Posts: 154
|
Quote:
There are several ways to improve this. From a software perspective a nice solution would be a C2P instruction combined with Multithreading. This combination would allow 1) to run a C2P at high speed from fastmem to fastmem. 2) to run C2P from fastmem to slow chipmem slow in parallel with low system resource usage Another way to improve the whole setup is to add RGB out to the FPGA card. This solution will open a lot more options of course with fast high resolution, truecolor screen. |
|
02 June 2014, 15:39 | #129 |
Registered User
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,157
|
The key distinction is whether or not this ROM appears somewhere in the Amiga's memory map and uses exceptions / traps / autoconfig initialization, or whether it's only visible to the CPU core itself, connected via some new designed-for-the-task mechanism, and thus completely transparent to the Amiga.
|
02 June 2014, 15:43 | #130 | ||
Registered User
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,157
|
Quote:
Quote:
|
||
02 June 2014, 16:22 | #131 | ||
Registered User
Join Date: Apr 2014
Location: Germany
Posts: 154
|
Quote:
But of course a DMA engine is always limited in flexibility. A second CPU thread is a lot more flexible. Threads can be used for many task - e.g. handling IDE or network traffic. Many tasks which "traditionally" used DMA could also be handled very good with hardware threads. Quote:
I have a LCD-TV connected to the AMIGA. The normal display comes in it with Scart. The new display can come in it with HDMI. This is easy to use. Adding a Flickerfixer to the FPGA is not difficult. Putting a whole chipset in the FPGA is more work but was also done before. |
||
02 June 2014, 16:27 | #132 | |
Registered User
Join Date: Jun 2010
Location: PL?
Posts: 2,867
|
Quote:
IMHO as this is performed 25,30,50,60 times per second on full screen then it can be more beneficial than fully extended precision transcendental FPU implementation (as 64KB LUT can cover Sine with 32b float and resolution of 0.005deg and i assume it will be fastest way to have FSIN). Adding video output can be done with help additional board that have been placed over Denise, then video from Denise can be captured and rerouted back do VIDIOT however also it can be possible to feed VIDIOT with new video data directly from FPGA where Denise video will be visible as overlay (in controlled window, perhaps with resizer/rescaler) thus it should be possible to have noninterlaced output with original video that fill whole screen size, original video as window inside bigger added/new video etc. Link between boards can be modern fast serial (like HDMI/DVI type of interface - video serializer and deserializer). But IMHO then it will be better to recreate whole Amiga (or by using similar principle to A-Clone) or by connectiong all main IC's (Agnus, Paula, Denise) around one FPGA and trough FPGA provide access to memory (as OCS\ECS\AGA will use very low amount of bandwidth this can be seen as UMA type architecture and CHIP can be unified with FAST). |
|
02 June 2014, 16:34 | #133 | |
Registered User
Join Date: Aug 2012
Location: Australia
Posts: 651
|
Quote:
Hell how about an improved fpga blitter and c2p from "fast mem" area into the "chip mem" area Last edited by Vot; 02 June 2014 at 16:42. |
|
02 June 2014, 21:24 | #134 | |
Registered User
Join Date: Jun 2010
Location: PL?
Posts: 2,867
|
Quote:
http://www.totalamiga.org/files/TA25...iewExtract.pdf btw seem that Denise is one of "easiest" IC's from Amiga to recreate in FPGA. |
|
03 June 2014, 18:12 | #135 | |||||||
Registered User
Join Date: May 2014
Location: inside the emulator
Posts: 377
|
Quote:
Quote:
Quote:
And why wouldn't 50 cycles be possible with an optimized trap mechanism? Quote:
Another way to support it would be using an extension of my prefix mechanism. A third way would be to implement the prefix system and document that using complex instructions can overwrite extended registers (D8-D15, A8-A15). It wouldn't be a problem for existing code and new code could just avoid those instructions. There are other options too. NB that a register file implemented in the smallest available type of memory block have 32 or 64 registers so a design with separate data and address register files have plenty to use for this purpose. Quote:
Quote:
If one of the two (for Xilinx FPGAs) free bits per instruction word is set the instruction could be trapped. Quote:
So why would nostalgia require support of all instructions ever existing in the 68k ISA? |
|||||||
03 June 2014, 18:14 | #136 | |
Registered User
Join Date: May 2014
Location: inside the emulator
Posts: 377
|
Quote:
|
|
03 June 2014, 19:33 | #137 | |
Registered User
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,157
|
Quote:
Note - that's just clarifying a distinction, not saying either is "better". If you want me to say which i think is better, then I'd say it's far more important for the base 68000 instruction set to be "hardware" than it is for FPU instructions. |
|
05 June 2014, 14:46 | #138 | ||||||||||||||||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Quote:
We have the choice between : - hardwired (full hardware) - iterative (like HW but in several passes) - microcode - emulation Only the last solution is unacceptable. Yes, good idea. A 68000 doing MOVEP is ok. A 68030 doing MOVEP is ok. A 68060 doing MOVEP is not ok. Easy to see the difference, really. Quote:
What prevents you from doing the same with e.g. MOVEP or FSIN ? I know that the 7000 LEs of the Vampire aren't enough. But for the full Apollo, why not ? You have enough space for several 68k in 100k LEs ! Quote:
But if you want to do instructions such as MOVEP exactly like they were done in the 68000 and absolutely no difference is visible in comparison to it, then it's fine with me. Alas, while microcode is 100% transparent, software emulation is not. The 68000's microcode was NOT 68k instructions. It was VLIW. It did not have to save regs, change the PC, decode instructions by software, and return to the caller. I guess the 68030's microcode is similar. And the 68000 was only 68000 transistors (hence its name). Boy, what a cost nowadays. Quote:
Also the 68k family started to decay at the time of the 68040. Not for nothing. The 68040's implementation was very poor and is really not a good example of a right choice. Quote:
No, on the contrary, Moto's logic was all but good. It was good up to the 68030 which ruled the world in its time. Not after, when it changed. Quote:
Quote:
Furthermore, a C2P is 100% Amiga specific - which the 68k must NOT be in any manner IMO. Quote:
Following your logic, no fpu at all is better. Perhaps this is what you want ? The limit with the c2p is the chipmem bandwidth, not the cpu. Therefore a hardware c2p wouldn't be much faster. Anyway, i don't want fsin for use myself especially. I want it mainly because it was there before. Quote:
Never forget that architectures persist longer than implementations. Do you accept adding instructions specific to solve some hardware problem ? Not me. Look at MOVEP for an example : designed for some specific purpose - now in the way and must be kept. Quote:
A good HAM rendering method has to read a pixel, find out whether it's closer to a fixed, red, green or blue pixel, and then emit it according to that choice. Doing that gives a quite big routine already (mine is around 240 bytes of code and you can bet it's optimised to death). If you wish to do HAM conversion in HW (good quality), you have to know that big TABLES are used there. Quote:
So i see little use for HAM in HW. Quote:
Quote:
A ROM inside the CPU is a lot faster than a ROM outside. Perhaps you forgot that a ROM has latencies, and they're quite big even at 100mhz. You may want to "hide" these latencies - but then you're gonna pollute the icache with that ROM - which isn't the case with microcode, obviously. Quote:
Quote:
By the time of the Natami's 68050 i wrote some small emu lib for it, so i know what i'm talking about. Even if you remove some of the bottlenecks, what remains is still an horror to handle. Quote:
Quote:
Quote:
Quote:
Quote:
Anyway the 68882 is something, regular integer instructions is something else. We may talk about CAS, or the bitfields for example. May be a lot more interesting than FSIN, huh ? Quote:
Nostalgia wants a cpu that's easy to code on, has a complete instruction set, not a cpu that's the fastest possible and sacrifices everything for that chimeric goal (as you're not gonna be competitive anyway with other current families). If you want to do a 68k, you do a 68k, period. If you want to take a subset of its instruction set, then you can reencode it fully and it'll be another story. I want to code in asm because I like the freedom of it. And only the 68k (or possibly a derived cpu family) is appropriate for that. Basically this is what I defend here. The ISA should be extended, not reduced, even if this costs a few mhz. Quote:
Because it involves executing many more instructions than microcode and would be a lot slower. As we're running on an FPGA, why not implement BOTH solutions anyway ? It's possible to switch even at runtime ! So everyone would be happy. The costs and benefits of each solution readily available for direct study. No long, useless discussions. But perhaps some are afraid of what they would discover ? That, for example, having all instructions isn't much slower than removing some ? |
||||||||||||||||||||||
05 June 2014, 15:39 | #139 | ||
Registered User
Join Date: Apr 2014
Location: Germany
Posts: 154
|
Quote:
MOVEP.L on the 68000 took 24 clocks 24 clocks @ 7 MHz is equivalent to 411 clocks @120MHz. Doing a trap costs me 8 clocks. This means you have 403 clocks to do MOVEP in software and would still not be slower.... Sounds doable... Quote:
* MOVEM is usefull * MOVEP is by far not that important Of course its possible to include every instruction.... But would you also include CALLM ? If not why not? Last edited by Gunnar; 05 June 2014 at 15:53. |
||
06 June 2014, 14:10 | #140 | ||||||||||||||
Registered User
Join Date: May 2014
Location: inside the emulator
Posts: 377
|
Quote:
Quote:
Quote:
Quote:
Quote:
Note that I don't propose to implement such a mechanism as I think the normal trap mechanism will be more than enough. The predecode bits have better uses that can potentially accelerate all instructions instead of only accelerating unimplemented instruction emulation. Quote:
Quote:
We have hobbyists doing hacking on their spare time, low end FPGAs and no market. Quote:
Multiprocessing is hard to retrofit into the Amiga and my previous attempts to discuss the topic didn't result of any feedback so I guess nobody is interested in even trying getting it to work. Quote:
Quote:
Quote:
Quote:
But most people never used that kind of system. Quote:
Quote:
BTW it will be slower with all instructions implemented. Even having MOVEM support decreases performance but it have to be supported. |
||||||||||||||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Vampire 500 project started | majsta | Hardware mods | 221 | 17 August 2016 18:42 |
cd32 project idea i challenge ... | sian | request.Other | 11 | 15 June 2013 19:34 |
Looking for artist to collaborate on Lotus Turbo Challenge project | P-J | Amiga scene | 16 | 07 January 2012 04:21 |
Desperately seeking Amiga Demo Coder | slayerGTN | Amiga scene | 2 | 02 August 2010 23:34 |
Project-X SE & F17 Challenge v2.0 (1993)(Team 17)(M5)[compilation][CDD3499] | retrogamer | request.Old Rare Games | 0 | 05 April 2007 14:37 |
|
|