27 April 2019, 01:35 | #161 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
lsr #8the trick is similar: Code:
d0.w=xx00 moveq #0,d1 .... move.w d0,-(sp) move.b (sp)+,d1 .... d1.w=00xx |
|
27 April 2019, 05:51 | #162 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
[lsl #8]
My personal preference is to start the program with "clr.l -(sp)" and match it before end with "move.l (sp)+,d0", and then use pairs of move.b dX,(sp) move.w (sp),dX |
27 April 2019, 09:15 | #163 | |
Registered User
Join Date: Feb 2015
Location: Copehagen
Posts: 36
|
Quote:
i just tried with $ff56 and the result was $ff is it not only if you use movem.w that it clear or set the upper word depending of bit15? Last edited by PeterJ; 27 April 2019 at 09:30. Reason: edit just add some stuf |
|
27 April 2019, 10:21 | #164 | ||
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
You just have to be careful not to use it in a nested routine so your sentence should be written "to start the subroutine with". As is should (i've simply written d0=$xx00 because low bits are anyway lost so can be anything). But from your next phrase is it not that you meant the asrinstruction? Quote:
movemdeals with words (or longs) and never with bytes. |
||
29 April 2019, 12:36 | #165 |
ex. demoscener "Bigmama"
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,624
|
|
14 August 2019, 20:51 | #166 | |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
Quote:
|
|
17 August 2019, 00:08 | #167 | |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
|
Quote:
Later, on-chip caches affected performance more than the number and time length of instructions, and this allowed utility applications to not be bogged down and reduce this factor. But even after this hardware acceleration (circa 1990), applications such as games and demos would never use C (or C++) in time-critical sections for another half a decade, as we know. It's true to this day that any high-level language (or one posing as such!) will always be beaten by a great margin by "simply" writing the program in Assembly. (The advantage of truly portable languages is of course the portability and less code to write, if you're not using macros.) All this to make clear that there is no language level higher than Assembly that will ever generate as efficient (or small) code as writing it in Assembly It's self-evident. But just to give factors for the performance loss paid. The compiler doesn't know what you're trying to do, so it can't deliver the perfect translation. |
|
04 February 2020, 15:19 | #168 |
Registered User
Join Date: Sep 2019
Location: Essen/Germany
Age: 55
Posts: 463
|
Maybe there is a faster way to clear the upper word of a register?
Replace this (16 cycles): Code:
and.l #$ffff,d0 Code:
moveq #0,d1 move.w d0,d1 move.l d1,d0 Also 12 Cycles but only one register needed: Code:
swap d0 clr.w d0 swap d0 Last edited by sparhawk; 04 February 2020 at 15:24. |
04 February 2020, 16:51 | #169 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
ext.l dx(4 cycles ) But usually I keep a register with the upper part zeroed out of the main loop and then move the data only for the lower part. |
|
04 February 2020, 16:53 | #170 | |
Registered User
Join Date: Sep 2019
Location: Essen/Germany
Age: 55
Posts: 463
|
Quote:
Yes, that woul dbe the obvious solution. But it depends, so in the general case, I can't know that. I usually do a lot of prototyping in Easy68k and see if I can find faster solutions as it tells me the cycle count, which is IMO a great feature for that. |
|
21 February 2020, 14:03 | #171 |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
This is one I saw in another thread that made me scratch my head for a while is using add dx,dx to simultaneously test and clear a "flag". For example you have a loop where you are setting a flag from 0 to 1 if something occurred. Then at the end of the loop you check the flag to see if you need to loop again and reset the flag (sorting routine I did this in).
Instead of: Code:
.loop: moveq #0,d0 ;reset flag ... ;If something occured, flag it moveq #1,d0 ... ;Do we need to loop again? tst.w d0 bne.s .loop Code:
moveq #0,d0 ;reset flag once .loop: ... ;If something occured, flag it moveq #-128,d0 ;set flag = $80 ($fffffff80) ... ;Do we need to loop again? Also reset flag add.b d0,d0 bcs.s .loop |
21 February 2020, 14:50 | #172 |
ex. demoscener "Bigmama"
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,624
|
I stared at this for a while without seeing how it would reset the flag (carry), but I suppose you meant "reset" in the sense of returning d0 to zero?
|
21 February 2020, 15:00 | #173 | |
Registered User
Join Date: Dec 2019
Location: Preston
Posts: 100
|
Quote:
the key part I believe is not the add.b. but the fact that d0 contains $xxxxxx80 beforehand from the moveq forgive me if I'm wrong as I'm just starting out. Mike |
|
21 February 2020, 15:05 | #174 |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
|
21 February 2020, 15:42 | #175 | |
ex. demoscener "Bigmama"
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,624
|
Quote:
|
|
02 March 2020, 16:05 | #176 | ||||
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
Quote:
Quote:
I burnt through a great heap of money (current estimate is between $150,000 - $200,000 :rising daily as I keep working on it alongside the game, so keep this in mind before you ask for free download - it's literally like asking to donate an average American house) during last two years on designing Higgs, which is slightly lower-level than C, but I designed it to be identical in speed to hand-written ASM. Current features are: - full access to all registers and ASM instructions - choice of WorkingRegister to use by Higgs if the feature requires it - global/local variables/constants - byte/word/long access via .bwl (default is long, so no need to specify .l) - arrays - structures - typecasting (word/long) - conditions - loops (continue + break) - blocks {} allowing to pollute the name-space only within the current block - debug printing - function declarations with parameters (your choice of registers or global or local variables) - function call with or without parameters - local functions (invisible to outside world) like in Pascal - push/pop stack syntax - basic math operations (signed var1 = var2 * var3), (var3 += var1) All of the features above are possible to implement (Higgs is written in C#) with the exact same instruction footprint as if you wrote it manually in Asm. Some common C features like switch or do-while are high on my to-do list - I somehow managed to write the game without them, to my surprise, so they simply didn't get implemented yet. On-Demand Inlining (e.g. only when you want, but can still force it to always) is in Top 5. On a 6502 target, I have an Unroll Loop, this still needs to be implemented to 68000 target. You still have to think in terms of byte/word/long access and still have to prefer registers to variables (but don't have to if you don't feel like it). You are solely responsible for contents of registers, but if you want - you have an option to code using just variables. Primarily, this targets .68000. Most of the features are implemented also for a RISC backend (Jaguar's GPU and DSP processors). I also have a .6502 and .6502C targets (though those are currently simplest). Once networking gets enabled in core for Vampire (and I can start deploying builds to my V4), I will make .68080 target, eventually with AMMX support. Quick example: Code:
; Arrays of structures are supposed to be accessed sequentially ; each time you simply advance the pointer via Next () which is a simple add.l #StructSizeOf,ptrStruct array SLaserShot LaserShots [MaxLaserShots] ; Player's lasers SLaserShot.UseRegister (a2) ; Use this register for access Animate_LaserShots: { ; Animate (localZ + camY) Already Active LaserShots { ; Player's LS register d7:lpMain ; Keep d1 as WorldSpeed, since SLaserShot_UpdateZ requires d1 as input register d1:WorldSpeed ; PlayerSpeed + LS_Speed register d2:CurrentPlayerSpeed CurrentPlayerSpeed = PlayerSpeed >> #3 SLaserShot.InitRegister (LaserShots) loop (lpMain = #MaxLaserShots) { WorldSpeed = CurrentPlayerSpeed + SLaserShot.Speed ; print2H (SLaserShot.camY,SLaserShot.camZ,#110,#50) if.l (SLaserShot.IsActive == #1) { if.l (SLaserShot.FrameDeactivate <= Frame:d0) { ; Disable LS if it travelled too far SLaserShot.IsActive = #0 } else { ; LS can still remain active SLaserShot_UpdateZ () ; Update Z SLaserShot_UpdateY () ; Update Y (after Z, so it is sync'ed) } } SLaserShot.Next () } } rts } Quote:
Granted, it's lower level than C as it's not supposed to be completely safe and idiot-proof, like C is. But it's infinitely more easy to add/remove Higgs code compared to ASM. The mental effort required for pure ASM (nested irregular conditions, etc.) makes it hard to simply discard the code you wrote. In Higgs, I don't even think about that - I simply delete the code and rewrite from scratch. Let the compiler insert all the jump labels and figure out the proper comparison/BXX instruction based on the parameters. Quote:
On Atari Jaguar, 98% of code was written for 68000 and only 4 KB in RISC (3D transform and rasterizer loop). So, quite literally, everything else is 68000. That's: - input, - Z-sorting, - culling World track mesh, - double-buffering, - creating doublebuffered polygon list for RISC GPU, - strafing physics, - collision detection, - camera, - full 8-state AI, - spawning enemies, - procedural random generation of enemy RPG parameters, - HUD, - managing Jaguar's ObjectProcessor list (and related IRQ), - damage equations. And about two dozen things I didn't think of right this moment. On Jag, about 90% of that was rewritten (I started with 100% ASM, gradually as I kept adding Higgs features, rewrote additional parts) into Higgs (100% on Amiga), yet benchmarks showed that it only took 10% of frame time on the 13.3 MHz 68000. Meaning, I could still run the full logic of game ten times per frame, yet keep 60 fps. So, even if the Motorola was 10x slower at just 1.4 MHz, it still should fit within a frame time. Now that's funny |
||||
03 March 2020, 15:11 | #177 | |
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
|
Quote:
But to claim that it reaches identical speed to hand-optimized assembler cannot be true, so I have to defend Photon's statement here. You always have to make compromises when translating a high-level (even the lowest high-level) language into machine code. Give me a program generated by Higgs and I (and many other coders here) will always be able to show you sequences which allow optimization. |
|
03 March 2020, 20:17 | #178 | |
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
Quote:
That's how it started - first with macros, then macro modifications at compile-time, and eventually parsing expressions and simple commands (loops, conditions, blocks, etc.). Yeah, I probably wouldn't use "hand-optimized" term. Rather, I use "hand-written". Meaning, same efficiency as I would write it by hand in ASM (though, it is certainly possible to write a slightly faster version, if you are willing to bastardize the code to the point it's unreadable later). It's always possible, in ASM, to rearrange and rewrite certain combination of instructions to save some cycles (as this thread has demonstrated probably dozens of times). But, that creates unmaintainable code (long-term). You save 4 cycles by abusing some fluke register dependency, and when you need to change the code, boom. You burn half day debugging wth is going on I'm sure we all did the same thing: - you write version 1 - it works, it is nicely documented or even self-documented - you spot something, make version 2 and it saves some cycles - you do the same and have version 3 - 3 months later you make some change elsewhere that breaks some of the dependencies brought by optimizations (because you now use higher 16 bits or whatever else it is). Now, it is possible, to implement a final Optimizer pass, that would go over the code, examine the register status and replace certain combination of ops by a different, faster one (like the ones mentioned in this thread). That would be indeed useful for 68000, but since now I focus on Vampire and 68040-68060, it's not really critical for me. That brings the question - is there some kind of optimizer like this already for 68000 ? Something that would do such analysis of the code and find combos of ops that are safe to replace with faster ones ? |
|
03 March 2020, 20:38 | #179 | |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
Quote:
And neither does it mean the code is unmaintainable. |
|
03 March 2020, 20:55 | #180 |
Registered User
Join Date: Aug 2006
Location: Finland
Age: 51
Posts: 241
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
68000 boot code | billt | Coders. General | 15 | 05 May 2012 20:13 |
Wasted Dreams on 68000 | sanjyuubi | support.Games | 5 | 27 May 2011 17:11 |
680x0 to 68000 | Counia | Hardware mods | 1 | 01 March 2011 10:18 |
quitting on 68000? | Hungry Horace | project.WHDLoad | 60 | 19 December 2006 20:17 |
3D code and/or internet code for Blitz Basic 2.1 | EdzUp | Retrogaming General Discussion | 0 | 10 February 2002 11:40 |
|
|