English Amiga Board    


Go Back   English Amiga Board > » Coders > Coders. Asm / Hardware

Reply
 
Thread Tools
Old 29 April 2012, 17:07   #41
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 38
Posts: 11,944
Quote:
Originally Posted by Thorham View Post
What I want to know is why TAS shouldn't be used for memory. Doesn't seem like it could hurt. I've actually tried (chipmem) it and it didn't seem to cause any weird behavior.
http://eab.abime.net/showpost.php?p=745876&postcount=3

TAS seems to work fine if there is no competing DMA active.

TAS Dx is perfectly safe, it does not do any memory accesses.
Toni Wilen is online now   Reply With Quote
Old 29 April 2012, 17:10   #42
mark_k
Registered User
 
Join Date: Aug 2004
Location:
Posts: 716
Quote:
Originally Posted by Thorham View Post
What I want to know is why TAS shouldn't be used for memory. Doesn't seem like it could hurt. I've actually tried (chipmem) it and it didn't seem to cause any weird behavior.
As I understand it, on 68000/010 (only) the special read-modify-write cycle used when TAS accesses memory can interfere with the custom chips' access to chip memory. Also some Zorro II fast RAM cards might not support the read-modify-write cycle.

You could try running a test program, something like this:
Code:
Loop:
   tas   d0
   tas   d0
   tas   d0
   tas   d0
   bra.b   Loop
On a 68000-based Amiga, set your Workbench screen to 16 colours high-res then run the test program. Maybe also experiment with reading/writing floppy a disk while the test program is running, and/or run a program which uses the blitter.

There's an old newsgroup posting which mentions some of the possible issues with TAS, see this.
mark_k is offline   Reply With Quote
Old 29 April 2012, 22:04   #43
Ricardo
Registered User
 
Ricardo's Avatar
 
Join Date: Jul 2009
Location: Nottingham / UK
Posts: 122
Cool topic
Ricardo is offline   Reply With Quote
Old 29 April 2012, 23:13   #44
Photon
Oldskool Demo Coder
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Hult / Sweden
Age: 41
Posts: 3,674
Send a message via MSN to Photon
Many caveats in HRM can be taken with a grain of salt, and the TAS warning like the other ones can be ignored - if you know where it's appropriate.

TAS, even to memory, is fine. But it has to be executed when no DMA (or memory refresh slots) want to fight its bus ownership. What happens is that the CPU does whatever it wants with the bus, and these other two users of the bus can't set its DMA or memory refresh address or data.

This can of course cause havoc including losing data in memory, and DMA writing to the wrong address including executing programs. You could avoid the former if you do your own memory refresh of desired ranges in memory you care about, such as where the running code and data and the hardware vectors are, and avoid the latter by disabling the copper and never executing TAS instructions while blitting or bitplane/sprite/audio DMA is active, but I think this limits its usefulness somewhat...

TAS Dn can of course be used anytime, anywhere, since it doesn't hog the bus.
__________________
Henrik. Programs Amiga demos, iPhone apps, websites, etc.
A1000/512k - A500 2.0/040@28/4M/.5M slowmem/8M/SCSI/CF - A600 portable II 3.1/ACA630/WiFi/CF - 'A1700' 3.1/68060@80/64M/IDE-Fix Express/CF - etc."The difference between PC and Amiga is that 10yo PCs are worth $0. 20yo Amigas are worth a lot, and Amigas that are only 15yo cost a fortune!"
If you like Portal 2, try my >> single player and cooperation maps <<
Photon is offline   Reply With Quote
Old 29 April 2012, 23:47   #45
Galahad/FLT
Going nowhere
 
Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 39
Posts: 5,028
I simply never use it, because i'll know damn well, i'll forget about not using it to access memory and then be scratching my head wondering why the fuck my program doesn't work!!!
__________________
Former member of: LSD, Scoopex, Razor 1911, Dual Crew Shining, Rednex, Fairlight.

www.southwestscrap.co.uk
Galahad/FLT is offline   Reply With Quote
Old 30 April 2012, 12:22   #46
mark_k
Registered User
 
Join Date: Aug 2004
Location:
Posts: 716
Quote:
Originally Posted by mark_k View Post
You could try running a test program, something like this:
Code:
Loop:
   tas   d0
   tas   d0
   tas   d0
   tas   d0
   bra.b   Loop
Doh. Of course that will always work fine since TAS is on a data register! It should be something like this instead:
Code:
   suba.l a0,a0       ;Address 0 is in chip memory
Loop:
   tas    (a0)
   tas    (a0)
   tas    (a0)
   tas    (a0)
   bra.b  Loop
Or if you have a 68010 CPU you can make use of its loop mode:
Code:
   suba.l a0,a0       ;Address 0 is in chip memory
Loop1:
   moveq  #-1,d0
Loop2:
   tas    (a0)
   dbf    d0,Loop2
   bra.b  Loop1
(Of course writing to location 0, while mostly harmless is not really legal. But it should be fine for the purposes of this test on 68000/010. If you want to be more legal you could allocate some chip memory or have a chip data hunk and point A0 to that.)
mark_k is offline   Reply With Quote
Old 30 April 2012, 14:12   #47
Photon
Oldskool Demo Coder
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Hult / Sweden
Age: 41
Posts: 3,674
Send a message via MSN to Photon
As stated, if you don't perform the tests in a DMA- and memory refresh-controlled environment, memory rows will be lost until you are running random words instead of code.

If you don't take care of those two things AND turn the system off, you will get trashed memory, rogue coppers, and a crash.

If system is off and it's running in real fastram, the effects should be limited to rogue coppers, if disk DMA is off and blits have finished.
__________________
Henrik. Programs Amiga demos, iPhone apps, websites, etc.
A1000/512k - A500 2.0/040@28/4M/.5M slowmem/8M/SCSI/CF - A600 portable II 3.1/ACA630/WiFi/CF - 'A1700' 3.1/68060@80/64M/IDE-Fix Express/CF - etc."The difference between PC and Amiga is that 10yo PCs are worth $0. 20yo Amigas are worth a lot, and Amigas that are only 15yo cost a fortune!"
If you like Portal 2, try my >> single player and cooperation maps <<

Last edited by Photon; 30 April 2012 at 14:20.
Photon is offline   Reply With Quote
Old 30 April 2012, 19:08   #48
Samurai_Crow
Team Chaos Member
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Pierre, SD USA
Age: 38
Posts: 235
Send a message via Yahoo to Samurai_Crow
There are softcores on the way for the NatAmi and possibly the FPGAArcade Replay board that will not support the instructions that are not legal on the Amiga. The encoding for TAS, CAS and such was kind of added on as an afterthought in the instruction set but the possibilities of adding more useful single-core opcodes to the Apollo N68070 are likely to yield better performance in the long run.
__________________
Member: Total Chaos team and AROS Development Team.
Samurai_Crow is offline   Reply With Quote
Old 30 April 2012, 21:10   #49
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 36
Posts: 1,588
Thanks guys for explaining the TAS problem Could be useful when used with a data register. Shame that CAS/CAS2 always work on memory.

Quote:
Originally Posted by Samurai_Crow View Post
There are softcores on the way for the NatAmi and possibly the FPGAArcade Replay board that will not support the instructions that are not legal on the Amiga. The encoding for TAS, CAS and such was kind of added on as an afterthought in the instruction set but the possibilities of adding more useful single-core opcodes to the Apollo N68070 are likely to yield better performance in the long run.
That's not a very smart move, seeing how those instructions can actually work, especially TAS.
__________________
Random number generation is the art of producing pure gibberish as quickly as possible.
- Bob Jenkins
Thorham is offline   Reply With Quote
Old 30 April 2012, 21:30   #50
pmc
is long gone
 
pmc's Avatar
 
Join Date: Apr 2007
Location: London
Posts: 1,590
I agree with Thorham on that.

I thought the Natami was supposed to be compatible with classic Amigas?

Not supporting all the opcodes that the native Amiga CPUs use surely automatically makes it less than 100% compatible.
pmc is offline   Reply With Quote
Old 01 May 2012, 01:17   #51
Samurai_Crow
Team Chaos Member
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Pierre, SD USA
Age: 38
Posts: 235
Send a message via Yahoo to Samurai_Crow
The goal for the FPGAArcade Replay is compatibility at all costs.

The goal for the NatAmi is a bit different. We want performance at all costs with just enough compatibility to run WHDLoad and all of its slaves. If you look at our homepage you'll see that we want to leave out legacy drawbacks in order to make way for more performance.

Besides the legal threshold for the word "compatible" is 90% of software running on it. I think we can meet that threshold easily.
__________________
Member: Total Chaos team and AROS Development Team.
Samurai_Crow is offline   Reply With Quote
Old 01 May 2012, 02:24   #52
matthey
Registered User
 
Join Date: Jan 2010
Location: Kansas
Posts: 205
The Natami's Apollo fpga CPU currently supports TAS in hardware without the read-modify-write but it's recommended NOT to use it in case it could be used later for it's intended purpose. The following instructions will likely be trapped:

MOVEP, MOVES, PACK, UNPK

They should not be used as they will be slow and the encoding space could be reused at a later time. It would be wise to replace them in any patches. MOVEP is trapped on the 68060 as well. Instructions that will likely not be supported and may have their encoding space reused are:

CAS, CAS2, CMP2, CHK2, CALLM, RTM and BKPT

CAS and CAS2 are not reliable on the Amiga. CMP2 and CHK2 are trapped on the 68060. CALLM/RTM are only on the 68020 and not really useful. If anyone sees a glaring 68k compatibility problem with this then speak up. The Apollo core should be very forgiving of self-modifying code which will probably be better for 68k compatibility than supporting some rarely used instructions.
matthey is offline   Reply With Quote
Old 01 May 2012, 23:20   #53
Photon
Oldskool Demo Coder
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Hult / Sweden
Age: 41
Posts: 3,674
Send a message via MSN to Photon
PRE-EDIT: gosh, why do I reply to the message in the inbox. Kept the below text cos I think it voices similar previously expressed opinions in more detail.

Quote:
Originally Posted by Samurai_Crow View Post
There are softcores on the way for the NatAmi and possibly the FPGAArcade Replay board that will not support the instructions that are not legal on the Amiga.
That's just silly!

In the same list of "illegal" instructions are: clr.w certain custom chip registers, yet you wouldn't remove the clr instruction, would you? Betcha a movep to BLTSIZ wouldn't be very healthy either!

Backwards compatibility is easy, forwards compatibility is harder. Now possibly made impossible by not supporting an instruction that has been in every 680x0 CPU! Please tell me they are reconsidering.

Imagine that, a single sentence in a book becoming law. I think this is how myths become religions.

Also: betcha I can make CAS work fine on accelerated Amigas. Trapping for patches is ...*oof* fine I guess... *grumble* *suspicious look*
__________________
Henrik. Programs Amiga demos, iPhone apps, websites, etc.
A1000/512k - A500 2.0/040@28/4M/.5M slowmem/8M/SCSI/CF - A600 portable II 3.1/ACA630/WiFi/CF - 'A1700' 3.1/68060@80/64M/IDE-Fix Express/CF - etc."The difference between PC and Amiga is that 10yo PCs are worth $0. 20yo Amigas are worth a lot, and Amigas that are only 15yo cost a fortune!"
If you like Portal 2, try my >> single player and cooperation maps <<
Photon is offline   Reply With Quote
Old 02 May 2012, 01:21   #54
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 36
Posts: 1,588
Quote:
Originally Posted by matthey View Post
MOVEP, MOVES, PACK, UNPK

They should not be used as they will be slow and the encoding space could be reused at a later time. It would be wise to replace them in any patches.
That's a nice way to reduce 680x0 compatibility, great design choice

Quote:
Originally Posted by matthey View Post
MOVEP is trapped on the 68060 as well. Instructions that will likely not be supported and may have their encoding space reused are:

CAS, CAS2, CMP2, CHK2, CALLM, RTM and BKPT

CAS and CAS2 are not reliable on the Amiga. CMP2 and CHK2 are trapped on the 68060. CALLM/RTM are only on the 68020 and not really useful.
Who cares about 'trapped on 68060'? Is the 68060 now the 680x0 law?

Quote:
Originally Posted by matthey View Post
If anyone sees a glaring 68k compatibility problem with this then speak up. The Apollo core should be very forgiving of self-modifying code which will probably be better for 68k compatibility than supporting some rarely used instructions.
Rarely used or not, 680x0 is 680x0, and by going down this road, the Natami 680x0 core won't be very 680x0, now will it?

Perhaps FPGAArcade is a better choice for classic users, seeing how Natami removes what ever it wants so that it can reuse the encoding space. All those extra instructions aren't any good anyway. Software that would work perfectly fine on 680x0, won't work now. Natami is looking less attractive by the second.
__________________
Random number generation is the art of producing pure gibberish as quickly as possible.
- Bob Jenkins
Thorham is offline   Reply With Quote
Old 02 May 2012, 01:53   #55
matthey
Registered User
 
Join Date: Jan 2010
Location: Kansas
Posts: 205
Quote:
Originally Posted by Photon View Post
Backwards compatibility is easy, forwards compatibility is harder. Now possibly made impossible by not supporting an instruction that has been in every 680x0 CPU! Please tell me they are reconsidering.
Some 68k instructions are baggage. They are poorly encoded (slows decoder for all instructions) or poorly implemented and don't work well in a superscaler environment. Not many programmers used them because they were too slow. The 68060 designers could see that it wasn't practical to support them in hardware. They trapped the less common ones for backward compatibility but this made them even slower. Reusing the encoding space is not the preferred choice but there isn't a lot of encoding space left without using A-line or F-line for integer instructions but doing that has it's down sides also. We are paying attention to which instructions were used on the Amiga and that's why MOVEP, TAS and most of the BCD instructions will most surely be supported. We are open to reconsidering removals if you can show us examples where CAS, CAS2, CMP2 and CHK2 are used in a legitimate way on the Amiga while being faster than using other instructions and being common enough that patching the executable to remove them would be a major problem. Note that these removals do not affect 68000 compatibility which, interestingly enough for this thread, should be excellent.

Quote:
Originally Posted by Photon View Post
Also: betcha I can make CAS work fine on accelerated Amigas. Trapping for patches is ...*oof* fine I guess... *grumble* *suspicious look*
I bet you can too but I bet you can't find an advantage to using it on the Amiga because of it's slow timing (on the 68040 and 68060 at least). How many instructions can the 68060 execute in:

CAS 19 cycles
TAS 17 cycles (in memory)

You can do the same faster with replacement code. If it was faster in any case on the Amiga then someone would have used it but it's not and that's why it's gone unused. CAS is useless without a working read-modify-write cycle that's slow on a modern processor.
matthey is offline   Reply With Quote
Old 02 May 2012, 02:45   #56
matthey
Registered User
 
Join Date: Jan 2010
Location: Kansas
Posts: 205
Quote:
Originally Posted by matthey View Post
MOVEP, MOVES, PACK, UNPK

They should not be used as they will be slow and the encoding space could be reused at a later time. It would be wise to replace them in any patches.
Quote:
Originally Posted by Thorham View Post
That's a nice way to reduce 680x0 compatibility, great design choice
The trapping emulation code will be in fast flash memory and any use of these instructions shouldn't need to be fast. I doubt you would notice that they are trapped at all.

Quote:
Originally Posted by Thorham View Post
Who cares about 'trapped on 68060'? Is the 68060 now the 680x0 law?
The 68060 made mostly good choices in the direction it chose in order to be fast. Unused instructions sitting around in hardware are slowing down the decoder and taking up space for faster and more useful instructions. Would you rather have a 68040 or 68060? The 68040 supported almost everything 68k integer. If you are happy with that or a 68020/68030 then that's fine. Some of us would rather have more speed instead.

Quote:
Originally Posted by Thorham View Post
Rarely used or not, 680x0 is 680x0, and by going down this road, the Natami 680x0 core won't be very 680x0, now will it?

Perhaps FPGAArcade is a better choice for classic users, seeing how Natami removes what ever it wants so that it can reuse the encoding space. All those extra instructions aren't any good anyway. Software that would work perfectly fine on 680x0, won't work now. Natami is looking less attractive by the second.
The name was changed to Apollo. Maybe we aren't 68k anymore. The ColdFire wasn't 68k and less than 50% of 68k software runs unmodified on it's most recent version. I think the Apollo will be able to run >99.9% of 68k code. I guess it's up to the user to decide though. Some Purists will choose an Amiga 1000 with 68000 as even the 68020/68030 are blasphemy. Motorola changed MOVEM behavior and made MOVE SR,<ea> supervisor mode causing 68000 code to crash on the 68020+. Yep, everything after the 68000 isn't really 68k compatible as the true path does not allow for any incompatibility. I should feel bad for enjoying my 68060 so much I guess .
matthey is offline   Reply With Quote
Old 02 May 2012, 08:22   #57
pmc
is long gone
 
pmc's Avatar
 
Join Date: Apr 2007
Location: London
Posts: 1,590
I won't say anything about what I do or don't think are the rights and wrongs of Natami compatibility or about who I do or don't agree with in the preceding discussion.

The reason I won't is that I can see that my (our) nice cosy 68000 optimisations thread, that up until now has been very useful, will very quickly descend into an off topic Natami compatibility thread.

As it happens I think a thread about Natami CPU compatibility would be very interesting, but please, really do go and make it another thread.
pmc is offline   Reply With Quote
Old 02 May 2012, 10:33   #58
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 114
Quote:
Originally Posted by Photon View Post

The instructions take the cycles they take, and there's no instruction reorder optimizations on 68000 apart from the prefetch after the write to BLTSIZ and the hard-to-know odd-cycle alignment wait of instructions that take 6/10/14 etc cycles.
may you explain to me these two optimizations?
TheDarkCoder is offline   Reply With Quote
Old 02 May 2012, 11:10   #59
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 36
Posts: 1,588
Quote:
Originally Posted by pmc View Post
The reason I won't is that I can see that my (our) nice cosy 68000 optimisations thread, that up until now has been very useful, will very quickly descend into an off topic Natami compatibility thread.
You're absolutely right

Here's one. It clamps d0 to 255 at an overflow. Use after a vlaue has been added to d0:

Code:
	subx.b	d1,d1
	or.b	d1,d0
__________________
Random number generation is the art of producing pure gibberish as quickly as possible.
- Bob Jenkins
Thorham is offline   Reply With Quote
Old 02 May 2012, 22:07   #60
Photon
Oldskool Demo Coder
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Hult / Sweden
Age: 41
Posts: 3,674
Send a message via MSN to Photon
Quote:
Originally Posted by pmc View Post
(this) will very quickly descend into an off topic Natami compatibility thread.

As it happens I think a thread about Natami CPU compatibility would be very interesting, but please, really do go and make it another thread.
I agree. It would be a friendly gesture and perhaps prepare a few programmers to adjust.

CAS was mentioned. The thing is, compatibility has never been about performance but in the case of Amiga, not have to rely on Amiga programmers who have perhaps moved on (in any sense) to update software to not have it become obsolete. It might have already been used in 4K ztravaganza ztrordinairez, or I might do one just to spite you! (Well, not really feeling like it but there's the size aspect of optimization as well.)

There's always been a limit to how many % of software a given ever so compatible Amiga or Amiga successor platform can support, but it's been due to other things than yanking out instructions and more about instruction behavior, caching, speed, and OS code. A compiler can make the programmer forget about the instruction set altogether, but this forum IS basically about the instruction set. And the chipsets.

I'm not poking my finger in a potential success story like Natami and trapping and patching is one way (grumble) for compatibility, but as pointed out for real Amiga users to not use instructions demonstrably working on an Amiga is... not really relevant to optimization which is this topic.

I think specifics of Natami supports points to its nearing conclusion, which is exciting! - and I'll help move any posts you wish regarding it or other nextgen Amiga clones to the NextGen forum.
__________________
Henrik. Programs Amiga demos, iPhone apps, websites, etc.
A1000/512k - A500 2.0/040@28/4M/.5M slowmem/8M/SCSI/CF - A600 portable II 3.1/ACA630/WiFi/CF - 'A1700' 3.1/68060@80/64M/IDE-Fix Express/CF - etc."The difference between PC and Amiga is that 10yo PCs are worth $0. 20yo Amigas are worth a lot, and Amigas that are only 15yo cost a fortune!"
If you like Portal 2, try my >> single player and cooperation maps <<
Photon is offline   Reply With Quote
Old 03 May 2012, 02:00   #61
Photon
Oldskool Demo Coder
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Hult / Sweden
Age: 41
Posts: 3,674
Send a message via MSN to Photon
Quote:
Originally Posted by TheDarkCoder View Post
may you explain to me these two optimizations?
Well, the prefetch feature of the 68000 is no secret. My theory was, this will cause an internal stage and another internal stage XOR a word memory access (total 2x 4 cycles) to be executed before a blit starts. The theory hasn't been tested for much more than DIVSing perspective while drawing lines 23 years ago, basically cos I couldn't find anything else that was more useful that wasn't a normal sub-12 cycle instruction. Sometimes the lines would be shorter than the DIVS cycle time of course, but it was rare. The point was that it was started immediately thus calculated internally in parallel finishing faster.

The other one is aligning table lookups (usually 14c) or a taken branch (10c) and similar with the alternating 4-cycle CMA/DMA memory access. In the vblank period, when no DMA is active it's "as written", just sum up the cycles. But while actually displaying something some of them would just be out of luck and have their CMA execute the NEXT 4-cycle slot the bitplane DMA wasn't hogging access.

Normally this is too much work really (really!) since you can't really go "oh, I'll halve the number of colors on screen and I'll be able to fit one or two CMA's between bitplane accesses" cos you'd have ruined the original idea (by making it look shit) and also would have gained much more already, by removing a bitplane's DMA, both for blitter and CPU.

So I haven't tried this unless I got some routine right by accident so I expect someone is going to debunk it instantly (and thunderously!)

But it's basically the last straw to grip when a frameful of effect is a sequence of perfectly and godlikely optimized instructions (according to yourself of course). Considering you'd likely have to sync with raster at the start of something, you'd probably lose more by that sync than you gained! But there might be a situation. Not one that wouldn't be completely 'surpassed' by precalc or infinite bobs or whatever, of course. Hah.
__________________
Henrik. Programs Amiga demos, iPhone apps, websites, etc.
A1000/512k - A500 2.0/040@28/4M/.5M slowmem/8M/SCSI/CF - A600 portable II 3.1/ACA630/WiFi/CF - 'A1700' 3.1/68060@80/64M/IDE-Fix Express/CF - etc."The difference between PC and Amiga is that 10yo PCs are worth $0. 20yo Amigas are worth a lot, and Amigas that are only 15yo cost a fortune!"
If you like Portal 2, try my >> single player and cooperation maps <<
Photon is offline   Reply With Quote
Old 20 May 2012, 17:26   #62
meynaf
68k wisdom
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon (France)
Age: 40
Posts: 979
A few coding tricks, not specific to 68000 but still ok there i think :
Code:
; sgn - returns d1=0 if d0=0, d1=1 if d0>0, or d1=-1 if d0<0
 add.l d0,d0
 subx.l d1,d1
 negx.l d0
 addx.l d1,d1

; quick-test to check if one byte of d0 is 00
 move.l d0,d1
 not.l d0
 sub.l #$01010101,d1
 and.l #$80808080,d0
 and.l d0,d1
 bne null_byte_found

; check if a=b, (d0=a, d1=b, range 0000-7FFF), but true in all cases if b=$ffff
 eor.w d1,d0
 bgt not_equal

; instead of :
 scs d0
 ext.w d0  ; or extb.l d0
 ext.l d0  ;
; write :
 subx.l d0,d0

; to see if a value is between $FFFF8000 and $7FFF, better put it in An reg :
 cmpa.w a0,a0  ; cmp with a0.w extended to .l, and a0.l
__________________
He who insults the other in a discussion is the one who's wrong.
meynaf is online now   Reply With Quote
Old 25 May 2012, 08:36   #63
Photon
Oldskool Demo Coder
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Hult / Sweden
Age: 41
Posts: 3,674
Send a message via MSN to Photon
Trying to optimize often leads to a few lines of strange new code that does it faster or shorter but looks irrelevant to the task Even though this looks more like good code for implementing variable typing in a higher level language, I liked it.
__________________
Henrik. Programs Amiga demos, iPhone apps, websites, etc.
A1000/512k - A500 2.0/040@28/4M/.5M slowmem/8M/SCSI/CF - A600 portable II 3.1/ACA630/WiFi/CF - 'A1700' 3.1/68060@80/64M/IDE-Fix Express/CF - etc."The difference between PC and Amiga is that 10yo PCs are worth $0. 20yo Amigas are worth a lot, and Amigas that are only 15yo cost a fortune!"
If you like Portal 2, try my >> single player and cooperation maps <<
Photon is offline   Reply With Quote
Old 25 May 2012, 10:28   #64
pmc
is long gone
 
pmc's Avatar
 
Join Date: Apr 2007
Location: London
Posts: 1,590
Quote:
Originally Posted by Photon
Trying to optimize often leads to a few lines of strange new code that does it faster or shorter but looks irrelevant to the task
So true. Leading to the weird experience of looking at some of your very own code and thinking: huh? what the hell was I doing that for?

Followed by a few minutes of groping through hazy memories and realising: oh, yeah, that's why.

It's another reason why I personally find it difficult to nigh on impossible to figure out other people's demo code. They did so many weird little things and optimisations that only they understood the reason for that I've got no chance.

Much easier to code your own effects from scratch than figure out how some other coder did it their way.
pmc is offline   Reply With Quote
Old 25 May 2012, 12:58   #65
phx
Registered User
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 264
That's why most programming languages allow comments.
phx is offline   Reply With Quote
Old 28 May 2012, 09:40   #66
meynaf
68k wisdom
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon (France)
Age: 40
Posts: 979
Quote:
Originally Posted by phx View Post
That's why most programming languages allow comments.
I second that. Whenever i use a "trick" in asm, i add comments for each line.

But, of course, when you re-source a program, comments are gone
__________________
He who insults the other in a discussion is the one who's wrong.
meynaf is online now   Reply With Quote
Old 31 May 2012, 22:31   #67
Lonewolf10
AMOS Extensions Developer
 
Lonewolf10's Avatar
 
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 33
Posts: 927
Some great tips here

Anyone else think this thread is worthy of being made "sticky"?


Regards,
Lonewolf10
__________________
My stuff on Aminet is here

All Square by Digital Dalmatian is here
Lonewolf10 is offline   Reply With Quote
Old 04 June 2012, 09:25   #68
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 114
I second that. But I would suggest to move the thread in the ASM section
TheDarkCoder is offline   Reply With Quote
Old 07 June 2012, 08:47   #69
meynaf
68k wisdom
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon (France)
Age: 40
Posts: 979
Perhaps there is a little bit too much OT here to do that, huh ?
__________________
He who insults the other in a discussion is the one who's wrong.
meynaf is online now   Reply With Quote
Old 15 June 2012, 17:26   #70
pmc
is long gone
 
pmc's Avatar
 
Join Date: Apr 2007
Location: London
Posts: 1,590
Instead of this:
Code:
                    cmpi.l              #num,Dn
this:
Code:
                    moveq.l             #num,Dn
                    cmp.l               Dn,Dn
where (to suit the moveq.l) num is in the range -128 to +127

Last edited by pmc; 16 June 2012 at 11:36. Reason: got Stung
pmc is offline   Reply With Quote
Old 15 June 2012, 21:21   #71
Photon
Oldskool Demo Coder
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Hult / Sweden
Age: 41
Posts: 3,674
Send a message via MSN to Photon
Yes, and one I use a lot is masking or subtracting stuff "to another register". When you do gfx stuff you often go

Code:
moveq #15,d1
and.w d0,d1
which retains the value in d0 should you need to mirror it or something.

Thread moved to asm forum and made sticky 8)
__________________
Henrik. Programs Amiga demos, iPhone apps, websites, etc.
A1000/512k - A500 2.0/040@28/4M/.5M slowmem/8M/SCSI/CF - A600 portable II 3.1/ACA630/WiFi/CF - 'A1700' 3.1/68060@80/64M/IDE-Fix Express/CF - etc."The difference between PC and Amiga is that 10yo PCs are worth $0. 20yo Amigas are worth a lot, and Amigas that are only 15yo cost a fortune!"
If you like Portal 2, try my >> single player and cooperation maps <<
Photon is offline   Reply With Quote
Old 15 June 2012, 23:23   #72
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 4,547
Quote:
Originally Posted by pmc View Post
Instead of this:
Code:
                    cmpi.l              #num,Dn
this:
Code:
                    moveq.l             #num,Dn
                    cmp.l               Dn,Dn
where (to suit the moveq.l) num is in the range -127 to +127
moveq range is -128 to +127.
__________________
Makes me sick when I hear all the shit that you say
So much crap coming out, it must take you all day
There's a space kept in hell with your name on the seat
With a spike in the chair just to make it complete
StingRay is offline   Reply With Quote
Old 16 June 2012, 11:35   #73
pmc
is long gone
 
pmc's Avatar
 
Join Date: Apr 2007
Location: London
Posts: 1,590
Doh! Stung again!

Original post now edited.
pmc is offline   Reply With Quote
Old 16 June 2012, 16:55   #74
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 114
@pmc: you probably meant to keep the small immediate value in a register different to the one against which you do the cmp:

moveq.l #num,Dx
cmp.l Dx,Dn
TheDarkCoder is offline   Reply With Quote
Old 16 June 2012, 17:21   #75
pmc
is long gone
 
pmc's Avatar
 
Join Date: Apr 2007
Location: London
Posts: 1,590
Yes. That was supposed to be implied but, as you say, it reads rather ambiguously. It's much more clear written your way.
pmc is offline   Reply With Quote
Old 20 June 2012, 08:41   #76
pmc
is long gone
 
pmc's Avatar
 
Join Date: Apr 2007
Location: London
Posts: 1,590
Here's on optimisation I worked out ages ago using shifting and adding instead of multiplying.

Maybe obvious or well known to others, or maybe not but well... might be useful to someone

As an example, the eight binary digits in a byte represent the decimal numbers:

128 64 32 16 8 4 2 1

So, multiplying a number by shifting is pretty easy: shift left once to multiply a number by 2 for example.

Code:
                    lsl.w               #1,d0
or shift left by three to multiply a number by 8 etc. etc.

But what about multiplying to other numbers?

I've found that it's possible to do a couple of shifts and an add to multiply to other numbers.

For example, if I wanted to multiply a number by 40 - in the binary digits above, there's no 40 but there are a 32 and an 8.

And, as luck would have it, 32 + 8 = 40

So, if I shift a number left by five (multiply by 32) and take the same number and shift it left by three (multiply by 8) and then add the two results

Code:
                    move.w              d0,d1
                    lsl.w               #3,d0
                    lsl.w               #5,d1
                    add.w               d1,d0
I get the original number in d0 multiplied by 40 but without having to use a comparatively slow mulu.w

It can also work with more than two additions - if I wanted to multiply a number by 56:

Code:
                    move.w              d0,d1
                    move.w              d0,d2
                    lsl.w               #3,d0
                    lsl.w               #4,d1
                    lsl.w               #5,d2
                    add.w               d2,d1
                    add.w               d1,d0
This works for other numbers too - try it and see what works. Obviously there'll be a cut off somewhere where all the shifting and adding will mount up and it might be quicker or no different to use a mulu.w instead.

Also, you do have to watch out for shifting digits "off the end" of the size the registers in use can hold.
pmc is offline   Reply With Quote
Old 20 June 2012, 08:52   #77
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 4,547
On 68000, instead of doing shifts/adds, you could also use a multiplication table. Disadvantage is that you might need a spare address register (depending on where in memory your table is) and that the table needs some memory of course. Advantage is that it is faster than lots of shifts+adds.

So f.e. if you want to multiply a number in d0 by 56 you'd do this:

lea multab(pc),a0
add.w d0,d0
move.w (a0,d0.w),d0
...
__________________
Makes me sick when I hear all the shit that you say
So much crap coming out, it must take you all day
There's a space kept in hell with your name on the seat
With a spike in the chair just to make it complete
StingRay is offline   Reply With Quote
Old 20 June 2012, 09:16   #78
pmc
is long gone
 
pmc's Avatar
 
Join Date: Apr 2007
Location: London
Posts: 1,590
Oh yes, definitely - pre multiplying your values and just dragging them out of a table by index is better than doing calculations in the code but you know, it's nice to have options
pmc is offline   Reply With Quote
Old 20 June 2012, 09:48   #79
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 4,547
Of course it is. Sometimes you don't even have the memory required for the table f.e. I just mentioned it for the sake of completeness.
__________________
Makes me sick when I hear all the shit that you say
So much crap coming out, it must take you all day
There's a space kept in hell with your name on the seat
With a spike in the chair just to make it complete
StingRay is offline   Reply With Quote
Old 20 June 2012, 10:24   #80
Codetapper
Moderator
 
Codetapper's Avatar
 
Join Date: May 2001
Location: Auckland / New Zealand
Age: 38
Posts: 2,463
Send a message via Skype™ to Codetapper
If you want to multiply by "nice" numbers like 40 on a 68000 and not use a multiplication table, you should do the following to avoid a second slow lsl operation rather than pmc's method:

Code:
        lsl.w   #3,d0   ;d0 = Number * 8
        move.w  d0,d1   ;d1 = Number * 8
        add.w   d1,d1   ;d1 = Number * 16
        add.w   d1,d1   ;d1 = Number * 32
        add.w   d1,d0   ;d0 = Number * 40
If shifting left by 2 or less, it's quicker to do 2 add's than a shift (on 68000 only!) If you wanted to multiply by 48 for example, it's actually quicker than to multiply by 40:

Code:
        lsl.w   #4,d0   ;d0 = Number * 16
        move.w  d0,d1   ;d1 = Number * 16
        add.w   d1,d1   ;d1 = Number * 32
        add.w   d1,d0   ;d0 = Number * 48
To multiply by 56:

Code:
        lsl.w   #3,d0   ;d0 = Number * 8
        move.w  d0,d1   ;d1 = Number * 8
        add.w   d1,d1   ;d1 = Number * 16
        move.w  d1,d2   ;d2 = Number * 16
        add.w   d2,d2   ;d2 = Number * 32
        add.w   d2,d1   ;d1 = Number * 48
        add.w   d1,d0   ;d0 = Number * 56
The other alternative is to use the fact that 56 is 64 - 8:

Code:
        lsl.w   #3,d0   ;d0 = Number * 8
        move.w  d0,d1   ;d1 = Number * 8
        lsl.w   #3,d0   ;d0 = Number * 64
        sub.w   d1,d0   ;d0 = Number * 56
Shifts of 3 or more are faster with a shift operation than 3 add's.

Last edited by Codetapper; 20 June 2012 at 10:44.
Codetapper is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
New opcode for 68000 family clenched request.UAE Wishlist 15 14 April 2009 15:02
Looking for 68000 CPU Connector BinoX support.Hardware 6 11 June 2007 13:01
quitting on 68000? Hungry Horace project.WHDLoad 60 19 December 2006 20:17
68000 Div/divu dlfrsilver support.WinUAE 3 01 November 2005 11:31
3D code and/or internet code for Blitz Basic 2.1 EdzUp Retrogaming General Discussion 0 10 February 2002 11:40


All times are GMT +2. The time now is 07:59.

-->

Powered by vBulletin® Version 3.7.0
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Page generated in 0.56310 seconds with 11 queries