68k details - Page 34

Don_Adan · 04 November 2018, 21:30

Quote:

Originally Posted by plasmab

Ok but for the sake of me being stupid.. i dont understand how you can jump more than plus or minus 32768 without something helping you out and either patching the jump or putting the destinations in a table. either way thats not relocatable.

I'm not contesting it cant be done.. just cant see how.

Start
lea Start(PC),A0
add.l #LongJump-Start,A0
jmp (A0)

ds.b 100000

LongJump
rts

plasmab · 04 November 2018, 21:43

Quote:

Originally Posted by Don_Adan

Start
lea Start(PC),A0
add.l #LongJump-Start,A0
jmp (A0)

ds.b 100000

LongJump
rts

That will work. Rather hacky. but it will work. I guess you'd use it very sparingly.

StingRay · 04 November 2018, 21:46

Too bad Don Adan presented the solution, I wanted you to think about it a bit! And there is nothing hacky about it!

plasmab · 04 November 2018, 21:48

Quote:

Originally Posted by StingRay

Too bad Don Adan presented the solution, I wanted you to think about it a bit! And there is nothing hacky about it!

My brain is engaged in other amiga things.... way more hacky than this.. but i still think thats pretty nasty.

ross · 04 November 2018, 22:23

I usually do this way:

Code:

l   move.l  #farcode-l,d0
    jmp     l(pc,d0.l)

    ds.b 100000

farcode nop ;my dist>32k code

plasmab · 04 November 2018, 22:30

Quote:

Originally Posted by ross

I usually do this way:

Code:

l   move.l  #farcode-l,d0
    jmp     l(pc,d0.l)

    ds.b 100000

farcode nop ;my dist>32k code

Thats much cleaner.

The previous code example did pointer arithmetic to get the correct address.

ross · 04 November 2018, 22:39

Off course the same concept of Don's code.

Only 2 bytes shorter (but use a spare register).

It can also be done in other ways, like with indexed jump tables.

Don_Adan · 04 November 2018, 22:40

Quote:

Originally Posted by StingRay

Too bad Don Adan presented the solution, I wanted you to think about it a bit! And there is nothing hacky about it!

Sorry.

plasmab · 04 November 2018, 22:42

Quote:

Originally Posted by ross

Off course the same concept of Don's code.

Only 2 bytes shorter (but use a spare register).

It can also be done in other ways, like with indexed jump tables.

All of these things are effectively hacks to work around the fact that the cpu doesnt have a long relative jump. You're hand rolling the bit thats missing int he CPU. The techniques would work pretty much the same on any CPU.

If the code was properly relocatable you wouldn't have to do this.

Don_Adan · 04 November 2018, 22:44

Quote:

Originally Posted by plasmab

That will work. Rather hacky. but it will work. I guess you'd use it very sparingly.

Nothing hacky, and this is only one example, exist more options f.e pea version. Often similar code is used to access routines without direct branch/jump f.e as code for games/utils protection.

Don_Adan · 04 November 2018, 22:58

Quote:

Originally Posted by plasmab

All of these things are effectively hacks to work around the fact that the cpu doesnt have a long relative jump. You're hand rolling the bit thats missing int he CPU. The techniques would work pretty much the same on any CPU.

If the code was properly relocatable you wouldn't have to do this.

You can use bra.l /bsr.l for 68020+. Or create own relocatable routine called at begining.

plasmab · 04 November 2018, 23:17

bxx.l is fine. I'm utterly happy with the hacky way.. just please dont pretend it isnt hacky.

Don_Adan · 04 November 2018, 23:20

Quote:

Originally Posted by plasmab

bxx.l is fine. I'm utterly happy with the hacky way.. just please dont pretend it isnt hacky.

Seems you dont see hacky code. Hacky can be f.e RNC copylock coder, but no this one mentioned by me or by Ross. I think that you have very minimalistic knowledge about 68k coding. Try to resource 10MB code from different 68k platforms and you will maybe understand which code is hacky.

plasmab · 04 November 2018, 23:33

Quote:

Originally Posted by Don_Adan

Seems you dont see hacky code. Hacky can be f.e RNC copylock coder, but no this one mentioned by me or by Ross. I think that you have very minimalistic knowledge about 68k coding. Try to resource 10MB code from different 68k platforms and you will maybe understand which code is hacky.

I review code for a living. I see hacks every day. And i listen to the excuses and BS from developers who try to pretend those arent hacks. The better coders are the ones that at least admit the code is hacky.

hth313 · 05 November 2018, 01:18

Quote:

Originally Posted by plasmab

All of these things are effectively hacks to work around the fact that the cpu doesnt have a long relative jump. You're hand rolling the bit thats missing int he CPU. The techniques would work pretty much the same on any CPU.

If the code was properly relocatable you wouldn't have to do this.

Do you mean "position independent" when you say "properly relocatable"?

If you do mean relocatable, a decent loader (or linker if generating for a fixed address space) should be able to relocate the destination in a

JSR.L

.

So I suppose you are talking about position independent then...

frank_b · 05 November 2018, 06:24

Quote:

Originally Posted by litwr

I have also read a quite interesting cite about 68000 in the very solid Byte magazine recently - https://archive.org/details/byte-mag...5-09/page/n197

There's another interesting article earlier on in that issue. A system designer discounting the 286 out of hand because it's "at least a generation behind the 68000".

StingRay · 05 November 2018, 11:41

Quote:

Originally Posted by plasmab

but i still think thats pretty nasty.

What exactly is nasty about perfectly valid code?

roondar · 05 November 2018, 12:33

Quote:

Originally Posted by frank_b

There's another interesting article earlier on in that issue. A system designer discounting the 286 out of hand because it's "at least a generation behind the 68000".

Don't be silly now, litwr clearly never accepts anything that supports the 68000 as being as good or better than 8086 as he has determined that any such information is clearly biased nonsense. Obviously, only articles that detract from the 68000 and praise Intel are accurate and the rest should be ignored

Quote:

Originally Posted by litwr

I can't agree, this feature of DOS relies very hard on the segment registers which are the part of ISA giving some superiority to 8086. I can also mention CP/M-86, MP/M-86, ...

68k have had a lot of OS and no one used headerless format, so IMHO it was rather not so easy as you can think. However I am ready not to count a header's bytes of 68020 code into account. Even though it is not 100% fair for x86, it is a clear handicap for 68k.

Kalms and others have already argued my point quite well here. Suffice to say I don't agree with the assertion it'd be hard. Writing PC relative code on 68000 is not difficult. It even gets you smaller (and sometimes faster) code.

Quote:

Let's look at http://www.roylongbottom.org.uk/mips.htm#anchorAcorn

We can take several lines there.

Code:

ARCHIMEDES          ARM2       8      4.5
MOMENTUM 21096      68020      20      6
42/40               68030      33      8
AMS/5000            80486      25     15
QI PCi              80386      25      5
VX FTserver         80486      25     15
6386E/33            80386      33     7.7
6386/25             80386      25     6.9

Then we can project the next lines

Code:

ARM     12     6.8
80386   25     6.9
80386   33     7.7
68030   33     8
ARM     25     14
80486   25     15

A couple of things about your list:
1) you forgot to mention the 68040 result, which is significantly higher than the 80486 result. It scores 21 MIPS in your list (see manufacturer Motorola).

2) you can't just uprate the ARM2 to 25MHz. At the time we're talking about, there was no memory fast enough to service such a chip. Which is exactly why there was no ARM2@25MHz and also why the ARM3 (which does do 25MHz) has a 4KB cache.

Interestingly, this seems to be the only real difference between the ARM2 and ARM3. More interestingly (but expected as the memory the AMR3 uses has to be slower than it would need to be in order to prevent wait states), the ARM3@25MHz has a manufacturer's speed claim of 12 MIPS, not 14 as you extrapolated (https://en.wikipedia.org/wiki/List_o...oarchitectures).

3) the 486 result seems really low, I have seen claims of 20 MIPS@25MHz elsewhere. Which would make some sense as this chip was seen to be competitive with the 68040 and that clearly wouldn't have been the case if it ran upwards of 30% slower.

I suspect the 486's named in this list are actually mostly the 486SL 'low cost' version (which was released somewhat after the 486 itself), as opposed to the 486DX (which is the original released version). The 'original' 486 has a MIPS rating of 20 according to other sources.

For instance, see this quote from http://lowendmac.com/2014/cpus-intel-80486/
"Byte magazine (May 1993) notes that the 486 has a MIPS (million instructions per second) rating of 20 at 25 MHz and 54 at 66 MHz."

However, it doesn't really matter anyway, as the claim that the 12MHz ARM2 was competitive with a 25MHz 486 is still plainly false.

Quote:

They show that ARM is a bit slower than 80486 and at 12 MHz it is even slower than 80386 @25Mhz.

They show the ARM2@12MHz is at best 45% of the speed of a 25MHz 486. Or, perhaps easier to grasp, the 486 is at least 2x the speed of the ARM2.

Calling that difference 'a bit slower' is disingenuous at best.

They also show that the ARM2@12MHz is slower than both the 386@33MHz and the 68030@33MHz (let alone one at 50Mhz). Which conforms exactly to what I stated at the start of our little exchange about the ARM2.

Quote:

However IMHO these results are rather biased. There were so no good compilers for ARM as for 68k or x86. Look at https://news.ycombinator.com/item?id=17793878 - it shows that even with FP Archimedes can be faster. Indeed very fast hardware division of x86 could also change the picture. Maybe I don't have 100% proof but I almost sure that ARM@25MHz can outperform 80486@25MHz with integers without division, for example, with line drawing algorithm discussed in this thread. I also almost sure that ARM@12MHz can outperform 80386@33MHz. I have just made approximate clocks calculation for the line drawing main loop. It takes 52 cycles for 80386, 24 cycles for 80486, and only 14 cycles for ARM and some of the ARM's cycles are the idle S-cycles. Sorry I am not very proficient with 68k so I dare to ask somebody to count 68000/68020 clocks.

The article you linked through also supports my position and not yours as it claims that the ARM2@8MHz challenges, but does not always beat, a 16MHz 386 at integer tasks. And loses at floating point heavy tasks & sorting. Extrapolating that to a 33MHz 386 and a 12MHz ARM2 would give you a 50% bonus for the ARM, but a 100% bonus for the 386 => the 33MHz 386 should be faster and that is exactly what we already knew from the tables above.

Even if you look at the rather impressive Dhrystone results of the ARM2@8 vs the 386@16, scaling these up to 12 and 33 MHZ still has the ARM2 lose.

In other words, the evidence you managed to find does not support any of your claims and in fact supports everything I've said, but you're going to continue claiming your earlier opinions are probably correct anyway. Got ya.

And seriously, approximate cycle counts for an untested bit of code? What use are those exactly (I mean, exact cycle counts might be useful, but approximate seems rather useless)? And what exactly does one tiny algorithm prove? (answer: nothing, really! It might be an outlier and considering other benchmarks disagree with these results, it is actually likely that it is an outlier)

Lastly, I want to stress (again) that I actually really like the ARM2's and think they offered great performance. However, I just feel that it's best to remain honest about the pro's and con's and not get carried away with opinions over facts. As nice as these CPU's were, they were not actually as fast as you've claimed.

-----
And all of this is without accounting for the fact we're comparing the wrong CPU's. As I researched (ok, Googled

) this post, I found the 1991 Archimedes at GBP999 was not running a 12MHz ARM2. It was in fact the A5000 running a 25MHz ARM3*. Which indeed gets a lot closer to the 486/68040 running at the same speed, although the ARM3 MIPS rating is still clearly lower than either of these two.

However, the given price of the A5000 did not include a hard disk or monitor, where the 486 I quoted did have a monitor and hard disk. As such, I'm still not convinced about the price/performance ratio being in the Acorns favour.

*) The 12MHz variation seems to be the A3010, which was released in 1992 for GBP499. There may in fact be other 12MHz variants prior to 1991, but the information on what is actually in the the various Archimedes models is somewhat scarce. However, even if they did exist, all potential candidates prior to 1991 were a lot more expensive than the GBP999 A5000.

plasmab · 05 November 2018, 12:38

Quote:

Originally Posted by StingRay

What exactly is nasty about perfectly valid code?

Many things..

The OS helps you out and does this for you. So why hand roll it?

Second.. you’re probably doing it wrong if you are mixing code and data to the extent you need to. That’s what sections are for. Or hard disks!

Valid does not mean something isn’t a hack.

roondar · 05 November 2018, 13:24

Without getting into the 'is this specific bit of code a hack' business (as I feel that is rather subjective and both sides of that argument make sense), I do wonder what else jmp d(pc,ix.l)/jsr d(pc,ix.l) could've been meant for.

To me it does seem to be designed for the purpose of getting around short-range branches while retaining 'address independence'. After all, you really shouldn't need a long index for jump tables.

And again, I really don't care about the hack-vs-non-hack aspect here - I'm just interested in figuring out the reason for designing it as is.

Edit: I do agree using the OS is generally the better option (silly hardware banging code like I sometimes write excluded as that *is* indeed rather hacky

), which is one extra reason to not want .COM files

04 November 2018, 22:23	#665
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	I usually do this way: Code: l move.l #farcode-l,d0 jmp l(pc,d0.l) ds.b 100000 farcode nop ;my dist>32k code

05 November 2018, 13:24	#680
roondar Registered User Join Date: Jul 2015 Location: The Netherlands Posts: 3,409	Without getting into the 'is this specific bit of code a hack' business (as I feel that is rather subjective and both sides of that argument make sense), I do wonder what else jmp d(pc,ix.l)/jsr d(pc,ix.l) could've been meant for. To me it does seem to be designed for the purpose of getting around short-range branches while retaining 'address independence'. After all, you really shouldn't need a long index for jump tables. And again, I really don't care about the hack-vs-non-hack aspect here - I'm just interested in figuring out the reason for designing it as is. Edit: I do agree using the OS is generally the better option (silly hardware banging code like I sometimes write excluded as that is indeed rather hacky ), which is one extra reason to not want .COM files Last edited by roondar; 05 November 2018 at 13:33.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Any software to see technical OS details?	necronom	support.Other	3	02 April 2016 12:05
2-star rarity details?	stet	HOL suggestions and feedback	0	14 December 2015 05:24
EAB's FTP details...	Basquemactee1	project.Amiga File Server	2	30 October 2013 22:54
req details for sdl	turrican3	request.Other	0	20 April 2008 22:06
Forum Details	BippyM	request.Other	0	15 May 2006 00:56

04 November 2018, 21:46	#663
StingRay move.l #$c0ff33,throat Join Date: Dec 2005 Location: Berlin/Joymoney Posts: 6,863	Too bad Don Adan presented the solution, I wanted you to think about it a bit! And there is nothing hacky about it!

04 November 2018, 22:39	#667
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	Off course the same concept of Don's code. Only 2 bytes shorter (but use a spare register). It can also be done in other ways, like with indexed jump tables.

04 November 2018, 23:17	#672
plasmab Banned Join Date: Sep 2016 Location: UK Posts: 2,917	bxx.l is fine. I'm utterly happy with the hacky way.. just please dont pretend it isnt hacky.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)