VASM wrong assembling?

deadwood · 30 November 2014, 21:48

Hi,

I have a portion of assembler code generated by gcc:

Code:

MakeGadgets:
	lea (-304,%sp),%sp
	movem.l #16190,-(%sp)

A disassembled .o file assembled by GAS shows this:

Code:

00001ab0 <MakeGadgets>:
    1ab0:	4fef fed0      	lea %sp@(-304),%sp
    1ab4:	48e7 3f3e      	moveml %d2-%d7/%a2-%fp,%sp@-

A disassembled .o file assembled by VASM (elf, std) shows this:

Code:

00001d9c <MakeGadgets>:
    1d9c:	4fef fed0      	lea %sp@(-304),%sp
    1da0:	48e7 7cfc      	moveml %d1-%d5/%a0-%a5,%sp@-

In both cases the return is:

Code:

	movem.l (%sp)+,#31996
	lea (304,%sp),%sp
	rts

Code:

    2096:	4cdf 7cfc      	moveml %sp@+,%d2-%d7/%a2-%fp
    209a:	4fef 0130      	lea %sp@(304),%sp
    209e:	4e75           	rts

So VASM is saving and restoring different registers

Any idea why this is happening? Using VASM 1.7a.

NOTE: I'm m68k assembler newbie, let me know if I did something obviously wrong.

TheDarkCoder · 01 December 2014, 09:13

according to standard Motorola syntax, movem does not support the immediate addressing mode.
I.e., IMHO movem.l #some value,-(sp) should give error.

I imagine that the interpretation of such a code should be that the immediate operand encodes which registers are involved. I think it's bad practice for gcc to generate such code

phx · 01 December 2014, 12:48

Yes, this is one of those gas "extensions", because it is probably easier for the compiler to output the register list as bit mask.

Vasm supports it, but obviously I didn't expect that you have to reverse the bitmask yourself, when a pre-decrement addressing mode is used.

This can be easily fixed in cpus/m68k/opcodes.h by replacing D2R (reversed) with D16 for QI (quick immediate) addressing modes:

Code:

diff -r1.33 opcodes.h
755c755
<   "movem",    {QI,PA},      {{D2R,SEA},        {0x4880,0},2|WL|S_WL6,m68000up},
---
>   "movem",    {QI,PA},      {{D16,SEA},        {0x4880,0},2|WL|S_WL6,m68000up},
775c775
<   "movm",     {QI,PA},      {{D2R,SEA},        {0x4880,0},2|WL|S_WL6,m68000up},
---
>   "movm",     {QI,PA},      {{D16,SEA},        {0x4880,0},2|WL|S_WL6,m68000up},

Fix is committed and available with tomorrows snapshot. Thanks for reporting!

deadwood · 01 December 2014, 18:56

Thanks for the fix!

More questions

1) .word is treated by VASM as 32bit, while GAS treats it as 16bit. Is this known issue?

2) Any chance of adding .balign[w\l] directives?

phx · 02 December 2014, 12:13

Quote:

Originally Posted by deadwood

1) .word is treated by VASM as 32bit, while GAS treats it as 16bit. Is this known issue?

AFAIK this kind of directive (also .int, .long, etc.) is target-dependant. I don't really know which size GAS prefers on which target. Many years ago I implemented those directives to match the PPC architecture, where .word definitely is 4 bytes (at least for 32-bit PPCs).
I don't understand why .word should be 16 bits for M68k, as it is also a 32-bit CPU. Or does it change for >=68020? I can improve the current implementation, but I need more information.

Quote:

2) Any chance of adding .balign[w\l] directives?

.balign should already work, although the third argument is ignored (do you need that?).
.balignw and .balignl are missing indeed. I'm working on it.

robinsonb5 · 02 December 2014, 13:28

Quote:

Originally Posted by phx

I don't understand why .word should be 16 bits for M68k, as it is also a 32-bit CPU.

Simply that in M68K nomenclature a word is 16 bits, as in move.w, add.w, etc, whereas a longword is 32-bits - move.l, etc. Thus any context in which something called just "word" is 32-bits is going to surprise M68K veterans!

Lonewolf10 · 02 December 2014, 20:04

Quote:

Originally Posted by robinsonb5

Thus any context in which something called just "word" is 32-bits is going to surprise M68K veterans!

Agreed, on Amiga's having a word that is anything other than 16-bits will confuse alot of people.

For those of us aware of older computers and mainframes (pre 1980s), a 'word' can be as much as 64 (72?) bits!

mc6809e · 02 December 2014, 20:09

Quote:

Originally Posted by Lonewolf10

Agreed, on Amiga's having a word that is anything other than 16-bits will confuse alot of people.

For those of us aware of older computers and mainframes (pre 1980s), a 'word' can be as much as 64 (72?) bits!

I'm a big fan of 48-bit words.

I'm surprised they haven't come back, given how often computers are used in multimedia. 48 bits is a convenient size for all sorts of things.

Megol · 02 December 2014, 21:44

Quote:

Originally Posted by mc6809e

I'm a big fan of 48-bit words.

I'm surprised they haven't come back, given how often computers are used in multimedia. 48 bits is a convenient size for all sorts of things.

Yes but why stop there? 64 bits is more natural extension for byte=8bits machines and is also pretty convenient.

mc6809e · 02 December 2014, 22:34

Quote:

Originally Posted by Megol

Yes but why stop there? 64 bits is more natural extension for byte=8bits machines and is also pretty convenient.

Well, as a member of the silicon conservation society, I object to the wastefulness of 64 bit datums!

Seriously, though, there is a penalty for using bigger values. Cache is too valuable to be filled with so many leading zeros!

48 bits is nice for audio processing -- you can process a 24-bit audio stereo pair.

48 bits is also nice for image processing when you need high dynamic range. 16 bits per channel is very convenient.

And most DACs max out at 24-bit resolution, so most processing of real world data is nicely paired with a 48-bit word.

Yeah, I know. I'm nutz.

Check out some of the FPGAs and DSPs out there, though. You'll find a surprising amount of support for 24/48 bit processing.

wawa · 02 December 2014, 23:09

sorry guys but cant this thread remain preserved for its genuine purpose?

phx · 03 December 2014, 10:29

Quote:

Originally Posted by robinsonb5

Simply that in M68K nomenclature a word is 16 bits, as in move.w, add.w, etc, whereas a longword is 32-bits - move.l, etc. Thus any context in which something called just "word" is 32-bits is going to surprise M68K veterans!

Ok. So a "word" is 16 bits, because the 68000 worked with 16-bit words on the data bus, and this nomenclature didn't change with the 68020.
Now I only need a list, for all CPUs supported by vasm, which defines the "traditional" word-size. Also 16-bits for x86?

I agree that I have to fix that, of course.

deadwood · 03 December 2014, 20:16

Quote:

Originally Posted by phx

.balign should already work, although the third argument is ignored (do you need that?).
.balignw and .balignl are missing indeed. I'm working on it.

Yup, in my use case third argument is given. Thanks for looking into .balignw and .balignl.

deadwood · 03 December 2014, 20:52

Also, I sometimes get this warning:

Code:

warning 1007 in line 3 of "/tmp/ccFFZW8l.s": scratch at end of line
>	.section	.rodata.str1.1,"aMS",@progbits,1

matthey · 04 December 2014, 22:20

Quote:

Originally Posted by phx

Ok. So a "word" is 16 bits, because the 68000 worked with 16-bit words on the data bus, and this nomenclature didn't change with the 68020.
Now I only need a list, for all CPUs supported by vasm, which defines the "traditional" word-size. Also 16-bits for x86?

A CPU word size can either mean the natural operation and register size of a CPU, the original size of a word on an older version of the CPU or whatever size the ISA/ABI is calling a word. It's confusing and inconsistent as the wiki "Word (Computer Architecture)" thinks a 68k word is 32 bits after explaining about VAX and PDP-11 naming conventions which the 68k names are derived from.

http://en.wikipedia.org/wiki/Word_%2...rchitecture%29

Fortunately, GAS appears to use the consistent 68k/VAX/PDP-11 definition which is always word=16 bits (2 bytes) and longword=32 bits (4 bytes).

Quote:

7.6 .balign[wl] abs-expr, abs-expr, abs-expr

Pad the location counter (in the current subsection) to a particular storage boundary. The first expression (which must be absolute) is the alignment request in bytes. For example `.balign 8' advances the location counter until it is a multiple of 8. If the location counter is already a multiple of 8, no change is needed.
The second expression (also absolute) gives the fill value to be stored in the padding bytes. It (and the comma) may be omitted. If it is omitted, the padding bytes are normally zero. However, on some systems, if the section is marked as containing code and the fill value is omitted, the space is filled with no-op instructions.
The third expression is also absolute, and is also optional. If it is present, it is the maximum number of bytes that should be skipped by this alignment directive. If doing the alignment would require skipping more bytes than the specified maximum, then the alignment is not done at all. You can omit the fill value (the second argument) entirely by simply using two commas after the required alignment; this can be useful if you want the alignment to be filled with no-op instructions when appropriate.
The .balignw and .balignl directives are variants of the .balign directive. The .balignw directive treats the fill pattern as a two byte word value. The .balignl directives treats the fill pattern as a four byte longword value. For example, .balignw 4,0x368d will align to a multiple of 4. If it skips two bytes, they will be filled in with the value 0x368d (the exact placement of the bytes depends upon the endianness of the processor). If it skips 1 or 3 bytes, the fill value is undefined.

https://www.sourceware.org/binutils/docs-2.10/as_7.html

phx · 05 December 2014, 13:20

Quote:

Originally Posted by deadwood

Also, I sometimes get this warning:

Code:

warning 1007 in line 3 of "/tmp/ccFFZW8l.s": scratch at end of line
>    .section    .rodata.str1.1,"aMS",@progbits,1

Vasm's std syntax module only reads the section name and attributes. The rest is ignored. I think @progbits is ELF specific. Don't know what the last argument means.

phx · 05 December 2014, 13:21

Quote:

Originally Posted by matthey

Fortunately, GAS appears to use the consistent 68k/VAX/PDP-11 definition which is always word=16 bits (2 bytes) and longword=32 bits (4 bytes).

It is not consistent. For example .word generates 32-bit constants for ARM.

phx · 05 December 2014, 13:24

Quote:

Originally Posted by deadwood

Yup, in my use case third argument is given. Thanks for looking into .balignw and .balignl.

Ok. Done. Tomorrows snapshot includes support for .balignw, .balignl, .p2alignw and .p2alignl. Also the third argument (maximum number of padding bytes) is now evaluated and working.
And, for the moment, I changed .word to emit 16-bit constants, until I implement a better, backend-dependant, solution.

deadwood · 06 December 2014, 19:44

Thank you! I'll tests those changes shortly. For now I think I found a problem with weak symbols.

Here is original code as generated by GCC:

Code:

	pea __LIBS_LIST__
	jsr _set_open_libraries_list
	addq.l #8,%sp
	tst.l %d0
	jeq .L10
	move.l %a6,-(%sp)
	pea 1.w
	pea 1.w
	pea __INIT_LIST__
	jsr _set_call_funcs

Here is disassembled code, generated by vasm:

Code:

  88:	4879 0000 0000 	pea 0 <Workbench_3_Workbench_ExpungeLib>
			8a: R_68K_32	__LIBS_LIST__
  8e:	4eb9 0000 0000 	jsr 0 <Workbench_3_Workbench_ExpungeLib>
			90: R_68K_32	_set_open_libraries_list
  94:	508f           	addql #8,%sp
  96:	4a80           	tstl %d0
  98:	6700 008a      	beqw 124 <Workbench_InitLib+0xc4>
  9c:	2f0e           	movel %fp,%sp@-
  9e:	4878 0001      	pea 1 <Workbench_3_Workbench_ExpungeLib+0x1>
  a2:	4878 0001      	pea 1 <Workbench_3_Workbench_ExpungeLib+0x1>
  a6:	4879 0000 00b8 	pea b8 <Workbench_InitLib+0x58>
			a8: R_68K_32	.rodata+0xb8
  ac:	4eb9 0000 0000 	jsr 0 <Workbench_3_Workbench_ExpungeLib>
			ae: R_68K_32	_set_call_funcs

Here is disassembled code, generated by gas:

Code:

  88:	4879 0000 0000 	pea 0 <Workbench_3_Workbench_ExpungeLib>
			8a: R_68K_32	__LIBS_LIST__
  8e:	4eb9 0000 0000 	jsr 0 <Workbench_3_Workbench_ExpungeLib>
			90: R_68K_32	_set_open_libraries_list
  94:	508f           	addql #8,%sp
  96:	4a80           	tstl %d0
  98:	6700 0082      	beqw 11c <Workbench_InitLib+0xbc>
  9c:	2f0e           	movel %fp,%sp@-
  9e:	4878 0001      	pea 1 <Workbench_3_Workbench_ExpungeLib+0x1>
  a2:	4878 0001      	pea 1 <Workbench_3_Workbench_ExpungeLib+0x1>
  a6:	4879 0000 0000 	pea 0 <Workbench_3_Workbench_ExpungeLib>
			a8: R_68K_32	__INIT_LIST__
  ac:	4eb9 0000 0000 	jsr 0 <Workbench_3_Workbench_ExpungeLib>
			ae: R_68K_32	_set_call_funcs

There difference is in offset a6. __INIT_LIST__ is a weak symbol defined in the same compilation unit as the code:

Code:

	.weak	__INIT_LIST__
	.align	2
	.type	__INIT_LIST__, @object
	.size	__INIT_LIST__, 8
__INIT_LIST__:
	.skip	8

In this case however, the non-weak __INIT_LIST__ symbol comes from another compilation unit. GAS code shows the symbol, while VASM shows an offset of local .rodata section. The non-weak __INIT_LIST__ in this case contains functions that should be ran at library init. I checked and in case of VASM version they are indeed not ran.

If I changed the __INIT_LIST__ to be external non-weak, things started to work as they should.

phx · 06 December 2014, 22:45

Yes, that's a serious problem with references to weak symbols. Thanks for reporting. It was never much tested, because usually AmigaOS and MorphOS programs don't need them.

Unlike gas, vasm always tries to convert relocations with defined symbols into section+offset, so the symbol is no longer needed. Doing that with weak symbols is fatal of course, because the linker may change the reference.

Fixed for aout and ELF formats. Please try tomorrows snapshot.

03 December 2014, 20:52	#14
deadwood Registered User Join Date: Nov 2014 Location: Poland Posts: 72	Also, I sometimes get this warning: Code: warning 1007 in line 3 of "/tmp/ccFFZW8l.s": scratch at end of line > .section .rodata.str1.1,"aMS",@progbits,1

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
vasm question	marduk_kurios	Coders. Asm / Hardware	7	14 February 2014 20:06
Assembling Gravity Force 2 source code	absence	Coders. General	5	13 May 2012 11:44
[REQ:ASM] Assembling and running	jman	Coders. Tutorials	9	07 May 2011 18:39
Devpac and assembling for absolute addresses	h0ffman	Coders. General	10	21 March 2011 19:12
vasm 1.5 RFC	phx	Coders. General	30	11 December 2010 02:08

30 November 2014, 21:48	#1
deadwood Registered User Join Date: Nov 2014 Location: Poland Posts: 72	VASM wrong assembling? Hi, I have a portion of assembler code generated by gcc: Code: MakeGadgets: lea (-304,%sp),%sp movem.l #16190,-(%sp) A disassembled .o file assembled by GAS shows this: Code: 00001ab0 <MakeGadgets>: 1ab0: 4fef fed0 lea %sp@(-304),%sp 1ab4: 48e7 3f3e moveml %d2-%d7/%a2-%fp,%sp@- A disassembled .o file assembled by VASM (elf, std) shows this: Code: 00001d9c <MakeGadgets>: 1d9c: 4fef fed0 lea %sp@(-304),%sp 1da0: 48e7 7cfc moveml %d1-%d5/%a0-%a5,%sp@- In both cases the return is: Code: movem.l (%sp)+,#31996 lea (304,%sp),%sp rts Code: 2096: 4cdf 7cfc moveml %sp@+,%d2-%d7/%a2-%fp 209a: 4fef 0130 lea %sp@(304),%sp 209e: 4e75 rts So VASM is saving and restoring different registers Any idea why this is happening? Using VASM 1.7a. NOTE: I'm m68k assembler newbie, let me know if I did something obviously wrong.

01 December 2014, 09:13	#2
TheDarkCoder Registered User Join Date: Dec 2007 Location: Dark Kingdom Posts: 213	according to standard Motorola syntax, movem does not support the immediate addressing mode. I.e., IMHO movem.l #some value,-(sp) should give error. I imagine that the interpretation of such a code should be that the immediate operand encodes which registers are involved. I think it's bad practice for gcc to generate such code

01 December 2014, 12:48	#3
phx Natteravn Join Date: Nov 2009 Location: Herford / Germany Posts: 2,555	Yes, this is one of those gas "extensions", because it is probably easier for the compiler to output the register list as bit mask. Vasm supports it, but obviously I didn't expect that you have to reverse the bitmask yourself, when a pre-decrement addressing mode is used. This can be easily fixed in cpus/m68k/opcodes.h by replacing D2R (reversed) with D16 for QI (quick immediate) addressing modes: Code: diff -r1.33 opcodes.h 755c755 < "movem", {QI,PA}, {{D2R,SEA}, {0x4880,0},2\|WL\|S_WL6,m68000up}, --- > "movem", {QI,PA}, {{D16,SEA}, {0x4880,0},2\|WL\|S_WL6,m68000up}, 775c775 < "movm", {QI,PA}, {{D2R,SEA}, {0x4880,0},2\|WL\|S_WL6,m68000up}, --- > "movm", {QI,PA}, {{D16,SEA}, {0x4880,0},2\|WL\|S_WL6,m68000up}, Fix is committed and available with tomorrows snapshot. Thanks for reporting!

01 December 2014, 18:56	#4
deadwood Registered User Join Date: Nov 2014 Location: Poland Posts: 72	Thanks for the fix! More questions 1) .word is treated by VASM as 32bit, while GAS treats it as 16bit. Is this known issue? 2) Any chance of adding .balign[w\l] directives?

02 December 2014, 23:09	#11
wawa Registered User Join Date: Aug 2007 Location: berlin/germany Posts: 1,054	sorry guys but cant this thread remain preserved for its genuine purpose?

06 December 2014, 22:45	#20
phx Natteravn Join Date: Nov 2009 Location: Herford / Germany Posts: 2,555	Yes, that's a serious problem with references to weak symbols. Thanks for reporting. It was never much tested, because usually AmigaOS and MorphOS programs don't need them. Unlike gas, vasm always tries to convert relocations with defined symbols into section+offset, so the symbol is no longer needed. Doing that with weak symbols is fatal of course, because the linker may change the reference. Fixed for aout and ELF formats. Please try tomorrows snapshot.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)