Maybe (just maybe), I've found (and fixed) something crucial, precisely about the part I highlighted in the previous post.
For sending the ACK/NACK after reading, I was doing this:
1. make SDA output
2. pull SCL low
3. drive SDA as needed
4. pull SCL high
For receiving the ACK/NACK after writing, I was doing this:
1. make SDA input
2. pull SCL low
3. pull SCL high
4. read SDA
The ATMEL datasheet doesn't mention anything specific about that.
The official I2C specifications (UM10204 by NXP) say:
"The Acknowledge signal is defined as follows: the transmitter releases the SDA line during the acknowledge clock pulse so the receiver can pull the SDA line LOW and it remains stable LOW during the HIGH period of this clock pulse (see Figure 4)."
That clearly says that the SDA line must not be released before the acknowledge cycle (like I was doing). However, if I inverted steps 1 and 2, things wouldn't work also in WinUAE (I had already discovered this moons ago, but now I've double-checked). The only option that remained was to start the cycle and release the line at the same time. In fact, I've found in the much better and richer datasheet of the same chip by Microchip this:
"An ACK is accomplished by the transmitting device first releasing the SDA line at the falling edge of the eighth clock cycle followed by the receiving device responding with a logic ‘0’ during the entire high period of the ninth clock cycle" - and that is also accompanied by this nice diagram that leaves no leeway to interpretation:
After I changed the code (attached), I updated both SkillGrid and the test suite to check whether they'd still produce correct results - and they did. If anyone can and wants to give it a go, the test suite is still available from
https://www.retream.com/_temporary/NVRAMRTS.lha. I hope the friend of mine who has the suitable machine for testing will be able to run the tests soonish.