PED81C - pseudo-native, no C2P chunky screens for AGA - Page 5

saimo · 06 December 2023, 13:53

Quote:

Originally Posted by paraj

Looks like you're not setting the transparent translation registers? MMUlib clears them (sets them to $FFFF6040). You should set them to something like:

ITT0/1 = $00ffc000
DDT0 = $0000c040
DDT1 = $00ffc000

before disabling paging, and restore before re-enabling.

Well spotted

Since it's decades that I last touched the MMU, I thought that I'd better leave the TT registers alone, wrongly thinking that ATC-based translation would override them (when, instead, it's the other way around) and assuming that they were already correctly set.
Many thanks for the suggestion.
There's one thing I don't understand with the settings you suggested for the DDTs, though:
* DDT0 = $0000c040 means that addresses 00xxxxxx are transparently translated and not writeable;
* DDT1 = $00ffc000 means that all addresses are transparently translated and writeable.
The settings conflict with each other and, although the M68060UM doesn't explicitly say how such conflicts are solved, paragraph 4.4 suggests that write protection would prevail ("When write protection is enabled for a block..."): in that case, writes to CHIP RAM and chip registers would fail.
Shouldn't all registers be set to $00ffc000?

EDIT: hmm... I'm not convinced the problem is relative to the TTs, because:
* when the NOMMU switch is not used, the MMU is not fiddled with, so the jerks should not appear (i.e. I must have introduced a bug somewhere else);
* given that, as you reported, the TTs are disabled from outside, when the program disables the ATC-based translation the CPU basically behaves like a 68EC060, i.e. it uses the addresses literally and does not perform any extra caching/writeability handling on them Doh, I was forgetting about cache coherency issues.
I'll try also with all the TTs disabled On second thought, I'd better transparently translate the whole address space to mark it writethrough.

Karlos · 06 December 2023, 15:48

Colour me curious. I wonder if the MMU could be problematic for TKG, it has quite a few randomly accessed tables.

saimo · 06 December 2023, 16:05

Quote:

Originally Posted by Karlos

Colour me curious. I wonder if the MMU could be problematic for TKG, it has quite a few randomly accessed tables.

Inevitably the MMU has an impact on performance. I guess that there must be discussions about it here on EAB (the one I know is this one as I had posted in it).

paraj · 06 December 2023, 17:34

Quote:

Originally Posted by saimo

Well spotted

Since it's decades that I last touched the MMU, I thought that I'd better leave the TT registers alone, wrongly thinking that ATC-based translation would override them (when, instead, it's the other way around) and assuming that they were already correctly set.
Many thanks for the suggestion.
There's one thing I don't understand with the settings you suggested for the DDTs, though:
* DDT0 = $0000c040 means that addresses 00xxxxxx are transparently translated and not writeable;
* DDT1 = $00ffc000 means that all addresses are transparently translated and writeable.
The settings conflict with each other and, although the M68060UM doesn't explicitly say how such conflicts are solved, paragraph 4.4 suggests that write protection would prevail ("When write protection is enabled for a block..."): in that case, writes to CHIP RAM and chip registers would fail.
Shouldn't all registers be set to $00ffc000?

EDIT: hmm... I'm not convinced the problem is relative to the TTs, because:
* when the NOMMU switch is not used, the MMU is not fiddled with, so the jerks should not appear (i.e. I must have introduced a bug somewhere else);
* given that, as you reported, the TTs are disabled from outside, when the program disables the ATC-based translation the CPU basically behaves like a 68EC060, i.e. it uses the addresses literally and does not perform any extra caching/writeability handling on them Doh, I was forgetting about cache coherency issues.
I'll try also with all the TTs disabled On second thought, I'd better transparently translate the whole address space to mark it writethrough.

Not exactly the MMU expert either, but $0000c040 should mean E=1, CM=%10 for the lower 16MB, and writing that, it should probably be $0000c060 instead (i.e. CM=%11 => Cache-Inhibited, Imprecise Exception Model). Write through sounds dangerous for custom registers..

Section 4.4 states that "If both registers match, the TT0 status bits are used for the access." so that's guaranteed. If neither match and paging is disabled, you get the default values specified in the translation control register.

modrobert · 06 December 2023, 18:00

I tried PVE on a stock A1200 with fast RAM, even though it's only 6 FPS on average it's running smooth for a "3D" engine, impressive results.

Code:

total number of frames rendered:    940
total number of frames shown:       7780
frames rendered per second average: 6.04
frames per render average:          8.27

Thinking back of "3D" games like Captain Blood and Virus, this definitely feels more responsive.

saimo · 06 December 2023, 18:09

Quote:

Originally Posted by paraj

Not exactly the MMU expert either, but $0000c040 should mean E=1, CM=%10 for the lower 16MB, and writing that, it should probably be $0000c060 instead (i.e. CM=%11 => Cache-Inhibited, Imprecise Exception Model). Write through sounds dangerous for custom registers..

Section 4.4 states that "If both registers match, the TT0 status bits are used for the access." so that's guaranteed. If neither match and paging is disabled, you get the default values specified in the translation control register.

How I could swap the bottom nibbles of the transparent translation registers around ($40 <-> $04) and miss the information you posted...

Anyway, I agree that going for the precise mode is best. But I'll give it another thought later/though (EDIT: "though" was supposed to be "tomorrow"... can't think straight...) (too tired now).
The attached build sets the MMU as you suggested (and has also a couple of other little changes, but not related to the MMU or bugfixes).

paraj · 06 December 2023, 18:41

Get some rest and return with a fresh perspective

Still really no speed difference with your latest build, however nommu now seems to be less glitchy! https://i.imgur.com/rvgztVC.mp4
Maybe the cache modification thing you started doing is causing it (for some reason)?

Quote:

Originally Posted by Karlos

Colour me curious. I wonder if the MMU could be problematic for TKG, it has quite a few randomly accessed tables.

I've tried the disable "trick" with some of my own stuff, but also TKG, and it seems to cause a slowdown rather than a speed up. Not sure why yet, but maybe because the gains outweigh the benefit of proper MMU setup for the lower 16megs (also I have zero page + rom moved to fast using MMU).

Also ATC misses, which are the only ones you'll avoid by switching to TT, should normally not have that big of an effect. Of course if there is really no locality in your memory accesses it's going to be slow (around the same as a chip read apparently!), but my test is really pathological.

saimo · 06 December 2023, 18:41

Quote:

Originally Posted by modrobert

I tried PVE on a stock A1200 with fast RAM, even though it's only 6 FPS on average it's running smooth for a "3D" engine, impressive results.

Code:

total number of frames rendered:    940
total number of frames shown:       7780
frames rendered per second average: 6.04
frames per render average:          8.27

Thinking back of "3D" games like Captain Blood and Virus, this definitely feels more responsive.

Wow, thanks! Nice to hear

I guess that such machine would be capable of an even better performance, as during the initial development I had tried a version whose renderer code was less efficient than the current one* on a stock A1200 and it reached 5 fps already. That was thanks to the fact that back then graphics were being rendered directly to a raster in CHIP RAM. However, seeing that the speed was too low and considering that with just 2 MB the maps have to be small, after a while I decided to make FAST RAM compulsory and changed the buffering strategy: before there were 3 buffers in CHIP RAM; now there are 2 buffers in CHIP RAM and 2 buffers in FAST RAM, and graphics get rendered in FAST RAM first and then copied to CHIP RAM (that, if I remember correctly, produced a gain a 1 or 2 fps on my 68030 machine, but I'm pretty sure that it isn't ideal for a machine that only has additional FAST RAM).

*I came up with several optimizations afterwards. Also, that version did not render the background, but performed clearing by continuing the drawing of the columns. That was really inefficient, but it was basically placeholder code for drawing the background with the idea of adding also skewing some day. Eventually I dropped the idea as that would affect the speed too much

paraj · 06 December 2023, 20:03

After a bit of guru meditation, I figured out that the reason it was slower is that cache should of course be configured to writeback not writethrough for fast mem. This gives a super tiny benefit in starting pose of original TKG with random build I had lying around (and another quick test).

I currently - subject to further tests - think the best way to achieve your no-MMU setup on 060 is something like (in pseudo-code):

Code:

  # Paged address translation is enabled, and DTT0/1 are not in used?
  if TT.e and not DTT0.e and not DTT1.e:
    DTT0 = $403fc020 # Enable write back, transparent translation for Z3 fast ram

Bit unsure about the mask to use, my card is mapped at $68xxxxxx. Maybe you could scan the memlist or something.

saimo · 07 December 2023, 00:23

Gosh, I had swapped ITT0 and DTT0 around!
New build, with that fixed. Also, it features another change, unrelated to the MMU but related to the screen refresh, that makes a certain piece of code more robust - although it should not have played any role in the glitches (unless there was expansion hardware causing NMI transitions - which wasn't the case according to the exceptions occurrences log). Sorry if I don't provide an description, but if it had not been too long and I should not have been sleeping (at least for a month straight...), I'd have explained.

EDIT: the version originally uploaded with this post included a broken table; if you had already downloaded the archive, please re-download it - sorry.

@paraj

Moreover, I've used CM = 10 (precise/serialized model) for the first 16 MB: isn't that more recommendable, after all?
If this finally works, the next step will be trying copyback for the addresses >= $1000000.

paraj · 07 December 2023, 18:31

Still glitchy, and crashed when running bb from RAM:

Precise mode is certainly safer, but I don't think it's needed, but haven't tested it. Be aware that it means the store buffer can't be for chip ram. Maybe not a big deal in this case since you're not C2Ping.

But really, I stand by suggestion from yesterday: Setup only one DTT0 for fast ram, and don't fiddle with anything else. You get all of the benefits you're looking for and none of the downsides.

For your next test build, maybe try to disable all of the advanced stuff including the data cache thing (at least with an option). I have a small tool that can setup DTT0 as I suggest, and I know it works, and will report if that improves things like we expect.

saimo · 07 December 2023, 20:01

Quote:

Originally Posted by paraj

Precise mode is certainly safer, but I don't think it's needed, but haven't tested it. Be aware that it means the store buffer can't be for chip ram. Maybe not a big deal in this case since you're not C2Ping.

But really, I stand by suggestion from yesterday: Setup only one DTT0 for fast ram, and don't fiddle with anything else. You get all of the benefits you're looking for and none of the downsides.

Wait, wasn't the setup supposed to be this?

ITT0 = $00ffc000
ITT1 = $00ffc000
DTT0 = $0000c060
DTT1 = $00ffc000

(Which totally makes sense to me.)

Quote:

For your next test build, maybe try to disable all of the advanced stuff including the data cache thing (at least with an option). I have a small tool that can setup DTT0 as I suggest, and I know it works, and will report if that improves things like we expect.

I trust your experience (and appreciate your support), and I'm more than happy to set DTT0 to $c060 (I thought I'd use $c040 to play safe given the current misbehaviour). However, if $c040 didn't work, how could the less strict $c060 improve things? I mean, the problem can't be there. Also, the thing is that is the NOMMU switch isn't used, the MMU is not touched at all. That also indicates that the problem must not be in the MMU setup.

The rendering and buffering stuff did not change since the last build that worked, which did include manipulating the caches. And, regarding them, there's nothing advanced - this is what happens (on 68040 and 68060) in the user mode code that renders the graphics:
1. trap #0;
2. render landscape;
3. trap #1.

The 68060 trap #0 handler, which disables the data cache, is as simple as this:

Code:

   move.l  #$20808000,d0 ;(.ESB,EBC,EIC)
   movec.l d0,cacr
   rte

And the trap #1 handler, which re-enables the data cache, is:

Code:

   move.l  #$a0808000,d0 ;(.EDC,ESB,EBC,EIC)
   movec.l d0,cacr
   rte

In both cases, d0 can be trashed.

Attached is an archive that contains two builds:
* both set the MMU registers are indicated above (but only when NOMMU is used: the MMU registers are not changed otherwise);
* one has the traps and the other doesn't;
* both count the occurrences of the exceptions: the final printout should have only the counter of vector 27 different from zero in the no-traps build; the other should have also the counters of vectors 32 and 33 different from 0, and all the counters should be equal (EDIT: the last part is true only if rendering is done at 50 fps).

paraj · 07 December 2023, 20:20

PVE: normal build - gltiches (like before)

bb/fb crashes

PVE_no_traps: no glitches

bb 19.131 (261 x 27 int, not others same for rest of tests), fb: 17.340 (288 x 27)

Enable my DTT0 thing: bb 25.400 (197 x 27 int), fb: 22.451 (223 x 27)

Not 100% sure, but you probably need to flush caches before disable the DC. Since writeback is likely enabled before, bad things probably happen if that data isn't written back before disabling them.

Regarding *TT0/1 setup: Looks about right, but again, I think what you really want is just to avoid ATC-miss penalties (which you should get from only setting up one DTT register). Everything else is just asking for trouble for no benefit. It's interesting to experiment with getting it setup correctly otherwise (I'm learning a lot

) but seems tangential to your goal.

saimo · 08 December 2023, 01:35

@paraj

Alright, chances are that your last report enlightened me and finally we have a working solution!

Quote:

Originally Posted by paraj

PVE: normal build - gltiches (like before)

bb/fb crashes

PVE_no_traps: no glitches

bb 19.131 (261 x 27 int, not others same for rest of tests), fb: 17.340 (288 x 27)

Enable my DTT0 thing: bb 25.400 (197 x 27 int), fb: 22.451 (223 x 27)

Just FYI: the interrupt counter differs in the various cases as that's the number of frames shown (given that it's the COPER interrupt) and thus depends on how long the program has been running.
EDIT: just noticed that the counters of exceptions 27, 32 and 33 differ in the same test on 68040 and 68060 (but not on 68030), and I guess that was what you referred to; a difference is not normal when running benchmarks - I'll investigate that The code works fine, it was the "and all the counters should be equal" statement to be wrong - well, mostly: the counters are equal if execution runs at 50 fps, as the number of traps indicates the number of frames rendered.

Quote:

Not 100% sure, but you probably need to flush caches before disable the DC. Since writeback is likely enabled before, bad things probably happen if that data isn't written back before disabling them.

This.
And so the solution is: set up the MMU properly so that also caching is handled correctly and turning the DMA on/off as needed (without flushing) has no negative effect. It's either both caches and MMU or none.

I think I know now why I was so confused (besides the fact of having no experience of coding specifically for 68060 and sleeplessness, that is):
* the critical point was the false belief that, before I started fiddling with the MMU, PVE worked fine;
* that was an illusion: quite a few posts back (haven't checked) at some point, for the first time you reported that PVE didn't work anymore;
* that was when I had started fiddling with the caches;
* the next build, which had some initialization stuff changed, worked again, and so I thought that caches were OK;
* I started fiddling with the MMU and got lost;
* at some other point, I realized that due to another initialization issue (CPU ID variable being used before it was set due to the shuffling of some code) the generic 68020 code started being used;
* that code was exactly what allowed PVE to work, as it does not contain the DMA toggling!
* however, I didn't realize that and kept on believing that the caches were just fine and the problems (which has brutally appeared after fixing the CPU ID bug) were related to the MMU only;
* the turning point was your idea of making a test without touching the caches.

Quote:

Regarding *TT0/1 setup: Looks about right, but again, I think what you really want is just to avoid ATC-miss penalties (which you should get from only setting up one DTT register). Everything else is just asking for trouble for no benefit. It's interesting to experiment with getting it setup correctly otherwise (I'm learning a lot ) but seems tangential to your goal.

Yes, my primary goal is to get PVE to run as fast as possible, but I'm enjoying the journey as well

So, I came to these conclusions:
* no more NOMMU switch: to achieve the best performance it is necessary to bypass the table searches and enable/disable the data cache (burst) on the fly, so the MMU setup has to be customized always (on all non-EC CPUs, that is); at most, if I get requests, I'll add a SAFE/S switch that disables both caches and MMU handling;
* ITT0 = $00ffc000;
* DTT0 = $0000C060;
* DTT1 = $00ffc000;
* DMA turned on/off as shown in the previous post (with $80008000/$000080000 for 68040; on 68030 things are different: the data cache is always on, but the burst is enabled only when copying blocks of memory).

The attached build implements the above.
It it works, I'd like to also try DTT1 = $00ffc020 (copyback) in order to gain a little more speed (although I expect only a tiny improvement, as the writes to FAST RAM to update variables are very few and all outside of the rendering core loop).

Wepl · 08 December 2023, 12:31

Why do you touch ITT* at all?
It should not be relevant IMHO.

If you change something in the MMU setup you also need to flush/clear the caches (CPUSH/CINV) because the caches operate independently.

saimo · 08 December 2023, 13:01

Quote:

Originally Posted by Wepl

Why do you touch ITT* at all?
It should not be relevant IMHO.

To avoid that the way the MMU is set up when the program is launched affects performance. By setting ITT0 to $00ffc000 table searches never happen.

Quote:

If you change something in the MMU setup you also need to flush/clear the caches (CPUSH/CINV) because the caches operate independently.

Indeed.
I think that the exact cause of the glitches/crashes was that the initialization code, together with (partially) incorrect MMU registers values, did this at some point after taking the system over:
1. clear caches (after modifying the code in a few places);
2. set interrupt and traps vectors (with the vbr pointing to a table in FAST RAM);
3. change the MMU setup.

If (as probable) the caches were operating in copyback mode, the vectors did not get written to RAM, so it was just luck that the program showed something at all.
Now the code clears the caches together with the MMU setup.

By the way, I have received a report that PVE now works fine on an A4000/060 when running normally; the same report says that the benchmarks don't work, but I still have to look into that.

Wepl · 08 December 2023, 14:08

Quote:

Originally Posted by saimo

To avoid that the way the MMU is set up when the program is launched affects performance. By setting ITT0 to $00ffc000 table searches never happen.

Yeah, right, sorry.
I first assumed you leave the MMU on and only additionally set the TT*s.

It's a pity that there are only 64 ATC entries per data/instruction.

It would be interesting what would happen if the mmu tables would be enabled to be cached. I assume mmu.library sets them to noncacheable (also makes sense in my eyes). Table searches then would still occur but most of them could be satisfied via the data cache.

saimo · 08 December 2023, 15:25

Quote:

Originally Posted by Wepl

Yeah, right, sorry.
I first assumed you leave the MMU on and only additionally set the TT*s.

No problem!
By the way: actually I do leave the MMU on (that is, if it's enabled to begin with, as I do not modify TC), but the way the TT registers are set up effectively disables table-based translation.

Quote:

It's a pity that there are only 64 ATC entries per data/instruction.

It would be interesting what would happen if the mmu tables would be enabled to be cached. I assume mmu.library sets them to noncacheable (also makes sense in my eyes). Table searches then would still occur but most of them could be satisfied via the data cache.

It looks like that can't be done on 68060 - from the MC68060UM, at the end of section 4.2.1:

Quote:

The processor does not use the data cache when performing a table search. Therefore, translation tables must not be placed in copyback space, since the normal accesses which build the translation tables would be cached and not written to external memory, but the processor only uses tables in external memory. This is a functional difference between the MC68060 and the MC68040.

Still, using the data cache for tables would affect the performance when accessing the other data.
Anyway, in contexts of hardware hitting software like this, table-based translation is hardly needed... oh, wait: maybe you're thinking of WHDLoad?

Wepl · 08 December 2023, 17:40

Quote:

Originally Posted by saimo

No problem!
By the way: actually I do leave the MMU on (that is, if it's enabled to begin with, as I do not modify TC), but the way the TT registers are set up effectively disables table-based translation.

But then you do not need to touch instruction TT.
As I understand you only have a problem with data. Then there is no need to change anything on the instruction side. ATC are also separate for data and inst.

Quote:

Originally Posted by saimo

It looks like that can't be done on 68060 - from the MC68060UM, at the end of section 4.2.1:

Yes correct, I forgot.

Quote:

Originally Posted by saimo

Still, using the data cache for tables would affect the performance when accessing the other data.
Anyway, in contexts of hardware hitting software like this, table-based translation is hardly needed... oh, wait: maybe you're thinking of WHDLoad?

Not related to WHDLoad.

I think a good idea would be to allocate a 16M aligned memory, make this transparent translated and put all data in this segment which should not be cached. The people will need probably 48M RAM for this to have a 16M free aligned block.

An other idea would be to make the TT only for supervisor mode and run your code in user or super mode depending on the wished cache use (or vice versa). Don't know if this is feasible. Probably too complicated.

saimo · 08 December 2023, 23:15

Quote:

Originally Posted by Wepl

But then you do not need to touch instruction TT.
As I understand you only have a problem with data. Then there is no need to change anything on the instruction side. ATC are also separate for data and inst.

I don't want the MMU to access the RAM at all, so table-based translation must be avoided also for instructions fetches.

Quote:

I think a good idea would be to allocate a 16M aligned memory, make this transparent translated and put all data in this segment which should not be cached. The people will need probably 48M RAM for this to have a 16M free aligned block.

An other idea would be to make the TT only for supervisor mode and run your code in user or super mode depending on the wished cache use (or vice versa). Don't know if this is feasible. Probably too complicated.

To be honest I can't follow you

Is it some general idea, or is it intended for this little project? In the latter case, nothing that expensive and complicated is needed: just transparent translation for the whole address space, with special care for the first 16 MB and caching.

06 December 2023, 18:00	#85
modrobert old bearded fool Join Date: Jan 2010 Location: Bangkok Age: 57 Posts: 779	I tried PVE on a stock A1200 with fast RAM, even though it's only 6 FPS on average it's running smooth for a "3D" engine, impressive results. Code: total number of frames rendered: 940 total number of frames shown: 7780 frames rendered per second average: 6.04 frames per render average: 8.27 Thinking back of "3D" games like Captain Blood and Virus, this definitely feels more responsive.

06 December 2023, 20:03	#89
paraj Registered User Join Date: Feb 2017 Location: Denmark Posts: 1,217	After a bit of guru meditation, I figured out that the reason it was slower is that cache should of course be configured to writeback not writethrough for fast mem. This gives a super tiny benefit in starting pose of original TKG with random build I had lying around (and another quick test). I currently - subject to further tests - think the best way to achieve your no-MMU setup on 060 is something like (in pseudo-code): Code: # Paged address translation is enabled, and DTT0/1 are not in used? if TT.e and not DTT0.e and not DTT1.e: DTT0 = $403fc020 # Enable write back, transparent translation for Z3 fast ram Bit unsure about the mask to use, my card is mapped at $68xxxxxx. Maybe you could scan the memlist or something.

07 December 2023, 00:23	#90
saimo Registered User Join Date: Aug 2010 Location: Italy Posts: 854	Gosh, I had swapped ITT0 and DTT0 around! New build, with that fixed. Also, it features another change, unrelated to the MMU but related to the screen refresh, that makes a certain piece of code more robust - although it should not have played any role in the glitches (unless there was expansion hardware causing NMI transitions - which wasn't the case according to the exceptions occurrences log). Sorry if I don't provide an description, but if it had not been too long and I should not have been sleeping (at least for a month straight...), I'd have explained. EDIT: the version originally uploaded with this post included a broken table; if you had already downloaded the archive, please re-download it - sorry. @paraj Moreover, I've used CM = 10 (precise/serialized model) for the first 16 MB: isn't that more recommendable, after all? If this finally works, the next step will be trying copyback for the addresses >= $1000000. Last edited by saimo; 08 December 2023 at 01:37. Reason: Removed attachment as I provided a newer version later.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
No native AGA screens on PIV since P96 v3 upgrade	LoadWB	support.Apps	0	30 October 2020 01:57
Extra bottom line on native screens, chipset feature or WinUAE?	PeterK	support.WinUAE	5	11 September 2019 21:21
My pseudo 3D jump code	Brick Nash	Coders. AMOS	24	03 September 2016 00:18
Chunky to Planar (C2P) -- USELESS GIMMICK?!	crosis38	support.Hardware	10	09 July 2016 04:17
Pseudo Ops Viruskiller	Promax	request.Apps	0	28 July 2010 22:21

06 December 2023, 15:48	#82
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,480	Colour me curious. I wonder if the MMU could be problematic for TKG, it has quite a few randomly accessed tables.

07 December 2023, 18:31	#91
paraj Registered User Join Date: Feb 2017 Location: Denmark Posts: 1,217	Still glitchy, and crashed when running bb from RAM: Precise mode is certainly safer, but I don't think it's needed, but haven't tested it. Be aware that it means the store buffer can't be for chip ram. Maybe not a big deal in this case since you're not C2Ping. But really, I stand by suggestion from yesterday: Setup only one DTT0 for fast ram, and don't fiddle with anything else. You get all of the benefits you're looking for and none of the downsides. For your next test build, maybe try to disable all of the advanced stuff including the data cache thing (at least with an option). I have a small tool that can setup DTT0 as I suggest, and I know it works, and will report if that improves things like we expect.

07 December 2023, 20:20	#93
paraj Registered User Join Date: Feb 2017 Location: Denmark Posts: 1,217	PVE: normal build - gltiches (like before) bb/fb crashes PVE_no_traps: no glitches bb 19.131 (261 x 27 int, not others same for rest of tests), fb: 17.340 (288 x 27) Enable my DTT0 thing: bb 25.400 (197 x 27 int), fb: 22.451 (223 x 27) Not 100% sure, but you probably need to flush caches before disable the DC. Since writeback is likely enabled before, bad things probably happen if that data isn't written back before disabling them. Regarding *TT0/1 setup: Looks about right, but again, I think what you really want is just to avoid ATC-miss penalties (which you should get from only setting up one DTT register). Everything else is just asking for trouble for no benefit. It's interesting to experiment with getting it setup correctly otherwise (I'm learning a lot ) but seems tangential to your goal.

08 December 2023, 12:31	#95
Wepl Moderator Join Date: Nov 2001 Location: Germany Posts: 876	Why do you touch ITT* at all? It should not be relevant IMHO. If you change something in the MMU setup you also need to flush/clear the caches (CPUSH/CINV) because the caches operate independently.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)