CrcWarmCopyPage CRC mismatch

Kulunu , 01-29-2018, 12:26 AM
Dear All,
Dear Robert,

I ran stress app test for my custom hardware design for 2 days and I have got final results as follows. (I attached complete log file here with)

Log: Thread 1 found 39 hardware incidents
Log: Thread 2 found 1 hardware incidents
Log: Thread 5 found 3 hardware incidents
Log: Thread 6 found 37 hardware incidents
Stats: Found 80 hardware incidents
Stats: Completed: 165702896.00M in 1001.73s 165416.61MB/s, with 80 hardware incidents, 0 errors
Stats: Memory Copy: 165702896.00M at 165427.95MB/s
Stats: File Copy: 0.00M at 0.00MB/s
Stats: Net Copy: 0.00M at 0.00MB/s
Stats: Data Check: 0.00M at 0.00MB/s
Stats: Invert Data: 0.00M at 0.00MB/s
Stats: Disk: 0.00M at 0.00MB/s

Status: FAIL - test discovered HW problems

In detail log file shows following type of errors :-

Log: Seconds remaining: 164140
Log: CrcWarmCopyPage CRC mismatch ffffffff011000001ff0110173ffff009010174024b0090 != ffffffff01ffffffff0110173ffff009010173ffff0090, but no miscompares found. Retrying with fresh data.
Report Error: miscompare : DIMM Unknown : 1 : 8669s
Hardware Error: miscompare on CPU 0(0xF) at 0x751d76d8(0x8df616daIMM Unknown): read:0x0000010000020100, reread:0x0000010000020100 expected:0x0000010000000100
Log: Seconds remaining: 164130

Could you please explain this ? How can I evaluate stress app test results technically ? How to understand and finalize what is wrong with memory module or PCB layout ? Any document to refer ?

robertferanec , 01-29-2018, 04:18 PM
Honestly, if you would like to know more what these errors mean, the best way is to ask the guy who programmed the app. I talked to him once, when we had seen some problems related to Ethernet. However, if you are running only memory test, you should not see any errors.

Couple of times I have tried to identify memory error based on the log files, but memory error itself in many case is not going to help to find out what is wrong with the memory (unless it is something very specific as bad connection of one bit). If the memory problem is wrong layout, wrong DDR controller settings, power, etc then specific error information is useless as it will be very random and in most cases I was not able to find any patters.

Usually it works the way, that if memory is OK, there are absolutely no errors. If there is something wrong, you will see an error, but this error is not going to tell you what is wrong. At least, this is my experience.
Kulunu, 02-05-2018, 03:48 AM
Dear Robert,Many thanks for your reply.Could you please tell me; how can I contact the guy who made stressapp test tool ?Regards,Kulunu.
robertferanec , 02-05-2018, 08:56 AM
