Re: [Yaffs] Yaffs2 erasure issue on MT29 NAND part

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Andrew McKay
Date:  
To: Charles Manning
CC: yaffs
Subject: Re: [Yaffs] Yaffs2 erasure issue on MT29 NAND part
Hey Charles,

I did end up finding a strange issue with the NAND part itself from a bunch of
testing I did this weekend.

>
>> I'm going to go back to testing the NAND directly with the MTD layer as see
>> if I can get the NAND to do strange things from there. I'm also going to
>> look into back porting newer MTD code into our 2.6.20.4 kernel to see if
>> that fixes the problem. I've mentioned some of my issues on the MTD
>> mailing list but haven't really gotten a response on that end.
>
> Let us know how you get on. This is interesting for everyone.
>


With my logic analyzer I verified that the deletion process is working. When
YAFFS (or MTD) claims that an erase failed, the part is saying that the erase
failed. However after the failure this block is no longer readable or writable.
I testing doing both with mtd-utils applications. This of course means that
when MTD tries to mark the block bad, it can't. We might be able to fix the
issue if we move to using a flash written bad block table. Currently we just
scan the part for bad blocks on boot and use a RAM based bad block table.

I have verified timing of all the NAND control lines and used a scope to verify
the signals are clean and don't have excessive overshoot or undershoot. I can't
see anything wrong at our end. We're trying to source different 16Gbit parts,
and I'm trying to get into contact with a Micron FAE to see if they have seen
this issue.

Here's the email I sent off to the MTD mailing list detailing the test and the
problems I saw.

-------------------------------------------------------------------------------
Hey guys,

I'm having issues with MT29F16G08DAA parts with MTD on Linux. I have found a
strange issue with block erasure failures. The part seems to get in a state
where if a block fails erasure, all pages with-in that block (including the OOB
area) will read all 0x00. I realize that internally the part may write all bits
to 0 to prevent over erasure of already erased bits, however if the device is
powercycled, the data that was in that block before the erase is still there.

Here is a dump of what I'm seeing.

/mnt/zen/mtd-utils # ./nanddump -s 0x29a00000 -l 0x1000 -p /dev/mtd12
ECC failed: 0
ECC corrected: 0
Number of bad blocks: 204
Number of bbt blocks: 0
Block size 262144, page size 4096, OOB size 128
Dumping data starting at 0x29a00000 and ending at 0x29a01000...
0x29a00000: 03 00 00 00 09 75 00 00 ff ff 2e 73 76 6e 00 00
0x29a00010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00100: 00 00 00 00 00 00 00 00 00 00 ff ff ed 41 00 00
0x29a00110: 00 00 00 00 00 00 00 00 0a d6 7c 4a 0b d6 7c 4a
0x29a00120: 0b d6 7c 4a ff ff ff ff ff ff ff ff ff ff ff ff
0x29a00130: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x29a00140: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x29a00150: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[SNIP]
0x29a00f40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x29a00f50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x29a00f60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x29a00f70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x29a00f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x29a00f90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x29a00fa0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x29a00fb0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x29a00fc0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x29a00fd0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x29a00fe0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x29a00ff0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
OOB Data: ff ff cb 19 00 00 1d 75 00 30 09 75 00 80 00 00
OOB Data: 00 00 25 7c 05 c4 06 00 00 00 f9 ff ff ff ff ff
OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
OOB Data: ff 00 c3 a9 6a 67 ff ff ff ff ff ff ff ff ff ff
OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff


/mnt/zen/mtd-utils # ./flash_erase /dev/mtd12 0x29a00000
Erase Total 1 Unnand_erase: start = 0x29a00000, len = 262144
its
Performing Flash Erase of length 262144 at offset 0x29a00000single_erase_cmd
nand_erase: Failed erase, page 0x00029a00

MTD Erase failure: Input/output error


/mnt/zen/mtd-utils # ./nanddump -s 0x29a00000 -l 0x1000 -p /dev/mtd12
ECC failed: 0
ECC corrected:ECC: BAD
0
Number of baECC: BAD
d blocks: 204
NECC: BAD
umber of bbt bloECC: BAD
cks: 0
Block siECC: BAD
ze 262144, page ECC: BAD
size 4096, OOB sECC: BAD
ize 128
DumpingECC: BAD
data starting aECC: BAD
t 0x29a00000 andECC: BAD
ending at 0x29aECC: BAD
01000...
ECC: BAD
ECC: BAD
ECC: BAD
ECC: BAD
ECC: BAD
ECC: 16 uncorrectable bitflip(s) at offset 0x29a00000
0x29a00000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[SNIP]
0x29a00f40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00f50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00f60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00f70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x29a00ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
/mnt/zen/mtd-utils #

The big issue here is that I've also testing writing to this block after a
failed erasure, and I can't seem to write to it either. I tried forcing the
block to all zeros after the failed erase, and on reboot the previous data is
still in the block. This means that when MTD is told to mark this block as bad,
the write of the first two bytes in the first two pages of the block also fails.
Therefore the page never gets marked bad. Eventually an erase will work and
the block goes back to a "working" state. However it'll end up in this strange
state again at some point.

Has anyone seen this happen before with NAND parts? Is there a way to avoid
this? I have tried issuing a RESET command to the part after a failed erase,
but all the pages in the block stay in this strange state. The only thing that
seems to recover the device is to power cycle it.

I suppose I could change to using a BBT that is written to the NAND device,
hopefully then the bad blocks would be kept track of. Though I've never had an
issue with marking blocks bad by writing the first two bytes of the first two
pages to 0x00 and have Linux build a BBT on the fly during boot-up.
-------------------------------------------------------------------------------



Andrew McKay
Iders Inc.