Re: [Yaffs] Bad Block definition

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Qi Wang 王起 (qiwang)
Date:  
To: Ricard Wanderlof, Charles Manning
CC: yaffs@lists.aleph1.co.uk
Subject: Re: [Yaffs] Bad Block definition
Hi, Ricard and Charles

> > But now, MTD layer read function only return ?EUCLEAN to YAFFS2, YAFFS2 cannot
> > get how many bit flip occur.
>
> Since a couple of kernel versions back, the mtd layer now includes a
> method of notifying how many bits were corrected during a read operation,
> and not just -EUCLEAN. So the mechanism to find out how many bitflips have
> occurred actually exists now.


> (Not all NAND flash controllers with hardware ECC support this however,
> i.e. they correct the data but it's not possible to read from the hardware
> how many bits were corrected (or in some cases even if bit correction
> occurred).)


Yes, you are right. I check the latest Linux kernel just now. There do is a mechanism
to find out how many bitflip have occurred.

> > But actually, in NAND flash, only program and erase error can be marked bad
> > block. Bit flip symptom is easy happen after a page is read many cycles.
> >
> > If a system use YAFFS2, and never power down this system, user will see a
> > lot of bad block after they run a time, But this block isn?t a real bad block.
> >
> > How about just refresh the block when bit flip occur, but not record the bit
> > flip count, and mark it as bad block?
>
> I would think that one factor in deciding if the block is going really bad
> would be to estimate how many read cycles have been done since the last
> rewrite. If there have been a lot of read cycles, or a lot of time has
> passed since the last rewrite, it would be more expected for bitflips to
> occur and hence the block should not be marked as bad.


Exactly, but I am afraid it is hardly to track read cycles for a block in YAFFS2.
As Linux standard driver can give bitflip numbers, I think it is possible to use the
method Charles just mentioned, use a three level of errors,
* NO_ERROR : No problems.
* REFRESH_NEEDED: (new) Refresh block, don't worry about it going bad.
* FIXED_BUT_SUSPECT: Treated same as FIXED is now, retire the block if it does this again.
* UNFIXED

Define 2 threshold, one threshold is used for REFRESH_NEEDED, and the other used for REFRESH_NEEDED.
REFRESH_NEEDED threshold should be lower than REFRESH_NEEDED threshold.
In the normal condition, a block should be refreshed once its bitflip reach the REFRESH_NEEDED threshold,
never can exceed REFRESH_NEEDED threshold. When bitflip exceed REFRESH_NEEDED threshold, I assume something
abnormal occurs, YAFFS2 can record this block, and if this abnormal occurs more than 3 times, YAFFS2 can mark
this block as bad block.

No sure if this process make sure, how do you think about? If it is ok for you, I am make a YAFFS2 patch to submit
to you.
Thanks