Re: [Yaffs] Modifications to GC

Attachments:
Message as email (text/plain) (text/html)

Author: Hugo Etchegoyen
Date:
To: Charles Manning
CC: yaffs
Subject: Re: [Yaffs] Modifications to GC

OK, thanks. Will report back asap.

Charles Manning escribió:
> On Thursday 04 March 2010 04:17:53 Hugo Etchegoyen wrote:
>
>> Charles,
>>
>> Thank you very much for answering. In the meantime I discovered there is
>> a yaffs list and I've subscribed to it, so if you prefer to continue
>> this thread in the list just advise.
>>
>> It looks that you are already addressing my issues, so you can count me
>> as a guinea-pig. My timescale for this project is 2-3 months.
>>
>> To be a little bit more precise about my application, I will try to give
>> you some timing figures. They are not very precise because my test board
>> is not working right know so I have to count on my memory, but I think
>> you'll get the idea.
>>
>> I have some 200 millisecs available for shutdown. I can save state and
>> sync yaffs in less than 100 millisecs if GC doesn't interfere, but when
>> it does I can easily run out of time. I did some profiling and realised
>> that the execution time of yaffs_GarbageCollectBlock() could climb to
>> some 300 millisecs, depending on how many chunks were copied at a time,
>> so I came up with the idea of a smaller value for maxCopies. I guess
>> your proposed code is safer than just setting maxCopies to 3.
>>
>
> 300ms is excessive for gc. I will have a look to see if there is something
> interfering.
>
>
>> Background GC would be great since it would improve the average writing
>> speed, as most writes would not involve in-line GC. However, since the
>> probability of entering GC at power fail would be much smaller, but not
>> zero, I think it would still be necessary to decrease maxCopies in order
>> to cover the worst case.
>>
>
> Yes indeed, background gc would not block for long.
>
>
>> Your idea of updating directory times in
>> background is also very nice. My application doesn't care about file or
>> directory times, so I will mount yaffs with "noatime" and "nodiratime"
>> in order to increase performance and flash endurance.
>>
> yaffs does not support atimes because it is expensive to do a header write for
> eacyaffs is already noatime and nodiratime whether you want it or not :-).
>
> The directory times I'm talking about are the directory mtime and ctime that
> get changed every time the directory is changed so that client apps can know
> to refresh their view etc. Every time a file is created, renamed or unlinked
> we write a directory entry reflect the change. This doubles the number of
> writes assocaited with creating or deleting empty files.
>
> The new approach will put the "dirty" directories into a dirty directory list
> and use the background thread to periodically (every second or so) update the
> directory times. That should reduce the number of directory update writes
> significantly.
>
> This is particularly bad when running test code which, say, creates 1000 files
> in a directory then deletes them resulting in 2000 directory update writes
> (ie. 4000 writes in total), half of which only last for a very brief time.
> With the new code there would only be a handful of directory update writes.
>
>
>
>> I agree that refreshing old data looks much better than refreshing
>> random blocks and at the same time does static wear leveling, so as soon
>> as you have code for this (or any hints) I will gladly try it.
>>
>
> It is in cvs. Give it a go.
>
>
>> Cheers,
>> Hugo
>>
>> Charles Manning escribió:
>>
>>> Hugo
>>>
>>> This is interesting as I am looking at implementing background gc and
>>> revising the gc policies. What are yourr timescales? I'd be interested in
>>> someone being a bleeding edge guinea-pig.
>>>
>>> On Tuesday 02 March 2010 09:41:45 Hugo Etchegoyen wrote:
>>>
>>>> Dear Charles,
>>>>
>>>> I intend to use yaffs2 in a uCLinux embedded application, but I need
>>>> some tuning.
>>>>
>>>> My hardware detects power failure and gives me a short time for
>>>> shutdown.
>>>>
>>> How short is short?
>>>
>>>
>>>> During this time I must save state to flash. This involves a
>>>> few small writes totalling much less that the size of an erase unit.
>>>> Then I sync (and unmount?) the yaffs partition.
>>>>
>>>> As there is little time available for shutdown, I need to make some
>>>> modifications to garbage collection:
>>>>
>>>> 1. I must be able to disable GC during power fail so that it will not
>>>> waste valuable time.
>>>> 2. GC must be more spreadout than it is now, i.e.
>>>> yaffs_GarbageCollectBlock() should only copy a few blocks at a time,
>>>> whether aggressive or not, so that power fail will not have to wait too
>>>> much before disabling GC.
>>>>
>>>> As an additional requirement, my flash will be pretty full. It will have
>>>> lots of static data and the dynamic data will be very hot, which is a
>>>> bad case for endurance. For that reason I also need to:
>>>>
>>>> 3. Add static GC to yaffs.
>>>>
>>>> I think (1) can be easily satisfied by raising a flag on power fail and
>>>> having yaffs_CheckGarbageCollection() check the flag and do nothing if
>>>> it is set.
>>>>
>>> At present it is quite easy to defeat gc by just setting dev->isDoingGC
>>> to 1.
>>>
>>>
>>>> Regarding (2), I have already tried a simple modification changing line
>>>> 3036 of yaffs_guts.c, in function yaffs_GarbageCollectBlock() as
>>>> follows:
>>>>
>>>> Original: maxCopies = (wholeBlock) ? dev->nChunksPerBlock : 10; >>>> New version: maxCopies = 3;

>>>>
>>>> I've tried this under rather heavy load and everything seems to work
>>>> properly. I hope a figure of 3 guarantees that we will not run out of
>>>> free blocks, because during GC three chunks are freed for each new chunk
>>>> written.
>>>>
>>> There are some conditions where this policy might fail, but I would
>>> expect 3 to probably work.
>>>
>>> Perhaps
>>> maxCopies = (wholeBlock) ? 5 : 3 or something like that would be safer.
>>>
>>>
>>>> Regarding (3), I am thinking of modifying
>>>> yaffs_FindBlockForGarbageCollection() so that once in a while, let's say
>>>> one time out of 100, it will just choose a random full block. This
>>>> should tend to move static data around.
>>>>
>>> It is potentially dangerous to just randomly gc a block unless you also
>>> check it is not disqualified.
>>>
>>> The new block refreshing code will also do what you're wanting. It
>>> infrequently gc's the oldest block thus rewriting static data.
>>>
>>>
>>>> I'm aware that all the intended modifications tend to postpone or slow
>>>> down GC. Do you think it would be advisable to increase the value of
>>>> nReservedBlocks in the yaffs device to compensate, and by how much?
>>>>
>>> nReserved blocks of 5 is already pretty conservative. The dangers are:
>>> a) The possibility of a block going bad during gc means we need to keep
>>> some extra handy.
>>> b) With the current gc logic, nErasedBlocks should never drop below 4,
>>> maybe 3. maxcopies = 3 might make things a bit worse because a gc can
>>> sometimes end up with very little reclaim (no reclaim is a possibility).
>>> That might mean we lose one block of headroom.
>>>
>>> Thus from a quick mental audit, it sounds like the current level is safe.
>>>
>>> I might be inclined to do something like: >>> if(dev->nErasedBlocks < 3) >>> maxCopies = dev->nChunksPerBlock; /* Should never happen. This is just a >>> safety net if erased space get really low.*/ >>> else if(wholeBlock) >>> maxCopies = 5; >>> else >>> maxCopies = 3;

>>>
>>> There is currently an ugliness caused by "shrink headers" which constrain
>>> the gc order meaning that often we can't collect the blocks that would
>>> give the most free space. Sometimes these end up in "chains" meaning
>>> there is a sequence of these gc's which causes gc to grind hard.
>>>
>>> As I mentioned at the top, I am revising gc in two ways:
>>> 1) Making it less "lumpy" by doing more smaller gcs.
>>> 2) Adding a background thread to do gc behind the scenes meaning that it
>>> should only very seldom need to be done in the writing thread. The
>>> background thread will also do directory time updating which should cut
>>> down on the number of writes when creating files etc.
>>>
>>> The background gc will target:
>>> 1) Trying to keep at least 25% of space in erased blocks. That means
>>> writes should happen fast.
>>> 2) In-write gc will only happen if erased space gets v low.
>>> 3) Eliminate or reduce those pesky shrink header chains so that if gc
>>> does happen in the write thread it doesn't become too onerous.
>>>
>>> -- Charles
>>>
>
>
>
>
>

--

Ing. Hugo Eduardo Etchegoyen*
*Gerente Dto. Software de Base

Compañía Hasar | Grupo Hasar*
*Marcos Sastre y José Ingenieros
El Talar. Pacheco
[B1618CSD] Buenos Aires. Argentina
Tel [54 11] 4117 8900 | Fax [54 11] 4117 8998
E-mail: hetchegoyen@hasar.com
Visítenos en: www.hasar.com
<http://www.hasar.com>Información legal y política de confidencialidad:
www.grupohasar.com/disclaimer <http://www.grupohasar.com/disclaimer>

This message is part of the following thread:
	the complete thread tree sorted by date
	Charles Manning at