Re: [Yaffs] Periodic Checkpointing

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Charles Manning
Date:  
To: yaffs
Subject: Re: [Yaffs] Periodic Checkpointing
On Thursday 01 April 2010 04:00:36 Josh Gelinske wrote:
> I have two large (4GB) NANDs in my system and I want to try implementing a
> periodic checkpoint mechanism to help speed up boot. Given the heavy amount
> of IO and the amount of state the checkpoint snapshots for these NANDS I
> cannot guarantee I can get a clean unmount (more like I can guarantee I
> will get an unclean shutdown).
>
> My current thought is to add a kernel thread that will checkpoint every 30
> seconds to 1 minute. Each time I "auto" checkpoint I would remove the older
> checkpoint once the new one is written. Of course given the timeing of
> shutdown can't be guaranteed this means I need to handle cleaning up that
> old checkpoint a boot as well in case it did not get erased. I have this
> much working and it improves boot time but of course there are issues.
>
> The part I am missing and am unsure the best way to implement is the scan
> that I need to do for any changes after my checkpoint. Currently any
> modifications after my auto checkpoint are lost and this can cause
> corruption of existing data etc. Any ideas on the best way to do this? Am I
> completely going down the wrong path here?


Hi Josh

One of the things I love about answering emails like this is that it prompts
new thinking and just responding has helped to seed a couple of ideas.

I've considered this a few times and have been thinking around what is
required to make an incremental checkpoint. There are some headaches...

The checkpoint holds quite a bit of cross-referenced state that would need to
be fixed up. For example:
* Writing a new chunk into a file will change the file chunk list, file info
and the block info (chunk use bitmaps, block state...).
* Doing a garbage collection might change many file chunk lists, block states,
chunk lists etc.
... and so on...

It would be fairly easy if all changes were incremental (ie just adding chunks
and no deleting). For example if we knew the checkpoint reflected state for
state up to the allocation of block sequence 5000 then saw that there's been
a block added with sequence 5001, we could just restore the checkpoint and do
a scan on block 5001 and apply those changes.

It gets a lot trickier if state has been changed and we then lose track of
those changes. For example, garbage collections, file deletions, etc.

I've started fiddling around with some ideas wrt checkpoint "patches". These
would be written to the end of the checkpoint and tell a post-scanner that it
needs to scan some blocks to fix up the state. Benefits are that a patch
would be very fast to write (quick to write as part of a shutdown handler,
maybe, or maybe written every time a block is deleted). Then every minute or
so the fresh checkpt could be written. This might get quite complicated... I
don't like complicated because it makes a breeding ground for bugs.

Another approach that I keep mentioning, and should really get around to
doing, is implementing "block summaries". This would use the last chunk in
each block to hold all the tags for chunks in the block. That makes scanning
way, way, faster meaning that the fallback for handling the case when there
is no current checkpoint is still approaches checkpoint speed.

These are not mutually exclusive and can work together. ie.
Use checkpoint if we can.
If not, scan, using block summaries for blocks that have them.

-- Charles