William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 1 | Welcome to YAFFS, the first file system developed specifically for NAND flash. |
| 2 | |
| 3 | It is now YAFFS2 - original YAFFS (AYFFS1) only supports 512-byte page |
| 4 | NAND and is now deprectated. YAFFS2 supports 512b page in 'YAFFS1 |
| 5 | compatibility' mode (CONFIG_YAFFS_YAFFS1) and 2K or larger page NAND |
| 6 | in YAFFS2 mode (CONFIG_YAFFS_YAFFS2). |
| 7 | |
| 8 | |
| 9 | A note on licencing |
| 10 | ------------------- |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 11 | YAFFS is available under the GPL and via alternative licensing |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 12 | arrangements with Aleph One. If you're using YAFFS as a Linux kernel |
| 13 | file system then it will be under the GPL. For use in other situations |
| 14 | you should discuss licensing issues with Aleph One. |
| 15 | |
| 16 | |
| 17 | Terminology |
| 18 | ----------- |
| 19 | Page - NAND addressable unit (normally 512b or 2Kbyte size) - can |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 20 | be read, written, marked bad. Has associated OOB. |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 21 | Block - Eraseable unit. 64 Pages. (128K on 2K NAND, 32K on 512b NAND) |
| 22 | OOB - 'spare area' of each page for ECC, bad block marked and YAFFS |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 23 | tags. 16 bytes per 512b - 64 bytes for 2K page size. |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 24 | Chunk - Basic YAFFS addressable unit. Same size as Page. |
| 25 | Object - YAFFS Object: File, Directory, Link, Device etc. |
| 26 | |
| 27 | YAFFS design |
| 28 | ------------ |
| 29 | |
| 30 | YAFFS is a log-structured filesystem. It is designed particularly for |
| 31 | NAND (as opposed to NOR) flash, to be flash-friendly, robust due to |
| 32 | journalling, and to have low RAM and boot time overheads. File data is |
| 33 | stored in 'chunks'. Chunks are the same size as NAND pages. Each page |
| 34 | is marked with file id and chunk number. These marking 'tags' are |
| 35 | stored in the OOB (or 'spare') region of the flash. The chunk number |
| 36 | is determined by dividing the file position by the chunk size. Each |
| 37 | chunk has a number of valid bytes, which equals the page size for all |
| 38 | except the last chunk in a file. |
| 39 | |
| 40 | File 'headers' are stored as the first page in a file, marked as a |
| 41 | different type to data pages. The same mechanism is used to store |
| 42 | directories, device files, links etc. The first page describes which |
| 43 | type of object it is. |
| 44 | |
| 45 | YAFFS2 never re-writes a page, because the spec of NAND chips does not |
| 46 | allow it. (YAFFS1 used to mark a block 'deleted' in the OOB). Deletion |
| 47 | is managed by moving deleted objects to the special, hidden 'unlinked' |
| 48 | directory. These records are preserved until all the pages containing |
| 49 | the object have been erased (We know when this happen by keeping a |
| 50 | count of chunks remaining on the system for each object - when it |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 51 | reaches zero the object really is gone). |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 52 | |
| 53 | When data in a file is overwritten, the relevant chunks are replaced |
| 54 | by writing new pages to flash containing the new data but the same |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 55 | tags. |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 56 | |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 57 | Pages are also marked with a short (2 bit) serial number that |
| 58 | increments each time the page at this position is incremented. The |
| 59 | reason for this is that if power loss/crash/other act of demonic |
| 60 | forces happens before the replaced page is marked as discarded, it is |
| 61 | possible to have two pages with the same tags. The serial number is |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 62 | used to arbitrate. |
| 63 | |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 64 | A block containing only discarded pages (termed a dirty block) is an |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 65 | obvious candidate for garbage collection. Otherwise valid pages can be |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 66 | copied off a block thus rendering the whole block discarded and ready |
| 67 | for garbage collection. |
| 68 | |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 69 | In theory you don't need to hold the file structure in RAM... you |
| 70 | could just scan the whole flash looking for pages when you need them. |
| 71 | In practice though you'd want better file access times than that! The |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 72 | mechanism proposed here is to have a list of __u16 page addresses |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 73 | associated with each file. Since there are 2^18 pages in a 128MB NAND, |
| 74 | a __u16 is insufficient to uniquely identify a page but is does |
| 75 | identify a group of 4 pages - a small enough region to search |
| 76 | exhaustively. This mechanism is clearly expandable to larger NAND |
| 77 | devices - within reason. The RAM overhead with this approach is approx |
| 78 | 2 bytes per page - 512kB of RAM for a whole 128MB NAND. |
| 79 | |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 80 | Boot-time scanning to build the file structure lists only requires |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 81 | one pass reading NAND. If proper shutdowns happen the current RAM |
| 82 | summary of the filesystem status is saved to flash, called |
| 83 | 'checkpointing'. This saves re-scanning the flash on startup, and gives |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 84 | huge boot/mount time savings. |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 85 | |
| 86 | YAFFS regenerates its state by 'replaying the tape' - i.e. by |
| 87 | scanning the chunks in their allocation order (i.e. block sequence ID |
| 88 | order), which is usually different form the media block order. Each |
| 89 | block is still only read once - starting from the end of the media and |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 90 | working back. |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 91 | |
| 92 | YAFFS tags in YAFFS1 mode: |
| 93 | |
| 94 | 18-bit Object ID (2^18 files, i.e. > 260,000 files). File id 0- is not |
| 95 | valid and indicates a deleted page. File od 0x3ffff is also not valid. |
| 96 | Synonymous with inode. |
| 97 | 2-bit serial number |
| 98 | 20-bit Chunk ID within file. Limit of 2^20 chunks/pages per file (i.e. |
| 99 | > 500MB max file size). Chunk ID 0 is the file header for the file. |
| 100 | 10-bit counter of the number of bytes used in the page. |
| 101 | 12 bit ECC on tags |
| 102 | |
| 103 | YAFFS tags in YAFFS2 mode: |
| 104 | 4 bytes 32-bit chunk ID |
| 105 | 4 bytes 32-bit object ID |
| 106 | 2 bytes Number of data bytes in this chunk |
| 107 | 4 bytes Sequence number for this block |
| 108 | 3 bytes ECC on tags |
| 109 | 12 bytes ECC on data (3 bytes per 256 bytes of data) |
| 110 | |
| 111 | |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 112 | Page allocation and garbage collection |
| 113 | |
| 114 | Pages are allocated sequentially from the currently selected block. |
| 115 | When all the pages in the block are filled, another clean block is |
| 116 | selected for allocation. At least two or three clean blocks are |
| 117 | reserved for garbage collection purposes. If there are insufficient |
| 118 | clean blocks available, then a dirty block ( ie one containing only |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 119 | discarded pages) is erased to free it up as a clean block. If no dirty |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 120 | blocks are available, then the dirtiest block is selected for garbage |
| 121 | collection. |
| 122 | |
| 123 | Garbage collection is performed by copying the valid data pages into |
| 124 | new data pages thus rendering all the pages in this block dirty and |
| 125 | freeing it up for erasure. I also like the idea of selecting a block |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 126 | at random some small percentage of the time - thus reducing the chance |
| 127 | of wear differences. |
| 128 | |
| 129 | YAFFS is single-threaded. Garbage-collection is done as a parasitic |
| 130 | task of writing data. So each time some data is written, a bit of |
| 131 | pending garbage collection is done. More pages are garbage-collected |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 132 | when free space is tight. |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 133 | |
| 134 | |
| 135 | Flash writing |
| 136 | |
| 137 | YAFFS only ever writes each page once, complying with the requirements |
| 138 | of the most restricitve NAND devices. |
| 139 | |
| 140 | Wear levelling |
| 141 | |
| 142 | This comes as a side-effect of the block-allocation strategy. Data is |
| 143 | always written on the next free block, so they are all used equally. |
| 144 | Blocks containing data that is written but never erased will not get |
| 145 | back into the free list, so wear is levelled over only blocks which |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 146 | are free or become free, not blocks which never change. |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 147 | |
| 148 | |
| 149 | |
| 150 | Some helpful info |
| 151 | ----------------- |
| 152 | |
| 153 | Formatting a YAFFS device is simply done by erasing it. |
| 154 | |
| 155 | Making an initial filesystem can be tricky because YAFFS uses the OOB |
| 156 | and thus the bytes that get written depend on the YAFFS data (tags), |
| 157 | and the ECC bytes and bad block markers which are dictated by the |
| 158 | hardware and/or the MTD subsystem. The data layout also depends on the |
| 159 | device page size (512b or 2K). Because YAFFS is only responsible for |
| 160 | some of the OOB data, generating a filesystem offline requires |
| 161 | detailed knowledge of what the other parts (MTD and NAND |
| 162 | driver/hardware) are going to do. |
| 163 | |
| 164 | To make a YAFFS filesystem you have 3 options: |
| 165 | |
| 166 | 1) Boot the system with an empty NAND device mounted as YAFFS and copy |
| 167 | stuff on. |
| 168 | |
| 169 | 2) Make a filesystem image offline, then boot the system and use |
| 170 | MTDutils to write an image to flash. |
| 171 | |
| 172 | 3) Make a filesystem image offline and use some tool like a bootloader to |
| 173 | write it to flash. |
| 174 | |
| 175 | Option 1 avoids a lot of issues because all the parts |
| 176 | (YAFFS/MTD/hardware) all take care of their own bits and (if you have |
| 177 | put things together properly) it will 'just work'. YAFFS just needs to |
| 178 | know how many bytes of the OOB it can use. However sometimes it is not |
| 179 | practical. |
| 180 | |
| 181 | Option 2 lets MTD/hardware take care of the ECC so the filesystem |
| 182 | image just had to know which bytes to use for YAFFS Tags. |
| 183 | |
| 184 | Option 3 is hardest as the image creator needs to know exactly what |
| 185 | ECC bytes, endianness and algorithm to use as well as which bytes are |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 186 | available to YAFFS. |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 187 | |
| 188 | mkyaffs2image creates an image suitable for option 3 for the |
| 189 | particular case of yaffs2 on 2K page NAND with default MTD layout. |
| 190 | |
| 191 | mkyaffsimage creates an equivalent image for 512b page NAND (i.e. |
| 192 | yaffs1 format). |
| 193 | |
| 194 | Bootloaders |
| 195 | ----------- |
| 196 | |
| 197 | A bootloader using YAFFS needs to know how MTD is laying out the OOB |
Wolfgang Denk | 4b07080 | 2008-08-14 14:41:06 +0200 | [diff] [blame] | 198 | so that it can skip bad blocks. |
William Juul | 0e8cc8b | 2007-11-15 11:13:05 +0100 | [diff] [blame] | 199 | |
| 200 | YAFFS Tracing |
| 201 | ------------- |