4. File systems.pdf - File Systems INF 551 Wensheng Wu 1 Roadmap Files and directories CRUD operations How to implement them Data structures Access

4. File systems.pdf - File Systems INF 551 Wensheng Wu 1...

This preview shows page 1 out of 60 pages.

You've reached the end of your free preview.

Want to read all 60 pages?

Unformatted text preview: File Systems INF 551 Wensheng Wu 1 Roadmap • Files and directories – CRUD operations • How to implement them – Data structures – Access methods 2 Files and directories • File: stored in blocks on storage device – Has user defined name: hello.txt – & low-level name, e.g., inode number: 410689 • Files are organized into directories (folders) – each may have a list of files and/or subdirectories – That is, directories can be nested 3 Example Root directory Empty directory 4 Operations on files • Create • Read • Update • Delete 5 Create • User interface, e.g., via GUI • Implementation, e.g., via a C program with a call to system function open() |: Bitwise OR operator • int fd = open("foo", O_CREAT | O_WRONLY | O_TRUNC); – – – – Open with flags indicating the specifics O_CREATE: create a file O_WRONLY: write only O_TRUNC: remove existing contents if exits 6 File descriptor • Note open() return file descriptor – Typically an integer – Reserved fds: stdin 0, stdout, 1, stderr 2 ??? 7 Read • read(fd, buffer, size) size不不能⼤大于buffer – Read from file "fd" <size> number of bytes – And store them in buffer the offset of read • Read starts from the current offset of fd – Initially 0 8 Write • write(fd, buffer, size) – Write to file fd <size> number of bytes stored in buffer – Also start writing from the current offset 每⼀一个已打开的⽂文件都有⼀一个读写位置,当打开⽂文件时通常其读写位置是指向⽂文件开头,若是以附加的⽅方式打开⽂文件(如 O_APPEND),则读写位置会指向⽂文件尾。当read()或write()时,读写位置会随之增加,lseek()便便是⽤用来控制该⽂文件的读 写位置。参数fildes 为已打开的⽂文件描述词,参数offset 为根据参数whence来移动读写位置的位移数。 Offset:偏移量量,每⼀一读写操作所需要移动的距离,单位是字节的数量量,可正可负(向前移,向后移)。 参数 whence为下列列其中⼀一种:(SEEK_SET,SEEK_CUR和SEEK_END和依次为0,1和2). SEEK_SET 将读写位置指向⽂文件头后再增加offset个位移量量。 SEEK_CUR 以⽬目前的读写位置往后增加offset个位移量量。 SEEK_END 将读写位置指向⽂文件尾后再增加offset个位移量量。 当whence 值为SEEK_CUR 或SEEK_END时,参数offet允许负值的出现。 9 Random read and write 随机从某⼀一位置开始读写 • off_t lseek(int fd, off_t offset, int whence) – If whence is SEEK_SET, the offset is set to <offset> bytes from the beginning of file – If whence is SEEK_CUR, the offset is set to its current location plus <offset> bytes – If whence is SEEK_END, the offset is set to the size of the file plus <offset> bytes (often offset is negative, e.g., -8 for 8 bytes from the end) • whence: from where 1) 欲将读写位置移到⽂文件开头时: lseek(int fildes,0,SEEK_SET); 2) 欲将读写位置移到⽂文件尾时: lseek(int fildes,0,SEEK_END); 3) 想要取得⽬目前⽂文件位置时: lseek(int fildes,0,SEEK_CUR); 10 Copy a file "0" starts an octal number => permissions: 110 (owner) rw100 (group) r-100 (others) r-x:可执⾏行行是什什么意思? Pointer to a character array 11 我们将rwx看成⼆二进制数,如果有则有1表示,没有则有0表示,那么rwx r-x r- -则可以表示成为: 111 101 100 再将其每三位转换成为⼀一个⼗十进制数,就是754。 例例如,我们想让a.txt这个⽂文件的权限为: ⾃自⼰己 同组⽤用户 其他⽤用户 可读 是 是 是 可写 是 是 可执⾏行行 那么,我们先根据上表得到权限串串为:rw-rw-r--,那么转换成⼆二进制数就是110 110 100,再每三位转换成为⼀一个⼗十进制数,就得到 664,因此我 们执⾏行行命令: chmod 664 a.txt File permission mode rw-r--r– => 110 (owner permission) 100 (group) 100 (others) 12 Resources for system calls • • open: (system_call ) • read: (system_call) • write: (system_call ) • close: (system_call) 13 Resources for system calls • man –S 2 read – Find it in the Section 2 of the manual 14 Install gcc on EC2 • sudo yum groupinstall "Development Tools" – Will install other dev. tools too – E.g., perl, bison, flex, automake, autoconf • Usage: – gcc -o copy2 copy2.c 15 File and directory • When creating a file – Bookkeeping data structure (inode) created: recording size of file, location of its blocks, etc. NameNode – Linking a human-readable name to the file – Putting the link in a directory 16 Info about file (stored in inode) struct stat { }; dev_t st_dev; /* ID of device containing file */ ino_t st_ino; /* inode number */ inode may be random mode_t st_mode; /* protection */ stat a.txt nlink_t st_nlink; /* number of (hard) links */ to see the metadata uid_t st_uid; /* user ID of owner */ gid_t st_gid; /* group ID of owner */ dev_t st_rdev; /* device ID (if special file) */ off_t st_size; /* total size, in bytes */ blksize_t st_blksize; /* blocksize for filesystem I/O */ blkcnt_t st_blocks; /* number of blocks allocated */ time_t st_atime; /* last time file content was examined */ time_t st_mtime; /* last time file content was changed */ time_t st_ctime; /* last time inode was changed */ 17 inode • Stores metadata/attributes about the file • Also stores locations of blocks holding the content of the file 18 Example • a.txt abc def abc def abc def copy出的file的#inode不不⼀一样 ln command 的file有⼀一样的inode,link=2 Device id Access permission Block size # of blocks allocated Inode # User id Group id # of (hard) links 19 Working with directories • Create: mkdir() system call – Used to implement command, e.g., mkdir xyz • Read: opendir(), readdir(), closedir() – ls xyz • Delete: rmdir() 20 Roadmap • Files and directories – CRUD operations • Implementation – Data structures: how to organize the blocks – Access methods: map system calls to operations on data structures 21 Organization of blocks • Array-based – Disk consists of a list of blocks – We will assume this • Tree-based, e.g., SGI XFS – Blocks are organized into variable-length extents – Use B+-tree to quickly find free extents 22 Blocks • Consider a disk with 64 blocks – 4KB/block – 512B/sector (we assume this in this lecture) • So there are 212/29 = 23 = 8 sectors/block – Capacity of disk = 64 * 4KB = 256KB 23 Data region • 56 blocks used to store data (files) – Blocks # 8 – 63 24 Metadata • For each file, file system keeps track of – Location of the blocks that comprise the file – Size of the file – Owner and access rights – Access and modify times – Etc. (see the stat struct a couple of slides back…) • These metadata are stored in an inode (index node) 25 inodes • Index nodes • Stored in blocks #3 -- #7 (i.e., 5 blocks) • Together they are called inode table move the file: remember to copy the superblock mont command???? bitmap: the bit vector 1101…1 56 bits = 7 bytes 26 How many inodes are there? • 256 bytes/inode • 5 blocks, 4KB/block 5 * 4KB / 256B = 80 inodes 4KB / 256B = 16 inodes in each block => 16 inodes/block (4K/256 = 212/28 ) => 5 blocks, 5 * 16 = 80 inodes => File system can store at most 80 files ⼀一个inode只能存⼀一个file的metadata 所以即使我们的某个block group 有空的block,但是没有inode,那 也是不不能新建⽂文件的 27 Free space management using bitmaps • Bitmap: a vector of bits – 0 for free (inode/block), 1 for in-use • Inode bitmap (imap) – keep track of which inodes in the inode table are when you create the file, the inode number is assigned randomly. available • Data bitmap (dmap) – Keep track of which blocks in data region are available 28 Bitmaps • Each bitmap is stored in a block 80 bits = 10 bytes 320K??? – Block "i": keep track of 80 inodes (could track 32K) – Block "d": keep track of the 56 data blocks 29 Superblock • Track where i/d blocks and inode table are – E.g., inode table starts at block 3; there are 80 inodes and 56 data blocks, etc. • Indicate type of file system • Will be read first when file system is mounted 30 inumber • Each inode is identified by a number – Low-level number of file name • Can figure out location of inode from inumber the bit map represents if the ith block contain files: 1 means yes, 0 : no 80 inodes 80bits = 10 bytes 56datanodes 56bits = 7 bytes 1 means the block is taken??? 1 block #16 inodes : 1sector #2 inodes 31 inumber => location • inumber = 32 => address: offset in bytes from the beginning => which sector? (12KB + inumber * 256) / 512B(sector size) 从superblock开始的字节数 a block has 16 inodes, 2 inodes occupies a sector, each block has 8 sectors. 32 inumber => location of inode • Address: 12K + 32 * 256 = 20K • Sector #: 20K/512 = 40 – more generally – (inodeStartAddress + inumber ∗ inode size)/sector size 33 inode => location of data blocks • A number of direct pointers – E.g., 8 pointers, each points to a data block – Enough for 8*4K = 32K size of file ⼀一个指针指向1个block,所以数据能有多⼤大:#指针 * size of block • Also has a slot for indirect pointer 简洁指针指向⼀一个指针数据块,指 针数据块存哪⾥里里? – Pointing to a data block storing direct pointers – Assume 4 bytes for block address (e.g., represented in CHS), so 1024 pointers/block – Now file can have (8 + 1024) blocks or 4,128KB 34 Multi-level index • Pointers may be organized into multiple levels – Indirect pointer (as in previous slide) • Inode (pointer1, pointer2, …, indirect pointer) • Indirect pointer -> a block of direct pointers – Double indirect pointers • Inode (pointer1, pointer2, …, indirect pointer) • Indirect pointer -> a block of indirect pointers instead -> each points to a block of direct pointers – Triple indirect pointers • Indirect pointer -> a block of indirect pointers -> each points to a block of indirect pointers -> each points to a block of direct pointers 35 Double Indirect Pointers Block of indirect pointers point to 4KB place inode Indirect pointer direct pointers Indirect pointer Block of direct pointers 36 Advantages of multi-level index • Grow to more levels as needed • Direct pointers handle most of the cases – Many files are small 直接指针:编号为1-12,每⼀一个都指向⼀一个数据块,block块的⼤大⼩小为4k,最多可以存放12*4k=48k的⽂文件,如 果⽂文件在48k之内,就可以使⽤用直接指针找到数据块所在地。 间接指针:指针不不是直接指向数据,⽽而是指向⼀一个指针数据块,这个指针数据块⼤大⼩小也是4k,但是指针数据块存放 的不不是数据⽽而是地址,假设4k的指针数据块⾥里里⾯面也要存放指针存放为4Byte,⼀一个指针可以存放4字节,所以可以存 放1024个指针,1024个指针可以指向1024个数据块,所以能指向的数据⼤大⼩小为 1024*4K=4096K=4M。 双重间接指针:所谓双重间接指针,就是其在指向数据的过程中经过了了两个指针数据块,根据间接指针的论述, 我们不不难得出双重间接指针所能够指向的数据⼤大⼩小为 1024*1024*4K=4G 三重间接指针:所谓双重间接指针,就是其在指向数据的过程中经过了了三个指针数据块,根据间接指针的论述, 我们不不难得出双重间接指针所能够指向的数据⼤大⼩小为 1024*1024*1024*4K=4T 37 Directory organization • Directory itself stored as a file • For each file in the directory, it stores: – name, inumber, record length, string length cd .. parent directory cd ~ root directory cd . current directory Actual length 38 Record length vs string length • String length = # of characters in file name + 1 (for \0: end of string) • Record length >= string length – Due to entry reuse 39 Reusing directory entries 在 Linux 中,元数据中的 inode 号(inode 是⽂文件元数据的⼀一部分但其并不不包含⽂文件名,inode 号即索引 节点号)才是⽂文件的唯⼀一标识⽽而⾮非⽂文件 • If file is deleted (using rm command) or a name is unlinked (using unlink command) – File is finally deleted when its last (hard) link is removed index.html • Then inumber in its directory entry set to 0 (reserved for empty entry) – So we know it can be reused 40 Storing a directory • Also as a file with its own inode + data block • inode: – file type: directory (instead of regular file) – pointer to block(s) in data region storing directory entries 41 Roadmap • Files and directories – CRUD operations • Implementation – Data structures: how to organize blocks, e.g., into array/tree – Access methods: turn system calls to operations on data structures 42 Open for read from the inodes • fd = open("/foo/bar", O_RDONLY) integer fd : cin (C++) 0 cout 1 cerr 2 43 Open for read • fd = open("/foo/bar", O_RDONLY) – Need to locate inode of the file "/foo/bar" – Assume inumber of root, say 2, is known (e.g., when the file system is mounted) 44 Open for read 1. Read inode and content of / (2 reads) – Look for "foo" in / -> foo's inumber 2. Read inode and content of /foo (2 reads) – Look for "bar" in /foo -> bar's inumber 3. Read inode of /foo/bar (1 read) – Permission check + allocate file descriptor 45 Cost of open() scale: millisecond root inumber -> root’s inode (R) -> root’s directory content (R) -> foo’s inmuber -> foo’s inode (R) -> foo’s content (R) -> bar’s inumber (entry for foo / or report error) -> bar’s inode (R) (only for read) need to track the permission • Need 5 reads of inode/data block ⽂文件夹的data是存的directory files? 46 Reading the file • read(fd, buffer, size) – Note fd is maintained in per-process open-file table – Table translates fd -> inumber of file return the actual number of bytes 47 File-open table per process Use inumber to track inode File descriptor File name Inumber Position offset 3 /foo/bar 32382 0 4 /foo/more 48482 512 … return 4KB … 48 Reading the file • read(fd, buffer, size) 1. 2. 3. 4. 5. 6. Consult bar's inode to locate its 1st block Read the block Update inode with newest file access time Update open-file table with new offset Continue steps 2, 3, 4 until done Deallocate file descriptor 读完第⼀一个block读下⼀一个block 49 1、访问时间,读⼀一次这个⽂文件的内容,这个时间就会更更新。⽐比如对这个⽂文件运⽤用 more、cat 等命令。ls、stat命令都不不会修改⽂文件的访问时间。 2、修改时间,修改时间是⽂文件内容最后⼀一次被修改时间。⽐比如:vi后保存⽂文件。ls -l列列出的 时间就是这个时间。 3、状态改动时间。是该⽂文件的i节点最后⼀一次被修改的时间,通过chmod、chown命令修改 ⼀一次⽂文件属性,这个时间就会更更新 Cost for reading a block • 3 I/O's: Access time : read Modification time: write Change time: status change In this case, the access time changes and is writtern in inode. – read inode, read data block, write inode 2 50 Open for write • int fd = open("/foo/bar", O_WRONLY) – Assume bar is a new file under foo – (note the difference from reading chapter!) ensure we have space for new inode. 51 Open for write • int fd = open("/foo/bar", O_WRONLY) 1. Read '/' inode & content 2 read – obtain foo's inumber 2. Read '/foo' inode & content 2 read – check if bar exists 52 Open for write 3. Read imap, to find a free inode for bar 4. Update imap, setting 1 for allocated inode 5. Write bar's inode 53 Open for write 6. Update foo's content block – Adding an entry for bar 7. Update foo's inode – Update its modification time 54 Cost for "open for write" • int fd = open("/foo/bar", O_WRONLY) • Need 9 I/O's inode root foo bar data bitmap bitmap inode inode inode read root foo data data bar bar bar data[0] data[1] data[2] read read create() read read write change the content of imap write writing a new inode for bar because it’s a new file write write update the content of foo: inumber and name (new entry) for bar change the content of foo (need to change the modification time) 55 Writing the file: /foo/bar 1. Read inode of bar (by looking up its inumber in the file-open table) 2. Allocate new data block – Read and write bmap 3. Write to data block of bar 4. Update bar inode – new modified time, add pointer to block 56 Cost of writing /foo/bar • 5 I/O's for write a block data inode root foo bar bitmap bitmap inode inode inode read root foo data data bar bar bar data[0] data[1] data[2] read read create() read read write write write write read read write() write write write 57 Caching for read • First read may be slow – But subsequent ones will speed up • Good idea to cache popular blocks – e.g., determined via LRU strategy Least recently used,最近最少使⽤用 算法根据数据的历史访问记录 来进⾏行行淘汰数据,其核⼼心思想是“如果数据最近被访问过,那么将来 被访问的⼏几率也更更⾼高”。 58 Buffering for delayed write • Improve write performance via: – Batching (e.g., two updates to the same imap) – Scheduling (reordering for better performance) In the HD? many files write in different sectors – Avoiding writes (if file created, then quickly deleted) • Problem: update may be lost when system crashes 59 Example file systems • NTFS – New technology file system, Microsoft proprietary • FAT – File allocation table – FAT 16, 32, … 32bits – 32 bits = # of sectors a file can occupy 512B/sector => 2TB limit on file size 4KB/sector => 16TB limit • Ext4 linux – fourth extended file system, common in Linux 60 ...
View Full Document

  • Fall '14

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes