Linux File System Stack – 2
A Linux file system is expected to handle (endure?) two types of data structure species, Dentries & Inodes, they are indeed defining characteristic of a file system running inside the Linux kernel. For example a path “/sludge/mastodon” contains three elements, “/” , “sludge” & “mastodon”, so each of them will have its own own personal dentry and inode. Among a lot of other stuff a dentry encapsulates the name, a pointer to the parent dentry and also a pointer to the corresponding inode.
What happens when we type “cd /sludge/mastodon”?
Setting the current working directory involves pointing the process “task_struct” to the dentry associated with “mastodon”, now how can we locate that dentry? We can imagine the following steps.
- “/” at the beginning of the string indicates root
- Now the root dentry is furnished during the file system mount so the VFS has a point where it can start its search for a file or a directory.
- A file system module is expected to have the capability to search for a child when the parent dentry is provided to it, so now VFS will request the dentry for “sludge” by providing its parent dentry (root).
- Now it up to the file system module to find the child entry using the parent dentry (*Parent Dentry also has a pointer to its own inode which should hold the key?*).
The above steps will be repeated but this time the parent will be “sludge” and “mastodon” will be the child, eventually VFS will have a list of the Dentries in a path.
Linux is geared to run on sluggish hard disks buttressed with gargantuan DRAM memories. This might mean that there is this ocean of Dentries and Inodes cached in RAM & whenever a cache miss is encountered VFS tries to access it using the above steps by calling the file system module specific “look_up” function.
Fundamentally a file system module is only expected to work on top of inodes, Linux will request operations like creation and deletion of inodes, look up of inodes, linking of inodes and allocation of storage blocks for inodes.
Parsing of paths, control cache management are all abstracted in kernel as part of VFS and buffer management as part of block driver framework.
Lets try to look at how a simple operation like file read and write works!
Writing a new file:
- From user space pass the buffer to be written using the “write” system call.
- VFS allocates a kernel page and associates that with the write offset in the “address_space” of that inode (Yes, each inode has its own address_space!!).
- Every write needs to eventually end up in the storage device so the new page in the cache (in RAM) needs to be mapped to a block in the storage device, for this VFS calls the “get_block” interface of the the file system module and sets the mapping. Now we are all set!
- A copy_from_user_space routine moves the contents into that page and marks it as dirty
- Finally returns to the application.
Overwriting contents of a file differ in two aspects, one is that the offset being written to might already have a page allocated in the cache and the other is that it should be already mapped to a block in the storage so its just a matter of memcpy from user space to kernel space buffer. All the dirty pages are written when the kernel flusher threads kick in and at this point the storage mapping established will help the kernel identify to which storage block the page must go.
Reading a new file follows the similar steps but its just that the contents needs to be read from the device into the page and then into the user space buffer. If an updated page is encountered in the page cache then the device read is avoided and hence the reads will be faster.