A Logical Filesystem

Jeff Bone jbone at deepfile.com
Fri Jul 25 11:40:36 PDT 2003


	

The next-gen filesystem cluetrain gathers steam...  This generated some 
buzz at USENIX, apparently:

	http://www.usenix.org/events/usenix03/tech/padioleau.html

----

Write-up from LtU:

	http://lambda.weblogs.com/discuss/msgReader$7919#7921

Ehud Lamm - Re: USENIX2003: A Logic File System  blueArrow
7/24/03; 1:05:15 AM (reads: 511, responses: 0)
A Logic File System. Yoann Padioleau and Olivier Ridoux, IRISA / 
University of Rennes. General Track, pp. 99-112 of the Proceedings 
http://www.usenix.org/events/usenix03/tech/padioleau.html

This is the most interesting, insightful, and innovative talk at this 
conference.

The goal of the talk is to combine the navigation and querying in a 
file system. Roughly speaking, the talk attempted to bring filesystems 
from the age of hierarchical and network databases into the age of 
relational -- and, moreover, deductive databases -- and yet maintain 
the familiar cd/ls/mkdir/mv interface, file paths and shell globs.

The current filesystems are built around the hierarchical model (links, 
if present, are only uni-directional. Therefore filesystems do not 
actually follow the network database model). Unfortunately, a hierarchy 
is a tree rather than a lattice: a hierarchy decomposes the system 
around one aspect disregarding the others. For example, the typical 
layout of the filesystem emphasizes the scope: /, /usr, /usr/local, 
/usr/local/share/ghostscript/7.05. Other aspects, e.g., library code, 
are smeared all over the place: /lib, /usr/lib, /usr/local/lib and 
/usr/local/share/ghostscript/7.05/lib. Locating files that are local to 
an installation is easy (/usr/local), locating all the library files is 
hard. Many interesting queries are difficult to execute or even to 
formulate. For example, to list all directories that contain only empty 
subdirectories, to "cd" to a directory with files foo.h or foo.c 
(disjunctive query), or to list all the files *.h or *.c in the given 
tree that are not in (sub)directories named Attic. To answer such 
queries, we have to write quite complex logical formulas, made more 
complex by the fact that me must keep the navigational structure in 
mind.

Many operating systems have tools that impose a relational layer on the 
file system -- glimpse, locate, find -- and offer a keyword-based 
search. However these tools often return a great number of results, 
which are not grouped in any particular way. For example, "locate 
sendmail.cf" may return a couple of file names. But "locate 
!sendmail.cf" (where ! stands for negation), if it were allowed, would 
list almost any file on the system. Clearly the user will find such a 
result unmanageable. The user would much prefer if the output were 
automatically grouped according to the most general and informative 
categories: the most general keywords that _will_ refine the search. 
Furthermore, the user should be able to navigate such a virtual 
directory of results using the familiar 'cd' and 'ls' commands, or 
reclassify a file with a 'mv' command. Finally, if the user adds or 
deletes a file, the virtual file tree should be updated automatically.

We would like to "cd /lib" and descend into the "directory" that 
contains only library files, grouped into subdirectories "root", "usr", 
"local". We would like to "cd /bin|/sbin" and get to the root of the 
"filesystem tree" that contains only executable files (again, grouped 
into "subdirectories"). We may even do "cd !src" to get the view of the 
filesystem without any source files. The Logical File System described 
in this talk (LISFS for short, not to be confused with the Log File 
system) can accomplish all that -- now. The prototype implementation is 
fully functional and reasonably fast. An ls-benchmark shows hardly any 
performance overhead of far more expressive filesystem queries.

The logical filesystem LISFS takes a relational view: files are 
entities that are described by their properties (attributes). To locate 
a file, we execute a query. LISFS lets a user compose a query 
incrementally: "cd /query" executes the query and makes the 
corresponding collection current. "cd query1" refines the original 
query. Incidentally, this means that "cd a/b" and "cd b/a" are the 
same: the path separator stands for a conjunction, and therefore, is 
commutative. The closed-world assumption makes negation queries 
possible (e.g., "cd !/lib"). LISFS goes farther than the naive 
relational model. A user may create taxonomies of attribute values. For 
example, the user may create a category "lib" with sub-categories 
"shared-lib" and "static-lib" -- using the mkdir command. LISFS will 
infer that anything classified as a shared library is also a library. 
The inference rules can be more complex: we can define a category 
"dual-lib" for project's libraries that can be linked both statically 
and dynamically. When LISFS reifies the result of a query as a 
filesystem directory, the most general categories that partition the 
result become the names of subdirectories. LISFS thus lets us navigate 
the inference tree. The complex inference requires a logical engine -- 
and indeed the current version of LISFS includes a Prolog interpreter, 
as a part of the filesystem code! Truly a file path is a logical 
formula.

Like the semantic file system, LISFS has intrinsic attributes, whose 
values are derived from the content of a file (e.g., file's size or the 
resolution of a JPEG image). Unlike the semantic file system, a user 
can define his own attributes and organize them into arbitrary 
taxonomies.

The paper describes the algorithms, the implementation of directory 
operations, and the obvious optimizations (e.g., caching).

LISFS is implemented truly at the low, file system level, as a 
user-level NFS daemon. Therefore, none of the applications need to be 
modified in any way to take advantage of LISFS. For example, the 
wildcard expansion in a shell (glob) "just works".

LISFS is written with the help of a PerlFS toolkit. However, Yoann told 
me that the implementation language was not truly Perl. Yoann designed 
his own language with types and an ability to express post-conditions 
and invariants. That source language was then mechanically translated 
into Perl. Perhaps Yoann might wish to supply more details?

The only filesystem that comes close to LISFS in terms of integration 
of query and navigation was the file system of Apple Newton (so-called 
'soup'). But NewtonScript never had any logical inference.

In the future research, Yoann plans to implement views of a file. We 
could then edit selected sections from a TeX document as if they were a 
separate file (Emacs' all-mode is a glimpse of that functionality). It 
seems that LISFS can be meaningfully integrated with Semantic Web.

Yoann was asked a question if a common person (without the knowledge of 
logic, etc) would be able to meaningfully navigate LISFS. Yoann gave an 
excellent answer: a common person can use Google. LISFS is not much 
different.

The prototype and more information are available at 
http://www.irisa.fr/LIS

Cimarron Taylor - Re: USENIX2003: A Logic File System  blueArrow
7/25/03; 1:01:14 AM (reads: 119, responses: 0)



More information about the FoRK mailing list