- Thomas' Webjar full of joy -
| Author: | Thomas Langewouters |
|---|---|
| tags: | software |
| description: | A distributed offline file-keeping system |
Caution!
This is mostly a draft, when linking keep in mind that this document changes a LOT, there is presently no Changelog or version info available. Not much of the things described here are implemented at this time. But things are looking good, the design is almost complete.
From the moment you have multiple (portable) PC's, you find your files scattered all over PC's, disks and maybe even locations.
I'm growing tired of having to power on machines to check if some of my stuff is on them. And, I can't just pull one of my laptops out of the closet, sync it up quickly and later pick up my work on it, as I left it on my desktop.
I looked into distributed filesystems such as the andrew- and coda filesystem, but found these to be unsuitable. Novell iFolder comes closer, but iFolder isn't easy to setup.
Over time I shaped my own vision on how some of these things could be implemented, and I named it litter.
Features:
Advantages:
Organising your files over multiple ltr volumes will bring advantages. Here are some examples of volumes I will be using:
Litter does not seek to replace version control tools, it solves a distinctly different problem. Integration with a version control system might be welcome depending on the kind of data is stored in a volume. Ltr is designed with this in mind.
I keep a copy of my audio on my iPod running Rockbox. Since this removable storage device is to be accessed directly by the Rockbox OS, a more cruder approach is used.
(FIXME: this piece is missing here)
Flash-based players with less storage would profit from ltr's partial volumes capability.
| Volume: | (litter volume) a folder containing files. Volumes are distributed (mirrored) over a pool. |
|---|---|
| Peers: | Two or more computers that keep the same volume. |
| Pool: | A group of volume peers. |
| Slicekeeper: | A host that has a subset of a volume's files(a slice) on disk. |
| ghost: | A file not kept on local disk. |
| Store: | Contains a snapshot of the volume's files (not necessary all files, but at any time all digests) |
| Digest: | A file tree mirroring the store, but its files contain hash+filesize. |
| Track: | A list of changes made to the store by a commit. |
There were two approaches I considered. The first one uses a combination of unionfs(aufs) and a simple FUSE filesystem. The second one uses a complex FUSE process to expose the volume to the user. There are some serious limitations to the first design, so it was opted to use a full FUSE mount.
Advantages/disadvantages of first versus second design:
Components:
Note
If a filename is postfixed with with /, the next indented block refers to folder contents. If a filename is postfixed with :: the next indented block refers to ASCII file contents.
/usr/ltr/volumename
`-- mounts/
| `-- :home:thomas:myvolume ::
| | |inbox0 rw
| | |*inbox1 ro
| | |store O
| | |peer0 sshfs#whirlpool:/usr/ltr/myvolume
| `-- :export:samba:myvolume ::
| |inbox2 rw
| |inbox1 ro
| |store A1
`-- store/
| `-- data/
| `-- meta/
| `-- local/
| `-- tracks/
| `-- HEAD::
| `-- hooks/
`-- inbox0/
| `-- data/
| `-- meta/
`-- peer0/
`-- store/
`-- inbox0/
^ | `-- store/ | `-- data/ | | (volume contents) | `-- meta/ | | (digest tree) | `-- local/ | | (excluded files) | `-- trash/ | | (rm's get moved here if keeptrash enabled) | `-- excluded:: | | |.zshrc | | |.bash* | | |.gnome2/ | | |.gconfd/ | | |.local/share/Trash/ | | |.cache/ | | | `-- SUMS:: | | |(sorted list of checksums) | `-- tracks/ | `-- HEAD:: | | O/A1/A1B2 | `-- hooks/ | | `--- pre-commit | | `--- post-commit | | | date1rand.sh | | `-- anotherhostname:: | | date0rand.sh | | date2rand.sh | `-- behaviour:: | | |slicekeeper false | | |keeptrash true | | |keeneye true | | |badfs true | | |directa true | `-- LOCK:: | (lockfile to prevent concurent access) \/
Data is kept in the data/ and local/ directories. The volume's directory tree is replicated there to the extend necessary to keep the files at the same relative path as in the volume.
The excluded file contains a plain-text list of which paths should at all times be redirected to the ``local/`` overlay.
hooks/ is a place to put scripts, e.g. you can put a script there that runs at post-commit and checks the new snapshot into a VCS.
The validity of the lockfile can for instance be ensured by updating its mtime at regular intervals. No change during for n-intervals indicates a stale file.
tracks/
`-- O/
`-- diff::
`-- trail::
`-- A1/
`-- diff::
`-- trail::
`-- A1B1
`-- diff::
`-- trail::
The diff file contains a file list (one per line) with file checksum and plus/minus indicator.
The trail is a list of volume instances based on this tree, if all instances are moved past a certain point, old trails (the lowest, most down to the root) can be removed, and the new lowest common tree can be used as the main track.
(O is origin, and its diff may be empty)
The data folder is used to hold new and modified file content. The volume's directory tree is replicated to the extend necessary to host files on the same path as those in the store relative to the volume root.
The modified data's metadata is also applied.
The contents of files in meta/ are never presented to the user. meta/ only contains four relevant kinds of information:
`-- peer0/
`-- store/
`-- inbox0/
Probably a network mount or a bindmount of a volume on a removable storage device.
On a slicekeeper host, you can pick which files to keep local in several ways (CLI,GUI).
Possible actions on each file:
Implementation:
Possible problems this can cause:
All hosts abandon a file:
We run out disk space on one of the hosts:
cli tool: ltr pick: +/-filename, - means read list from stdin.
If available, a networked remote host can be used to access files not cached locally. This feature will be mostly used in the case of archive volumes or slice-keeping of a volume with big files (like media) on it.
A store is mounted and locked before using files from it. Note that the file hash has to match the wanted file.
| `-- mount0.info:: | |(hostname) NFS/SSHFS `-- mount0/ | (store contents) | LOCK
partial commits e.g.: ltr commit podcasts/
ltr use volume@otherhost ltr <volume> sync <host> ltr status ltr freeze (spawns new inbox) ltr commit [-a] [filespec] [list on stdin] ltr [!]lo [not] localonly (hide/show ghost files)
Implemented by the inbox class.
The indexer should keep track of sha1 sums and produce a list of duplicates in a special file.
Committing switches the active inbox to ro, compose todo list. (keep list of open files!)
implemented by the volume class.
Conflicts:
This is complicated by the fact that file moves are recognised.
-> should really proof of concept this...
In this case, the store can be directly manipulated by other computers and the user (using the Rockbox file manager).
A technique has to be devised to track volume manipulations in an efficient manner, we could do SHA1 hashes of each file on every sync, but this takes a lot of time.
Therefore I suggest we consider the volume 'dirty'. When changes are to be applied to the volume we can use superficial scans on mtimes and existence of files. If the volume is to be used as a datasource by another ltr volume however, checksums must be matched for each file that is used.
The ltr command has to work in such a volume (e.g.; the presence of .ltrbin has to be checked in parent dirs.
When a direct access volume is found, it can be watched with Linux inotify, and a log can be kept of file operations. Care should be taken to exclude file operations by a ltr process (e.g.: sync-up with peer volume).
Suggested directory structure (incomplete):
ipod mountpoint | `---(other files) | `---audio/ | | (volume contents) | `-- .ltrbin/ | `-- HEAD: | `-- tracks/ | `-- manifest:: | | (which files were present) | `-- meta/ | (tree with sha1+size files)
IMAP mail servers use maildirs to store email. Each mail is saved as a plain text file. This got me thinking. Instead of using IMAP, I could just store my mail (in maildirs) on a separate litter volume, like this setup:
It is still possible to use your mail from any IMAP aware (hand-held possibly) application if you set up an IMAP service on a server and have it operate on a litter volume.
Write it in python. Use existing vcs extensions, like tortoisehg-dev/contrib/nautilus-thg.py as a starting point.
Features:
This got messed up due to changing directions to a FUSE based implementation, so I removed it temporarily.