Thomas makes, writes


PenguinPages Preprocessor

screenshot of old website

When I started maintaining a personal website years ago, I kept all content in static HTML files which I uploaded. The webhost didn't support server side scripting and for a while I used a JavaScript to provide a navigation menu.

The JavaScript menu fixed just that, navigation. Devising a publishing side scripting system allowed me to separate the website look (template) from the content too.

I created a set of PHP scripts I named Pepp to process the page content along with a navigation menu to static HTML files to upload to the webspace.

Pepp is a commandline tool, it uses the commandline (and CGI) build of PHP. In time I added handy features like photo album macro's. At the end of its lifespan, it made use of powerful technologies like XML, XSLT and PHP macro's.


The behaviour of Pepp was controlled by some files. Pepp needs information like a list of the pages the website sports, and the general layout of the page, and a menu structure to include.


The webpages.xml is a XML file that contains a database of each page on the site (except the photoalbums, who were generated by a function invocation from one of the pages, pictures.php).

It was an ordinary XML file with a <page> tag entry for each page. This element contained a few other elements like a <title> tag, path to the file on disk, and optionally tags for the html description and keywords meta tags.


The layout.xml file contains a <theme> entry for each template. The entry contains data like the CSS file to use, the name of the theme, and the string that gets appended to the filename (e.g. the 'index' page receives the name when it's processed with the 'Arctic' theme, because the 'filename-extension' parameter of that theme is set to '.ar'.

The commandline interface


Here's a list of the command line options of the program:

thomas@whirlpool www $ siteman --help
Siteman Version 0.7.0
by Thomas Langewouters
go              -   process all files
do filename     -   process filename
clean           -   clean html output folder
sync            -   synchronize website with hosting
--help          -   show this help message

I can also combine instructions e.g. 'siteman do index sync' for instance, will process the page 'index', and synchronize the local website with the hosting.

How does it work

When siteman gets invoked with the argument 'do pagename' it looks at which layouts are in layouts.xml and executes " pagename layoutname" for each layout. smexecutive on its part invokes the pageparser with two arguments, the page's entry from webpages.xml and the layout's entry from layouts.xml, both in the form of a simplexml object. The result pageparser returns is compared with the content of pagename(the layout's filename_extension).html and if it differs, it gets saved (this is done to avoid changing the timestamp of the file, so sitecopy will only upload the files whose content has changed)

When siteman gets invoked with the argument 'go' (process all pages), it looks which pages are in webpages.xml and does a "siteman go pagename" on each file.


Since the basic infrastructure was there, I made a few handy tools that were to be used to fancy up the webpages.

Code highlighting: GeSHi

For highlighting program code on pages I used GeSHi, this enabled syntax highlighted code listings to be included easily in the content.

Fileview extension

This extension made it possible to create a list of files in a folder. The primary target was a download page listing, however after a few hacks it was able to generate photoalbums, and put to use.

echo MakeFileList("downloads", configfolder . "filelist/styles/downloads.xml",
                   configfolder . "filelist/styles/downloads.xml","all");

As you see, MakeFileList needs four parameters:

  • the folder to make a filelist of
  • The stylefile to use (contains data about the layout of the list)
  • The themefile to use (contains a list of mimetypes with matching icons)
  • which files (e.g.; all, pictures, exe)

The stylefile was also a poor attempt at building a XML templating system for the filelist generator. It's possible to use inline PHP in the stylefile, this is the way the picalbum.xml works. The code to create thumbnails is included in that stylefile, it invokes the convert tool from ImageMagick.

The themefile is an XML file that specifies an icon filename for each mime-type.

Note and warning extension

This makes it easy to insert a box for notes on the pages.

Smiley extension

A PHP directive made it possible to insert smiley's.


PHP5: Any version of php5 will do, provided it has simplexml and XSL support.

Sitecopy for uploading

Siteman uses sitecopy to upload the website. Sitecopy is an utility that synchronizes the local and remote site by looking at the file's date stamp. I just run "sitecopy -u homepage" and sitecopy uploads the changed and new files, and removes the deleted files from the remote site. (note; the "siteman sync" command actualy executes "siteman -u homepage")