This note describes the history of expandfile, a simple Unix command line program for expanding templates.
History
I wrote expandfile in 2002 to replace a collection of ad hoc Perl programs that I had been using to maintain the multicians.org website since 1995. Since 2002, I've added features to expandfile occasionally.
Web pages are written in the HTML language and sent from a web server program on a server host to a web browser program running on a user device. I chose to create static HTML pages and have the web server serve them without any change at run-time. Because multicians.org had mirror sites where I could not modify the web server configuration or execute code on the server, I could not dynamically generate web pages, even if I wanted to. Furthermore, at the time, people implementing web sites by generating pages dynamically from databases, like bulletin board systems, were encountering web server performance and security problems.
My ISP's web servers supported a "server side include" feature, which let users insert the contents of auxiliary files when serving a web page. When I tried to separate my pages into content and boilerplate, I found that I wanted to have the boilerplate be slightly different for each page, so I sought a method where the boilerplate content could be tailored depending on the page. The organization I chose was:
- Each "object" HTML web page was created from a "source" file.
- Source files were stored in the file system on my computer.
- There was an object source file for each page, plus boilerplate source files included by multiple pages.
- Source files were compiled into HTML object files on my computer when necessary.
- Object files were stored in the file system on my computer and copied to the server computers when changed.
- Object files in server computers' file systems were HTML ready to serve.
The idea of a simple macro expander dates back to Christopher Strachey's GPM macro expander, which I had experimented with in the mid 60s on CTSS at MIT Project MAC. Later, I used several computer languages that used pre-processors to add features to their languages, such as ALM, PL/I, and C: they provided features such as including other input files.
RUNOFF is an archetype text-transformation language system; I had used it in the 60s on CTSS. RUNOFF's input is either text to be copied to the output, or commands that change the state of execution and affect how later processing works. Later implementations of RUNOFF-like languages in Multics and elsewhere included the ideas of semantic markup, macro execution, and multi-pass processing.
I decided to write my own source text expansion tool that did not parse the underlying input language, similar to GPM: it would just transform text strings into other text strings, with a minimal way of defining macros. This made the program more general and freed it from dependency on the syntax of the underlying language; I didn't have to write or maintain an HTML parser, and changes in the HTML spec would rarely require the tool to change. As in GPM and some RUNOFFs, I could set and evaluate string variables, and expand macros that accepted string variables as arguments.
I looked at the Unix m4 tool, written by friends of mine from Multics days. It wasn't available for the computer I had then; Perl was available on Unix, Windows, and Mac.
Wherever possible, I used what was already available: I didn't invent a file storage system, but used my computer's OS system; I didn't invent a text file format or editor, using tools provided on the computer.
I studied the errors I made often when maintaining multicians.org, and chose tools that would prevent them. Having each fact in one place was good only if I made sure to regenerate all object pages that used that fact: this led me to use the make program. Another common error was forgetting to update server files when I modified a file on my computer: this suggested the use of rsync.
Using Expandfile to Generate Web Sites
I first used expandfile to translate "HTML with extensions" input (which I called HTMX) into HTML, mostly to include common boilerplate, such as page banners and CSS layout instructions used on all my web pages. I added features to allow source files to set variables that boilerplate files could use, like "page title" and "date updated" in the page header and footer.
Adding builtin functions that could transform variables' values came next, then the ability to capture the output of external shell commands, and then integration with SQL. I used these features to simplify my work flow for maintaining websites I created, and to eliminate special-purpose Perl programs in favor of logic in HTMX web page templates.
The big advance for me was introducing the *block builtin, and the pattern of writing HTMX files that
- Set parameter variables, define a body block that includes variable expansions, and possibly define other blocks.
- At the end, *include a wrapper file that outputs headers, *expands the body block, and then outputs footers. (The header and footer can expand the parameter variables to set page titles and so on.)
This pattern separates site boilerplate from page content, provides an independent source file for each HTML page, and makes it easy to regenerate a single page.
Using expandfile was valuable when I made global changes to every page on a site to change every page's appearance, to conform to changing HTML specifications, to improve support for mobile devices, or to use new browser features.
Connecting expandfile to my local MySQL database and supporting *sqlloop was the next big step. This provided consistent formatting for lists of people, publications, glossary entries, and website page indexes, and defined a lightweight way for any HTMX file to refer to data from these lists. I used this to have multiple pages refer to facts that were kept in one place in SQL data. These changes reduced the chance that an editing mistake would screw up a whole page, or that I'd forget to update all the pages that used a fact.
The third big step I took was using traditional Unix tools to automate site building and publishing. Using make (created for Unix by Multician Stu Feldman) to invoke expandfile only when an HTMX file was newer than its corresponding HTML files meant that I could make a one-line change to a file and then just type make install to recompile the minimum number of files and automatically rsync them to the deployment site.
Macros
There were times when I wanted to create repetitive formatting of a sequence of items, like setting up picture galleries. I added the *callv builtin to expandfile to be able to easily execute a series of steps, for example generating the HTML code to display a photo. Then I set up a library of these macros that I could *include into a source file when generating an HTML object.
The macros for inserting IMG tags into a page will output a warning if the image file cannot be found; if the file is found, the macros invoke a small helper program using *shell to get the dimensions of the picture so they can be included in the IMG tag's WIDTH and LENGTH parameters. This speeds up the loading of the HTML page.
Another example of macro use is the main expandfile page, whose source you can view. That page has a section for each builtin function implemented by expandfile. The page defines a block called bif and begins each section with an invocation of the bif macro, like *callv,bif,prototype,result,text. Using the macro ensures that each section heading is correctly formatted. The macro also formats a <tr> row for the summary table of all builtin functions, which is added to a text variable that will be inserted into a TABLE of functions using *include. The HTMX source that generates the main page is here.
Security
Several design choices I made were aimed at reducing the chance of attacks on my sites.
Early web sites were attacked from outsiders who found bugs in host operating systems and web servers. I decided that I wanted to use only the simplest features of a web server to transmit the contents of read-only files. This design meant that each web page could be read-only, cached in the server memory, and served with quick response.
I chose not to depend on dynamic generation of pages, or to allow site visitors cause any computation on the server. If visitors could not upload any data, there would be less chance that attack code could get executed.
Other Applications of Expandfile
As expandfile developed, I found other uses for it, including reformatting database files and preparing input for other programs such as input to procmail, RSS feed declarations in XML, shell scripts, input to the dot graphical compiler, and XML sitemap files for the Google crawler.
For some features, I use expandfile to expand a template which generates HTMX files which are in turn expanded by expandfile; this lets me do "two pass" expansions so I can add up counters and then display the values above the detailed information.
In 2004, I wrote a web statistics application, Super Webtrax, that uses expandfile to produce a daily web page with charts and tables generated from web server log data loaded into SQL.
For a document formatting application, I extracted data from data files in a proprietary format, translated it to SQL and loaded it into a local MySQL database, used expandfile's *sqlloop builtin to generate HTML, and a browser to generate visually formatted output, printed the browser output to PostScript, and used page impression tools to generate a booklet.
I have also built template files that use the *shell builtin to fire off curl commands that fetch XML data from Web APIs, and then parse the result with *xmlloop to generate HTML reports.
Language Issues
I originally wrote expandfile in Perl 5, and used it on Unix, macOS and Windows, through years of evolution of my program, the Perl language, and the features provided on different platforms. I showed expandfile to friends, but they were put off by the difficulty of installing and configuring the Perl implementation:
- Using CPAN library modules requires installing them. For example, expandfile uses Perl module DBD::mysql, which must be available even if a particular HTMX program does not use SQL. Similarly, expandfile requires XML::LibXML even if you don't use *xmlloop. Installing the Perl CPAN libraries for expandfile may take hours of downloading, installing, and configuring.
- Installing CPAN modules sometimes requires that other utility programs be installed on the computer and configured first. For example, installing DBD::mysql fails unless MySQL is installed and configured.
- Different revision levels are provided on different platorms: e.g. macOS no longer provides Perl, so users have to install it.
Later Improvements
Early versions of expandfile had some features that I later decided were mistakes. Fortunately, nobody but me was using the program, and I knew where all the HTMX source files were. So I backed up the program and sources, created and tested a new program version, modified every source file that had to change, recompiled everything, compared old output to new, and accounted for changes before switching over to the new version.
Some of the changes were bug fixes or added new features, such as *xmlloop and *format. A few changes were made when Perl syntax changed and the program had to be updated.
I made the following changes in early 2021:
- Eliminated little-used syntax that gave special meaning to two characters: instead, implemented the functions as library macros for the few places that need it.
- Eliminated unnecessary control argument -config; configuration files are just expanded for side effect before other input files.
- Renamed implementation variables to prefix their names with _xf_ to avoid collisions with user variable names.
- Added configuration variable _xf_expand_multics to enable or disable Multics features; eliminated -mult control argument.
- Allowed multiple args to *shell, *fwrite, *fappend, and *htmlescape -- concatenate them with no separator.
- Added the *bindcsv function to replace a potentially exploitable practice.
- Made error messages more specific and added runtime checks for installation and implementation errors.
- Reimplemented, tested, and documented the configuration and install mechanisms.
- Created a comprehensive test suite.
- Updated documentation and added a Unix man page.
- Placed all source on GitHub with MIT Open Source license.
Comparison to other approaches
PHP
PHP programs are parsed and interpreted at runtime on every view; their execution can output HTML, access databases, and so on. PHP constructs look like <?php echo '<p>Hello World</p>'; ?> You can set and refer to variables that have string values. PHP has over 1000 builtin functions, including SQL access, and is very popular. (You can install a caching module into your web server that avoids unnecessary parsing.)
Markdown
Markdown converts a simple text language into HTML. Some Web bulletin board systems allow users to type in their postings in Markdown, and translate the input to HTML. Markdown is often used as a way to collect user comments and input, and allow formatting of the input, without requiring users to learn HTML syntax. It's easy to learn, but it doesn't support the features needed for a full-featured web page.
Hugo
Just learned a little about this recently. Hugo is based on go instead of Perl; it uses Go's conventions about modules and source organization. Similar to expandfile, Hugo input consists of HTML and extension constructs -- Hugo uses {{ ... }} instead of %[ ... ]% . It has variables, function expansion, and function definition. Looks slick. It uses a kind of "markdown" to avoid writing HTML. It supports multi-language translation, themes, and static websites.