Bibhtml

Bibhtml consists of a set of BibTeX style files, which allow you to use BibTeX to produce bibliographies in HTML. These are modelled closely on the standard BibTeX style files.

To accompany them, this package includes a pair of XSLT scripts which illustrate how you might integrate these generated bibliographies into an XML/HTML workflow.

The long-term URL for this package is http://purl.org/nxg/dist/bibhtml

This documentation describes bibhtml version 2.0.2, released 2013 September 8.

Contents

Bibhtml consists of a set of BibTeX style files, which allow you to use BibTeX to produce bibliographies in HTML. These are modelled closely on the standard BibTeX style files. For sample output, see the reference section below.

To accompany them, this package includes a pair of XSLT scripts which illustrate how you might integrate these generated bibliographies into an XML/HTML workflow.

The output of these style files is usable as-is, but it benefits from some post-processing, to remove TeX-isms. There’s a sed script in the distribution which does exactly that, called detex.sed. If you want to make a version of that in some other regexp-supporting language, let me know and I can include it in the distribution.

As well, the package includes a Perl script which orchestrates the various steps required to manage such a bibliography for one or more HTML files. The references in the text are linked directly to the corresponding bibliography entry, and if a URL is defined in the entry within the BibTeX database file, then the generated bibliography entry is linked to this.

The BibTeX style files are abbrvhtml.bst, alphahtml.bst, plainhtml.bst and unsrthtml.bst. As well, there are .bst files which produce their output in date order. To use them, you should generate an .aux file by some appropriate means, and include the line \bibstyle{plainhtml}. Run BibTeX, and the result is a .bbl file, in broadly the same style as the corresponding traditional BibTeX one, but formatted using HTML rather than LaTeX. This might form a useful component of a XSLT-based workflow. For further details, see the discussion of the style files below.

There is also a Perl script, bibhtml, which can orchestrate generating and using this .aux file. This script isn’t really maintained any more, but it is still distributed, and documented below.

Bibhtml works with a standard BibTeX database – it is intended to be compatible with a database used in the standard way with LaTeX. The BibTeX style files distributed with this package define an additional url field: if this is present, then the generated entry will contain a link to this URL. They also define an eprint field – if you do not use the LANL preprint archive, this will be of no interest to you.

BibTeX style files

The package includes several BibTeX style files. As well as the ones directly derived from the standard styles, there are also plainhtmldate.bst, plainhtmldater.bst, alphahtmldate.bst and alphahtmldater.bst styles, which are derived from the standard plain.bst and alpha.bst styles, which sort the output by date and reverse date, rather than by author.

In version 2 of the bibhtml package, the *html.bst files are derived from the traditional files using the urlbst package, and then minimally adjusted so as to produce HTML rather than LaTeX.

Since they are derived via the urlbst package, these style files support an additional entry type, @webpage, and two additional fields on all entry types, url and lastchecked, which give the URL associated with the reference, and the date at which the URL was last verified to be still present.

The distributed .bst files have two configurable parameters, which you might want to adjust for your installation:

The variable 'xxxmirror gives the host name of the arXiv mirror which will be used when generating links to eprints. The default setting in the .bst files is:

"xxx.arxiv.org" 'xxxmirror :=

By default, the style files generate link targets in the bibliography with the same name as the citation key. Thus a BibTeX entry with key surname99, say, would appear in the generated HTML .bbl file wrapped in <a name="surname99">...</a>. If this is inconvenient, perhaps because it conflicts with other links within the file, then you can adjust the 'hrefprefix variable within the style file, to specify a prefix which should appear in the link key. Thus setting

"ref:" 'hrefprefix :=

in the .bst file would produce links like <a name="ref:surname99">...</a> in the .bbl file.

XSLT scripts

The distribution includes a pair of sample XSLT scripts:

The scripts assume that a source file is in XHTML, and has citations marked up as

<span class='cite'>ref99</span>

and that the bibliography is indicated with

<?bibliography bibdata bibstyle?>

A suitable workflow, taking as an example the source file for the page you are reading, is:

% xsltproc bibhtml-extract-aux.xslt bibhtml.html >bibhtml.aux                    
% bibtex bibhtml
This is BibTeX, Version 0.99c (Web2C 7.5.7)
The top-level auxiliary file: bibhtml.aux
The style file: unsrthtml.bst
Database file #1: bibrefs.bib
% sed -f detex.sed bibhtml.bbl >bibhtml.bbl.tmp
% mv bibhtml.bbl.tmp bibhtml.bbl
% xsltproc --stringparam bibfile-name bibhtml \
    bibhtml-insert-bib.xslt bibhtml.html >bibhtml-new.html

The bibhtml.xslt script, when run over a source file, generates a .aux suitable for processing with BibTeX. The resulting .bbl file, possibly after post-processing, can be included in the source XHTML with an XSLT script which includes something like:

<xsl:template select="processing-instruction('bibliography')">
  <xsl:copy-of select="document('mybib.bbl')"/>
</xsl:template>

Postprocessing HTML bibliographies

The output of the BibTeX styles is designed so that it is generally reasonably usable without any post-processing. However it is not ideal, since there are occasionaly TeX-isms such as backslash-escaped characters and the like, depending on what is in the source .bib file. Also, without post-processing any DOIs in the source file aren’t formed into links.

The distribution includes a sed file, detex.sed, which can do appropriate post-processing. Thus the normal workflow is:

% bibtex mydoc
% sed -f detex.sed mydoc.bbl >mydoc.bbl.tmp
% mv mydoc.bbl.tmp mydoc.bbl

Since it uses sed, this is fairly obviously unix-specific, but if anyone would like to contribute a script with similar functionality (it’s just a few moderately tortuous regular expressions), I’d be delighted to include it in the distribution.

Installation

The .bst files have to be installed ‘somewhere where LaTeX can find them’. If you give the command kpsepath bst you can see the list of directories that BibTeX searches for .bst files – on my system, I’d put them into /usr/local/texlive/texmf-local/bibtex/bst, which is a directory for system-wide local additions.

If you wish, you may change the distributed BibTeX style files (see above) to the extent of changing the ‘eprint’ mirror site from the master xxx.arxiv.org to a more local mirror. If you don’t use the LANL preprint archive, this will be of no interest to you.

The bibhtml script

As noted above, this script should still work and is distributed on that basis, but it’s no longer maintained, and won’t be further developed. The XSLT-based mechanism described above is probably more robust; also, the interface described in this section is not the same as the interface of the XSLT scripts section above.

The BibTeX database

TeX features such as ~ and -- are translated to corresponding HTML entities (controlled with the +3 switch, see below), but other TeX constructions will make their way into the generated HTML, and look a little odd. I might try to deal with these in future versions.

Preparing the text

You prepare your text simply by including links to the bibliography file (the default is bibliography.html), followed by a fragment composed of the BibTeX citation key. Thus, you might cite [grendel89] with

<a href="bibliography.html#grendel89">(Grendel, 1989)</a>

(of course, the link text can be anything you like). That’s all there is to it. When you run bibhtml, it generates an .aux file which makes BibTeX produce references for exactly those keys which appear in this way.

Preparing the bibliography file – processing instructions supported

The bibliography file is an ordinary HTML document (which may itself have citations within it), distinguished only by having two processing instructions within it. Bibhtml replaces everything between <?bibhtml start ?> and <?bibhtml end ?> (which should be on lines by themselves) with the formatted bibliography. It leaves those instructions in place, naturally, so once this file is set up, you shouldn’t have to touch it again. Older versions of bibhtml used the magic comments <-- bibhtml start --> and <-- bibhtml end -->: these are still supported, but are deprecated and may disappear in a future version.

Alternatively, you may include the processing instruction <?bibhtml insert?>. This acts broadly like the start and end processing instructions, except that the line is completely replaced by the inserted bibliography. This is useful if the file being processed is a generated file (perhaps the output of a separate XML tool-chain, for example), which will not therefore have to be rescanned in future.

You can specify the bibliography database and style file either on the command line (see below) or using the <?bibhtml bibdata bibfile?> and <?bibhtml bibstylestylefile?> instructions. The value of ‘bibdata’ is cumulative, and appends to any value specified on the command line. A value of ‘bibstyle’ specified on the command line, in contrast, overrides any value in the file.

As a special case, bibhtml also replaces the line after a comment <?bibhtml today ?> with today’s date.

Summary of processing instructions:

<?bibhtml start?> and <?bibhtml stop?>
Bracket the bibliography – any text between these PIs is replaced when bibhtml is next run.
<?bibhtml insert?>
This PI is replaced by the bibliography when bibhtml is next run. This PI is always removed, irrespective of the presence or absence of the --strip option.
<?bibhtml bibdata bibfile?>
Specify the bibliography database to be used. This is the analogue of a \bibliography{bibfile} command in a LaTeX file; see also the -b command-line option.
<?bibhtml bibstyle stylefile?>
Specify the bibliography style to be used. This is the analogue of \bibliographystyle{stylefile} command in a LaTex file; see also the -s command-line option.
<?bibhtml today?>
Replace the following line by today’s date.

Supported options

Usage

% bibhtml [options...] filename...
% bibhtml --merge file.bbl file.html

The filename argument is the name of a file to be scanned.

Bibhtml takes a list of HTML files as argument (though see below for a two-pass variant). It creates an .aux file, runs BibTeX, and merges the resulting .bbl file (if it exists) into bibliography.html, or whatever has been specified as the bibliography file name.

There are several options:

-3, +3
Set this to +3 if you want ~ translated to &nbsp;, and -- to &enspace;. Or set it to -3 (the default) if you don’t.
-a
If this option is set, bibhtml won’t bother scanning any files at all, and will generate references for all the entries in your database. This is equivalent to \nocite{*} in LaTeX.
-b bibdata
The name of your BibTeX database file, as it would be specified in a \bibliography{} command in LaTeX. Unless you happen to keep all your references in a file called bib.bib, you’ll probably want to change this. Or you can use the <?bibhtml bibdata xxx?> processing instruction.
-c configfile
Specifies a configuration file which contains a single line of options, which are inserted in the command line at that point.
--merge
In this special case, bibhtml takes two arguments, a .bbl file and an .html file, merges the first into the second, and nothing else. It’s intended to be used when you have generated a .bbl file by a separate run of BibTeX, and simply wish to merge the results into your bibliography file. As such, it will most likely be useful as part of a script, or other post-processing system.
-r rootname
Specify this and you’ll create rootname.html, rootname.aux and so on. Why not just stick with the default ‘bibliography’...?
-s bibstyle
The name of the BibTeX bibliography style you want to use, as it would be specified for the \bibstyle command in LaTeX. If you want to have a different layout for your HTML bibliographies, please don’t change the file plainhtml.bst distributed with bibhtml. Instead, make a copy of plainhtml.bst under a different name, edit it as much as you like, and use this option of bibhtml to use the modified version instead of the default. Or you can use the <?bibhtml bibstyle xxx?> processing instruction.
--strip
If this option is set, then strip all processing-instruction lines from the output file. This means that the resulting file cannot be processed again by bibhtml, and so is appropriate when the file is the output of a separate tool-chain.
-V, --version
Bibhtml prints the version information and exits.
-v, -q
Do you want the program to be verbose or quiet? The default is -v, verbose.

The defaults for the various parameters are unlikely to be helpful, so you’re likely to want to set one or more of them every time you run the program. It is for this reason, and because you’re likely to want the same set of options every time you create the bibliography for a set of files in a directory, that you can put a collection of options in a configuration file. This can be specified on the command line with the option -c configfilename. If this option is not given, then bibhtml looks for a file named bibhtml.config. For example, the configuration file might contain:

-b mybib +3 -r refs

Two-pass bibhtml

You might sometimes have a need to invoke the two phases separately. If you make a symbolic link to the program via ln -s bibhtml bibhtml2, then you can generate an aux-file alone by giving the command bibhtml2 *.html, say; and you can merge a bbl-file into the bibliography file with the command bibhtml2 bibliography.bbl. The command line option --merge above may be more suitable if you are calling this code from a script, as it requires you to specify both the .bbl and the .html file it is being merged with, (and so it is more robust, and more flexible).

On Unix, this works because the program is able to detect the name it was invoked under. This may be more difficult on other platforms. If so, please get in touch, with suggestions.

Example

There are multiple sources of advice for how to cite electronic documents. The most formal are an ISO standard url:iso690, and the American Psychological Association’s guidance in Sect. 6.31 of apastyle. As well, there are older but still useful discussions in walker06 and emory95.

References

[url:iso690] International Standards Organisation.
ISO 690-2 [online, cited 9 August 2009].
[apastyle] American Psychological Association.
Publication Manual of the American Psychological Association, 6th edition, 2009 [cited 9 August 2009].
[walker06] Janice R. Walker and Todd Taylor.
The Columbia Guide to Online Style. Columbia University Press, 2nd edition, 2006 [cited 9 August 2009].
[emory95] Goizueta Business Library.
Citation formats [online, cited 9 August 2009].

See also the documentation for the urlbst package, which generates BibTeX style files for ordinary LaTeX output (which also supports a @webpage entry type, and url and lastchecked fields), and which contains a similar list of references concerned with citing online sources.

Distribution

Obtaining bibhtml

Bibhtml is available on CTAN at biblio/bibtex/contrib/bibhtml/, and at http://purl.org/nxg/dist/bibhtml.

Download the distribution: bibhtml-2.0.2.tar.gz or bibhtml-2.0.2.zip.

Do let me know if you use this stuff. I’d be grateful for any bug reports, and bug fixes, and also for any comments on the clarity or otherwise of this documentation.

The project source code is hosted at bitbucket.org. You can check out the source code from there, and you are welcome to add issues to the bugparade.

Licence

This software is copyright, 1999, 2005, 2006, 2009, 2013 Norman Gray. It is released under the terms of the GNU General Public Licence. See the copyright declaration at the top of file bibhtml, and the file LICENCE for the licence conditions. You can find an online copy of the GPL at http://www.gnu.org/copyleft/gpl.html.

If this licence is a problem for you, get in touch and we can work something out.

Changes

2.0.2, 2013 September 8
Minor bugfixes: XSLT namespace fixes, and DOI formatting.
2.0.1, 2009 November 2
The *date.bst and *dater.bst styles now work again.
2.0, 2009 August 9
First real release of v2.0, after mild use elsewhere.
2.0b1, 2009 August 9
Substantial rewrite, taking the style files from the urlbst package (thus there are more styles than before), adding more XSLT scripts, and de-emphasising the Perl script. The code is now hosted at bitbucket.org.
1.3, 2006 October 31
1.2, 2005 September 19
1.2b2, 2005 August 30
1.2b1, 2005 August 19
Norman
2013 September 8