pdfbib readme

Max Snauth

Revision History
Revision 0.109 October 2006TB


pdfbib is a cgi with ajax interface which reads the metadata of your pdf's and allows you to compare and synchronize this data with data in a (bibliographic) database. Functionality is provided to edit both the metadata and the data in the database. The goal of this project is threefold:

  • make local indexing and search software more efficient by providing accurate metadata
  • make sure that the data stocks which unfortunately have to be stored double - one time in the database, one time in the file itself - are at least synchronized
  • provide a nice interface to do this boring work

My current configuration is a web server which has direct read/write access to the pdf's and to the sqlite2 database used by bibus — which is IMHO currently the most mature bibliographic database (besides bibtex and all its related software of course) and interacts well with openoffice.org.


As with all my scripts this cgi is heavily based on perl modules which have to be installed to make this work. Some of these are part of standard perl distributions, some are not. More specifically, you will need:

  • obviously a web server with
  • perl CGI installed and working
  • DBI with the SQLite2 driver (because bibus uses sqlite 2)
  • PDF::API2 (reading/writing metadata to pdf's)
  • CGI::Ajax
  • CAM::PDF (to extact ascii text from pdf's, so this is not strictly necessary)
  • out of laziness I use "find" command to search for all pdfs in the path, this should be fixed to include non-linux users, see the todo section.

Get them from CPAN


I include my bibliographic database in bibus' format (bibus.dat), but you really should start with an empty one. Get bibus and choose sqlite as database format. More information on bibus' database format can be found on bibus' community wiki.

Put the main cgi (pdfbib.pl) into your cgi directory and make sure that it is executable. Then adapt at least two parameters:

Where the sqlite database resides. Make sure you have read/write rights to this file.
Where your pdfs reside. This can later be adjusted from the user interface. The path is searched recursively.
user and password for database
The shipped example database comes without password set. The username is berker.

One naming convention is crucial: the pdf-filename consists of <Identifier>.pdf. Identifier is the "identifier" field of bibus' database and has to be unique.


The interface should be selfexplaining with so much ajax ;-)


  • make the whole thing faster, for instance by:
  • do not use "find" but builtin perl functions
  • write user guide
  • provide some mechanism which filters the pdf's which have missing metadata/database fields
  • many more?


The current release is hosted by sourceforge: http://sourceforge.net/project/showfiles.php?group_id=178613


Copyright © 2006 Thomas Berker

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.