Home

PDF to Web Reader (for handheld devices)

Written by: NetworkError, on 08-08-2008 17:29
Last update: 03-04-2009 13:43
Published in: Public, Technical Wootness
Views: 3385

My latest project is reading books on my new Blackberry 8330. They're in PDF format and they seem to be a bit much for the poor, stupid Blackberry PDF distiller. I tried converting the whole file to HTML with a command line utility called pdftoheml, but that was a bit large as well.

So I had to come up with a way to read a PDF file one piece at a time. I figured the path-of-least resistance would be to create a web-based pdf-to-html converter that did its work one page at a time.

That brings us to today's project. I created a class (and controller) that wraps the Linux command line utility "pdftohtml". The controller tells the class what "book" (PDF file) to read and which page to read. The class then outputs the HTML for that page. The HTML is mostly generated from "pdftohtml", but I'm going back through the output and re-writing parts of it. I'm also adding pagination, a book selector page, and I'm saving the reader's place in a few cookies.

The command line program is from the nice folks at processtext.com.

I've pasted the code below. Here is a link to the project so you can see it in action. Let's walk through the features.

Let's start with the controller file. It looks for post variables that tell it:

  • What book to read.
  • Go to the next page.
  • Go to the previous page.
  • Go to a specific page.
  • Close the book.

Each of these variables triggers a call to a method in our BookReader object. When the BookReader object has been configured, we call printHTML(). This will output one of the following, depending on what condition the object is in.

  1. Print out the book list if no books are selected.
  2. Print out the selected page of the selected book.
  3. Print out errors if something is amiss.

The BookReader class is fairly strait forward. Methods for setting and getting the pages all have error checking and store their settings in a cookie. The constructor gets a list of PDF files from the 'books' directory. And the various print methods output our UI.

Since we're running a command line program with user input for parameters, it's really important to scrub the input for potential code injection attacks.

I should probably enhance this by putting the object in the session so I only cache the book list once. (I avoid accessing the drive whenever possible.) The only time I'll clear the cache is when the closeBook() method is called. (That will refresh the book list page.)

I could extend this class too. It wouldn't be hard to put an authentication layer on and override the store and fetch book/page methods to use a database. This would allow multiple users to use the system and save their place without stomping each other's settings.

Anyway... That's my project. The rest of the functionality is pretty well explained by looking at the doc-blocks below. If you have any comments about this project, please feel free to contact me.

 

<?

/**
* @file
* This is a simple script to convert a PDF book to HTML one page at a time.
* It also keeps track of you place with a few cookies.
* I wrote this to read books on my blackberry.
*
* @author NetworkError <junk@networkerror.org>
**/

// Instantiate our book reader class.
$book_reader = new BookReader();

/**
* Scan for post variables and set the members of the BookReader class accordingly.
**/
// Set book.
if (array_key_exists('book', $_POST)) {
$book_reader->setBook($_POST['book']);
}

// Set page.
if (array_key_exists('next_page', $_POST)) {
$book_reader->nextPage();
} elseif (array_key_exists('previous_page', $_POST)) {
$book_reader->previousPage();
} elseif (array_key_exists('page', $_POST)) {
$book_reader->setPage($_POST['page']);
}

// Close book.
if (array_key_exists('close_book', $_POST)) {
$book_reader->closeBook();
}

/**
* Print the HTML output from the book reader.
**/
$book_reader->printHTML();


// Print debug.
if ($_GET['debug']) {
$book_reader->printDebug();
}

/**
* This class handles the business of listing books, displaying pages, and
* tracking your place in a cookie.
**/
class BookReader
{

/**
* An array containing the filenames of all PDF books in the books directory.
**/
protected $books = array();

/**
* Currently selected book.
**/
protected $book = false;

/**
* Currently selected page.
**/
protected $page;
// Default to 1.
protected $default_page = 1;

/**
* How long cookies should last before expring (in seconds).
**/
protected $cookie_timeout = 31536000; // 1 year.

/**
* Our array of errors.
**/
protected $errors = array();

/**
* Set up the initial variables.
**/
public function __construct()
{
// Get list of books from the ./books directory.
$this->books = scandir('books');
unset($this->books[0]); // "."
unset($this->books[1]); // ".."

// Set cookie timeout.
$this->cookie_timeout += time();

// Read saved values.
$this->book = $this->fetchSavedBook();
$this->page = $this->fetchSavedPage();
}

/**
* Select a book.
*
* @param $book string book (File name of PDF book in the books directory.)
*
* @return bool (success/fail)
**/
public function setBook($book)
{
if (in_array($book, $this->books)) {
// Set the book.
$this->book = $book;
$this->saveBook($book);
return true;
} else {
$this->logError('Selected book "'.$book.'" is not in the list of available books.');
return false;
}
}

/**
* Save the book. If no book specified, delete saved book.
*
* @param [$book] string book (filename)
*
* @return void
**/
protected function saveBook($book = false)
{
if ($book === false) {
setcookie('book', '', time() - 3600);
} else {
setcookie('book', $_POST['book'], $this->cookie_timeout);
}
}

/**
* Fetch the saved book.
*
* @return string book (filename) || bool false (if no value found)
**/
protected function fetchSavedBook()
{
if (array_key_exists('book', $_COOKIE)) {
return $_COOKIE['book'];
} else {
return false;
}
}

/**
* Advance the page by 1.
*
* @return void
**/
public function nextPage()
{
$this->setPage($this->page + 1);
}

/**
* Decrease the page by 1.
*
* @return void
**/
public function previousPage()
{
// Don't let them go lower than 1.
if ($this->page > 1) {
$this->setPage($this->page - 1);
}
}

/**
* Set the page number manually.
*
* @param $page int page_number (what page number do you want?)
*
* @return void
**/
public function setPage($page)
{
if (is_numeric($page)) {
$this->page = $page;
$this->savePage($page);
return true;
} else {
$this->logError('Selected page "'.$page.'" is not numeric.');
return false;
}
}

/**
* Save the page number. If none specified, delete saved page number.
*
* @param [$page] int page_number
*
* @return void
**/
protected function savePage($page = false)
{
if ($page === false) {
setcookie('page', '', time() - 3600);
} else {
setcookie('page', $page, $this->cookie_timeout);
}
}

/**
* Fetch the saved page number.
*
* @return int page (1 if none found)
**/
protected function fetchSavedPage()
{
if (array_key_exists('page', $_COOKIE)) {
return $_COOKIE['page'];
} else {
return $this->default_page;
}
}

/**
* Close the book. Clear cookies, $this->page, and $this->book.
*
* @return void
**/
public function closeBook()
{
$this->page = 1;
$this->savePage();

$this->book = false;
$this->saveBook();
}

/**
* Output the book list or page, if they have selected a book.
* This doesn't return HTML, it just echos it.
*
* @return void
**/
public function printHTML()
{
if ($this->errors()) {
$this->printErrors();
} else {
// Print HTML
if (!$this->book) {
$this->printBookSelector();
} else {
$this->printPage();
}
}
}

/**
* Output HTML page for book list.
*
* @return void
**/
public function printBookSelector()
{
// Print the book selector.
echo '';
echo 'Available books:

'; echo '
'; foreach ($this->books as $book) { echo ' '.$book.'
'; } echo '
'; echo '
';
echo '';
echo '';
}

/**
* Output HTML page - book page.
*
* @return void
**/
public function printPage()
{
// Print the book page.
$html = $this->getPage();

if (!$this->errors()) {
// Fix body background color.
$html = str_replace('bgcolor="#A0A0A0"', 'bgcolor="white"', $html);

// Remove color errors.
$html = str_replace('Error : Bad color', '', $html);

// Add page selector
$pagination = '
'; $pagination .= ''; $pagination .= ''; $pagination .= ''; $pagination .= ''; $pagination .= ''; $pagination .= ''; $pagination .= ''; $pagination .= ''; $pagination .= ''; $pagination .= ''; $pagination .= '
page == 1)?' disabled="true"':'').' /> Page '.$this->page.'
Goto Page:
';
$pagination .= '';
$html = str_replace('', $pagination.'', $html);

// Move annoying converter thingy to the footnote.
$html = str_replace('Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html
', '', $html);
$html = str_replace('', '

Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html
', $html); echo $html; } else { $this->printErrors(); } } /** * Print error output. * * @return void **/ public function printErrors() { if (is_array($this->errors)) { foreach ($this->errors as $error) { echo ''.$error['method'].''.$error['args'].'
'; echo 'Details: '.$error['details'].'

'; } } else { echo 'There are no errors.'; } } /** * Get debug output. * * @return void **/ public function printDebug() { echo '

';
echo 'POST:'."\n";
var_export($_POST);
echo "\n".'COOKIE:'."\n";
var_export($_COOKIE);
echo "\n".'Command:'."\n";
echo $this->command;
echo '

';
}

/**
* Are thre errors? (bool)
*
* @return bool (true == yes, false == no)
**/
public function errors()
{
if (count($this->errors) > 0) {
return true;
} else {
return false;
}
}

/**
* Get a page worth of HTML (unscrubbed).
*
* @return string HTML
**/
protected function getPage()
{
$this->command = 'pdftohtml -stdout -f '.$this->page.' -l '.$this->page.' \'books/'.$this->book.'\'';
$converter = popen($this->command, 'r');
if ($converter !== false) {
$html = '';
while (!feof($converter)) {
$html .= fread($converter, 1024);
}
pclose($converter);
} else {
$html = false;
$this->logError('There was an error opening the "pdftohtml" conversion process.');
}

return $html;
}

/**
* Log an error.
*
* @param $error_message string error_message (The description of the error.)
**/
protected function logError($error_message)
{
$backtrace = debug_backtrace();
$error = array(
'method' => $backtrace[1]['function'],
'args' => '('.implode(', ', $backtrace[1]['args']).')',
'details' => $error_message,
);

$this->errors[] = $error;
}
}

 

Read more... Be first to comment this article   |   Print   |   Send to friend