Wednesday, August 15, 2012

Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

SkyHi @ Wednesday, August 15, 2012

Ok, so I needed to convert .docx files to .pdf files on the fly, but none of the free php libraries that were available let me do it on my server (a webservice was not good enough).
Basically either I needed to pay for a library (and have it maybe suck) or just deal with the free ones that didn't convert the formatting well enough.
Not good enough!
I found that LibreOffice (OpenOffice's successor) allows command line conversion using the LibreOffice conversion engine (which DID preserve the formatting like I wanted and generally worked great).
I loaded the latest version of Ubuntu (http://www.ubuntu.com/download/ubuntu/download) onto my Virtual Box (https://www.virtualbox.org/wiki/Downloads) on my computer and found that I was able to easily convert files using the commandline like this:
libreoffice --headless -convert-to pdf fileToConvert.docx -outdir output/path/for/pdf
I thought: sweet...but I don't have admin rights on my host's web server. I tried to use a "portable" version of LibreOffice that I obtained from http://portablelinuxapps.org/ but I was unable to get it to work on my host's webserver, because my host's webserver didn't have all the dependencies (Dependency Hell! http://en.wikipedia.org/wiki/Dependency_hell)
I was at a loss of how to make it work, until I ran across a cool project made by a Ph.D. student (Philip J. Guo) at Stanford called CDE: http://www.stanford.edu/~pgbovine/cde.html
I will let you look at his explanations of how it works (I followed what he did here:

starting at about 32:00 as well as the directions on his site), but in short, it allows one to avoid dependency hell by copying all the files used when you run certain commands, recreating the linux environment where the command worked. I was able to use this to run LibreOffice without having to resort to someone's portable version of it, and it worked just like it did when I did it on Ubuntu with the command above, with a tweak: I needed to run the wrapper of LibreOffice the CDE generated.
So, below is my PHP code that calls it. In this code snippet, the filename to be copied is passed in as $_POST["filename"]. I copy the file to the same spot where I originally converted the file, convert it, copy it back and then delete all the files (so that it doesn't start growing exponentially).
I did it this way because I wasn't able to make it work otherwise on the webserver. If there is a linux + webserver ninja out there that can figure out how to make it work without doing this, I would be interested to know what you did. Please post a comment or something if you did that.
 
//first copy the file to the magic place where we can convert it to a pdf on the fly
copy($_POST["filename"], "../LibreOffice/cde-package/cde-root/home/robert/Desktop/".$_POST["filename"]);
//change to that directory
chdir('../LibreOffice/cde-package/cde-root/home/robert');
//the magic command that does the conversion
$myCommand = "./libreoffice.cde --headless -convert-to pdf Desktop/".$_POST["filename"]." -outdir Desktop/";
exec ($myCommand);
//copy the file back
copy("Desktop/".str_replace(".docx", ".pdf", $_POST["filename"]), "../../../../../documents/".str_replace(".docx", ".pdf", $_POST["filename"]));
//delete all the files out of the magic place where we can convert it to a pdf on the fly
$files1 = scandir('Desktop');
//my files that I generated all happened to start with a number.
$pattern = '/^[0-9]/';
foreach ($files1 as $value)
{
preg_match($pattern, $value, $matches);
if(count($matches) ?> 0)
{
unlink("Desktop/".$value);
}
}
//changing the header to the location of the file makes it work well on androids
header( 'Location: '.str_replace(".docx", ".pdf", $_POST["filename"]) );
?>

And here is the tar.gz file I generated I generated with CDE. See below for a working example and complete, documented code.
Success! I made a truly portable version of LibreOffice that can convert files on the fly on a webserver using 100% free, open source software!
Note: since when I used CDE I only converted a .docx to a .pdf, my tar.gz file above will probably only work to do that. To get it to do other things, you will have to do them with CDE first.
*****************************************************************************
UPDATE: since several people have had questions on how to get it working or had issues making it work, I am putting a complete working example out there for you to play with and modify.
Click here for working example.

And here is the tar.gz of the working example, tied up in a nice bow for you. To make sure the permissions don't get screwed up, I recommend uploading the tar.gz file to your server and then unpacking it there.

This is my way of giving back to all the great people out there that have helped me out by doing these kinds of things for me. Pay it forward, guys! [licensed under the MIT license.]