Can open office convert an docx xml file into html
I have a web application that converts microsoft word documents to html
using LibreOffice. Everything has been working great, but I am having some
issues with their ability to interpret indentation on certain files.
Using suggestions from other kind developers, I have discovered how to
pull apart the docx file by converting it to a zip, unzipping it, and then
extracting the document.xml file. in the process, I have noticed that
LibreOffice is consistently incapable of interpreting any <w:tab> tags and
rendering them as indentations.
I have tried many things to fix this, but I am running out of ideas. My
last ditch effort was going to be to use php to programatically replace
all of the <w:tab> tags with <w:ind> tags (which LibreOffice successfully
interprets as tabs). However, as soon as i convert the file to .zip with
bash I am unable to seal it back up as a docx. I mean, i can do it, but
LibreOffice no longer recognizes it and throws a strange error at me.
Is there any way I can get an html rendering from just pulling the
document.xml file? If not, does anybody know how to seal these documents
back up again? Any help is much appreciated. Thanks!
No comments:
Post a Comment