Formatting Xml Files with xmllint

I am currently working on a project where I need to parse Open Xml files comming from the Office Suite. As you may already know this involves opening what is essentially a zip file full of xml files.

When creating a parser though, having source files for referencing is very useful, and the saved files in these zip files are unformatted, i.e. they have no newlines or indenting - which making navigating them very painful.

After having opening a file I usually ran this vim command %!xmllint --format % which would run the file through xmllint, format it, and then replace the buffers content with the formatted version of the file.

This also worked well - however I keep opening new files, and keep forgetting to save after having formatted them… So that command got run a lot!

First try to solve it

I decided to just parse all xml files in one go after having opened the zip. I thought this to be as easy as:

find . -name "*.xml*" -exec xmllint --format {} \;

However this just output each file to stdout. And xmllint has no option to save the file in place.

Second try

Well - maybe we could just pipe the output back into the file.

find . -name "*.xml*" -exec xmllint --format {} > {} \;

But, alas, this did not work. The command now just did nothing. Maybe because you cannot reference the filename multiple times (turns out, you can…)

Third try

Ok, so I created a script to wrap the command in instaed.

#!/bin/bash
xmllint $1 > $1

and calling that

find . -name "*.xml*" -exec format.sh {} \;

But now all files gives us this error in xmllint:

parser error : Document is empty

So we are destroying the file, before xml lint is parsing.

Final try

Instaed of finding an elegant way to do this, I thought: Well if this happened to me, I would just format to a new file, the remove the original file, and rename the new file to the old name…

So that’s what I did:

#!/bin/bash
xmllint --format $1 > $1.tmp
rm $1
mv $1.tmp $1

I then extended it a bit, and placed in my scripts folder:

#!/bin/bash
if [[ $1 == "all" ]] 
then
  find . -name "*.xml*" -exec $0 {} \;
else
  xmllint --format $1 > $1.tmp
  rm $1
  mv $1.tmp $1
fi

and now I can just be in a unzipped folder of xml files and run

formatxml all