UNIX shell scripting : extract the value of XML-tagged data from a text file

UNIX Shell scripts are often used to perform interface data file manipulation tasks (basic validation, re-naming, archiving, etc). This is fine when the attributes being manipulated are, say, in the file name. But what if the script needs to grab a value from inside the data file? And maybe the file is in XML format.

For example , the XML data file contains the following string:

<FileSequenceNum >0000015</FileSequenceNum>

…and the value of FileSequenceNum is required by the shell script. There must beĀ loads of ways to do this, but here are three examples:


VALUE=`sed -ne "/$TAG/s/[^0-9]*\([0-9]*\)..*/\1/p" $FILE`


VALUE=`grep -w $ TAG $FILE | awk -F'>' '{ print $2 }' | awk -F'<' '{print $1}'`


TAG1="FileSequenceNum "

grep -w $TAG1 dt_test.xml | sed -e 's/^[ \t]*//' | sed "s/<$TAG1>\(.*\)<\/$TAG2.*/\1/"

echo "Value is: $VALUE"

In the last example, two $TAG variables are used because the data file contains a trailing space in the opening XML tag name, but no space in the closing tag name:

<FileSequenceNum >0000015<FileSequenceNum>