UNIX shell scripting : extract the value of XML-tagged data from a text file

UNIX Shell scripts are often used to perform interface data file manipulation tasks (basic validation, re-naming, archiving, etc). This is fine when the attributes being manipulated are, say, in the file name. But what if the script needs to grab a value from inside the data file? And maybe the file is in XML format.

For example , the XML data file contains the following string:

<FileSequenceNum >0000015</FileSequenceNum>

…and the value of FileSequenceNum is required by the shell script. There must be loads of ways to do this, but here are three examples:


VALUE=`sed -ne "/$TAG/s/[^0-9]*\([0-9]*\)..*/\1/p" $FILE`


VALUE=`grep -w $ TAG $FILE | awk -F'>' '{ print $2 }' | awk -F'<' '{print $1}'`


TAG1="FileSequenceNum "

grep -w $TAG1 dt_test.xml | sed -e 's/^[ \t]*//' | sed "s/<$TAG1>\(.*\)<\/$TAG2.*/\1/"

echo "Value is: $VALUE"

In the last example, two $TAG variables are used because the data file contains a trailing space in the opening XML tag name, but no space in the closing tag name:

<FileSequenceNum >0000015<FileSequenceNum>

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s