Friday, August 31, 2012

Awk script that checks duplicate values in an XML

I wanted to test if a key-combination of 3 field values appeared multiple times in an XML file. A colleague of mine gave me head start on text-processing using Awk. OS X ships with the BSD version of awk so this came out quite handy.

The input XML:

<ehbo>
    <order>
        <siebelorderid>10</siebelorderid>
        <siebelordernumber>20</siebelordernumber>
        <cordysordernumber>30</cordysordernumber>
    </order>
    <order>
        <siebelorderid>10</siebelorderid>
        <siebelordernumber>20</siebelordernumber>
        <cordysordernumber>30</cordysordernumber>
    </order>
</ehbo> 

The Awk script:

BEGIN {
print "START";
print "";
}
/SiebelOrderID/ {
       line = $0;
       split(line, a, ">");
       split(a[2], b, "<");
       siebelOrderID = b[1];

  #print "a" siebelOrderID;
}
/SiebelOrderNumber/ {
       line = $0;
       split(line, c, ">");
       split(c[2], d, "<");
       SiebelOrderNumber = d[1];

  #print "b" SiebelOrderNumber;
}
/CordysOrderNumber/ {
       line = $0;
       split(line, e, ">");
       split(e[2], f, "<");
       CordysOrderNumber = f[1];

       #print "c" CordysOrderNumber;
}

/\/Order/ { 

plep = siebelOrderID "_" SiebelOrderNumber "_" CordysOrderNumber;

lijst[plep]++; 

#print "d" plep;

siebelOrderID="";
SiebelOrderNumber="";
CordysOrderNumber="";

}

END {
       for (i in lijst) {
        if (lijst[i]-1) {
print i, lijst[i]
}
       }

print "";
print "FINISHED!";
}


To make life easier, I put the script in a file called "parser.awk" and then used the following command in Terminal:
awk -f parser.awk input.xml

This will print the field combinations that occur multiple times.

References:




0 reacties:

Post a Comment