Modify XML with XPathWritten by Jim Ownby

Today, we’re going to discuss how to modify XML with XPath or XSL templates. How do you accomplish this? Let’s dive in.

How Do I Modify XML with XPath or XSL Templates?

With Xpath, you can alter the query that XSL uses to differentiate the resulting output. With XSL Templates, in combination with XPath, you can have alternative and reusable formatting to alter the output. Let’s take a few steps back and first review what Xpath and XSL are.

Modify XML

What Are XPath And XSL

In the world of Information Technology, technical terminology tend to be specific but there are a few exceptions. A megabyte of RAM is a measurement of memory while a megabyte of disk space is a measurement of storage. Same term, different use.

XSL, as a term, has this kind of grey area, too. Extensible Stylesheet Language (XSL) is usually referred to as a family of languages used to transform (or manipulate) and render XML documents.

XPath and XSL Transformation (XSLT) are separate language specifications within that family and they have very different uses.

XPath is a query language for XML documents, much like SQL is for an RDBMS database, such as Oracle or SQL Server, or JQuery for HTML. It allows a consumer (like a program) or a provider (like a service) to navigate a XML document and select specific sections or values that match the query parameters.

XSLT is a language for transforming XML into other formats, such as XML, HTML, plain text, etc. It uses XPath and specific pattern-matching definitions to identify and format input into a defined output. Most modern browsers have an XSLT processing engine built in to handle the display of XML documents.

  • XSL Templates are modular constructs used within XSLT to encapsulate formatting logic.

Situation

Here’s where I ran into an issue with how a phone directory was showing up in a browser.

It was simply displaying a singular list of phone numbers but what was needed was a different format, where some phone numbers would display for some people and not others.

blog image 118

The web page in question was a default page provided by an enterprise application, so there was no real access to the “how” the page was generated from the web server. However, the web server was using an XSLT file to process the directory.

A quick, minor change to the file confirmed that this was how the list was formatted.

The original XSLT looked like this:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
  <xsl:output method="xml" indent="yes"/>
  <xsl:template match="/PhoneList">
    <xsl:for-each select="Region/PhoneNumber">

      <tr>
        <td class="region">
          <xsl:value-of select="@Name"/>
        </td>
        <td class="number">
          <xsl:value-of select="@Value"/>
        </td>
        <td class="language">
          <span class="defaultLanguage">
            <xsl:value-of select="@Primary"/>
          </span>
          <xsl:value-of select="@Secondary"/>
        </td>
      </tr>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

So, at first glance, this looks like straight-forward XML, with two interesting distinctions:

-Most of the nodes start and end with an xsl: prefix.

-In the middle section, it looks like the HTML tags for table rows (tr) and table data cells (td)

The xsl: prefix is what the XSLT processor is looking for upon which to take action. This, as well as the specific commands/functions defined, help the processor discern between the template structure and the output format.

In this case, the middle section is the output format of HTML and, specifically, fills a simple table with the phone number details.

While the input XML source is somewhat unknown, this is a basic breakdown of what that xml schema looks like:

<PhoneList>
    <Region>
        <PhoneNumber Name="Main" Value="+1 (222) 555-1212" Primary="English (United States)" Secondary="" />
    </Region>
</PhoneList>

The output looks like this:

<table class="numbers" cellspacing="0" cellpadding="0">
        <tbody><tr><th class="attribute" id="txtPhoneFormRegion">Region</th><th class="attribute" id="txtPhoneFormNumber">Number</th><th class="attribute" id="txtPhoneFormLangs">Available Languages</th></tr>
        <!--?xml version="1.0" encoding="utf-8"?-->
<tr>
  <td class="region">Main</td>
  <td class="number">+1 (222) 555-1212</td>
  <td class="language">
    <span class="defaultLanguage">English (United States)</span></td>
</tr>
    </tbody></table>

When properly tied together, the output looks decent. But, what if the input xml is a bit more involved, like this:

<PhoneList>
    <Region>
        <PhoneNumber Name="Main" Value="+1 (222) 555-1212" Primary="English (United States)" Secondary="" />
        <PhoneNumber Name="Main.VIP" Value="+1 (222) 555-1213" Primary="English (United States)" Secondary="" />
    </Region>
    <Region>
        <PhoneNumber Name="Main (Europe)" Value="+49 (0101) 056789" Primary="Deutsch (Germany)" Secondary="" />
        <PhoneNumber Name="Main.VIP  (Europe)" Value="+49 (0101) 056790" Primary="Deutsch (Germany)" Secondary="" />
    </Region>
    <Region>
        <PhoneNumber Name="Main (Asia)" Value="+61 3 1234 5678" Primary="English (Australia)" Secondary="" />
        <PhoneNumber Name="Main.VIP  (Asia)" Value="+61 3 1234 5680" Primary="English (Australia)" Secondary="" />
    </Region>
</PhoneList>

In this case, I needed only the “VIP” entries to show if the input XML had them listed. If the “VIP” entries weren’t there then they wanted the list formatted normally. Based on this criteria, we have two basic options: XPath or XSL Templates.

blog image 116

Option 1: XPath

The XPath “option” simply refers to how we will rely upon XPath queries within a single XML Template (the .xsl itself or the “main” template.) This would incur a broad change to the main template but is arguably a lower overall level of complexity. We start with inserting the <xsl:choose> node into the <xsl:template> node. This is the start of the XSL version of an if statement. Instead of:

if...
    then...
else...

, XSL uses:

choose...
    when...
otherwise...

We’ll use the <xsl:when test=""> node to inspect the value of the Name attribute to determine if the condition we are looking for matches up. Here is where XPath comes into it: we need to know, before any of the data gets processed, if our condition matches. XPath gives us that with the following:

//PhoneNumber[contains(@Name, 'VIP')]

Let’s break this down:

  • //“: This is XPath short-hand for “Select all of the following nodes, regardless of location in the XML hierarchy.
  • //PhoneNumber” : So, now we are asking for all PhoneNumber nodes (it’s case-sensitive) in the input XML document.
  • //PhoneNumber[...]” : The square brackets are used by XPath to determine the difference between node-level information and attribute-level information.
  • //PhoneNumber[contains(@Name, 'VIP')]” : contains() is a XPath method that allows you to determine if an referenced value includes the value you passed in and it returns a boolean. The @ symbol, when inside the brackets, specifically refers to the attribute name itself. Again, this is all case-sensitive.

When we put it all together, it looks like this:

<xsl:when test="//PhoneNumber[contains(@Name, 'VIP')]">
          ...
</xsl:when>

When we combine this new logic with the original template, we have the following:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
  <xsl:output method="xml" indent="yes"/>
  <xsl:template match="/PhoneList">
    <xsl:choose>
      <xsl:when test="//PhoneNumber[contains(@Name, 'VIP')]">
            <xsl:for-each select="Region/PhoneNumber">
                <xsl:if test="PhoneNumber[contains(@Name, 'VIP')]">
                  <tr>
                    <td class="region">
                      <xsl:value-of select="@Name"/>
                    </td>
                    <td class="number">
                      <xsl:value-of select="@Value"/>
                    </td>
                    <td class="language">
                      <span class="defaultLanguage">
                        <xsl:value-of select="@Primary"/>
                      </span>
                      <xsl:value-of select="@Secondary"/>
                    </td>
                  </tr>
                </xsl:if>
            </xsl:for-each>
       </xsl:when>
       <xsl:otherwise>
            <xsl:for-each select="Region/PhoneNumber">

              <tr>
                <td class="region">
                  <xsl:value-of select="@Name"/>
                </td>
                <td class="number">
                  <xsl:value-of select="@Value"/>
                </td>
                <td class="language">
                  <span class="defaultLanguage">
                    <xsl:value-of select="@Primary"/>
                  </span>
                  <xsl:value-of select="@Secondary"/>
                </td>
              </tr>
            </xsl:for-each>
        </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

So we placed the original template inside of the <xsl:when> and then repeated it inside the <xsl:otherwise>. We also had to utilize a <xsl:if> inside the first template occurence, since it would have still printed out every PhoneNumber node returned during the <xsl:for-each> if we didn’t constrain the return properly. As far as the first option goes, this does work but let’s look at the alternative option.

Option 2: XSL Template

The XSL Template “option,” in this case, means we will actually be using two XSL template: the main one and a referenced one. We still need to use the choose...when...otherwise pattern, like before, but we will attempt to establish a bit more organization:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
  <xsl:output method="xml" indent="yes"/>
  <xsl:template match="/PhoneList">
    <xsl:choose>
      <xsl:when test="//PhoneNumber[contains(@Name, 'VIP')]">
          <xsl:apply-templates select="//PhoneNumber[contains(@Name, 'VIP')]" />
      </xsl:when>
      <xsl:otherwise>
        <xsl:apply-templates select="//PhoneNumber" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template match="PhoneNumber">
    <tr>
      <td class="region">
        <xsl:value-of select="@Name"/>
      </td>
      <td class="number">
        <xsl:value-of select="@Value"/>
      </td>
      <td class="language">
        <span class="defaultLanguage">
          <xsl:value-of select="@Primary"/>
        </span>
        <xsl:value-of select="@Secondary"/>
      </td>
    </tr>
  </xsl:template>
</xsl:stylesheet>

So, we are using two <xsl:template> sections to define our output, instead of one. This opens our flexibility down the road, in case the criteria became more diverse. And, instead of using an <xsl:if> node, the <xsl:apply-templates select=""> gives us the same functionality by checking the criteria before applying the referenced criteria.

The basic layout of the original template is virtually untouched in the second template section but the complexity of the overall file layout has increased slightly.

Conclusion

In the end, both of these options work well enough. Over time, though, the XSL template option will probably prove to be the easier to enhance and\or maintain. I went with the second option for the flexibility as projects are constantly changing.

Hungry for more knowledge? Learn the 5 Pillars to Successful Software with our FREE downloadable e-book!





Create Your Application With A Successful Foundation