XSLT to the Rescue

XSLT TO THE RESCUE

The immense amount of work that I have had in the recent months kept me away from this blog for much longer than I would wish, but also forced me to brush up some skills that I started acquiring long ago.

I had so much work that I was forced to subcontract a large part of it. One of the projects that I needed to subcontract required that the translator followed the provided terminology strictly.

I had a multiterm termbase, but how do I provide that to a translator who only uses Cafetran?  The termbase had entries in many languages and also entries flagged as blacklisted.

Fortunately, you can export the whole termbase from Multiterm in tbx (termbase exchange) format. Cafetran can read TBX, but the file still contained all the redundant languages and blacklisted terms, and as much as I like Cafetran, it is too dumb for that. But there is this thing called XSLT, which is used to transform XML files into other XML files or other formats. And TBX is basically XML inside.

After a brief look at the TBX file helped me determine what I wanted to keep and what to skip. The following code did the job:

xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="langSet[not(@xml:lang='EN-GB' or @xml:lang='PL-PL')]"/>
 <xsl:template match="tig[descripGrp/descrip[@type='Forbidden']='True']"/>
</xsl:stylesheet>



I'm not much of an XSLT guru, and I found what I needed at Stackoverflow. The only real work I did was to figure out what to put in the last two elements before the closing tag. What this stylesheet does is to make a copy of the TBX file, unless the conditions stated in the last two elements are matched. And these conditions are: Skip langSet elements unless they contain English or Polish, and then skip tig elements, if they contain forbidden vocabulary.

To make any practical use of this stylesheet you need a tool that understands XML and can perform an XSLT transformation. If you are a translator, then Okapi Rainbow is a good choice. It is a free toolkit that offers many other functionalities useful for translators. The file to be transformed needs to be loaded on the first tab, and load the XSLT file under Utilities > XML Utilities > XSL Transformation... Shortly after clicking Execute you should get your transformed file, with a file name similar to the original, provided that your stylesheet is correct.

All in all, XSLT is a powerful tool, and if you have a knack for these things (I mean writing a bit of code), it can help you a lot.

Comments

Popular posts from this blog