OCR – Exit ABBY Finereader, Enter Tesseract I've used the former for many years, and in many ways it is excellent software, but there are some things about it that cause us to part ways: the software is Russian and the company owning it, ABBYY deregistered itself in Russia shortly before the war on Ukraine began. So it is just a smokescreen, and you can find out more here: https://ain.capital/2022/08/11/russian-abbyy-still-works-in-ukraine/ A few days ago I uninstalled Finereader from my computer, releasing lots of space on the disk - it is truly a behemoth. Today was actually the first time I did an OCR of a PDF in Tesseract in order to translate it. There was a small table that I needed to re-create manually, (Tesseract can't do tables out of the box), but that is an inconvenience I am willing to suffer in order not to use ABBYY software any more. This article is a work in progress, to be continued....
Posts
- Get link
- X
- Other Apps
Skrypty w Pythonie Ułatwiające wypełnianie JPK_VAT7M od jpklibre (Oj dawno nic nie pisałem na blogu!) Do niedawna robiłem JPK za pomocą arkusza w Excelu, ale na nowym komputerze Excel 64-bitowy, więc arkusz nie działa. Dlaczego – nie będę się rozpisywać. Pora na rozwiązanie uniwersalne, które będzie działać niezależnie od systemu operacyjnego – skoroszyt jpklibre , bo komputerów z Windowsem mam 1, a z systemem Linux – kilka. Można oczywiście wypełnić arkusz ręcznie, ale to jest mozolne. Można też poddać w wątpliwość sens arkuszy, gdy pliki JPK powinno generować oprogramowanie księgowe. Owszem, moje generuje, ale uparło się przy tym robić korekty, które nic nie mają wspólnego z rzeczywistością, więc robię sam - półautomatycznie. Żeby nie wypełniać jpklibre całkowicie ręcznie, można skorzystać z rozwiązania Scriptforge, które dostępne jest "fabrycznie" dla LibreOffice Calc od wersji 7.3 LibreOffice. Trzeba tylko doinstalować rozszerzenie APSO , które ułatwi uruchamianie makr
Translating on Linux, the First Steps
- Get link
- X
- Other Apps
Translating on Linux More than a year since my last post, that's really something! But never mind. Suffice it to say that life isn't always a bed of roses. In this post I'm going to write about my most recent experience with Linux. This isn't my first encounter with this operating system, more than a decade ago I bought a laptop without an operating system, so I put Linux on it, and I even went through the painstaking process of installing Oracle's Java on it so that I could use one of the more reliable CAT tools that worked on Linux at that time, namely Swordfish from Maxprograms. But that is history. Some three years ago I bought a low-end Lenovo Ideapad 100 laptop. I needed something for my daily ‘field trips’, and the previous piece of hardware started to fail. The laptop came with preinstalled Windows 10 which ran satisfactorily for about a year or so, even though it came bundled with Lenovo's bloatware and as a translator I also installed my set of t
- Get link
- X
- Other Apps
Python 3 script for Deepl Translator Pro Among the MT (machine translation) providers Deepl is the new kid on the block, but it has been making waves ever since it appeared. Frankly speaking, as soon as I found out how good the Deepl MT is, I actually abandoned Google Translate. If I am to pay for a service, I'll pay for the one which delivers better results. Before Deepl announced the paid Pro service, there was (and still is) the free Python module called pydeepl. It used to work until recently, but the ‘backdoor’ making it possible to programmatically get free translations in the era of the paid service has recently been plugged. (For the record, I subscribed to the paid service as soon as it appeared). There also is a Deepl Translator plugin for SDL Trados Studio. It is quite simple and almost no settings, except for the API key, can be entered. My Python script for pydeepl was more sophisticated, for example I added a replacement table for cases where I did not like Deep
- Get link
- X
- Other Apps
USING A CAT TOOL WITHOUT SPENDING A PENNY OR A CENT So, here I am again after four months. And I wouldn't be writing this, if one of my clients wasn't doing maintenance of their online translation system, preventing me from downloading files for offline work at the weekend. Nowadays, translators working with translation agencies are often asked what CAT tool they use and/or are even expected to use a CAT tool. A CAT tool, regardless of the vendor, is your translation environment. It converts the original to its internal format (more often than not it is a flavor of the xliff format), it helps you re-use your translations by saving them to a translation memory, and helps you be consistent with terminology, by offering the glossary or termbase functionality). When you are done translating, it converts your translation back to the original format. These are the most basic functions of a CAT tool. But what if you're just starting out as a freelancer and cannot af
XSLT to the Rescue
- Get link
- X
- Other Apps
XSLT TO THE RESCUE The immense amount of work that I have had in the recent months kept me away from this blog for much longer than I would wish, but also forced me to brush up some skills that I started acquiring long ago. I had so much work that I was forced to subcontract a large part of it. One of the projects that I needed to subcontract required that the translator followed the provided terminology strictly. I had a multiterm termbase, but how do I provide that to a translator who only uses Cafetran? The termbase had entries in many languages and also entries flagged as blacklisted. Fortunately, you can export the whole termbase from Multiterm in tbx (termbase exchange) format. Cafetran can read TBX, but the file still contained all the redundant languages and blacklisted terms, and as much as I like Cafetran, it is too dumb for that. But there is this thing called XSLT, which is used to transform XML files into other XML files or other formats. And TBX is basically XML
- Get link
- X
- Other Apps
Macros, scripts, coding In the early years of my freelancing I worked mostly in Microsoft Word. I either did not use any CAT tool or I used the free version of Wordfast (now called Wordfast Classic) which is a set of macros in itself. I was looking for ways to make my tedious work of translating in Microsoft Word a bit easier. Soon I found out about macros. Initially I created them by recording, then I learned more about coding them. Now, though most of my translation work is done outside of Microsoft Word, in CAT tools, I still have most of the macros I created more than 10 years ago. Many of them fell into disuse, but there is one that I use regularly when I need to do some work in Word. It is the macro I called ‘TextFieldFit’. When I draw a text box, I run the macro to reduce the inner margins, remove the border around the box and reduce the font size. I use text boxes to translate drawings that did not get imported into my preferred CAT tool, the SDL Trados Studio. I