swapTerms_withDict – つぶやき @ m2pi

Have you ever thought it’d be spiffy, if you could:

Prepare a list of terminology translations using Microsoft Excel, saved as a UTF-8 CSV file,
… then apply those translations to a document, instantaneously?

Well… I came up with this little shell tool named ‘swapTerms_withDict‘ that does just that.

Here’s the code:

#!/usr/bin/env python3
# swapTerms_withDict
# (c) 2019 Media to People, Incorporated.
# -*- coding: utf-8 -*-

import sys, os, re, codecs

try:
    sourceFile = sys.argv[1]
    dFile = sys.argv[2]
except IndexError:
    print('Usage: swapTerms_withDict textFile.txt CSV_dictionary_file.csv\nExiting...')
    sys.exit(0)

workingDir = os.getcwd()
dic={}
keys = []
rnFind = re.compile('\r\n')
rFind = re.compile('\r')

if (dFile.find('.csv') < 0):
    print('Usage: swapTerms_withDict textFile.txt CSV_dictionary_file.csv\nExiting...')
    sys.exit(0)
else:
    with open(sourceFile, encoding='utf-8') as doc:
        docBuf = doc.read()

    with open(dFile, encoding = "utf-8-sig") as d:
        dBuf = d.read()
        dBuf = rnFind.sub('\n', dBuf)
        dBuf = rFind.sub('\n', dBuf)
        for l in dBuf.split('\n'):
            ll = l.split(',')
            if len(ll)!=2:
                continue
            else:
                dic[ll[0]] = ll[1]

for i in dic:
    keys.append(i)

keys = sorted(keys, key=len, reverse=True)

for dItem in keys:
    docBuf = docBuf.replace(dItem, dic[dItem])

targetFilename = sourceFile.replace('.txt','_WITH_WORDS_SWAPPED.txt')

try:
    targetFile = codecs.open(targetFilename, 'w', 'utf_8')
    targetFile.write(docBuf)
    targetFile.close()
except IPError:
    print('Write error: File \"' + targetFilename + '\" could not be written. Exiting...')
    sys.exit(0)

You are welcome to use it, if you like 😉

Usage:

swapTerms_withDict sourceTxt_UTF-8.txt Dict_file.csv

Enjoy! 😉