Have you ever thought it’d be spiffy, if you could:
- Prepare a list of terminology translations using Microsoft Excel, saved as a UTF-8 CSV file,
- … then apply those translations to a document, instantaneously?
Well… I came up with this little shell tool named ‘swapTerms_withDict‘ that does just that.
Here’s the code:
#!/usr/bin/env python3
# swapTerms_withDict
# (c) 2019 Media to People, Incorporated.
# -*- coding: utf-8 -*-
import sys, os, re, codecs
try:
sourceFile = sys.argv[1]
dFile = sys.argv[2]
except IndexError:
print('Usage: swapTerms_withDict textFile.txt CSV_dictionary_file.csv\nExiting...')
sys.exit(0)
workingDir = os.getcwd()
dic={}
keys = []
rnFind = re.compile('\r\n')
rFind = re.compile('\r')
if (dFile.find('.csv') < 0):
print('Usage: swapTerms_withDict textFile.txt CSV_dictionary_file.csv\nExiting...')
sys.exit(0)
else:
with open(sourceFile, encoding='utf-8') as doc:
docBuf = doc.read()
with open(dFile, encoding = "utf-8-sig") as d:
dBuf = d.read()
dBuf = rnFind.sub('\n', dBuf)
dBuf = rFind.sub('\n', dBuf)
for l in dBuf.split('\n'):
ll = l.split(',')
if len(ll)!=2:
continue
else:
dic[ll[0]] = ll[1]
for i in dic:
keys.append(i)
keys = sorted(keys, key=len, reverse=True)
for dItem in keys:
docBuf = docBuf.replace(dItem, dic[dItem])
targetFilename = sourceFile.replace('.txt','_WITH_WORDS_SWAPPED.txt')
try:
targetFile = codecs.open(targetFilename, 'w', 'utf_8')
targetFile.write(docBuf)
targetFile.close()
except IPError:
print('Write error: File \"' + targetFilename + '\" could not be written. Exiting...')
sys.exit(0)
You are welcome to use it, if you like 😉
Usage:
swapTerms_withDict sourceTxt_UTF-8.txt Dict_file.csv
Enjoy! 😉
