Have you ever thought it’d be spiffy, if you could:
- Prepare a list of terminology translations using Microsoft Excel, saved as a UTF-8 CSV file,
- … then apply those translations to a document, instantaneously?
Well… I came up with this little shell tool named ‘swapTerms_withDict‘ that does just that.
Here’s the code:
#!/usr/bin/env python3 # swapTerms_withDict # (c) 2019 Media to People, Incorporated. # -*- coding: utf-8 -*- import sys, os, re, codecs try: sourceFile = sys.argv[1] dFile = sys.argv[2] except IndexError: print('Usage: swapTerms_withDict textFile.txt CSV_dictionary_file.csv\nExiting...') sys.exit(0) workingDir = os.getcwd() dic={} keys = [] rnFind = re.compile('\r\n') rFind = re.compile('\r') if (dFile.find('.csv') < 0): print('Usage: swapTerms_withDict textFile.txt CSV_dictionary_file.csv\nExiting...') sys.exit(0) else: with open(sourceFile, encoding='utf-8') as doc: docBuf = doc.read() with open(dFile, encoding = "utf-8-sig") as d: dBuf = d.read() dBuf = rnFind.sub('\n', dBuf) dBuf = rFind.sub('\n', dBuf) for l in dBuf.split('\n'): ll = l.split(',') if len(ll)!=2: continue else: dic[ll[0]] = ll[1] for i in dic: keys.append(i) keys = sorted(keys, key=len, reverse=True) for dItem in keys: docBuf = docBuf.replace(dItem, dic[dItem]) targetFilename = sourceFile.replace('.txt','_WITH_WORDS_SWAPPED.txt') try: targetFile = codecs.open(targetFilename, 'w', 'utf_8') targetFile.write(docBuf) targetFile.close() except IPError: print('Write error: File \"' + targetFilename + '\" could not be written. Exiting...') sys.exit(0)
You are welcome to use it, if you like 😉
Usage:
swapTerms_withDict sourceTxt_UTF-8.txt Dict_file.csv
Enjoy! 😉