Character encoding tools Encoding validator: used for cleansing corpora that include unexpected characters. Character encoding converter: can optionally emulate non-ascii characters with ascii strings. with encoding
GICAS Asian Scripts (most of the website is in Japanese). GICAS: Grammatological Informatics based on Corpora of Asian Scripts gicas with asianencodingsscriptssoftware
IANA charset registry These are the official names for character sets that may be used in the Internet and may be referred to in Internet documentation. with characterencodings