Different punctuation marks between English and Chinese or Japanese
Here is a short explanation that can become handy if you ever need to work with computer text in Chinese or Japanese.
When you see two same texts written in English and Chinese (or Japanese), there are some differences, that are easy to overlook.
Compare these two short texts written in English and Chinese:
Shut up! Be quiet!
Syntax of English text is this:
WORD+SPACE+WORD+EXCLAMATION MARK+SPACE+(another sentence)
Pretty common, no? And not only for English but for most Latin-based scripts.
What are the differences in Chinese text?
First, there is no sequence WORD+SPACE+WORD. In Chinese, there is no space between words. There is no reason why you should put space between words in Chinese. Because one Chinese character usually also means one word. We have something like WORDWORD syntax in Chinese.
Instead of Shut up we have 闭嘴 (and not 闭 嘴).
What is between sentences?
Now comes the interesting part. How to separate sentences?
In English we use:
Could look similar in Chinese text, but it isn’t.
Because Chinese script do not use spaces between words, they have no reason to put spaces between sentences. So Chinese version is like this:
But wait! Can you see this?
Isn’t there some space between sentences? No there is not. Really!
But can’t you see the space? Yes, you can, but that is the Chinese version of an exclamation mark.
It is called Fullwidth Exclamation Mark and it is a different character than the Latin Exclamation Mark. Look these two characters are not the same: “!” “！”.
Both characters are even in the different Unicode section:
Latin Exclamation Mark: U+0021
Fullwidth Exclamation Mark: U+FF01
Fullwidth Exclamation Mark contains space in its visual representation. There is a visible space but not a written space!
Chinese syntax is:
SENTENCE1+FULLWIDTH EXCLAMATION MARK+SENTENCE2
No space is written between sentences, just one character: “！”.
There are more Fullwidth variants for many other Latin characters, some examples:
║ Name ║ Latin ║ Fullwidth ║
║ Exclamation mark ║ ! ║ ！ ║
║ Question mark ║ ? ║ ？ ║
║ Comma ║ , ║ ， ║
║ Colon ║ : ║ ： ║
║ Left Parenthesis ║ ( ║ （ ║
║ Right Parenthesis ║ ) ║ ） ║
You can check any Chinese text (eg. Wikipedia) for them.
In Japanese script, the mechanism is pretty similar. I found some small differences eg. comma in Japanese script “、” is different from Fullwidth comma “，” that is used in Chinese script for some historical reasons that I did not explore yet.
Korean script does not need these Fullwidth variants. Korean script uses the space between words, so space between sentences is made the same way as we are use to in Latin scripts.
Why I need to learn this
I do not know Chinese nor Japanese, but I needed to understand this syntax for my project Subfilter for Netflix to discover why my parsing process doesn’t work for those languages.
Well in the next version will! You can try (version for Chrome and Firefox are available). Especially if you like movies and learning foreign languages.
Hope this article was interesting to you.
If you like more such information, you can follow me on Twitter.