中日韓統一表意文字(英語:CJK Unified Ideographs)①,又稱作統一漢字集(英語:Unihan),建立目的是將中、日、韓、越、壯、琉球文起源相同、本義相通、形狀一樣或稍異的表意文字,在ISO 10646及統一碼標準賦予相同編碼。這作業活動在統一碼標準稱為漢字統一(英語:Han unification)。整理出來的中日韓統一表意文字,由統一碼聯盟建置的Unihan資料庫維護。
最新的統一碼漢字版本為 Unicode v15 包括 15.0 (2022) & 15.1 (2023)。
底下附有各版本(v1.0 – v15.1)發表時間與當時增添的漢字的表格。
亦可參考此篇:
➤ 中日韓統一表意文字 在每一個 統一碼版本 所增加的字數與累積總字數。
最簡單的查閱 Unicode v15 版的漢字,可點閱 ➤ Unicode v15: 180 字 簡介版。(每個字區只取前面的 180 字)
或是各區只選頭尾兩字的 ➤ Unicode v15.1 測試版。
若想看完整的單區字集,可點選底下各個連結,即可看到本佔所提供該字集區的所有字(基本漢字區有 2 萬多字,擴充區 B 有 4 萬多字)
基本漢字 – 擴充區: Ext-A– Ext-B – Ext-C – Ext-D – Ext-E – Ext-F – Ext-G – Ext-H – Ext-I
本站只提供單區的字集並未提供各個單字的內涵,若想深入了解完整的單區字集,以及關於單區內各個單字的詳盡內涵,建議到 「字統网 zi.tools」 裡面的 「字碼 Encoding 頁」 點選該頁所提供的各個單區的連結,再點選你想深入了解的單字,它將為你展開該單字詳細資料.這裡附上在擴充區 I(剛出爐沒多久的 Unicode v15.1裡新加入的字集)內 第二個字 的內容截圖以供參考。
Unicode v15: 180 字 簡介版裡除了各字區外, 另有
太玄經符號:87
易經六十四卦符號:64 +
康熙字典部首:214
CJK 符號標點:64 +
CJK 兼容符號:32 +
GB 符號集 715 + 167 ...
CJK 部首擴展:115
CJK 筆劃:36
CJK 兼容表意文字:472
CJK 兼容表意文字補充:542
統一碼版本(發表年度)與該版本增加的 字數
Unicode version 統一碼 版本 | Base & Extension 基本漢字 與 擴充區 (Ext) | Characters Added 增添字數 |
1.0 (1991) | CJK Unified Ideographs (4E00–9FFF) on Plane 0 | 20,992 |
3.0 (1999) | CJK Unified Ideographs Extension A (3400–4DBF) on Plane 0 | 6,582 |
3.1 (2001) | CJK Unified Ideographs Extension B (20000–2A6DF) on Plane 2 | 42,720 |
5.2 (2009) | CJK Unified Ideographs Extension C (2A700–2B73F) on Plane 2 | 4,154 |
6.0 (2010) | CJK Unified Ideographs Extension D (2B740–2B81F) on Plane 2 | 222 |
8.0 (2015) | CJK Unified Ideographs Extension E (2B820–2CEAF) on Plane 2 | 5,762 |
10.0 (2017) | CJK Unified Ideographs Extension F (2CEB0 – 2EBEF) on Plane 2 | 7,473 |
13.0 (2020) | CJK Unified Ideographs Extension G (30000 – 3134F) on Plane 3 | 4,939 |
15.0 (2022) | CJK Unified Ideographs Extension H (31350 – 323AF) on Plane 3 | 4,192 |
15.1 (2023) | CJK Unified Ideographs Extension I (2EBF0 – 2EE5F) on Plane 2 | 622 |
- 點擊上面表內的連結,可查閱該字區內的所有的字。像基本漢字 (2萬多字)與 擴充區B(4萬多字)因字數龐大,會須多花一點時間下載。
- Plane 0 – 3: 請參閱此頁 ➤ Unicode 字元平面
Notes:
- CJK Unified Ideographs (4E00–9FFF): The first 20,902 characters of 20,992 in the block are arranged according to the Kangxi Dictionary ordering of radicals. In this system the characters written with the fewest strokes are listed first. The remaining characters were added later, and so are not in radical order.
- Extension B includes most of the characters used in the Kangxi Dictionary that are not in the basic CJK Unified Ideographs block.
註:
- 在第一區 20,992 個基本漢字中的前面 20,902 字,是根據康熙字典的部首,從少到多的筆劃數排序的。在這些字後加進去的字,就沒依據部首的筆劃數排序了。
- 擴充區 B 包含了絕大部分康熙字典裡沒收入基本漢字的字(所以所加入的字數是基本漢字字數的兩倍有餘)。
相關資料:統一碼版本與當年的總字數
資料來源:
1. CJK Unified Ideographs
2. 中日韓統一表意文字
①:ideographic vs. logographic
在維基英文版的 “CJK Unified Ideographs” 裡面,有人認為漢字不應該是 “ideographic” 而是 “logographic”。
The term ideographs is a misnomer, as the Chinese script is not ideographic but rather logographic.
這個說法,在中文版的 “中日韓統一表意文字” 好像沒看到。
根據維基自己的講法 “ideograph” 跟 “logograph” 的定義如下:
An ideogram or ideograph (from Greek ἰδέα idéa “idea” and γράφω gráphō “to write”) is a graphic symbol that represents an idea or concept, independent of any particular language, and specific words or phrases. Some ideograms are comprehensible only by familiarity with prior convention; others convey their meaning through pictorial resemblance to a physical object, and thus may also be referred to as pictograms.
The numerals and mathematical symbols are ideograms – 1 ‘one’, 2 ‘two’, + ‘plus’, = ‘equals’, and so on (compare the section “Mathematics” below). In English, the ampersand & is used for ‘and’ and (as in many languages) for Latin et (as in &c for et cetera), % for ‘percent’ (‘per cent’), # for ‘number’ (or ‘pound’, among other meanings), § for ‘section’, $ for ‘dollar’, € for ‘euro’, £ for ‘pound’, ° for ‘degree’, @ for ‘at’, and so on. The reason they are ideograms rather than logograms is that they do not denote fixed morphemes: they can be read in many different languages, not just English. There is not always only a single way to read them and they are in some cases read as a complex phrase rather than a single word.
In a written language, a logogram, logograph, or lexigraph (from Greek logo, “word”, and gramma “that which is drawn or written”) is a written character that represents a word or morpheme. Chinese characters (pronounced Hànzì in Mandarin Chinese, Kanji in Japanese, Hanja in Korean, chữ Hán in Vietnamese and Sawgun in Standard Zhuang) are generally logograms, as are many hieroglyphic and cuneiform characters. The use of logograms in writing is called logography, and a writing system that is based on logograms is called a logography or logographic system. All known logographies have some phonetic component, generally based on the rebus principle.
Alphabets and syllabaries are distinct from logographies in that they use individual written characters to represent sounds directly. Such characters are called phonograms in linguistics. Unlike logograms, phonograms do not have any inherent meaning. Writing language in this way is called phonemic writing or orthographic writing.