MySQL 支持utf8mb4

<p>MySQL 支持utf8mb4</p><p>MySQL在 5.5.3 之后增加了 utf8mb4 字符编码,mb4即most bytes 4。简单说 utf8mb4 是 utf8 的超集并完全兼容utf8,能够用四个字节存储更多的字符。</p><p>而utf8 是 utf8mb3 的别名。标准的 UTF-8 字符集编码是可以用 1~4 个字节去编码21位字符,但是MySQL其实实现的utf8只是使用3个字节而已,utf8mb4才是真正意义上的 utf8。</p><p>如果数据库表字段设置的字符集不是utf8mb4,却插入类似emjoy表情的时候:</p><ul class=" list-paddingleft-2"><li><p>严格模式 下会出现 Incorrect string value: /xF0/xA1/x8B/xBE/xE5/xA2… for column &#39;name&#39; 这样的错误</p></li><li><p>非严格模式下此后的数据会被截断</p></li></ul><hr/><h2>排序字符集</h2><table><thead><tr class="firstRow"><th>Suffix</th><th>Meaning</th><th>Remark</th></tr></thead><tbody><tr><td>_ai</td><td>Accent insensitive</td><td></td></tr><tr><td>_as</td><td>Accent sensitive</td><td></td></tr><tr><td>_ci</td><td>Case insensitive</td><td>不分区大小写</td></tr><tr><td>_cs</td><td>case-sensitive</td><td>区分大小写</td></tr><tr><td>_bin</td><td>Binary</td><td>二进制存储,区分大小写</td></tr></tbody></table><h3>utf8mb4_ unicode_ ci VS utf8mb4_ general_ ci</h3><ul class=" list-paddingleft-2"><li><p>utf8_general_ci校对速度快,但准确度稍差。</p></li><li><p>utf8_unicode_ci准确度高,但校对速度稍慢。</p></li></ul><p>数据库一般默认选择utf8mb4_general_ci;<br/> 如果你的应用有德语、法语或者俄语,请一定使用utf8mb4_unicode_ci</p><h2>配置</h2><pre>vim/etc/my.cnf[client]default-character-set=utf8mb4[mysql]default-character-set=utf8mb4[mysqld]#character-set-client-handshake=FALSEcharacter-set-server=utf8mb4collation-server=utf8mb4_general_ciinit_connect=&#39;SETNAMESutf8mb4&#39;</pre><h2>检查目前MySQL的字符集</h2><pre>mysql&gt;SHOWVARIABLESWHEREVariable_nameLIKE&#39;character\_set\_%&#39;ORVariable_nameLIKE&#39;collation%&#39;;+--------------------------+-----------------+|Variable_name|Value|+--------------------------+-----------------+|character_set_client|utf8||character_set_connection|utf8||character_set_database|utf8||character_set_filesystem|binary||character_set_results|utf8||character_set_server|utf8||character_set_system|utf8||collation_connection|utf8_general_ci||collation_database|utf8_general_ci||collation_server|utf8_general_ci|+--------------------------+-----------------+10rowsinset(0.00sec)</pre><h2>总结</h2><ul class=" list-paddingleft-2"><li><p>mysql 版本 5.5.3+</p></li><li><p>MySQL Connector/J Java驱动5.1.13+</p></li><li><p>/etc/my.cnf 配置中 添加配置,详见上面</p></li><li><p>排序字符集选用 utf8mb4_unicode_ci</p></li><li><p>列(字段)&gt; 表 &gt; 数据库</p></li></ul><p>备注:<br/> 其实只要数据库支持utfbmb4(show char set),及时MySQL配置(/etc/my.cnf)中配置的默认字符集是utf8,也可以直接指定数据库、表、字段的字符集为utf8mb4,然后在连接的时候指定字符集为utf8mb4即可。<br/> 比如设置Java的连接参数中characterEncoding=utf8mb4</p><p>查看MySQL支持的字符集列表:</p><p>root@localhost:3306.sock [test]&gt;show char set;<br/> +----------+---------------------------------+---------------------+--------+<br/> | Charset | Description | Default collation | Maxlen |<br/> +----------+---------------------------------+---------------------+--------+<br/> | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |<br/> | dec8 | DEC West European | dec8_swedish_ci | 1 |<br/> | cp850 | DOS West European | cp850_general_ci | 1 |<br/> | hp8 | HP West European | hp8_english_ci | 1 |<br/> | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |<br/> | latin1 | cp1252 West European | latin1_swedish_ci | 1 |<br/> | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |<br/> | swe7 | 7bit Swedish | swe7_swedish_ci | 1 |<br/> | ascii | US ASCII | ascii_general_ci | 1 |<br/> | ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |<br/> | sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |<br/> | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |<br/> | tis620 | TIS620 Thai | tis620_thai_ci | 1 |<br/> | euckr | EUC-KR Korean | euckr_korean_ci | 2 |<br/> | koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |<br/> | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |<br/> | greek | ISO 8859-7 Greek | greek_general_ci | 1 |<br/> | cp1250 | Windows Central European | cp1250_general_ci | 1 |<br/> | gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |<br/> | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |<br/> | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |<br/> | utf8 | UTF-8 Unicode | utf8_general_ci | 3 |<br/> | ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |<br/> | cp866 | DOS Russian | cp866_general_ci | 1 |<br/> | keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |<br/> | macce | Mac Central European | macce_general_ci | 1 |<br/> | macroman | Mac West European | macroman_general_ci | 1 |<br/> | cp852 | DOS Central European | cp852_general_ci | 1 |<br/> | latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |<br/> | utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |<br/> | cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |<br/> | utf16 | UTF-16 Unicode | utf16_general_ci | 4 |<br/> | utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |<br/> | cp1256 | Windows Arabic | cp1256_general_ci | 1 |<br/> | cp1257 | Windows Baltic | cp1257_general_ci | 1 |<br/> | utf32 | UTF-32 Unicode | utf32_general_ci | 4 |<br/> | binary | Binary pseudo charset | binary | 1 |<br/> | geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |<br/> | cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |<br/> | eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |<br/> | gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 |<br/> +----------+---------------------------------+---------------------+--------+<br/> 41 rows in set (0.00 sec)</p><h2>建议</h2><p>数据库在设置字符集的时候,设置成utf8mb4格式!</p>
返回顶部 留言