byte[] getBytes() 대신 byte[] getBytes(String charsetName) 를 이용하세요.
java API를 보시면 ----------- byte[] getBytes() Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array. ---------------- 이렇게 설명하고있습니다. 여기서 주목할점은 "the platform's default charset"입니다.
String(byte[] bytes, String charsetName) 에서 파라미터 bytes는 변환할려는 charsetName에 정확하게 일치해야합니다.
그래야 제대로 보여줄수있습니다.
그런데, "플랫폼 디폴트 문자셋"(the platform's default charset)으로 바이트 배열을 가져와서는
k9200544 님. 답변 감사드립니다. 그런 생각은 못 해 봤네요. 막연히 Encode/Decode 다 된다고 생각해서..
그럼, 자바는 Encode 함수는 없는 건가요?
언뜻 생각해도, Encode 함수가 없을 것 같지는 않은데요.
인터넷 뒤져보면 OutputStreamWrite() 같은건 Encode String 을 지정할 수 있다고 나오는데, 이 이야기는 [인코딩은 시스템 안팎으로 데이터가 드나들 때만 하면 된다] 라는 의미잖아요?
그런데, 이렇게 해 놨을 리가 없(?)다는 생각도 들구요. 프로그램 짤때 편의상 txt를 먼저 Encoding 한 다음에, 나중에 byte 단위로 서버에 보낼 거 같기도 하구요. ( 예를 들어 txt의 ByteSize를 구하려면, 인코딩 방식에 따라 ByteSize가 달라지니, 반드시 Encoding을 먼저 해야 ByteSize를 알 수 있잖아요. ㅡ,.ㅡ;; ) 누군가가(?) 인코딩 함수를 구현해 버렸을 거 같기도 한데요. ( 막상 찾아보면 죄다 New String( txt.getByte()) 만 나오긴 합니다만. T_T )
이상입니다. 만일 제가 잘못 생각하고 있다면 지적 부탁드리구요, 좋은 하루 되시기 바랍니다.
행복은 희생없이는 얻을 수 없는 것인가? 시대는 불행없이는 넘을 수 없는 것인가?
—
----------------------------------------- 행복은 희생없이는 얻을 수 없는 것인가? 시대는 불행없이는 넘을 수 없는 것인가?
I have the String String hex = "6174656ec3a7c3a36f"; and i wanna get the String output = "atenção" but in my test i only get String output = "aten????o"; what i m doing wrong? javautf-8hexascii
marked as duplicate by casperOne♦Apr 2 '13 at 20:32
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
Consider Which prints: atenção Basically, your hex string represents the hexidecimal encoding of the bytes that represent thecharacters in the string atenção when encoded in UTF-8. To decode:
You first have to go from your hex string to bytes (AAA)
Then go from bytes to chars (BBB) -- this is dependent on the encoding, in your case UTF-8.
The go from chars to a string (CCC)
Your hex string appears to denote a UTF-8 string, rather than ISO-8859-1. The reason I can say this is that if it was ISO-8859-1, you'd have two hex digits per character. Your hex string has 18 characters, but your expected output is only 7 characters. Hence, the hex string must be a variable width encoding, and not a single byte per character like ISO-8859-1. The following program produces the output: atenção If you change UTF-8 to ISO-8859-1, you'll see: atenção.
The Java Strings are Unicode: each character is encoded on 16 bits. Your String is - I suppose - a "C" string. You have to know the name of the character encoder and use CharsetDecoder.
The ç and ã are 16-bit characters, so they are not represented by a byte as you assume in your decode routine, but rather by a full word. I would, instead of converting each byte to a char, convert the bytes to java Bytes, and then use a string routine to decode the array of Bytes to a string, allowing java the dull task of determining the decoding routine. Of course, java may guess wrong, so you might have to know ahead of time what the encoding is, as per the answer given by @Aubin or @Martin Ellis
Here’s a Java example to show how to convert Hex to ASCII or vice verse in Java. The conversion process is depend on this formula “Hex<==>Decimal<==>ASCII“.
ASCII to Hex – Convert String to char array, cast it to integer(decimal) follow by Integer.toHexString to convert it to Hex value.
Hex to ASCII – Cut the Hex value in pairs format, convert it to radix 16 interger(decimal) Integer.parseInt(hex, 16), and cast it back to char.
Example
public class StringToHex{
public String convertStringToHex(String str){
char[] chars = str.toCharArray();
StringBuffer hex = new StringBuffer();
for(int i = 0; i < chars.length; i++){
hex.append(Integer.toHexString((int)chars[i]));
}
return hex.toString();
}
public String convertHexToString(String hex){
StringBuilder sb = new StringBuilder();
StringBuilder temp = new StringBuilder();
//49204c6f7665204a617661 split into two characters 49, 20, 4c...
for( int i=0; i<hex.length()-1; i+=2 ){
//grab the hex in pairs
String output = hex.substring(i, (i + 2));
//convert hex to decimal
int decimal = Integer.parseInt(output, 16);
//convert the decimal to character
sb.append((char)decimal);
temp.append(decimal);
}
System.out.println("Decimal : " + temp.toString());
return sb.toString();
}
public static void main(String[] args) {
StringToHex strToHex = new StringToHex();
System.out.println("\n***** Convert ASCII to Hex *****");
String str = "I Love Java!";
System.out.println("Original input : " + str);
String hex = strToHex.convertStringToHex(str);
System.out.println("Hex : " + hex);
System.out.println("\n***** Convert Hex to ASCII *****");
System.out.println("Hex : " + hex);
System.out.println("ASCII : " + strToHex.convertHexToString(hex));
}
}
Output
***** Convert ASCII to Hex *****
Original input : I Love Java!
Hex : 49204c6f7665204a61766121
***** Convert Hex to ASCII *****
Hex : 49204c6f7665204a61766121
Decimal : 7332761111181013274971189733
ASCII : I Love Java!
Thejava.io.InputStreamReader,java.io.OutputStreamWriter,java.lang.Stringclasses, and classes in thejava.nio.charsetpackage can convert between Unicode and a number of other character encodings. The supported encodings vary between different implementations of the Java Platform, Standard Edition 7 (Java SE 7). The class description forjava.nio.charset.Charsetlists the encodings that any implementation of the Java Platform, Standard Edition 7 is required to support.
Oracle's Java SE Development Kit 7 (Java SE 7) for all platforms (Solaris, Linux, and Microsoft Windows) and the Java SE Runtime Environment 7 (JRE 7) for Solaris and Linux support all encodings shown on this page. Oracle's JRE 7 for Microsoft Windows may be installed as a complete international version or as a European languages version. By default, the JRE 7 installer installs a European languages version if it recognizes that the host operating system only supports European languages. If the installer recognizes that any other language is needed, or if the user requests support for non-European languages in a customized installation, a complete international version is installed. The European languages version only supports the encodings shown in the following Basic Encoding Set table. The international version (which includes thelib/charsets.jarfile) supports all encodings shown on this page.
The following tables show the encoding sets supported by Java SE 7. The canonical names used by the newjava.nioAPIs are in many cases not the same as those used in thejava.ioandjava.langAPIs.
Basic Encoding Set (contained in lib/rt.jar)
Canonical Name forjava.nioAPICanonical Name forjava.ioAPI andjava.langAPIDescription
IBM00858
Cp858
Variant of Cp850 with Euro character
IBM437
Cp437
MS-DOS United States, Australia, New Zealand, South Africa
IBM775
Cp775
PC Baltic
IBM850
Cp850
MS-DOS Latin-1
IBM852
Cp852
MS-DOS Latin-2
IBM855
Cp855
IBM Cyrillic
IBM857
Cp857
IBM Turkish
IBM862
Cp862
PC Hebrew
IBM866
Cp866
MS-DOS Russian
ISO-8859-1
ISO8859_1
ISO-8859-1, Latin Alphabet No. 1
ISO-8859-2
ISO8859_2
Latin Alphabet No. 2
ISO-8859-4
ISO8859_4
Latin Alphabet No. 4
ISO-8859-5
ISO8859_5
Latin/Cyrillic Alphabet
ISO-8859-7
ISO8859_7
Latin/Greek Alphabet (ISO-8859-7:2003)
ISO-8859-9
ISO8859_9
Latin Alphabet No. 5
ISO-8859-13
ISO8859_13
Latin Alphabet No. 7
ISO-8859-15
ISO8859_15
Latin Alphabet No. 9
KOI8-R
KOI8_R
KOI8-R, Russian
KOI8-U
KOI8_U
KOI8-U, Ukrainian
US-ASCII
ASCII
American Standard Code for Information Interchange
UTF-8
UTF8
Eight-bit Unicode (or UCS) Transformation Format
UTF-16
UTF-16
Sixteen-bit Unicode (or UCS) Transformation Format, byte order identified by an optional byte-order mark
UTF-16BE
UnicodeBigUnmarked
Sixteen-bit Unicode (or UCS) Transformation Format, big-endian byte order
UTF-16LE
UnicodeLittleUnmarked
Sixteen-bit Unicode (or UCS) Transformation Format, little-endian byte order
UTF-32
UTF_32
32-bit Unicode (or UCS) Transformation Format, byte order identified by an optional byte-order mark
UTF-32BE
UTF_32BE
32-bit Unicode (or UCS) Transformation Format, big-endian byte order
UTF-32LE
UTF_32LE
32-bit Unicode (or UCS) Transformation Format, little-endian byte order
x-UTF-32BE-BOM
UTF_32BE_BOM
32-bit Unicode (or UCS) Transformation Format, big-endian byte order, with byte-order mark
x-UTF-32LE-BOM
UTF_32LE_BOM
32-bit Unicode (or UCS) Transformation Format, little-endian byte order, with byte-order mark
windows-1250
Cp1250
Windows Eastern European
windows-1251
Cp1251
Windows Cyrillic
windows-1252
Cp1252
Windows Latin-1
windows-1253
Cp1253
Windows Greek
windows-1254
Cp1254
Windows Turkish
windows-1257
Cp1257
Windows Baltic
Not available
UnicodeBig
Sixteen-bit Unicode (or UCS) Transformation Format, big-endian byte order, with byte-order mark
x-IBM737
Cp737
PC Greek
x-IBM874
Cp874
IBM Thai
x-UTF-16LE-BOM
UnicodeLittle
Sixteen-bit Unicode (or UCS) Transformation Format, little-endian byte order, with byte-order mark
Extended Encoding Set (contained in lib/charsets.jar)
Canonical Name forjava.nioAPICanonical Name forjava.ioAPI andjava.langAPIDescription
Big5
Big5
Big5, Traditional Chinese
Big5-HKSCS
Big5_HKSCS
Big5 with Hong Kong extensions, Traditional Chinese (incorporating 2001 revision)
EUC-JP
EUC_JP
JISX 0201, 0208 and 0212, EUC encoding Japanese
EUC-KR
EUC_KR
KS C 5601, EUC encoding, Korean
GB18030
GB18030
Simplified Chinese, PRC standard
GB2312
EUC_CN
GB2312, EUC encoding, Simplified Chinese
GBK
GBK
GBK, Simplified Chinese
IBM-Thai
Cp838
IBM Thailand extended SBCS
IBM01140
Cp1140
Variant of Cp037 with Euro character
IBM01141
Cp1141
Variant of Cp273 with Euro character
IBM01142
Cp1142
Variant of Cp277 with Euro character
IBM01143
Cp1143
Variant of Cp278 with Euro character
IBM01144
Cp1144
Variant of Cp280 with Euro character
IBM01145
Cp1145
Variant of Cp284 with Euro character
IBM01146
Cp1146
Variant of Cp285 with Euro character
IBM01147
Cp1147
Variant of Cp297 with Euro character
IBM01148
Cp1148
Variant of Cp500 with Euro character
IBM01149
Cp1149
Variant of Cp871 with Euro character
IBM037
Cp037
USA, Canada (Bilingual, French), Netherlands, Portugal, Brazil, Australia
IBM1026
Cp1026
IBM Latin-5, Turkey
IBM1047
Cp1047
Latin-1 character set for EBCDIC hosts
IBM273
Cp273
IBM Austria, Germany
IBM277
Cp277
IBM Denmark, Norway
IBM278
Cp278
IBM Finland, Sweden
IBM280
Cp280
IBM Italy
IBM284
Cp284
IBM Catalan/Spain, Spanish Latin America
IBM285
Cp285
IBM United Kingdom, Ireland
IBM297
Cp297
IBM France
IBM420
Cp420
IBM Arabic
IBM424
Cp424
IBM Hebrew
IBM500
Cp500
EBCDIC 500V1
IBM860
Cp860
MS-DOS Portuguese
IBM861
Cp861
MS-DOS Icelandic
IBM863
Cp863
MS-DOS Canadian French
IBM864
Cp864
PC Arabic
IBM865
Cp865
MS-DOS Nordic
IBM868
Cp868
MS-DOS Pakistan
IBM869
Cp869
IBM Modern Greek
IBM870
Cp870
IBM Multilingual Latin-2
IBM871
Cp871
IBM Iceland
IBM918
Cp918
IBM Pakistan (Urdu)
ISO-2022-CN
ISO2022CN
GB2312 and CNS11643 in ISO 2022 CN form, Simplified and Traditional Chinese (conversion to Unicode only)
ISO-2022-JP
ISO2022JP
JIS X 0201, 0208, in ISO 2022 form, Japanese
ISO-2022-KR
ISO2022KR
ISO 2022 KR, Korean
ISO-8859-3
ISO8859_3
Latin Alphabet No. 3
ISO-8859-6
ISO8859_6
Latin/Arabic Alphabet
ISO-8859-8
ISO8859_8
Latin/Hebrew Alphabet
JIS_X0201
JIS_X0201
JIS X 0201
JIS_X0212-1990
JIS_X0212-1990
JIS X 0212
Shift_JIS
SJIS
Shift-JIS, Japanese
TIS-620
TIS620
TIS620, Thai
windows-1255
Cp1255
Windows Hebrew
windows-1256
Cp1256
Windows Arabic
windows-1258
Cp1258
Windows Vietnamese
windows-31j
MS932
Windows Japanese
x-Big5-Solaris
Big5_Solaris
Big5 with seven additional Hanzi ideograph character mappings for the Solaris zh_TW.BIG5 locale
x-euc-jp-linux
EUC_JP_LINUX
JISX 0201, 0208, EUC encoding Japanese
x-EUC-TW
EUC_TW
CNS11643 (Plane 1-7,15), EUC encoding, Traditional Chinese
x-eucJP-Open
EUC_JP_Solaris
JISX 0201, 0208, 0212, EUC encoding Japanese
x-IBM1006
Cp1006
IBM AIX Pakistan (Urdu)
x-IBM1025
Cp1025
IBM Multilingual Cyrillic: Bulgaria, Bosnia, Herzegovinia, Macedonia (FYR)
x-IBM1046
Cp1046
IBM Arabic - Windows
x-IBM1097
Cp1097
IBM Iran (Farsi)/Persian
x-IBM1098
Cp1098
IBM Iran (Farsi)/Persian (PC)
x-IBM1112
Cp1112
IBM Latvia, Lithuania
x-IBM1122
Cp1122
IBM Estonia
x-IBM1123
Cp1123
IBM Ukraine
x-IBM1124
Cp1124
IBM AIX Ukraine
x-IBM1381
Cp1381
IBM OS/2, DOS People's Republic of China (PRC)
x-IBM1383
Cp1383
IBM AIX People's Republic of China (PRC)
x-IBM33722
Cp33722
IBM-eucJP - Japanese (superset of 5050)
x-IBM834
Cp834
IBM EBCDIC DBCS-only Korean
x-IBM856
Cp856
IBM Hebrew
x-IBM875
Cp875
IBM Greek
x-IBM921
Cp921
IBM Latvia, Lithuania (AIX, DOS)
x-IBM922
Cp922
IBM Estonia (AIX, DOS)
x-IBM930
Cp930
Japanese Katakana-Kanji mixed with 4370 UDC, superset of 5026
x-IBM933
Cp933
Korean Mixed with 1880 UDC, superset of 5029
x-IBM935
Cp935
Simplified Chinese Host mixed with 1880 UDC, superset of 5031
x-IBM937
Cp937
Traditional Chinese Host miexed with 6204 UDC, superset of 5033
x-IBM939
Cp939
Japanese Latin Kanji mixed with 4370 UDC, superset of 5035
x-IBM942
Cp942
IBM OS/2 Japanese, superset of Cp932
x-IBM942C
Cp942C
Variant of Cp942
x-IBM943
Cp943
IBM OS/2 Japanese, superset of Cp932 and Shift-JIS
x-IBM943C
Cp943C
Variant of Cp943
x-IBM948
Cp948
OS/2 Chinese (Taiwan) superset of 938
x-IBM949
Cp949
PC Korean
x-IBM949C
Cp949C
Variant of Cp949
x-IBM950
Cp950
PC Chinese (Hong Kong, Taiwan)
x-IBM964
Cp964
AIX Chinese (Taiwan)
x-IBM970
Cp970
AIX Korean
x-ISCII91
ISCII91
ISCII91 encoding of Indic scripts
x-ISO2022-CN-CNS
ISO2022_CN_CNS
CNS11643 in ISO 2022 CN form, Traditional Chinese (conversion from Unicode only)
x-ISO2022-CN-GB
ISO2022_CN_GB
GB2312 in ISO 2022 CN form, Simplified Chinese (conversion from Unicode only)
x-iso-8859-11
x-iso-8859-11
Latin/Thai Alphabet
x-JIS0208
x-JIS0208
JIS X 0208
x-JISAutoDetect
JISAutoDetect
Detects and converts from Shift-JIS, EUC-JP, ISO 2022 JP (conversion to Unicode only)
x-Johab
x-Johab
Korean, Johab character set
x-MacArabic
MacArabic
Macintosh Arabic
x-MacCentralEurope
MacCentralEurope
Macintosh Latin-2
x-MacCroatian
MacCroatian
Macintosh Croatian
x-MacCyrillic
MacCyrillic
Macintosh Cyrillic
x-MacDingbat
MacDingbat
Macintosh Dingbat
x-MacGreek
MacGreek
Macintosh Greek
x-MacHebrew
MacHebrew
Macintosh Hebrew
x-MacIceland
MacIceland
Macintosh Iceland
x-MacRoman
MacRoman
Macintosh Roman
x-MacRomania
MacRomania
Macintosh Romania
x-MacSymbol
MacSymbol
Macintosh Symbol
x-MacThai
MacThai
Macintosh Thai
x-MacTurkish
MacTurkish
Macintosh Turkish
x-MacUkraine
MacUkraine
Macintosh Ukraine
x-MS950-HKSCS
MS950_HKSCS
Windows Traditional Chinese with Hong Kong extensions