JAVA

java 한글,영문,일본,중국 정확한 인코딩 관련, hex -> utf8 문자열 hex 인코딩 방법 관련

AlrepondTech 2020. 9. 22. 04:33

=======================

JAVA - 한글 인코딩 변환 체크 한방에 끝내기

String word = "무궁화 꽃이 피었습니다.";
System.out.println("utf-8 -> euc-kr        : " + new String(word.getBytes("utf-8"), "euc-kr"));
System.out.println("utf-8 -> ksc5601       : " + new String(word.getBytes("utf-8"), "ksc5601"));
System.out.println("utf-8 -> x-windows-949 : " + new String(word.getBytes("utf-8"), "x-windows-949"));
System.out.println("utf-8 -> iso-8859-1    : " + new String(word.getBytes("utf-8"), "iso-8859-1"));

System.out.println("iso-8859-1 -> euc-kr        : " + new String(word.getBytes("iso-8859-1"), "euc-kr"));
System.out.println("iso-8859-1 -> ksc5601       : " + new String(word.getBytes("iso-8859-1"), "ksc5601"));
System.out.println("iso-8859-1 -> x-windows-949 : " + new String(word.getBytes("iso-8859-1"), "x-windows-949"));
System.out.println("iso-8859-1 -> utf-8         : " + new String(word.getBytes("iso-8859-1"), "utf-8"));

System.out.println("euc-kr -> utf-8         : " + new String(word.getBytes("euc-kr"), "utf-8"));
System.out.println("euc-kr -> ksc5601       : " + new String(word.getBytes("euc-kr"), "ksc5601"));
System.out.println("euc-kr -> x-windows-949 : " + new String(word.getBytes("euc-kr"), "x-windows-949"));
System.out.println("euc-kr -> iso-8859-1    : " + new String(word.getBytes("euc-kr"), "iso-8859-1"));

System.out.println("ksc5601 -> euc-kr        : " + new String(word.getBytes("ksc5601"), "euc-kr"));
System.out.println("ksc5601 -> utf-8         : " + new String(word.getBytes("ksc5601"), "utf-8"));
System.out.println("ksc5601 -> x-windows-949 : " + new String(word.getBytes("ksc5601"), "x-windows-949"));
System.out.println("ksc5601 -> iso-8859-1    : " + new String(word.getBytes("ksc5601"), "iso-8859-1"));

System.out.println("x-windows-949 -> euc-kr     : " + new String(word.getBytes("x-windows-949"), "euc-kr"));
System.out.println("x-windows-949 -> utf-8      : " + new String(word.getBytes("x-windows-949"), "utf-8"));
System.out.println("x-windows-949 -> ksc5601    : " + new String(word.getBytes("x-windows-949"), "ksc5601"));
System.out.println("x-windows-949 -> iso-8859-1 : " + new String(word.getBytes("x-windows-949"), "iso-8859-1"));

출처 - http://titis.tistory.com/entry/java-인코딩-변환-한방에-해결

=======================

출처: http://blog.javarouka.me/2011/09/new-string.html

자바 new String() 시 초보들이 하기 쉬운 실수...

라벨: java

캐릭터셋 변환에 대해 인터넷 블로그 등에 잘못 떠돌고 있는 괴담(?) 은아니고 괴코드(?) 가 있다.

// 예상과는 다른 동작을 하는 코드 String convert = new String(message.getBytes("euc-kr"), "utf-8");
이건 잘못된 API의 이해가 부른 오동작 코드 이다.

String::getBytes 는 자바 내부에 관리되는 유니코드 문자열을 인자로 지정된

캐릭터셋의 바이트 배열로 반환하는 메서드이며,
new String(바이트배열, 캐릭터셋) 생성자는 해당 바이트 배열을 주어진 캐릭터 셋으로

간주 하여 스트링을 만드는 생성자이다.

다음 예제를 보자

String d = "안녕 親9"; // 자바는 내부 문자열을 모두 유니코드 처리한다
  
// 유니코드 문자열을 UTF-8 캐릭터 바이트배열로 변환하여 반환
byte [] utf8 = d.getBytes("UTF-8");

// 유니코드 문자열을 EUC-KR 캐릭터 바이트배열로 변환하여 반환
byte [] euckr = d.getBytes("EUC-KR");
  
// 당연히 다른 바이트 배열이므로 사이즈가 다르다.
System.out.println("byte length > " + utf8.length); // byte length > 11
System.out.println("byte length > " + euckr.length); // byte length > 8
  
// 실수 코드.
// UTF-8 캐릭터셋으로 배열된 바이트배열을 EUC-KR로 해석할 수 없다.
System.out.println(new String(utf8, "EUC-KR"));

절대 캐릭터 변환이랍시고 new String(바이트배열, 변환하고싶은 희망사항 캐릭터셋) 을 쓰는 오류는 범하지 말자.

자바 내부에서 처리하는 문자열은 일괄적으로 같은 유니코드 형식으로 처리되며,
이기종 전송 등 필요한 경우에는 getBytes()로 해당 문자열 바이트 배열로 변환 뒤 전송하면 그만일 것이다.

다만 예전 구형 웹서버등을 사용한 프로젝트의 경우의 문자열을 원래 캐릭터로

복구하는 코드가 위의 new String 을 쓰는 경우가 있는데,
이건 웹 서버에서 캐릭터셋을 잘못 해석하여 주는 것을 바로잡는 코드이거나,

비슷한 캐릭터 코드에서 코드로 해석한 것이며, 캐릭터 셋 변환이 아님을 알아두자.

좋은 참고 글 하나 링크한다.

Java Character Set의 이해

이메일로 전송 BlogThis!Twitter에서 공유 Facebook에서 공유 Pinterest에 공유

시간: 오후 2:04

작성자: Hanghee Yi

댓글 9개:

익명2012년 6월 14일 오전 2:18답글
답글
좋은글 담아갑니다 ^^
Hanghee Yi2013년 3월 28일 오전 2:09
넵 감사합니다 ^^
답글
Zany Lee2013년 3월 26일 오후 3:03답글
답글
정말 좋은 글이네요. 잘 읽고 갑니다~~!
Hanghee Yi2013년 3월 28일 오전 2:09
댓글 감사합니다! ^^
답글
JONGHO JAMES JOO2013년 4월 8일 오후 5:04답글
와 좋은 글 감사드려요 ^^

안되서 고민하던건데.. 한번 다시 해봐야겠어요 :)
Jaejin Lee2013년 8월 1일 오후 1:44답글
좋은 글 잘 읽었습니다.^^
손종현2014년 4월 6일 오후 3:59답글
좋은 내용 감사합니다
예전 소스를 이용하여 개발중 이상동작하는 소스가 있어서 고민중이었는데 이놈이 문제였네요^^
최지원2014년 8월 14일 오전 2:17답글
와.. 감사합니다!! 덕분에 자바에서 mfc로 한글문자열 보내는 문제가 해결되었습니다!!! 정말 감사합니다><
휘파람2016년 5월 29일 오후 10:02답글
감사합니다... 이부분이 항상 헷갈렸는데 덕분에 확실히 알아갑니다

=======================

출처: https://kldp.org/node/116309

글쓴이: ckbcorp 작성 일시: 금, 2010/07/16 - 7:48오후

프로그래밍 QnA

우선...코드를 먼저 보시는 게 설명이 쉬울 듯 하여, 코드를 올립니다.

String fileEncoding=System.getProperty("file.encoding");
        System.out.println("file.encoding = "+fileEncoding);
 
        String Encoding = "한글";
        try {
        	String toBinaryRaw = new String(Encoding.getBytes() );
		System.out.println("Binary Raw Data:" + toBinaryRaw );
		ShowAllByte( toBinaryRaw );
        	String toISO_8859 = new String(Encoding.getBytes(),"ISO-8859-1");
		System.out.println("ISO-8859-1 Encoding : " + toISO_8859 );
		ShowAllByte( toISO_8859 );
		String toUtf_8 = new String(Encoding.getBytes(),"utf-8");
		System.out.println("UTF-8 Encoding : " + toUtf_8);
		ShowAllByte( toUtf_8 );
		String toEUCKR = new String(Encoding.getBytes(),"euc-kr");
		System.out.println("toEUCKR Encoding : " + toEUCKR );
		ShowAllByte( toEUCKR );
		String toUTF8_EUCKR = new String( Encoding.getBytes("utf-8"),"euc-kr");
		System.out.println("toUTF8_EUCKR Encoding : " + toUTF8_EUCKR );
		ShowAllByte( toUTF8_EUCKR  );
		String toksc5601 = new String(Encoding.getBytes(),"KSC5601");
		System.out.println("KSC5601 Encoding : " + toksc5601);			
		ShowAllByte( toksc5601 );
		String toms949 = new String(Encoding.getBytes(),"ms949");
		System.out.println("MS949 Encoding : " + toms949);			
		ShowAllByte( toms949 );
 
	} catch (UnsupportedEncodingException e) {
		e.printStackTrace();
	}

보시는 바와 같이, 한글 인코딩 / 디코딩 테스트를 위한 코드입니다.
이클립스 환경은 UTF-8로 되어 있습니다. 그래서 실행시키면

file.encoding = utf-8
Binary Raw Data:한글
Size: 6 Byte: [0xed 0x95 0x9c 0xea 0xb8 0x80 )
ISO-8859-1 Encoding : íê¸
Size: 12 Byte: [0xc3 0xad 0xc2 0x95 0xc2 0x9c 0xc3 0xaa 0xc2 0xb8 0xc2 0x80 )
UTF-8 Encoding : 한글
Size: 6 Byte: [0xed 0x95 0x9c 0xea 0xb8 0x80 )
toEUCKR Encoding : ��
Size: 9 Byte: [0xef 0xbf 0xbd 0xef 0xbf 0xbd 0xef 0xbf 0xbd )
toUTF8_EUCKR Encoding : ��
Size: 9 Byte: [0xef 0xbf 0xbd 0xef 0xbf 0xbd 0xef 0xbf 0xbd )
KSC5601 Encoding : ��
Size: 9 Byte: [0xef 0xbf 0xbd 0xef 0xbf 0xbd 0xef 0xbf 0xbd )
MS949 Encoding : �쒓�
Size: 9 Byte: [0xef 0xbf 0xbd 0xec 0x92 0x93 0xef 0xbf 0xbd )

이와 같이 나옵니다.

1. 목적은 UTF-8 환경 한글을 TCP/IP로 전송하여 EUC-KR 환경에서 보는 겁니다. 단, Client:UTF-8 이고 Server:EUC-KR 이라서, Client쪽에서 UTF-8을 EUC-KR로 바꿔주려 합니다.

근데, 처음에 new String( Encoding.getBytes(), "EUC-KR"); 해도 자꾸 한글이 깨지길레, 아무래도 이상해서 Client에서 전송 전에 Binary값을 찍어 보니 위와 같은 결과를 얻었습니다.

저는 String( Encoding.getBytes(), "EUC-KR" ) 하면 6Byte의 UTF-8 "한글" 이 4Byte의 EUC-KR "한글" 로 바뀔 거라고 생각했는데, 실제로 찍어보니 6바이트가 아니라 9바이트가 됐네요?

저걸 new String( Encoding.getBytes("UTF-8"), "EUC-KR"); 로도 해 봤습니다만...마찬가지구요.

인터넷 뒤져보면 다른 분들은 단순히 String( text.getBytes( FromEncode ), ToEncode ) 만으로도 문제없이 사용하시는 듯 한데,
저는 뭐가 문제인지 모르겠습니다.

혹시 제가 이해 자체가 잘못되었다면 잘못된 부분을 언급해 주시면 감사하겠구요.
만일 비슷한 경우를 겪으셨거나, 혹은 조언이 있으시다면, 주저없이 알려 주시면 대단히 감사하겠습니다.

이상입니다. 좋은 저녁 되시기 바랍니다.

‹ c++ 추상클래스 공부하다 막혀서 질문드립니다.C++막 공부하는 학생입니다.. string이나 char*을 typedef로 재정의 불가인가요? ›

byte[] getBytes(String charsetName) 으로하세요

글쓴이: emptynote 작성 일시: 토, 2010/07/17 - 1:44오전

byte[] getBytes() 대신 byte[] getBytes(String charsetName) 를 이용하세요.

java API를 보시면
-----------
byte[] getBytes()
Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array.
----------------
이렇게 설명하고있습니다. 여기서 주목할점은 "the platform's default charset"입니다.

String(byte[] bytes, String charsetName) 에서 파라미터 bytes는 변환할려는 charsetName에 정확하게 일치해야합니다.

그래야 제대로 보여줄수있습니다.

그런데, "플랫폼 디폴트 문자셋"(the platform's default charset)으로 바이트 배열을 가져와서는

그것을 각각의 문자셋 String로 변환을 하기때문에 엉뚱하게 보이는것이 맞습니다.

답글

답변 감사드립니다.

글쓴이: ckbcorp 작성 일시: 화, 2010/07/20 - 4:02오후

k9200544 님. 답변 감사드립니다. 그런 생각은 못 해 봤네요. 막연히 Encode/Decode 다 된다고 생각해서..

그럼, 자바는 Encode 함수는 없는 건가요?

언뜻 생각해도, Encode 함수가 없을 것 같지는 않은데요.

인터넷 뒤져보면 OutputStreamWrite() 같은건 Encode String 을 지정할 수 있다고 나오는데, 이 이야기는 [인코딩은 시스템 안팎으로 데이터가 드나들 때만 하면 된다] 라는 의미잖아요?

그런데, 이렇게 해 놨을 리가 없(?)다는 생각도 들구요.
프로그램 짤때 편의상 txt를 먼저 Encoding 한 다음에, 나중에 byte 단위로 서버에 보낼 거 같기도 하구요.
( 예를 들어 txt의 ByteSize를 구하려면, 인코딩 방식에 따라 ByteSize가 달라지니, 반드시 Encoding을 먼저 해야 ByteSize를 알 수 있잖아요. ㅡ,.ㅡ;; )
누군가가(?) 인코딩 함수를 구현해 버렸을 거 같기도 한데요. ( 막상 찾아보면 죄다 New String( txt.getByte()) 만 나오긴 합니다만. T_T )

이상입니다. 만일 제가 잘못 생각하고 있다면 지적 부탁드리구요, 좋은 하루 되시기 바랍니다.

행복은 희생없이는 얻을 수 없는 것인가?
시대는 불행없이는 넘을 수 없는 것인가?

—

-----------------------------------------
행복은 희생없이는 얻을 수 없는 것인가?
시대는 불행없이는 넘을 수 없는 것인가?

답글

(지나가는이)

글쓴이: 익명 사용자 작성 일시: 화, 2011/08/09 - 6:53오후

(제생각엔)
그런 함수는 없을것 같습니다.
character set간의 변환을 string construction외에 따로 할 필요가 없을것 같습니다. 직접 작성하신 코드가 굳이 말하자면 그런 함수가 되지 않을까 합니다.

답글

(지나가는이)

글쓴이: 익명 사용자 작성 일시: 화, 2011/08/09 - 6:53오후

답글

이렇게 해봐요~

글쓴이: 익명 사용자 작성 일시: 목, 2013/11/28 - 12:34오전

new String( Encoding.getBytes("ISO-8859-1"),"euc-kr")

=======================

출처: http://stackoverflow.com/questions/15749475/java-string-hex-to-string-ascii-with-accentuation

This question already has an answer here: Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte 7 answers I have the String String hex = "6174656ec3a7c3a36f"; and i wanna get the String output = "atenção" but in my test i only get String output = "aten????o"; what i m doing wrong? java utf-8 hex ascii
marked as duplicate by casperOne♦ Apr 2 '13 at 20:32 This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment

4 Answers

active oldest votes

Consider
Which prints: atenção
Basically, your hex string represents the hexidecimal encoding of the bytes that represent thecharacters in the string atenção when encoded in UTF-8.
To decode:

You first have to go from your hex string to bytes (AAA)
Then go from bytes to chars (BBB) -- this is dependent on the encoding, in your case UTF-8.
The go from chars to a string (CCC)

Your hex string appears to denote a UTF-8 string, rather than ISO-8859-1.
The reason I can say this is that if it was ISO-8859-1, you'd have two hex digits per character. Your hex string has 18 characters, but your expected output is only 7 characters. Hence, the hex string must be a variable width encoding, and not a single byte per character like ISO-8859-1.
The following program produces the output: atenção
If you change UTF-8 to ISO-8859-1, you'll see: atenÃ§Ã£o.

The Java Strings are Unicode: each character is encoded on 16 bits. Your String is - I suppose - a "C" string. You have to know the name of the character encoder and use CharsetDecoder.

The ç and ã are 16-bit characters, so they are not represented by a byte as you assume in your decode routine, but rather by a full word.
I would, instead of converting each byte to a char, convert the bytes to java Bytes, and then use a string routine to decode the array of Bytes to a string, allowing java the dull task of determining the decoding routine.
Of course, java may guess wrong, so you might have to know ahead of time what the encoding is, as per the answer given by @Aubin or @Martin Ellis

=======================

출처: http://www.mkyong.com/java/how-to-convert-hex-to-ascii-in-java/

Here’s a Java example to show how to convert Hex to ASCII or vice verse in Java. The conversion process is depend on this formula “Hex<==>Decimal<==>ASCII“.

ASCII to Hex – Convert String to char array, cast it to integer(decimal) follow by Integer.toHexString to convert it to Hex value.
Hex to ASCII – Cut the Hex value in pairs format, convert it to radix 16 interger(decimal) Integer.parseInt(hex, 16), and cast it back to char.

Example

public class StringToHex{
 
  public String convertStringToHex(String str){
 
	  char[] chars = str.toCharArray();
 
	  StringBuffer hex = new StringBuffer();
	  for(int i = 0; i < chars.length; i++){
	    hex.append(Integer.toHexString((int)chars[i]));
	  }
 
	  return hex.toString();
  }
 
  public String convertHexToString(String hex){
 
	  StringBuilder sb = new StringBuilder();
	  StringBuilder temp = new StringBuilder();
 
	  //49204c6f7665204a617661 split into two characters 49, 20, 4c...
	  for( int i=0; i<hex.length()-1; i+=2 ){
 
	      //grab the hex in pairs
	      String output = hex.substring(i, (i + 2));
	      //convert hex to decimal
	      int decimal = Integer.parseInt(output, 16);
	      //convert the decimal to character
	      sb.append((char)decimal);
 
	      temp.append(decimal);
	  }
	  System.out.println("Decimal : " + temp.toString());
 
	  return sb.toString();
  }
 
  public static void main(String[] args) {
 
	  StringToHex strToHex = new StringToHex();
	  System.out.println("\n***** Convert ASCII to Hex *****");
	  String str = "I Love Java!";  
	  System.out.println("Original input : " + str);
 
	  String hex = strToHex.convertStringToHex(str);
 
	  System.out.println("Hex : " + hex);
 
	  System.out.println("\n***** Convert Hex to ASCII *****");
	  System.out.println("Hex : " + hex);
	  System.out.println("ASCII : " + strToHex.convertHexToString(hex));
  }
}

Output

***** Convert ASCII to Hex *****
Original input : I Love Java!
Hex : 49204c6f7665204a61766121
 
***** Convert Hex to ASCII *****
Hex : 49204c6f7665204a61766121
Decimal : 7332761111181013274971189733
ASCII : I Love Java!

Reference

1. http://en.wikipedia.org/wiki/Hexadecimal
2. http://mindprod.com/jgloss/hex.html

728x90

=======================

출처: https://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html

Supported Encodings

The java.io.InputStreamReader, java.io.OutputStreamWriter, java.lang.String classes, and classes in the java.nio.charset package can convert between Unicode and a number of other character encodings. The supported encodings vary between different implementations of the Java Platform, Standard Edition 7 (Java SE 7). The class description for java.nio.charset.Charset lists the encodings that any implementation of the Java Platform, Standard Edition 7 is required to support.

Oracle's Java SE Development Kit 7 (Java SE 7) for all platforms (Solaris, Linux, and Microsoft Windows) and the Java SE Runtime Environment 7 (JRE 7) for Solaris and Linux support all encodings shown on this page. Oracle's JRE 7 for Microsoft Windows may be installed as a complete international version or as a European languages version. By default, the JRE 7 installer installs a European languages version if it recognizes that the host operating system only supports European languages. If the installer recognizes that any other language is needed, or if the user requests support for non-European languages in a customized installation, a complete international version is installed. The European languages version only supports the encodings shown in the following Basic Encoding Set table. The international version (which includes the lib/charsets.jar file) supports all encodings shown on this page.

The following tables show the encoding sets supported by Java SE 7. The canonical names used by the new java.nio APIs are in many cases not the same as those used in the java.io and java.lang APIs.

Basic Encoding Set (contained in lib/rt.jar)

Canonical Name for java.nio APICanonical Name for java.io API and java.lang APIDescription

IBM00858	Cp858	Variant of Cp850 with Euro character
IBM437	Cp437	MS-DOS United States, Australia, New Zealand, South Africa
IBM775	Cp775	PC Baltic
IBM850	Cp850	MS-DOS Latin-1
IBM852	Cp852	MS-DOS Latin-2
IBM855	Cp855	IBM Cyrillic
IBM857	Cp857	IBM Turkish
IBM862	Cp862	PC Hebrew
IBM866	Cp866	MS-DOS Russian
ISO-8859-1	ISO8859_1	ISO-8859-1, Latin Alphabet No. 1
ISO-8859-2	ISO8859_2	Latin Alphabet No. 2
ISO-8859-4	ISO8859_4	Latin Alphabet No. 4
ISO-8859-5	ISO8859_5	Latin/Cyrillic Alphabet
ISO-8859-7	ISO8859_7	Latin/Greek Alphabet (ISO-8859-7:2003)
ISO-8859-9	ISO8859_9	Latin Alphabet No. 5
ISO-8859-13	ISO8859_13	Latin Alphabet No. 7
ISO-8859-15	ISO8859_15	Latin Alphabet No. 9
KOI8-R	KOI8_R	KOI8-R, Russian
KOI8-U	KOI8_U	KOI8-U, Ukrainian
US-ASCII	ASCII	American Standard Code for Information Interchange
UTF-8	UTF8	Eight-bit Unicode (or UCS) Transformation Format
UTF-16	UTF-16	Sixteen-bit Unicode (or UCS) Transformation Format, byte order identified by an optional byte-order mark
UTF-16BE	UnicodeBigUnmarked	Sixteen-bit Unicode (or UCS) Transformation Format, big-endian byte order
UTF-16LE	UnicodeLittleUnmarked	Sixteen-bit Unicode (or UCS) Transformation Format, little-endian byte order
UTF-32	UTF_32	32-bit Unicode (or UCS) Transformation Format, byte order identified by an optional byte-order mark
UTF-32BE	UTF_32BE	32-bit Unicode (or UCS) Transformation Format, big-endian byte order
UTF-32LE	UTF_32LE	32-bit Unicode (or UCS) Transformation Format, little-endian byte order
x-UTF-32BE-BOM	UTF_32BE_BOM	32-bit Unicode (or UCS) Transformation Format, big-endian byte order, with byte-order mark
x-UTF-32LE-BOM	UTF_32LE_BOM	32-bit Unicode (or UCS) Transformation Format, little-endian byte order, with byte-order mark
windows-1250	Cp1250	Windows Eastern European
windows-1251	Cp1251	Windows Cyrillic
windows-1252	Cp1252	Windows Latin-1
windows-1253	Cp1253	Windows Greek
windows-1254	Cp1254	Windows Turkish
windows-1257	Cp1257	Windows Baltic
Not available	UnicodeBig	Sixteen-bit Unicode (or UCS) Transformation Format, big-endian byte order, with byte-order mark
x-IBM737	Cp737	PC Greek
x-IBM874	Cp874	IBM Thai
x-UTF-16LE-BOM	UnicodeLittle	Sixteen-bit Unicode (or UCS) Transformation Format, little-endian byte order, with byte-order mark

Extended Encoding Set (contained in lib/charsets.jar)

Canonical Name for java.nio APICanonical Name for java.io API and java.lang APIDescription

Big5	Big5	Big5, Traditional Chinese
Big5-HKSCS	Big5_HKSCS	Big5 with Hong Kong extensions, Traditional Chinese (incorporating 2001 revision)
EUC-JP	EUC_JP	JISX 0201, 0208 and 0212, EUC encoding Japanese
EUC-KR	EUC_KR	KS C 5601, EUC encoding, Korean
GB18030	GB18030	Simplified Chinese, PRC standard
GB2312	EUC_CN	GB2312, EUC encoding, Simplified Chinese
GBK	GBK	GBK, Simplified Chinese
IBM-Thai	Cp838	IBM Thailand extended SBCS
IBM01140	Cp1140	Variant of Cp037 with Euro character
IBM01141	Cp1141	Variant of Cp273 with Euro character
IBM01142	Cp1142	Variant of Cp277 with Euro character
IBM01143	Cp1143	Variant of Cp278 with Euro character
IBM01144	Cp1144	Variant of Cp280 with Euro character
IBM01145	Cp1145	Variant of Cp284 with Euro character
IBM01146	Cp1146	Variant of Cp285 with Euro character
IBM01147	Cp1147	Variant of Cp297 with Euro character
IBM01148	Cp1148	Variant of Cp500 with Euro character
IBM01149	Cp1149	Variant of Cp871 with Euro character
IBM037	Cp037	USA, Canada (Bilingual, French), Netherlands, Portugal, Brazil, Australia
IBM1026	Cp1026	IBM Latin-5, Turkey
IBM1047	Cp1047	Latin-1 character set for EBCDIC hosts
IBM273	Cp273	IBM Austria, Germany
IBM277	Cp277	IBM Denmark, Norway
IBM278	Cp278	IBM Finland, Sweden
IBM280	Cp280	IBM Italy
IBM284	Cp284	IBM Catalan/Spain, Spanish Latin America
IBM285	Cp285	IBM United Kingdom, Ireland
IBM297	Cp297	IBM France
IBM420	Cp420	IBM Arabic
IBM424	Cp424	IBM Hebrew
IBM500	Cp500	EBCDIC 500V1
IBM860	Cp860	MS-DOS Portuguese
IBM861	Cp861	MS-DOS Icelandic
IBM863	Cp863	MS-DOS Canadian French
IBM864	Cp864	PC Arabic
IBM865	Cp865	MS-DOS Nordic
IBM868	Cp868	MS-DOS Pakistan
IBM869	Cp869	IBM Modern Greek
IBM870	Cp870	IBM Multilingual Latin-2
IBM871	Cp871	IBM Iceland
IBM918	Cp918	IBM Pakistan (Urdu)
ISO-2022-CN	ISO2022CN	GB2312 and CNS11643 in ISO 2022 CN form, Simplified and Traditional Chinese (conversion to Unicode only)
ISO-2022-JP	ISO2022JP	JIS X 0201, 0208, in ISO 2022 form, Japanese
ISO-2022-KR	ISO2022KR	ISO 2022 KR, Korean
ISO-8859-3	ISO8859_3	Latin Alphabet No. 3
ISO-8859-6	ISO8859_6	Latin/Arabic Alphabet
ISO-8859-8	ISO8859_8	Latin/Hebrew Alphabet
JIS_X0201	JIS_X0201	JIS X 0201
JIS_X0212-1990	JIS_X0212-1990	JIS X 0212
Shift_JIS	SJIS	Shift-JIS, Japanese
TIS-620	TIS620	TIS620, Thai
windows-1255	Cp1255	Windows Hebrew
windows-1256	Cp1256	Windows Arabic
windows-1258	Cp1258	Windows Vietnamese
windows-31j	MS932	Windows Japanese
x-Big5-Solaris	Big5_Solaris	Big5 with seven additional Hanzi ideograph character mappings for the Solaris zh_TW.BIG5 locale
x-euc-jp-linux	EUC_JP_LINUX	JISX 0201, 0208, EUC encoding Japanese
x-EUC-TW	EUC_TW	CNS11643 (Plane 1-7,15), EUC encoding, Traditional Chinese
x-eucJP-Open	EUC_JP_Solaris	JISX 0201, 0208, 0212, EUC encoding Japanese
x-IBM1006	Cp1006	IBM AIX Pakistan (Urdu)
x-IBM1025	Cp1025	IBM Multilingual Cyrillic: Bulgaria, Bosnia, Herzegovinia, Macedonia (FYR)
x-IBM1046	Cp1046	IBM Arabic - Windows
x-IBM1097	Cp1097	IBM Iran (Farsi)/Persian
x-IBM1098	Cp1098	IBM Iran (Farsi)/Persian (PC)
x-IBM1112	Cp1112	IBM Latvia, Lithuania
x-IBM1122	Cp1122	IBM Estonia
x-IBM1123	Cp1123	IBM Ukraine
x-IBM1124	Cp1124	IBM AIX Ukraine
x-IBM1381	Cp1381	IBM OS/2, DOS People's Republic of China (PRC)
x-IBM1383	Cp1383	IBM AIX People's Republic of China (PRC)
x-IBM33722	Cp33722	IBM-eucJP - Japanese (superset of 5050)
x-IBM834	Cp834	IBM EBCDIC DBCS-only Korean
x-IBM856	Cp856	IBM Hebrew
x-IBM875	Cp875	IBM Greek
x-IBM921	Cp921	IBM Latvia, Lithuania (AIX, DOS)
x-IBM922	Cp922	IBM Estonia (AIX, DOS)
x-IBM930	Cp930	Japanese Katakana-Kanji mixed with 4370 UDC, superset of 5026
x-IBM933	Cp933	Korean Mixed with 1880 UDC, superset of 5029
x-IBM935	Cp935	Simplified Chinese Host mixed with 1880 UDC, superset of 5031
x-IBM937	Cp937	Traditional Chinese Host miexed with 6204 UDC, superset of 5033
x-IBM939	Cp939	Japanese Latin Kanji mixed with 4370 UDC, superset of 5035
x-IBM942	Cp942	IBM OS/2 Japanese, superset of Cp932
x-IBM942C	Cp942C	Variant of Cp942
x-IBM943	Cp943	IBM OS/2 Japanese, superset of Cp932 and Shift-JIS
x-IBM943C	Cp943C	Variant of Cp943
x-IBM948	Cp948	OS/2 Chinese (Taiwan) superset of 938
x-IBM949	Cp949	PC Korean
x-IBM949C	Cp949C	Variant of Cp949
x-IBM950	Cp950	PC Chinese (Hong Kong, Taiwan)
x-IBM964	Cp964	AIX Chinese (Taiwan)
x-IBM970	Cp970	AIX Korean
x-ISCII91	ISCII91	ISCII91 encoding of Indic scripts
x-ISO2022-CN-CNS	ISO2022_CN_CNS	CNS11643 in ISO 2022 CN form, Traditional Chinese (conversion from Unicode only)
x-ISO2022-CN-GB	ISO2022_CN_GB	GB2312 in ISO 2022 CN form, Simplified Chinese (conversion from Unicode only)
x-iso-8859-11	x-iso-8859-11	Latin/Thai Alphabet
x-JIS0208	x-JIS0208	JIS X 0208
x-JISAutoDetect	JISAutoDetect	Detects and converts from Shift-JIS, EUC-JP, ISO 2022 JP (conversion to Unicode only)
x-Johab	x-Johab	Korean, Johab character set
x-MacArabic	MacArabic	Macintosh Arabic
x-MacCentralEurope	MacCentralEurope	Macintosh Latin-2
x-MacCroatian	MacCroatian	Macintosh Croatian
x-MacCyrillic	MacCyrillic	Macintosh Cyrillic
x-MacDingbat	MacDingbat	Macintosh Dingbat
x-MacGreek	MacGreek	Macintosh Greek
x-MacHebrew	MacHebrew	Macintosh Hebrew
x-MacIceland	MacIceland	Macintosh Iceland
x-MacRoman	MacRoman	Macintosh Roman
x-MacRomania	MacRomania	Macintosh Romania
x-MacSymbol	MacSymbol	Macintosh Symbol
x-MacThai	MacThai	Macintosh Thai
x-MacTurkish	MacTurkish	Macintosh Turkish
x-MacUkraine	MacUkraine	Macintosh Ukraine
x-MS950-HKSCS	MS950_HKSCS	Windows Traditional Chinese with Hong Kong extensions
x-mswin-936	MS936	Windows Simplified Chinese
x-PCK	PCK	Solaris version of Shift_JIS
x-SJIS_0213	x-SJIS_0213	Shift_JISX0213
x-windows-50220	Cp50220	Windows Codepage 50220 (7-bit implementation)
x-windows-50221	Cp50221	Windows Codepage 50221 (7-bit implementation)
x-windows-874	MS874	Windows Thai
x-windows-949	MS949	Windows Korean
x-windows-950	MS950	Windows Traditional Chinese
x-windows-iso2022jp	x-windows-iso2022jp	Variant ISO-2022-JP (MS932 based)

=======================

저작자표시

'JAVA' 카테고리의 다른 글

java "Exception in thread" sychronized method problem, 자바언어에서 동기화의 어려움 관련 (0)	2020.09.20
[Java] Queue 의 종류와 용법, 자바 병렬 프로그래밍 - 구성 단위 (0)	2020.09.20
자바 이미지생성, 이미지 버퍼, 이미지 메모리 생성 관련 (0)	2020.09.18
자바 - 애플릿 post 관련 업로드 관련 (0)	2020.09.18
자바개발 FTP 관련 (0)	2020.09.18

현재글java 한글,영문,일본,중국 정확한 인코딩 관련, hex -> utf8 문자열 hex 인코딩 방법 관련

알레폰드의 테크 이모저모

java 한글,영문,일본,중국 정확한 인코딩 관련, hex -> utf8 문자열 hex 인코딩 방법 관련

JAVA - 한글 인코딩 변환 체크 한방에 끝내기

자바 new String() 시 초보들이 하기 쉬운 실수...

Java Character Set의 이해

byte[] getBytes(String charsetName) 으로하세요

답변 감사드립니다.

(지나가는이)

(지나가는이)

이렇게 해봐요~

marked as duplicate by casperOne♦ Apr 2 '13 at 20:32

4 Answers

Example

Reference

Supported Encodings

Basic Encoding Set (contained in lib/rt.jar)

Extended Encoding Set (contained in lib/charsets.jar)

'JAVA' 카테고리의 다른 글

'JAVA'의 다른글

티스토리툴바

java 한글,영문,일본,중국 정확한 인코딩 관련, hex -> utf8 문자열 hex 인코딩 방법 관련

자바 new String() 시 초보들이 하기 쉬운 실수...

marked as duplicate by casperOne♦ Apr 2 '13 at 20:32

4 Answers

Example

Reference

Supported Encodings

Basic Encoding Set (contained in lib/rt.jar)

Extended Encoding Set (contained in lib/charsets.jar)

'JAVA' 카테고리의 다른 글

'JAVA'의 다른글

관련글

티스토리툴바