www.webdeveloper.com
Results 1 to 2 of 2

Thread: How to save webpages without modifying their original charset?

Hybrid View

  1. #1
    Join Date
    Dec 2007
    Posts
    19

    Smile How to save webpages without modifying their original charset?

    I am writing a java project saving Chinese webpages.
    My local OS default charset is gb2312, the Chinese national standard charset.
    First I load the specified webpages to a StringBuffer, then flush the buffer to a specifed file.
    Critical codes are below:


    Code:
    	public static StringBuffer webPage2Buffer(URL url,String encoding)
    	  throws IOException
    	{
    		//String encoding = "UTF-8";	
    		StringBuffer result=new StringBuffer();
    		InputStream in=url.openStream();
                    BufferedReader buffRead = new BufferedReader(new InputStreamReader(in, encoding));
    		int c;
    		while((c=buffRead.read())!=-1) result.append((char) c);
    		return result;
    	}
    	
    	public static void Buffer2File(StringBuffer strBuf,String writeTime, String storingPlace)
    	{
    		File rltFile = new File(storingPlace+writeTime+".html");
    		try
    		{
    			PrintWriter printer = new PrintWriter(rltFile);
    			printer.println(strBuf.toString());
    			printer.close();
    		}
    		catch (IOException e1)
    		{
    			e1.printStackTrace();
    		};
    	}
    The webpages are originally encoded in UTF-8, but after being saved, they are transferred to gb2312,
    which is not wanted. What should I do to save them without modifying the original charset?
    Thanks!

  2. #2
    Join Date
    Dec 2007
    Posts
    19

    Smile Sovled.

    Solved.

    Code:
    			PrintWriter printer = new PrintWriter( new OutputStreamWriter( new FileOutputStream(rltFile), "UTF-8"));
    Hope can help!

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles