UTF8 charset problem

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

UTF8 charset problem

Pinocchio
This post was updated on .
Hello there!

I'm trying to use thymeleaf 1.1.2 and spring 3.1 with utf-8 charset, but that doesn't work.

My configuration is:

servlet-context.xml

<beans:bean class="org.thymeleaf.spring3.view.ThymeleafViewResolver">
	<beans:property name="templateEngine" ref="templateEngine" />
	<beans:property name="characterEncoding" value="UTF-8" />
	<beans:property name="contentType" value="text/html; charset=UTF-8" />
</beans:bean>

home.html

<!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org">
<head lang="pl">
	<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
	<title>Insert title here</title>
</head>
<body>
<p>À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß</p>
<p>Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ</p>
<p>Ȁ ȁ Ȃ ȃ Ȅ ȅ Ȇ ȇ Ȉ ȉ Ȋ ȋ Ȍ ȍ Ȏ ȏ Ȑ ȑ Ȓ ȓ Ȕ ȕ Ȗ ȗ Ș ș Ț ț Ȝ ȝ Ȟ ȟ</p>
<p>𦤀 𦤁 𦤂 𦤃 𦤄 𦤅 𦤆 𦤇 𦤈 𦤉 𦤊 𦤋 𦤌 𦤍 𦤎 𦤏 𦤐 𦤑 𦤒 𦤓 𦤔 𦤕 𦤖 𦤗 𦤘 𦤙 𦤚 𦤛 𦤜 𦤝 𦤞 𦤟</p>
<p>Τη γλώσσα μου έδωσαν ελληνική<br/>
το σπίτι φτωχικό στις αμμουδιές του Ομήρου.<br/>
Μονάχη έγνοια η γλώσσα μου στις αμμουδιές του Ομήρου</p>
<p>На берегу пустынных волн<br/>
Стоял он, дум великих полн,<br/>
И вдаль глядел. Пред ним широко<br/>
Река неслася; бедный чёлн<br/>
По ней стремился одиноко.<br/>
По мшистым, топким берегам<br/>
Чернели избы здесь и там,<br/>
Приют убогого чухонца;<br/>
И лес, неведомый лучам<br/>
В тумане спрятанного солнца,<br/>
Кругом шумел.</p>
<p>Mogę jeść szkło i mi nie szkodzi.</p>
<p>Pchnąć w tę łódź jeża lub osiem skrzyń fig</p>
</body>
</html>

Result from themeleaf:

À � Â � Ä Å Æ Ç � É Ê Ë Ì Í Î Ï � Ñ Ò Ó Ô Õ Ö × � Ù Ú Û Ü Ý Þ ß

Ā � Ă � Ą ą Ć ć � ĉ Ċ ċ Č č Ď ď � đ Ē ē Ĕ ĕ Ė ė � ę Ě ě Ĝ ĝ Ğ ğ

Ȁ � Ȃ � Ȅ ȅ Ȇ ȇ � ȉ Ȋ ȋ Ȍ ȍ Ȏ ȏ � ȑ Ȓ ȓ Ȕ ȕ Ȗ ȗ � ș Ț ț Ȝ ȝ Ȟ ȟ

𦤀 � 𦤂 � 𦤄 𦤅 𦤆 𦤇 � 𦤉 𦤊 𦤋 𦤌 𦤍 𦤎 𦤏 � 𦤑 𦤒 𦤓 𦤔 𦤕 𦤖 𦤗 � 𦤙 𦤚 𦤛 𦤜 𦤝 𦤞 𦤟

Τη γλώ��α μου έδω�αν ελληνική
το �πίτι φτωχικό �τις αμμουδιές του Ομή�ου.
Μονάχη έγνοια η γλώ��α μου �τις αμμουδιές του Ομή�ου

На берег� п��тынных волн
Стоял он, д�м великих полн,
� вдаль глядел. Пред ним �ироко
Река не�ла�я; бедный чёлн
По ней �тремил�я одиноко.
По м�и�тым, топким берегам
Чернели избы зде�ь и там,
Приют �богого ч�хонца;
� ле�, неведомый л�чам
В т�мане �прятанного �олнца,
Кр�гом ��мел.

Mogę jeść szkło i mi nie szkodzi.

Pchnąć w tę łódź jeża lub osiem skrzyń fig

Result from pure html page:

À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß

Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ

Ȁ ȁ Ȃ ȃ Ȅ ȅ Ȇ ȇ Ȉ ȉ Ȋ ȋ Ȍ ȍ Ȏ ȏ Ȑ ȑ Ȓ ȓ Ȕ ȕ Ȗ ȗ Ș ș Ț ț Ȝ ȝ Ȟ ȟ

𦤀 𦤁 𦤂 𦤃 𦤄 𦤅 𦤆 𦤇 𦤈 𦤉 𦤊 𦤋 𦤌 𦤍 𦤎 𦤏 𦤐 𦤑 𦤒 𦤓 𦤔 𦤕 𦤖 𦤗 𦤘 𦤙 𦤚 𦤛 𦤜 𦤝 𦤞 𦤟

Τη γλώσσα μου έδωσαν ελληνική
το σπίτι φτωχικό στις αμμουδιές του Ομήρου.
Μονάχη έγνοια η γλώσσα μου στις αμμουδιές του Ομήρου

На берегу пустынных волн
Стоял он, дум великих полн,
И вдаль глядел. Пред ним широко
Река неслася; бедный чёлн
По ней стремился одиноко.
По мшистым, топким берегам
Чернели избы здесь и там,
Приют убогого чухонца;
И лес, неведомый лучам
В тумане спрятанного солнца,
Кругом шумел.

Mogę jeść szkło i mi nie szkodzi.

Pchnąć w tę łódź jeża lub osiem skrzyń fig

Tomcat server connector charset is set to UTF-8.
Thanks a lot for help :)
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 charset problem

Zemi
Administrator
Hello,

try to set the UTF-8 character encoding also in the TemplateResolver element, something like

  <beans:bean id="templateResolver"
        class="org.thymeleaf.templateresolver.ServletContextTemplateResolver">
    <beans:property name="prefix" value="/WEB-INF/templates/" />
    <beans:property name="suffix" value=".html" />
    <beans:property name="characterEncoding" value="UTF-8" />
    <beans:property name="templateMode" value="HTML5" />
  </beans:bean>

Reply | Threaded
Open this post in threaded view
|

Re: UTF8 charset problem

Pinocchio
It works :)
Character encoding should be set in TemplateResolver and ThymeleafViewResolver.

Thx!
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 charset problem

danielfernandez
Administrator
Pinocchio wrote
Character encoding should be set in TemplateResolver and ThymeleafViewResolver.
Hi,

Just for the sake of documentation:

1. In TemplateResolver you set the character encoding in which your files ARE in disk, this is, the one that should be used for reading them and converting their bytes into chars (characters are encoding-independent).

2. In ThymeleafViewResolver you set the character encoding in which your processed pages should be written into the HTTP response's output stream, this is, the one that should be used for converting your characters back into bytes for transferring them over the network.

Regards,
Daniel.

Reply | Threaded
Open this post in threaded view
|

Re: UTF8 charset problem

tsuyoshi
I hope default encoding is utf-8:)
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 charset problem

Selim Ober
tsuyoshi wrote
I hope default encoding is utf-8:)
It seems not. My UTF-8 encoded files on disk started to render correctly only after adding characterEncoding to ServletContextTemplateResolver.

Thanks for to heads-up.
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 charset problem

danielfernandez
Administrator
Selim Ober wrote
tsuyoshi wrote
I hope default encoding is utf-8:)
It seems not. My UTF-8 encoded files on disk started to render correctly only after adding characterEncoding to ServletContextTemplateResolver.
If you are using Spring, the default encoding will be ISO-8859-1, because that is the default encoding of Spring MVC applications when using JSP. Thymeleaf just follows the principle of least surprise there.

Regards,
Daniel.

Reply | Threaded
Open this post in threaded view
|

Re: UTF8 charset problem

blandger
I've encountered error with utf-8 in project, so I decided to raise the old thread again.

Looks like I have all things setup, but something is still missing. Your help is appreciate.
Basically I'm getting that error after POST form submit.
The text fields get spoiled after page refresh (after validation error for example).

1.
Here is my Java config settings:

    @Bean
    public ServletContextTemplateResolver templateResolver() {
        ServletContextTemplateResolver resolver = new ServletContextTemplateResolver();
        resolver.setPrefix("/WEB-INF/templates/");
        resolver.setTemplateMode("HTML5");
        resolver.setCharacterEncoding("UTF-8");
       ...........

    @Bean
    public AjaxThymeleafViewResolver tilesViewResolver() {
        AjaxThymeleafViewResolver thymeleafViewResolver = new AjaxThymeleafViewResolver(); // web flow ajax
        thymeleafViewResolver.setViewClass(FlowAjaxThymeleafTilesView.class); // web flow + thymeleaf ajax
        thymeleafViewResolver.setCharacterEncoding("UTF-8");
        thymeleafViewResolver.setContentType("text/html; charset=UTF-8");

       ...........

    @Bean
    public ThymeleafTilesConfigurer tilesConfigurer() {
        ThymeleafTilesConfigurer tilesConfigurer = new ThymeleafTilesConfigurer();
        tilesConfigurer.setDefinitions(new String[]{"/WEB-INF/**/tiles-defs.xml"});
        return tilesConfigurer;
    }

2.
In my Tiles 'layout.html' template file I have following:
<!DOCTYPE html>
<html lang="en"
      xmlns="http://www.w3.org/1999/xhtml"
      xmlns:th="http://www.thymeleaf.org"
      xmlns:tiles="http://www.thymeleaf.org"
      xmlns:sec="http://www.thymeleaf.org/thymeleaf-extras-springsecurity3">
<head>
    <meta charset="utf-8" />
.......
</head>
<body>
</body>
</html>

3.
I use IntelliJ IDE, my files are utf-8 encoded in project (I suppose without BOM).

4. I'm use Tomcat 7.0.32 x64, Windows 7 x64
I've made changes into tomcat's config /conf/server.xml like:
  <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" URIEncoding="UTF-8" /> 

5.
I can receive the same text value into the 'GET' method and display correct value inside generated html FORM.

But If I make page submit (POST) (I can put another Russian text into field or leave received value), after that I see bound, spoiled values in form bean for such text field in Spring controller class.


    @RequestMapping(value = "/venue", method = RequestMethod.GET)
    public String addVenuePlace(
            ......
            @RequestParam String address,
            @ModelAttribute("formBean") VenuePlace formBean,
            Model model) {
      ....... // I receive correct russian 'address' param value by GET
        formBean.setAddress(address); // I'm setting it to formBean instance. it's shown correctly on page
        return "venue/venue"; // renders /WEB-INF/templates/venue/venue.html
    }

    @RequestMapping(value = "/venue", method = RequestMethod.POST)
    public String saveVenue(
            Locale locale,
            @ModelAttribute("formBean") @Valid VenuePlace formBean,
            BindingResult result,
            RedirectAttributes redirectAttrs,
            SessionStatus sessionStatus,
            Model model) {
........
     formBean.getAddress(); // shows spoiled!! value in debugger
     // value = "ЧеÑ\u0080вонозаводÑ\u0081кий Ñ\u0080айон"
...........


Here is my page headers in Chrome sent on form submit:
Request URL:http://localhost:8080/venue
Request Method:POST
Status Code:200 OK
Request Headersview source
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:172
Content-Type:application/x-www-form-urlencoded
Cookie:............
Host:localhost:8080
Origin:http://localhost:8080
Referer:http://localhost:8080/venue
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17
Form Dataview sourceview URL encoded
address:Коминтерновский район
Response Headersview source
Cache-Control:no-cache
Cache-Control:no-store
Content-Language:en-US
Content-Length:6092
Content-Type:text/html;charset=UTF-8
Date:Mon, 18 Feb 2013 19:25:11 GMT
Expires:Thu, 01 Jan 1970 00:00:00 GMT
Pragma:no-cache
Server:Apache-Coyote/1.1

What else can be wrong with my config or files ?
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 charset problem

Zemi
Administrator
Hello,

according to
   http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8

you should set up a filter like

    <filter>
        <filter-name>encodingFilter</filter-name>
        <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
        <init-param>
            <param-name>encoding</param-name>
            <param-value>UTF-8</param-value>
        </init-param>
        <init-param>
            <param-name>forceEncoding</param-name>
            <param-value>true</param-value>
        </init-param>
    </filter>

and avoid any former valves or filters (changing the order).

Reply | Threaded
Open this post in threaded view
|

Re: UTF8 charset problem

blandger
Thanks for mention that, I forgot about it, but... I have it in web Java config:

public class MainWebAppInitializer implements WebApplicationInitializer {
    public void onStartup(ServletContext servletContext) throws ServletException {
.......
        registerCharacterEncodingFilter(servletContext);
........
    }

    private void registerCharacterEncodingFilter(final ServletContext servletContext) {
        CharacterEncodingFilter characterEncodingFilter = new CharacterEncodingFilter();
        characterEncodingFilter.setEncoding("UTF-8");
        characterEncodingFilter.setForceEncoding(true);
        FilterRegistration.Dynamic filter = servletContext.addFilter("characterEncodingFilter", characterEncodingFilter);
        filter.addMappingForUrlPatterns(null, true, "/*");
    }
.........
}
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 charset problem

blandger
It's still unsolved, ran out of ideas where to look for.
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 charset problem

Emanuel
Administrator
It sounds like the form isn't submitting the data correctly (you said the data is incorrect in the bean after a POST), so maybe the form/browser isn't submitting the data using UTF8.

I read that you can control the character encoding that the form will use when submitting, using the 'accept-charset' attribute.  http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html#attr-form-accept-charset  Have you tried that?

<form accept-charset="utf-8">

If your form doesn't have the accept-charset attribute, the browser submits the data using the character encoding of the document containing the form (specified by the content-type HTTP response header when you received the document). http://reference.sitepoint.com/html/form/accept-charset
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 charset problem

blandger
Thank you Emanuel.
I've tried to add attribute to form, but that didn't help.

I'm using Spring Webflow basically. You probably know webflow works by using PRG (Post Redirect Get) approach. Here is some logs from Chrome browser.

1. I come to first flow page and see params in browser:
http://localhost:8080/event/registerEvent?execution=e2s1
Content-Type:text/html;charset=UTF-8
......

2. I click submit button having 'correct' utf-8 data in one field and some incorrect data in another field, so I could see my 'spoiled' data at once in the same page because of validation error.

Browser log:
Request URL:http://localhost:8080/event/registerEvent?execution=e2s1
Request Method:POST
Status Code:302 Found
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:211
Content-Type:application/x-www-form-urlencoded
Host:localhost:8080
Origin:http://localhost:8080
Referer:http://localhost:8080/event/registerEvent?execution=e2s1
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.97 Safari/537.22
.........
Also I can see data posted by browser and it has correct utf-8 value for one of the fields:
startDateTime[0]:01/30/2013 16:00
duration:1
sportName:футбол


3. Then page returns back I see log like:
Request URL:http://localhost:8080/event/registerEvent?execution=e2s1
Request Method:GET
Status Code:200 OK
html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3[/b]
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Host:localhost:8080
Referer:http://localhost:8080/event/registerEvent?execution=e2s1
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.97 Safari/537.22
Cache-Control:no-cache
Cache-Control:no-store
Content-Language:en-US
Content-Type:text/html;charset=UTF-8
Date:Wed, 27 Feb 2013 08:23:17 GMT
Expires:Thu, 01 Jan 1970 00:00:00 GMT
Pragma:no-cache
Server:Apache-Coyote/1.1
Transfer-Encoding:chunked

but utf-8 data returns spoiled and have unreadable format. I can see weird value in input.

Probably I should apply to Spring Web Flow forum. It is odd behavior, I see such strange stuff I can't solve by myself for first time.
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 charset problem

Emanuel
Administrator
Are you able to inspect the values once they get to your code on the server?  Also, can you do a request.getCharacterEncoding() in the server-side code, after you submit the form, to find out what the value is?

It seems like the CharacterEncodingFilter isn't kicking in.  Just like Zemi said, and from what I read from this StackOverflow question/answer (http://stackoverflow.com/questions/5578833/url-encoded-character-gets-parsed-wrongly-by-webflow-el-jsf), the CharacterEncodingFilter should set the encoding on the request to UTF8 and get those special characters working on the server side.

Is it possible to also try adding the CharacterEncodingFilter to your web.xml file before all other filters instead of the onStartup() / registerCharacterEncodingFilter() methods?  Or try changing the line:

filter.addMappingForUrlPatterns(null, true, "/*");

To:

filter.addMappingForUrlPatterns(null, false, "/*");

I think that filter needs to be one of the first things to happen, but the 'true' value makes it get evaluated after all other filters.  (Note: I'm not sure if the true/false change will fix it since the Javadocs for that method say that a value of false will cause the filter to be evaluated "...before any declared filter mappings of the ServletContext from which this FilterRegistration was obtained..." and I don't know if the Dynamic object is the same as that used to register all the filters in your web.xml, or those 'magically' registered by Spring Web Flow).

I was also going to suggest going to the Spring Web Flow forums to ask there (this seems to be more of  Spring issue than a Thymeleaf one), but I just took a look at those forums and I see you've posted there already.
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 charset problem

blandger
Thank you a lot, Emanuel.

That one little change has completely solved the issue.
filter.addMappingForUrlPatterns(null, false, "/*");

Now it works fine and handles utf-8 characters correctly.