url rewriting

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

url rewriting

user
Hello,

  When I use expression @{/ctx/my-link}, HttpServletResponse.encodeURL method isn't called. I think it should be for url rewriting. Now as I understand, tuckey (http://www.tuckey.org/) rewrite filter won't be working with thymeleaf framework ?
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

danielfernandez
Administrator
Hi,

Only the parameter values should be URL-encoded, not the whole of the URL.

"/ctx/my-link" is an URL without parameters, so there is nothing to URL-encode there. However, in an URL like @{/ctx/my-link(myparam=${myvariables})}, the value of the "myparam" parameter will be URL-encoded.

Regards,
Daniel.
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

user
http://code.google.com/p/urlrewritefilter/source/browse/trunk/src/main/java/org/tuckey/web/filters/urlrewrite/UrlRewriteWrappedResponse.java

I think, many people would like to use UrlRewriteFilter, but with thymeleaf it is impossible.
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

Emanuel
Administrator
What part of UrlRewriteFilter isn't working for you when using Thymeleaf?  I've got both UrlRewriteFilter and Thymeleaf working together, although for a limited set of use cases. eg: redirect to /blog/ instead of just /, redirect based on user, action type (POST or GET), request params.
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

user
your redirection is in your controller code, not in thyemleaf @{...} attribute.
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

Emanuel
Administrator
So are you saying that redirects don't work when they're in your controller code?  If so, why are redirects in your controller code?  UrlRewriteFilter is supposed to be used as a filter with redirect rules in a urlrewrite.xml file.
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

user
I am saying, than if you are using in thymeleaf template expression like , then you can't write rewrite rule to replace for eg. /my-servlet with /your-servlet.
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

user
In reply to this post by Emanuel
Can you show me your rewrire rules ? and how do you use @{...} expressions ?
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

Emanuel
Administrator
Below are some of my rewrite rules - they're only very basic ones.  Also, I don't use any @{..} expressions (I actually didn't know about them when I started redesigning for Thymeleaf, so I used the <base> HTML element instead).

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE urlrewrite PUBLIC "-//tuckey.org//DTD UrlRewrite 3.2//EN" "http://www.tuckey.org/res/dtds/urlrewrite3.2.dtd">

<urlrewrite>

  <!-- Default/Homepage is the blog -->
  <rule>
    <from>^/$</from>
    <to>/blog/</to>
  </rule>

  <!-- Some sections default to their 'about' page -->
  <rule>
    <from>^/(campaign|redhorizon)/?$</from>
    <to>/$1/about/</to>
  </rule>

  <!-- Redirect the old RSS feed URLs -->
  <rule>
    <from>^/(blog|writing)/feed/$</from>
    <to type="redirect">/feed/</to>
  </rule>

  <!-- Redirect moved content to their new URLs -->
  <rule>
    <from>^/writing/reviews/SearedChickenBurger$</from>
    <to type="redirect">/blog/Review_SearedChickenBurger</to>
  </rule>

</urlrewrite>
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

user
Now I see. You are using incoming rules. I want use outgoing:
  http://urlrewritefilter.googlecode.com/svn/trunk/src/doc/manual/4.0/index.html <outbound-rule> element
But with thymeleaf I can't, because LinkExpression don't calls response.encodeURL(). I am using 2.0.12 version, maybe I can customize LinkExpression ?
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

sergey
we have the same ssue.

ThymeLeaf code treats urlrewriting as an addendum of ;jsessionid in certain cases. But actually should rely on container to do it properly i. e. call response.encodeURL. Same way as jsp c:url does.
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

danielfernandez
Administrator
Hi,

I have modified the way links are computed so that response.encodeURL(...) is used. This has been added to 2.0.15-SNAPSHOT and, if everything works well, will be a part of 2.0.15 when it is released.

If you want to test the snapshot version, you can learn how to use snapshots in your project at the FAQ: http://www.thymeleaf.org/faq.html

Regards,
Daniel.
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

sergey
I've looked into a code. It is almost what we need! Thank you for your quick response.

The only thing that that encodeUrl is conditional - and won't work for googlebod, which will be a nightmare on production for some developers (it will be very hard to find why everything works, but not for google). And currently it is not possible to easily prevent this thymeleaf default behavior.

So, I would recommend to remove this isUserAgentGoogleBot, because there are many other options available:
1. UrlRewriteFilter http://urlrewritefilter.googlecode.com/svn/trunk/src/doc/manual/4.0/guide.html
2. Spring Security disable-url-rewriting http://static.springsource.org/spring-security/site/docs/3.0.x/reference/appendix-namespace.html#nsa-http-attributes
3. Tomcat 7 <session-config> <tracking-mode>COOKIE</tracking-mode> </session-config> (or Tomcat 6 disableURLRewriting="true" http://tomcat.apache.org/tomcat-6.0-doc/config/context.html)
4. Apache RewriteRule  ^/(.*);jsessionid=(.*) /$1 [R=301,L]
5. some other options http://stackoverflow.com/questions/962729/is-it-possible-to-disable-jsessionid-in-tomcat-servlet


Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

danielfernandez
Administrator
Hi,

The reason URL rewriting is disabled for Google Search Engine's Bot (not Google Chrome) is that Google works in a strange way compared to other search engines (like e.g. Bing), and whenever it encounters links with a jsessionid fragment during its crawling --which it does, because the Google Bot crawler does not support cookies--, it keeps the "jsessionid" fragment in its index, and shows it as a part of the search result, a potential security risk. Google considers the developers' responsibility to avoid this.

As a general rule, a public internet application should never create a session (an HttpSession object) before a user has really logged in (esp. for performance reasons), so search engine crawlers should never be an issue. But if the application has to do this for specific reasons, this check in thymeleaf's link creation logic avoids the potential security issue of indexing session-bound URLs (imagine an application with very-long-lasting sessions where the same HttpSession object --and session ID therefore-- is used before and after user log-in...).

Could you describe in more detail the issues you see in this behaviour?

Your solutions, as far as I can understand them, completely disable URL rewriting (and for every user agent, not only Google Bot). In fact, using response.encodeURL(...) and then disabling URL rewriting is basically the same as doing nothing...

Regards,
Daniel.
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

sergey
Actually, solution 1 (using UrlRewriteFilter) is customizable and example in the manual provides similar solution for this "googlebot" problem. But it is more flexible/configurable and does not hardcode "googlebot".

http://urlrewritefilter.googlecode.com/svn/trunk/src/doc/manual/4.0/guide.html

  <outbound-rule>
       <name>Strip URL Session ID's</name>
       <condition name="user-agent">googlebot</condition>
       <from>^(.*?)(?:\;jsessionid=[^\?#]*)?(\?[^#]*)?(#.*)?$</from>
       <to>$1$2$3</to>
   </outbound-rule>

So, UrlRewritefilter can solve the problem "isUserAgentGoogleBot" is aiming to fix.


using response.encodeURL and disabling URL rewriting is NOT like doing nothing if you use URLRewriteFilter because it relies on a call to response.encodeUrl() to make "outbound" url rewriting.

The problem:
For instance, you had a URL like /old/url
Now you want it to look like /new/url

You write the urlrewritefilter rule:
    <rule>
        <from>^/old/url$</from>
        <to>/old/url</to>
    </rule>
    <outbound-rule>
        <from>^/old/url$</from>
        <to>/new/url</to>
    </outbound-rule>

here is thymeleaf template (notice, it still mentions old url)

<a. th:href="@{'/old/'}">...

the output should be:

<a. href="/new/">...


Issue here that for googlebot will still see
<a. href="/old/">...

on the pages - because response.encodeUrl() was not called, while for everyone else it will be /new/url

This is a trivial example. In reality we have much more difficult outbound rules.

So to make UrlRewriteFilter to work correctly with thymeleaf we at least need to have a possibility to disable "isUserAgentGoogleBot" check.
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

danielfernandez
Administrator
OK, understood. Thanks for your examples.

From what I've learned, this could also affect some load-balancing systems, also using wrapped HttpServletResponse objects.

I've modified it to make every URL go through response.encodeUrl(...) in web environments, even absolute URLs. Checking for google bot (and avoiding anonymous sessions to show there) should now be an application developer's responsibility.

2.0.15-SNAPSHOT has been updated including this last fix.

Regards,
Daniel.
Reply | Threaded
Open this post in threaded view
|

Re: url rewriting

sergey
yep!

Code looks OK now.
Hope to see 2.0.15 to be released soon.

Thank you.