Demystifying shortened and extension-less URLs in AEM

Demystifying shortened and extension-less URLs in AEM

Your company has decided to migrate their web presence to Adobe Experience Manager and you’re getting to the tail end of the project. This is usually the point when you realise your URLs need to be shortened, because, let’s be frank: who wants to see “/content” in their URL? And, whilst we’re at it, you should probably get rid of that “html” extension as well.

So the problem we’re trying to solve is how to turn a URL like http://acme.com/content/acme/en/about.html into http://acme.com/about. There are various ways of going about it and naturally there are trade offs with each approach. In this post I’m going to summarise each approach and it’s tradeoffs.

Web Server Rewrite Rules

One approach is to use a rewrite module in your web server (i.e. mod_rewrite). This will rewrite all incoming URLs to a path that can be resolved to a resource in AEM, before handing it over to the Dispatcher. This is fairly easy to do by using VirtualHost and rewriting the links accordingly depending on the domain. There are many resources available on how to achieve this, so I won’t go into the details.

1360775990371

Request processing flow when web server rewrites incoming URLs (source)

Benefits

The advantage of this approach is that the Dispatcher cache path mirrors the content path, so you will not have any issues invalidating content on activation. Moreover, rewrite rules are quite powerful, so you can do some pretty fancy things.

Drawbacks

However, this also means that your AEM instances are not aware of the link rewriting logic, so you will most likely have to create a custom Link Transformer to rewrite links within your pages before serving them up to the end users.

Alternatively, you can create a Tag Library to rewrite links, but that will not work for URLs contained in authored content. This is not much fun because it requires more work on your part and is duplicating the link rewriting logic.

Sling Resource Resolution

The next approach leverages the Sling resource resolution mechanism that comes with AEM. With this approach, the resource resolution engine associates a path with a content resource. The benefit here is that not only can AEM resolve incoming requests to content paths, but it can also rewrite links to their shortened versions automagically.

1360775548810

Request processing flow when AEM rewrites incoming URLs (source)

The most straight forward way to achieve this setup is to configure the “JCR Resource Resolver Factory” service by defining the mappings using the “URL Mappings” (a.k.a resource.resolver.mapping) property.

Using the we.retail sample site as an example, once you apply the configuration below you can now browse to the equipment page using the following URL: http://localhost:4502/equipment.html


<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:sling="http://sling.apache.org/jcr/sling/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0"
jcr:primaryType="sling:OsgiConfig"
resource.resolver.searchpath="[/apps,/libs,/apps/foundation/components/primary,/libs/foundation/components/primary]"
resource.resolver.manglenamespaces="{Boolean}true"
resource.resolver.allowDirect="{Boolean}true"
resource.resolver.required.providers="[org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProviderFactory]"
resource.resolver.virtual="[/:/]"
resource.resolver.mapping="[/-/,/content/we-retail/us/en/-/]"
resource.resolver.map.location="/etc/map"
resource.resolver.default.vanity.redirect.status="302"/>

It’s also important to note that a “reverse” mapping is created from this entry, meaning that the built-in “Day CQ Link Checker Transformer” can shorten links within our HTML pages automatically (provided link rewriting is enabled). You may also combine this with the “Strip HTML Extension” property of the built-in transformer in order to remove the “.html” extension from your links (note: you will need a web server to append the “.html” extension to your incoming requests so that they can be processed).


<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:sling="http://sling.apache.org/jcr/sling/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0"
jcr:primaryType="sling:OsgiConfig"
linkcheckertransformer.strictExtensionCheck="{Boolean}false"
linkcheckertransformer.rewriteElements="[a:href,area:href,form:action]"
linkcheckertransformer.disableRewriting="{Boolean}false"
linkcheckertransformer.disableChecking="{Boolean}false"
linkcheckertransformer.stripHtmltExtension="{Boolean}true"
linkcheckertransformer.mapCacheSize="{Long}5000"/>

The resulting HTML markup for the we.retail navigation will look like the snippet below:


<ul class="nav navbar-nav navbar-center">
<li class="visible-xs">
<a href="/">we.<strong class="text-primary">Retail</strong></a>
</li>
<li>
<a href="/experience">Experience</a>
</li>
<li>
<a href="/men">Men</a>
</li>
<li>
<a href="/women">Women</a>
</li>
<li>
<a href="/equipment">Equipment</a>
</li>
</ul>

Benefits

This method works great for sites that have simple URL handling requirements and do not need to support multi-tenancy. It’s very easy to configure (only 2 configuration files) and doesn’t require AEM to be aware of the various domains it’s serving content to.

Drawbacks

Whilst this approach is easy to configure, this simplicity comes at a cost:

  • It will not work for websites where the URLs do not nicely match up with the JCR content because it doesn’t support regular expressions.
  • It doesn’t support cross-site links because this rewriting method is not domain-aware.
  • If a path is duplicated across multiple sites, it will resolve to the first match. For example, given Geometrixx Outdoors and Geometrixx, when requesting http://localhost:4502/company.html, the resource resolution will either resolve to /content/geometrixx-outdoors/en/company or /content/geometrixx/en/company, depending on which mapping was defined first.
  • When relying on the LinkCheckerTransformer to provide you with extension-less URLs, your calls to ResourceResolver#map on the backend will still contain the “html” extension. This may not be desired when rendering links to be contained in emails.

Pulling out the big guns

Another approach is to define Sling Mappings under /etc/map. Sling Mappings are made up of a series of nodes that dictate how the mapping should function for each configured domain. They take more time to setup but are a great compromise because they are almost as powerful as rewrite rules and also make your AEM application aware of the link processing logic.

Here’s what our Sling mappings would look like for the we.retail site:


{
"jcr:primaryType": "sling:Folder",
"weretail.com": {
"jcr:primaryType": "sling:Mapping",
"sling:internalRedirect": [
"/content/we-retail/us/en"
],
"weretail_com_content": {
"jcr:primaryType": "sling:Mapping",
"sling:match": "(.+)$",
"sling:internalRedirect": [
"/content/we-retail/us/en/$1",
"/$1"
]
},
"reverse_mapping_content": {
"jcr:primaryType": "sling:Mapping",
"sling:match": "$1",
"sling:internalRedirect": [
"/content/we-retail/us/en/(.*).html"
]
},
"reverse_mapping_content_nohtml": {
"jcr:primaryType": "sling:Mapping",
"sling:match": "$1",
"sling:internalRedirect": [
"/content/we-retail/us/en/(.*)"
]
},
"reverse_mapping_root": {
"jcr:primaryType": "sling:Mapping",
"sling:match": "$",
"sling:internalRedirect": [
"/content/we-retail/us/en(.html)?"
]
}
},
"weretail_com_root": {
"jcr:primaryType": "sling:Mapping",
"sling:match": "weretail.com$",
"sling:internalRedirect": [
"/content/we-retail/us/en.html"
]
}
}

This set of Sling Mappings has more entries than strictly necessary but provides the following:

  • They handle the rewriting of links created by selecting a resource using the pathfield widget where no “html” extension is provided
  • Requests to ResourceResolver#map will return the same URL as the one shown to the end user because it isn’t dependent on any transformer to rewrite links

The resulting HTML markup for the we.retail navigation will look like the snippet below:


<ul class="nav navbar-nav navbar-center">
<li class="visible-xs">
<a href="http://weretail.com/">we.<strong class="text-primary">Retail</strong></a>
</li>
<li>
<a href="http://weretail.com/experience">Experience</a>
</li>
<li>
<a href="http://weretail.com/men">Men</a>
</li>
<li>
<a href="http://weretail.com/women">Women</a>
</li>
<li>
<a href="http://weretail.com/equipment">Equipment</a>
</li>
</ul>

Note: The links shown above are absolute links because no web server was set up with the “weretail.com” domain to serve this content. If a web server was set up, the LinkCheckerTransformer would generate internal links like “/equipment” instead of “http://weretail.com/equipment”.

Benefits

  • Leverages Sling resource resolution so the rules that are defined in /etc/map will be used by the LinkCheckerTransformer to shorten URLs within web pages.
  • You can leverage capturing groups to perform sophisticated resource resolution/mapping rules.
  • With this method, the LinkCheckerTransformer will be able to generate cross-site links. This means that all links will be internal unless the content path points to another website, in which case an absolute link will be rendered.

Drawbacks

  • It can be a bit tricky to understand the various properties that can be applied to Sling Mappings
  • The domain-awareness can make it difficult to maintain the mappings in an environment where AEM instances are provisioned on the fly (may need to use a script to generate the mappings dynamically when an instance is allocated a domain name)

General Sling Processing Gotchas

As you can see the Sling Resource Resolution is a powerful tool to perform URL handling; However there are a few points to watch out for:

  • When rewriting a link, if the path provided cannot be resolved to a resource, the link will not be rewritten at all. You will most likely run into this problem if you are using a mix of Sling Filters and request forwards to render your content. A way around this is to write a custom Transformer to force the resource mapping to occur.
  • The Dispatcher cache path will not match the content path. More specifically, content will be cached using the shortened path but when a cache invalidation request is sent, the full path will be provided. A way around this is to write rewrite rules on the web server to correct the path to be invalidated or to use the Dispatcher Flush Rules provided by ACS Commons (as of version 1.5.0, regular expressions are supported for more complex invalidation logic).
  • Whilst Sling does a lot for you, you will still need to setup some rewrite rules to append the “.html” extension in order for your content to be cached by the Dispatcher.
  • The LinkCheckerTransformer works just like any other transformer in the rewriting pipeline, it responds to SAX events from the HTML parser. This means that it may not rewrite all links within your HTML page unless it is configured to generate an event. For example, tags may be used to hold open redirects – these will not be rewritten unless the INPUT tag is added to the HTML parser and input:value is added to the list of rewrite elements.
  • The HTML markup must be valid or else link rewriting will cease to work when it comes across invalid markup.

Conclusion

Hopefully this provided you with some insight into URL handling in AEM. It is a tricky topic and it gets even more complex when vanity URLs are thrown in the mix. Avoid them if possible – they get unruly very quick! If you’re going to take away anything from this post, it should be that the Sling Resource Resolution mechanism can do the heavy lifting for you, so I highly recommend you consider it when implementing your URL handling processes.

mickjleroy@gmail.com

Consultant at Shine Solutions and Adobe Certified AEM Lead Developer

8 Comments
  • L Hawk
    Posted at 07:43h, 28 January Reply

    This was really helpful. It’s a bit over my head though. We have a site that uses AEM and we’re just working on one section of the site. We’d like to simplify the urls by removing unnecessary subdirectories and file extensions. The issue is we aren’t going to be changes urls for the entire site. Again, just one section. Is this possible?

    • Michael Leroy
      Posted at 10:37h, 30 January Reply

      I’m glad you found this article helpful!

      Regarding your question, because you’re only working on a section of the site, I recommend you perform the rewriting using Apache Rewrite rules and use a custom Link Transformer to rewrite the links within your pages. This method will provide you with more flexibility regarding which links you rewrite as well as the extensions you remove. Sling Mappings are easier to implement when the same rewriting logic needs to be applied throughout an entire website.

  • Jay Proulx (@jay_proulx)
    Posted at 01:18h, 03 March Reply

    What do you mean by this:

    Note: The links shown above are absolute links because no web server was set up with the “weretail.com” domain to serve this content. If a web server was set up, the LinkCheckerTransformer would generate internal links like “/equipment” instead of “http://weretail.com/equipment”.

    I don’t quite follow how LinkCheckerTransformer is aware of a “web server” being set up. We have dispatchers configured, but the domain is still being prefixed, we just want the absolute short url. (i.e. /equipment)

  • Pingback:TEL monthly newsletter – Feb 2017 – Shine Solutions Group
    Posted at 21:06h, 08 March Reply

    […] Leroy detailed how to get rid of long and ugly URLs in AEM.  This is a tricky topic that needs some thought upfront – or else it can get very unruly, […]

  • Roli
    Posted at 21:00h, 29 August Reply

    Hi @mickleroy,

    Just saw your comment on article – https://stackoverflow.com/questions/42034285/how-to-setup-multiple-domains-on-aem-dispatcher#

    My question is Do I need to create multiple httpd file or I can have multiple host entry in the same file?

  • ronnyfm
    Posted at 09:16h, 30 November Reply

    Thanks for the explanation. But have you tried this scenario?

    Dispatcher is blocking all by default. We are not removing the .html from the URLS. Then, using sling mappings we have redirects for old urls, some of them without extension. Therefore, the requests is blocked by the dispatcher. The redirect works fine on publish. How to tell the dispatcher to bypass extensionless requests.

  • Krystian Panek
    Posted at 17:00h, 14 January Reply

    Thanks for providing these mappings. Basing also on them, I prepared an automated setup of AEM instances and Dispatcher for We Retail website. Just take a look:
    https://github.com/Cognifide/gradle-aem-boot

    Here used your mappings:
    https://github.com/Cognifide/gradle-aem-boot/blob/master/gradle/instance/mapping/we-retail.json

    What I missed in this article is how to handle missing ‘html’ extension which is required to be specified when a request is reaching Sling / AEM instance. I decided to add the following lines to virtual host configuration:

    https://github.com/Cognifide/gradle-aem-boot/blob/master/gradle/environment/httpd/conf/vhost/we-retail.example.com.conf#L17

    Greetings, Krystian

  • ciapunek
    Posted at 17:01h, 14 January Reply

    Thanks for providing these mappings. Basing also on them, I prepared an automated setup of AEM instances and Dispatcher for We Retail website. Just take a look:
    https://github.com/Cognifide/gradle-aem-boot

    Here used your mappings:
    https://github.com/Cognifide/gradle-aem-boot/blob/master/gradle/instance/mapping/we-retail.json

    What I missed in this article is how to handle missing ‘html’ extension which is required to be specified when a request is reaching Sling / AEM instance. I decided to add the following lines to virtual host configuration:

    https://github.com/Cognifide/gradle-aem-boot/blob/master/gradle/environment/httpd/conf/vhost/we-retail.example.com.conf#L17

    Greetings, Krystian

Leave a Reply

%d bloggers like this: