URLs were developed and agreed upon in order to allow anyone at anytime to form a link to any resource on the internet.
This particular article applies to content that is produced on a periodic basis with a defined structure.
What a URL should be
Nathan Ashby-Kuhlman is his week long series on article URLs mentioned 4 clear principles for what a good URL would be:
Jakob Nielsen came to a similar conclusion in his Alertbox column, adding "a short and easy to remember domain name" to the list.
The internet is made to be a fluid atmosphere, with links flowing from one piece of content to another in ever changing patterns. However, that flow is quickly interrupted if those links are broken because of address changes. Tim Berners-Lee, the inventor of the world wide web, addresses this in "Cool URIs don't change".
Berners-Lee suggests avoiding putting the following information into the url, because of the probability for change:
- Software mechanisms - for instance http://example.com/cgi-bin/page.html or http://example.com/page.pl?id=423. How you build your pages is expected to change as technology improves, however your url structure should be able to outlast any change.
- Authors name
- Status - old, draft, rewrite, correction, update, etc.
- Access - public, private, gold, silver, or bronze subscription level
- Disk name - absolutely unnecessary, but very often done, especially as a subdomain http://raven.example.com/page.html
Berners-Lee also suggests eliminating the file name extension and subject from the URL, however that may sacrifice readability and hierarchy for the sake of added permanence.
Minor changes/additions to a story or a switch to paid access (for instance paid archives) should not affect the url. There is an obvious advantage when switching to a paid archive system, to have links from other sites, emails, and search engine remaining intact. Nathan Ashby-Kuhlman gives praise to the Arizona Daily Sun for handling a story's transition from free to pay appropriately.
In a perfect world, urls would just be the machine-readable addresses that are hidden behind well written links on web pages. However, in Jesse James Garrett's article on user-centered url design, he argues that that we don't live in a perfect world and urls need to be both computer and human readable. Just as computers use that address to figure out which file in which folder on which computer a user is requesting, as user should be able to figure out what they are requesting, where it is, and what else might be available by reading the url.
To illustrate this point, imagine a reading an article and stumbling upon the following passage:
But Senator Lindsey Graham says Rumsfeld also was preparing the public for more disturbing events.
At this point, you have a decision to make. Do you follow the link or continue reading the story? You mouse over the link to see where it goes, in order to decide. Is the link regarding a transcript of Rumsfeld's testimony, a related story, or a paid contextual ad? Without context provided within the url itself, it is impossible to make a knowledgable decision.
To help support readability or "guessability", it is important to avoid CMS specific id's, template names, session information, etc within the url itself. Instead, it is important to use information that adds context to the link, for instance the date published, section, and/or short title.
The importance of hierarchical urls is connected to the importance of having "hackable urls", essentially allowing users to modify the url to get a broader set of information. For instance, Nathan Ashby-Kuhlman suggests a good news organization would use the url structure of example.com/section/subsection/YYYY/MM/DD/slug. Where YYYY/MM/DD is the date (for instance 2004/05/07) and the slug is a short title (for instance rumsfeld-transcript). When using this structure, the hacking the url should allow users to get to these pages:
- example.com/section/YYYY/MM/DD/, a list of all articles in that section on that day
- example.com/section/YYYY/MM/, a list of all articles in that section in that month
- example.com/section/YYYY/, a list of the articles-by-month pages or long list of articles
- example.com/section/, the index page for the requested section
Arguments have been made that the majority of users are not sophisticated enough to "hack" the url and therefore it is a waste of time to organize the structure this way. However, Peter Seebach in his Cranky User column makes the case for making sites and urls "expert-friendly" in addition to just being user-friendly. The argument is that users are normally smarter than sites give them credit for, however they are also often more easily frustrated and might be inclined to navigate elsewhere.
Brief and Clean
URLs take on a life of their own once they're published. They may travel through email, on a document, into a book, or by word of mouth and just like luggage or a fragile parcel, the owner should do everything possible to make sure they make it to their final destination.
- When passed on through email messages, links should be no more than 78 characters to avoid wrapping.
- Many characters are difficult to represent on paper (1, l, O, or 0) if possible they should be avoided or obvious.
- Many characters are not supposed to be in urls (~, spaces, etc) and therefore are not guaranteed to work on all browsers.
- Since links are often represented with an underline, underscores can be difficult to see. Of note, google understands Google understands dog-pound as two worlds, while reads dog_pound as one (rare) word.
- URLs are case sensitive (although many web servers compensate for this), to avoid confusion all urls should be lowercase.
For the vast majority of articles on news sites, Nathan Ashby-Kuhlman suggested the best url format. It is:
With a readable section, subsection, and slug this allows for a readable and hackable url. As long as special characters are avoided, names are reasonably short, and the article is permanently accessible then these urls should have higher success rates in search engines, allow for deeper navigation by readers, and make for happier readers overall.