Wednesday, January 9, 2008

What is a DOCTYPE? Part 2 of 2 - TUTORIAL

Audience: Those with at least a basic understanding of HTML and CSS, and have read or understand part 1 of this tutorial.

In part one of this tutorial, we learned that the strange bit of text that appears at the top of the HTML code of some web pages (see below) is a "DOCTYPE declaration".

Sample DOCTYPE declaration:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

We also learned its purpose: to inform the web browser which version of HTML a particular web page is written in.

A question you may ask is, "But I've seen plenty of web pages that don't have a DOCTYPE declaration at the top of the HTML code. Why do those pages still seem to work ok?".

There are indeed many web pages on the internet with no DOCTYPE declaration. In fact, even www.google.com doesn't have one! (At least as of January 9, 2008). Yet the page still seems to display properly in any web browser.

The reason why pages like www.google.com will still load in a web browser, despite not having a DOCTYPE declaration, is because web browsers are very forgiving of badly written HTML. What I mean by "badly written" is any HTML code that does not properly comply with the rules in which HTML is supposed to be written. These rules are specified in the "Doctype Definition" (DTD) of a given version of HTML (if you are unsure what this means, please refer to part 1 of this tutorial series).

So, even if a web page does not tell the web browser which version of HTML it is supposed to be written in, a web browser will still attempt to display it. This is also known as a web browser entering "quirks mode" (as opposed to being in "standards mode", which just means that the web browser knows which version of HTML is being used). Quirks mode isn't good because there is no clearly defined set of rules for how a web browser is supposed to interpret HTML when in quirks mode. This means a web page with no DOCTYPE declaration can look very different in one web browser compared to another. Unfortunately, using a DOCTYPE declaration still won't guarantee a web page to look the same in different web browsers. One reason for this is that web browsers do not interpret the same version of HTML in exactly the same way (there is no good reason for this, that's just the way it is). However, it is generally considered to be true that using a proper DOCTYPE declaration will make it easier to build a web page that will look the same (or close enough) in any web browser (also known as a web page having "cross browser support"). The reason for this is that with a proper DOCTYPE declaration, different web browsers are at least trying to interpret your HTML code in the same way.

I will now go through the available HTML DOCTYPEs:

(Note: HTML versions 2 and 3 both have DOCTYPEs. But these versions of HTML are so old, they are not worth focusing on).

The earliest version of HTML worth worrying about is HTML 4.01, which comes in 3 types:
  • HTML 4.01 Strict
  • HTML 4.01 Transitional
  • HTML 4.01 Frameset
It would take up too much time to list every single difference between these versions. However the general differences are as follows:

HTML 4.01 Strict:
"Style" elements are not allowed to appear within the HTML code of Strict web pages. Style elements are things like colors, font sizes and images. These should all be reserved for the CSS (Cascading Style Sheet).

HTML 4.01 Transitional:
Some style elements are allowed in the HTML code of a HTML 4.01 Transitional web page, but the FRAMESET tag is not allowed.

HTML 4.01 Framset:
Very similar to Transitional, except that the FRAMSET tag is allowed to be used.

You may be wondering, "Why are there 3 different types of the same version?". The answer is that the people in charge of releasing new versions of HTML (the W3C) understand how difficult it is to build a web page that will work properly in every web browser. It is therefore considered "ideal" to use the Strict DOCTYPE, but the reality is that you may find it impossible to get your web page working in all web browsers if you're not allowed to use style tags in your HTML code. The goal trying to be reached is to have all style elements separated in to the CSS, with no style elements at all in the HTML code.

(For a more complete list of what is and isn't allowed with Strict and Transitional web pages, see Roger Johansson's article 'Transitional vs. Strict Markup'. You can just scroll straight to the section titled 'Elements that are not allowed in Strict DOCTYPEs', and start from there).

Another important version of HTML is XHTML 1.0. This version of HTML also appears in the same 3 types that HTML 4.01 did:
  • XHTML 1.0 Strict
  • XHTML 1.0 Transitional
  • XHTML 1.0 Frameset
The difference between the Strict, Transitional and Framset versions of XHTML 1.0 are pretty much the same as the differences between them in HTML 4.01. The important thing here is the difference between HTML 4.01 (all versions), and XHTML 1.0 (all versions).

The major difference between HTML and XHTML is not what tags are or aren't allowed. The difference lies in how the HTML code should be written. XHTML code must be written according to some very specific rules. For example:

Some XHTML rules:
  • All tags must to be written in lowercase (eg, <h1>, <p>, and not <H1> or <P>).
  • All tags that are opened must be closed (eg, <h1> must have a corresponding </h1>)
  • If more than one tag is open at the same time, they must be closed in the reverse order they were opened. For example:
Good:
<p><i>Watch as I close the italics tag before the paragraph tag, just as XHTML requires me to</i></p>

BAD:
<p><i>This is bad! I am closing the p tag before i tag!</p></i>

(For a more complete list of the rules you must follow when writing XHTML, see Linda Roeder's article 'Basics of XHTML - Why, What and How'. The list of rules starts at about the third paragraph).

As you might be able to guess, one major reason that XHTML was invented is to try and get everyone to write their HTML code in the same way. This would have major benefits for web browsers (in their attempts to display web pages properly) and for web designers (whose code will now look very similar, if not exactly the same, as any other web designer).

The last version of HTML that you need to worry about is XHTML 1.1. This version of HTML does not come in 3 types as you have seen previously, it's just plain old XHTML 1.1. This version is very similar to XHTML 1.0 Strict. It must also follow the same rules about how XHTML code is to be written as the other XHTML versions (lowercase tags, etc). So what's the difference between XHTML1.0 Strict and XHTML1.1? Unfortunately, this question can't really be answered if you don't know what XML is (that's not a typo. XML is not the same thing as XHTML). But don't worry. At this stage, XHTML 1.0 is still more popular than XHTML 1.1, so just stick with one of the earlier versions for now.

Below I have again listed all of the HTML versions that I have talked about, along with their DOCTYPE declarations. You might have seen elsewhere that you can just "copy and paste" these bits of text in to the top of your own HTML code - and indeed, there is nothing wrong with that.
In fact, it's probably better to do so, because if you get just one character wrong in your DOCTYPE declaration, it will probably throw your web browser in to "quirks mode", possibly without you even realising. So, once you have decided which version of HTML you want to write your web page in, just copy and paste the relevant DOCTYPE as it appears below. I have listed each DOCTYPE under a bolded heading - the heading is NOT part of the DOCTYPE declaration and should NOT be copied and pasted in to your HTML code! (Please note, when copying and pasting these DOCTYPE declarations, they should be the first thing that appear in your HTML code).

HTML 4.01 Strict
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

HTML 4.01 Transitional
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

HTML 4.01 Frameset
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
"http://www.w3.org/TR/html4/frameset.dtd">

XHTML 1.0 Strict
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

XHTML 1.0 Transitional
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

XHTML 1.0 Framset
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

XHTML 1.1
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

The W3C have come up with a great way of testing whether a web page has been written properly, as per the rules of its DOCTYPE. It's called the W3C markup validator, and you can use it to test your own web pages (or even any web page that is on the internet).

The people responsible for HTML are trying to achieve the goal of having one HTML version that all web browsers will rely on, and as long as you build your web page in that version, you shouldn't encounter any "cross browser support" issues. Unfortunately that day is probably still some time away. But the idea of encouraging web designers to start using DOCTYPEs and to follow the respective rules is so that one day, a common standard can be reached.

---

Please comment on this tutorial, and let me know if there's any way I could have improved it. Most importantly, what could I have done to make it easier to understand?

3 comments:

vasquezprince said...

VERY INFORMATIVE.
Thanks for sharing.

Anonymous said...

Thx man ;)

Jitendra Vyas said...

great article . easy to understand. thx