COM271, Week 2

HTML "Rules": Extensible markup languages;
"Well-formed" and "valid" XHTML;
Validators and the Mozilla Web Developer Add-on.

Syllabus | Table of Pages | Assignments | References and Useful Links

XHTML: "Well formed" and "valid" pages

The web and its principle markup languages—html, css, javascript—continually evolve, constantly trying to come up with new ways to build on new technologies. You may be thinking of yourself as someone learning to develop web pages for your desktop or laptop computer, but the same languages are also intended (see W3C notes)to work regardless of technology (have you tried to open your web page on a smart phone?). And, it is reasonable to expect that the page you post today will still work when someone browses to it next year, in 5 years, and if it lasts that long, 20 years from now as well..

The goal of setting and maintaining a set of rules by which we can build durable pages and timeless (relatively speaking) browsers, is also compatible with a philosophy that suggests that pages be consistent in the ways in which page strcuture, layout, and programming work together.

The notion of Extensible Markup Langages (XML) is that we should be able to build from a core language, and to add to or modify it following a common set of rules. More, we should be able to build a browser that worked by first looking at the page to determine what set of rules worked for the page, and then we could work through (parse) the page according to those rules. Old page? Old rules. New browser downloads the rules and correctly renders the page.

We looked (here) at the means to download the set of rules, using the !DOCTYPE element and its DTD document type declaration. So which rules should we use, and what do we have to do to use them?

I'll suggest an answer, but first let's remember that we are climbing aboard a constantly moving train. We may think we know where it is heading, but sometimes we are surprised. We need to be prepared for unexpected turns.

Thomas Powell is a talented California (UCSD) web authority who writes really good reference books (which you should have on your shelf: here are a couple I'd start with). In 2003, Powell's HTML & XHTML: The Complete Reference (McGraw Hill / Osborne, 932 p.) made it clear that the train was on a path toward widespread adoption of a set of rules for writing code, guided by XML. HTML would become influenced by XML. In practical terms, this led to a set of strong recommendations (and there aren't really more than these):

The Rules of HTML
(Extracting from or paraphrasing Powell, 2003, p. 18-23)

HTML is Not Case Sensitive; XHTML is: I suggest that you write everything—element names, attributes, and even values (see next)— in lower case. Always!
HTML/XHTML Attribute Values May be Case Sensitive: If you reference a file name (using src or href attributes, for example), you need to use the case of the file name (e.g., src="MyDogPhoto.jpg").
HTML/XHTML is Sensitive to a Single White Space Character:
Subtle errors tend to creep into HTML files where white space is concerned; be especially careful with spacing around <img> and <a> tags. For example, consider the markup here:

<a href="www.demo.com">
< img src="demo.gif" />
< /a>.

The line-return in the html coding just after the image tag will (properly!) create a blank inside the link tags. This will be treated like a blank character, including a blue underscore. This artifact is called a "tick." (paraphrasing Powell, 2003, p. 19). Note: some browsers do this; others do not. The remedy is to put all of the code on a single line, making sure there is no trapped blank space.^*
XHTML/HTML Follows a Content Model: Certain elements can contain only certain other elements; Some elements must be contained. The best example is the unordered list, which is an element that contains the list-item element, like this:
<ul>
<li>First Item in a list</li>
<li>Second item</li>
</ul>
While you could use a list item alone to, say, effect indentation, it is improper: list items don't exist except within a list!
Elements Should Have Close Tags Unless Empty: If I set off content by prefacing it with a paragraph tag, I need to also set off the end of the content with a closing tag. A few tags (e.g., elements img, br, ht, link) don't enclose content; rather, they just create a line break or a horizontal rule, or they reference a file (which is then included on the page). These elements are "empty."
Elements Should Nest: "A simple rule states that tags should nest, not cross, thus
is wrong because tags cross
whereas
is not because the tags are properly nested
Attributes should be quoted: If you assign a value to an attribute, the value needs to be inside quotes; this includes text strings, numbers, etc. (all values)
Browsers will ignore Unknown Attributes and Elements: Watch your spelling and you don't get to make up stuff.

^*Powell notes that beginning programmers often use the non-blank space code,  , in multiples, to effect indenting.
      I'm indented!.
This adds a lot of characters that must be downloaded (six characters for each blank space. The better approach is to assign a margin using a CSS style. (e.g.,
I'm indented properly!

If you follow these simple rules—I insist that you do!—your code will be said to be both valid and well formed. This means, I believe, that your pages will be more likely to endure, and it will be far more likely to work on the next greatest thing that reads it (glasses-mounted mini projectors that display holographic web pages that seem to be floating in space 12 inched from your eyes!).

Interestingly, Powell's more recent HTML and CSS (5th edition, 2010, McGraw Hill , 832 p.) makes it clear that he now wonders whether he was mistaken in 2003 about how certain it was that XHTML was the way of the future. Powell's more recent book features HTML5, a sharp diversion from the tighter standards of XHTML. Powell comments:

Starting in 2004, a group of well-known organizations and indivduals got together [...] to produce a new version of HTML. The exact reasons and motivations for this effort seem to vary depending on who you talk to—slow uptake of XHTML, frustration with the lack of movement by the Web standards body, need for innovation, or any one of many other reasons—but, whatever the case, the aim was to create a new, rich future for Web applications that include HTML as a foundation element. Aspects of the emerging specification such as the canvas element have already shown up in browsers like Safari and Firefox, so by 2008, the efforts of this group were rolled into the W3C and drafts began to emerge. Whether this makes HTML5 become official or likely to be fully adopted is obviously somewhat at the mercy of the browser vendors and the market, but clearly another very likely path for the future of markup goes through HTML5. Already we see Google adopting it in various places, so its future looks bright.

—Powell, ibid., p. 52).

[...] HTML5 is quite a bit different than HTML4 [...]. There are many new tags and there is a tremendous emphasis on interactivity and Web application development. However, probably the most interesting aspect of HTML5 is the focus on defining what browsers—or, more widely, user agents in general—are supposed to do when they encounter ill-formed markup. HTML5, by defining known outcomes, makes it much more likely that today's "tag soup" will be parsed predictably by tomorrow's browsers. Unfortunately, read another way, it provides yet more reasons for those who create such a mess of markup not to change their bad habits.

Likely, the future of markup has more than one possible outcome. My opinion is that those who produce professional-grade markup or who write tools to do so will continue to embrace standards. XML or not, while those who dabble with code and have fun will continue to work with little understanding of the rules they break and will have no worries about doing so. The forgiveness that HTML allows is both the key to its popularity and, ultimately, the curse of the unpredictability often associated with it.

—ibid, p. 53

What's important is to remember that web developers are on a moving train. In this course, you come up to speed with the way things are working at the moment, in a way that allows you to ride the train. In a few years, the web will have moved to a new place. In the interim, code written with the discipline of an XHTML framework (i.e., if you just follow the few rules outlined above) has a high probability of being useful wherever and however users access your pages.

Validators

If you code "properly," XHTML thinking goes, and use a !DOCTYPE and DTD declaration, you may also want to take advantage of the W3C's validator. This is accessible directly, or through a really amazing Mozilla web developer download. You can obtain these here:

W3C Markup Validator Service: http://validator.w3.org/
Mozilla's Web Developer Addon: https://addons.mozilla.org/en-US/firefox/addon/web-developer/

To use the validator, you copy the URL of your page (e.g., www.com.uri.edu/com271/zeus/index.html) and paste it in the validator text box. The page is then "read" and any parts that do not conform to your specified document type's rules (determined by the DTD that begins your page) (i.e., XHTML rules, above) are pointed out; you are given the line and shown where your code is not "valid," and it is explained. You can then go back and tweak your page. The feeling that you get when it all comes back "valid" is, I am told, a real rush!

The Mozilla addon creates a new toolbar. Included is a menu of tools for validating HTML (i.e., a direct link to the W3C validator), plus tools for checking CSS, accessbility under the government's section 508 rules (we'll cover these later), and "more." There are a whole raft of other features, which I'll try to highlight in a later edition of this page and in the lab. For now, download the developer and play; I think you'll figure it out, or we can look at it together in lab.

HTML "Rules": Extensible markup languages;"Well-formed" and "valid" XHTML;Validators and the Mozilla Web Developer Add-on.

XHTML: "Well formed" and "valid" pages

The Rules of HTML(Extracting from or paraphrasing Powell, 2003, p. 18-23)

Validators

HTML "Rules": Extensible markup languages;
"Well-formed" and "valid" XHTML;
Validators and the Mozilla Web Developer Add-on.

The Rules of HTML
(Extracting from or paraphrasing Powell, 2003, p. 18-23)