<!--#include virtual="commontop.html" -->
<title>Web Programming Step by Step, Lecture 18: XML</title>
</head>
<body>
<div class="layout">
<div id="controls"><!-- DO NOT EDIT --></div>
<div id="currentSlide"><!-- DO NOT EDIT --></div>
<div id="header"></div>
<div id="footer">
<h1><em>Web Programming Step by Step</em>, Lecture 18</h1>
<h2>XML</h2>
</div>
</div>
<div class="presentation">
<div class="slide">
<h1><a href="http://www.webstepbook.com/">Web Programming Step by Step</a></h1>
<h3>Lecture 18 <br /> XML</h3>
<h4>Reading: 10.3 - 10.4</h4>
<p class="license">
Except where otherwise noted, the contents of this presentation are Copyright 2010 Marty Stepp and Jessica Miller.
</p>
<div class="w3c">
<a href="http://validator.w3.org/check/referer"><img src="images/w3c-xhtml11.png" alt="Valid XHTML 1.1" /></a>
<a href="http://jigsaw.w3.org/css-validator/check/referer"><img src="images/w3c-css.png" alt="Valid CSS!" /></a>
</div>
</div>
<!--
Bad way to store text data
just a bunch of lines
with key=value pairings
show .ini file from old Windows?
idea of old days with lots of config files, etc.
why bad?
not standardized
error-prone
no common software to read/write them
-->
<div class="slide">
<h1>The bad way to store data</h1>
<pre class="examplecode xml">
My note:
BEGIN
TO: Tove
FROM: Jani
SUBJECT: Reminder
MESSAGE (english):
Hey there,
Don't forget to call me this weekend!
END
</pre>
<ul>
<li>we could use an Ajax request to fetch a file like this from the server</li>
<li>what's wrong with this approach?</li>
</ul>
</div>
<div class="slide">
<h1>What is XML?</h1>
<ul>
<li><strong>XML</strong>: a "skeleton" for creating markup languages</li>
<li>you already know it!
<ul>
<li>syntax is identical to XHTML's:
<pre class="examplecode xml"><code><<var>tagName</var> <var>attribute</var>="<var>value</var>"><var>content</var></<var>tagName</var>>
<<var>tagName</var> <var>attribute</var>="<var>value</var>"/></code></pre>
</li>
</ul>
</li>
<li>languages written in XML specify:
<ul>
<li>names of tags <span class="aside">in XHTML: <code>h1</code>, <code>div</code>, <code>img</code>, <code>a</code>, etc.</span></li>
<li>names of attributes <span class="aside">in XHTML: <code>id</code>, <code>class</code>, <code>src</code>, <code>href</code>, etc.</span></li>
<li>rules about how they go together <span class="aside">in XHTML: certain elements must be inside others (block/inline)</span></li>
</ul>
</li>
<li>used to present complex data in human-readable form
<ul>
<li>"self-describing data"</li>
</ul>
</li>
</ul>
</div>
<div class="slide">
<h1>Anatomy of an XML file</h1>
<pre class="examplecode xml">
<em><?xml version="1.0" encoding="UTF-8"?></em> <span class="comment"><!-- <span class="term">XML prolog</span> --></span>
<em><note></em> <span class="comment"><!-- <span class="term">root element</span> --></span>
<em><to>Tove</to></em> <span class="comment"><!-- <span class="term">element</span> --></span>
<em><from></em>Jani<em></from></em> <span class="comment"><!-- <span class="term">open/closing tags</span> --></span>
<subject><em>Reminder</em></subject> <span class="comment"><!-- <span class="term">content</span> --></span>
<message <em>language</em>="<em>english</em>"> <span class="comment"><!-- <span class="term">attribute</span> and its <span class="term">value</span> --></span>
Don't forget me this weekend!
</message>
<em></note></em>
</pre>
<ul>
<li>begins with an <code><?xml ... ?></code> header tag ("<span class="term">prolog</span>")</li>
<li>has a single <span class="term">root element</span> (in this case, <code>note</code>)</li>
<li>tag, attribute, and comment syntax is just like XHTML</li>
</ul>
</div>
<div class="slide">
<h1>Doctypes and Schemas</h1>
<ul>
<li>"rule books" for individual flavors of XML
<ul>
<li>XML provides universal basic <em>syntax</em></li>
<li>doctypes and schemas specify <em>lexicon</em> (names of tags and attributes) and <em>grammar</em> (how they can be used together)</li>
</ul>
</li>
<li>used to <em>validate</em> XML files to make sure they follow the rules of that "flavor"
<ul>
<li>the W3C HTML validator uses the XHTML doctype to validate your HTML</li>
</ul>
</li>
<li>for more info:
<ul>
<li><a href="http://en.wikipedia.org/wiki/Document_Type_Definition">Document Type Definition (DTD)</a> ("doctype")</li>
<li><a href="http://en.wikipedia.org/wiki/XML_Schema_%28W3C%29">W3C XML Schema</a></li>
</ul>
</li>
<li>optional — if you don't have one, there are no rules beyond having well-formed XML syntax</li>
<li>(we won't cover these any further here)</li>
</ul>
</div>
<div class="slide">
<h1>Uses of XML</h1>
<ul>
<li>XML data comes from many sources on the web:
<ul>
<li><span class="term">web servers</span> store data as XML files</li>
<li><span class="term">databases</span> sometimes return query results as XML</li>
<li><span class="term">web services</span> use XML to communicate</li>
</ul>
</li>
<li>XML is the <em>de facto</em> universal format for exchange of data</li>
<li>XML languages are used for <a href="http://en.wikipedia.org/wiki/MusicXML">music</a>, <a href="http://en.wikipedia.org/wiki/MathML">math</a>, <a href="http://en.wikipedia.org/wiki/SVG">vector graphics</a></li>
<li>popular use: <a href="http://en.wikipedia.org/wiki/RSS">RSS</a> for news feeds & podcasts</li>
</ul>
</div>
<div class="slide">
<h1>Pros and cons of XML</h1>
<ul>
<li>pro:
<ul style="margin-top: 0em">
<li style="margin-top: 0em">easy to read (for humans and computers)</li>
<li>standard format makes automation easy</li>
<li>don't have to "reinvent the wheel" for storing new types of data</li>
<li>international, platform-independent, open/free standard</li>
<li>can represent almost any general kind of data (record, list, tree)</li>
</ul>
</li>
<li>
con:
<ul style="margin-top: 0em">
<li style="margin-top: 0em">bulky syntax/structure makes files large; can decrease performance
<ul>
<li>example: <a href="http://en.wikipedia.org/wiki/MathML#Example">quadratic formula in MathML</a></li>
</ul>
</li>
<li>can be hard to "shoehorn" data into a good XML format</li>
</ul>
</li>
</ul>
</div>
<div class="slide">
<h1>What tags are legal in XML?</h1>
<ul>
<li><em>any tags you want!</em></li>
<li>examples:
<ul>
<li>an email message might use tags called <code>to</code>, <code>from</code>, <code>subject</code></li>
<li>a library might use tags called <code>book</code>, <code>title</code>, <code>author</code></li>
</ul>
</li>
<li>when designing an XML file, <em>you</em> choose the tags and attributes that best represent the data</li>
<li>rule of thumb: data = tag, metadata = attribute</li>
</ul>
</div>
<div class="slide">
<h1>XML and Ajax</h1>
<div class="rightfigure">
<img src="images/ajax_bleach.gif" alt="Ajax bleach" />
</div>
<ul>
<li>
web browsers can display XML files, but often you instead want to fetch one and analyze its data
</li>
<li>
the XML data is fetched, processed, and displayed using Ajax
<ul>
<li>
(XML is the "X" in "Ajax")
</li>
</ul>
</li>
<li>
It would be very clunky to examine a complex XML structure as just a giant string!
</li>
<li>luckily, the browser can break apart (<strong>parse</strong>) XML data into a set of objects
<ul>
<li>there is an XML DOM, very similar to the (X)HTML DOM</li>
</ul>
</li>
</ul>
</div>
<!--
<div class="slide">
<h1>Structure of an XML document</h1>
<p>
XML's syntax looks much like XHTML's:
</p>
<ul>
<li>a header, then a single document tag that can contain other tags
<pre class="example xml">
<?xml version="1.0" encoding="UTF-8"?>
<var>document tag</var>
</pre>
</li>
<li>tag syntax:
<pre class="example xml">
<<var>element</var> <var>attribute(s)</var>>
<var>text or tags</var>
</<var>element</var>>
</pre>
<ul>
<li>or a tag with no inner content can end with /></li>
</ul>
</li>
<li>attribute syntax:
<pre class="example xml">
<var>name</var>="<var>value</var>"
</pre>
</li>
<li>comments: <code><!- <var>comment</var> -></code></li>
<li>entities: <code>&lt;</code></li>
</ul>
</div>
-->
<div class="slide">
<h1>XML DOM tree structure</h1>
<div class="rightfigure" style="width: 45%">
<img src="images/figure_6_textbook_cat.gif" alt="node tree" style="width: 100%" />
</div>
<pre class="examplecode xml">
<?xml version="1.0" encoding="UTF-8"?>
<categories>
<category>children</category>
<category>computers</category>
...
</categories>
</pre>
<ul>
<li>the XML tags have a tree structure</li>
<li>DOM nodes have parents, children, and siblings</li>
</ul>
</div>
<div class="slide">
<h1>Recall: Javascript XML (XHTML) DOM</h1>
<p>
The DOM properties and methods* we already know can be used on XML nodes:
</p>
<ul>
<li>properties:
<ul>
<li><code>firstChild</code>, <code>lastChild</code>, <code>childNodes</code>, <code>nextSibling</code>, <code>previousSibling</code>, <code>parentNode</code></li>
<li><code><strong>nodeName</strong></code>, <code><strong>nodeType</strong></code>, <code><strong>nodeValue</strong></code>, <code><strong>attributes</strong></code></li>
</ul>
</li>
<li>methods:
<ul>
<li><code>appendChild</code>, <code>insertBefore</code>, <code>removeChild</code>, <code>replaceChild</code></li>
<li><code><strong>getElementsByTagName</strong></code>, <strong><code>getAttribute</code></strong>, <code><strong>hasAttributes</strong></code>, <code><strong>hasChildNodes</strong></code></li>
</ul>
</li>
<li>caution: cannot use HTML-specific properties like <code>innerHTML</code> in the XML DOM!</li>
<!--
<li>Prototype methods:
<ul>
<li>
<a href="http://prototypejs.org/api/element/ancestors"><code>ancestors</code></a>,
<a href="http://prototypejs.org/api/element/childElements"><code>childElements</code></a>,
<a href="http://prototypejs.org/api/element/descendants"><code>descendants</code></a>,
<a href="http://prototypejs.org/api/element/firstDescendant"><code>firstDescendant</code></a>,
<a href="http://prototypejs.org/api/element/descendantOf"><code>descendantOf</code></a>,
<a href="http://prototypejs.org/api/element/next"><code>next</code></a>,
<a href="http://prototypejs.org/api/element/previous"><code>previous</code></a>,
<a href="http://prototypejs.org/api/element/siblings"><code>siblings</code></a>,
<a href="http://prototypejs.org/api/element/previousSiblings"><code>previousSiblings</code></a>,
<a href="http://prototypejs.org/api/element/nextSiblings"><code>nextSiblings</code></a>,
<a href="http://prototypejs.org/api/element/adjacent"><code>adjacent</code></a>
</li>
</ul>
</li>
-->
</ul>
<p class="aside" style="margin-top: 2em">
* (though not Prototype's, such as <code>up</code>, <code>down</code>, <code>ancestors</code>, <code>childElements</code>, or <code>siblings</code>)
</p>
</div>
<div class="slide">
<h1>Navigating the node tree</h1>
<ul>
<li>
caution: can <em>only</em> use standard DOM methods/properties in XML DOM (<strong>NOT Prototype's</strong>)
</li>
<li>
caution: can't use <code>id</code>s or <code>class</code>es to use to get specific nodes (no <code>$</code> or <code>$$</code>). Instead:
<pre class="syntaxtemplate js">
<span class="comment">// returns all child tags inside node that use the given element</span>
var elms = <var>node</var>.getElementsByTagName("<var>tagName</var>");
</pre>
</li>
<li class="incremental">
caution: can't use <code>innerHTML</code> to get the text inside a node. Instead:
<pre class="syntaxtemplate js">
var text = <var>node</var>.firstChild.nodeValue;
</pre>
</li>
<li class="incremental">
caution: can't use <code>.<var>attributeName</var></code> to get an attribute's value from a node. Instead:
<pre class="syntaxtemplate js">
var attrValue = <var>node</var>.getAttribute("<var>attrName</var>");
</pre>
</li>
</ul>
</div>
<div class="slide">
<h1>Using XML data in a web page</h1>
<ol style="font-size: smaller;">
<li>use Ajax to fetch data</li>
<li class="incremental">use DOM methods to examine XML:
<ul>
<li><code><var>XMLnode</var>.getElementsByTagName("<var>tag</var>")</code></li>
</ul>
</li>
<li class="incremental">extract the data we need from the XML:
<ul>
<li><code><var>XMLelement</var>.getAttribute("<var>name</var>")</code>, <code><var>XMLelement</var>.firstChild.nodeValue</code>, etc.</li>
</ul>
</li>
<li class="incremental">create new HTML nodes and populate with extracted data:
<ul>
<li><code>document.createElement("<var>tag</var>")</code>, <code><var>HTMLelement</var>.innerHTML</code></li>
</ul>
</li>
<li class="incremental">inject newly-created HTML nodes into page
<ul>
<li><code><var>HTMLelement</var>.appendChild(<var>element</var>)</code></li>
</ul>
</li>
</ol>
</ul>
</div>
<div class="slide">
<h1>Fetching XML using AJAX (template)</h1>
<pre class="syntaxtemplate js">
new Ajax.Request("<var>url</var>",
{
method: "get",
onSuccess: <var>functionName</var>
}
);
...
function <var>functionName</var>(ajax) {
<var>do something with ajax.response<em>XML</em></var>;
}
</pre>
<ul>
<li><code>ajax.response<em>Text</em></code> contains the XML data in plain text</li>
<li><code>ajax.<a href="http://www.w3schools.com/ajax/ajax_responsexml.asp">response<em>XML</em></a></code> is a pre-parsed XML DOM object</li>
</ul>
</div>
<div class="slide">
<h1>Analyzing a fetched XML file using DOM</h1>
<pre class="examplecode xml" style="font-size: smaller">
<?xml version="1.0" encoding="UTF-8"?>
<employees>
<lawyer money="5"/>
<janitor name="Sue"><vacuumcleaner/></janitor>
<janitor name="Bill">too poor</janitor>
</employees>
</pre>
<p>We can use DOM properties and methods on <code>ajax.responseXML</code>:</p>
<pre class="examplecode js" style="font-size: smaller">
<span class="comment">// zeroth element of array of length 1</span>
var employeesTag = <em>ajax.responseXML</em>.getElementsByTagName("employees")<em>[0]</em>;
<span class="comment">// how much money does the lawyer make?</span>
var lawyerTag = <em>employeesTag</em>.getElementsByTagName("lawyer")[0];
var salary = lawyerTag.<em>getAttribute("money")</em>; <span class="comment">// "5"</span>
<span class="comment">// array of 2 janitors</span>
var janitorTags = employeesTag.getElementsByTagName("janitor");
var excuse = janitorTags[1].<em>firstChild.nodeValue</em>; <span class="comment">// " too poor "</span>
</pre>
</div>
<div class="slide">
<h1>Analyzing a fetched XML file using DOM (2)</h1>
<pre class="examplecode xml" style="font-size: smaller">
<?xml version="1.0" encoding="UTF-8"?>
<employees>
<lawyer money="5"/>
<janitor name="Bill"><vacuumcleaner/></janitor>
<janitor name="Sue">too poor</janitor>
</employees>
</pre>
<p>
What are the results of the following expressions?
</p>
<pre class="examplecode js" style="font-size: smaller">
<span class="comment">// zeroth element of array of length 1</span>
var employeesTag = <em>ajax.responseXML</em>.getElementsByTagName("employees")<em>[0]</em>;
</pre>
<ul style="font-size: smaller">
<li>
<code>employeesTag.firstChild</code>
</li>
<li>
<code>ajax.responseXML.getElementsByTagName("lawyer")</code>
</li>
<li>
<code>employeesTag.getElementsByTagName("janitor").length</code>
</li>
<li>
<code>employeesTag.getElementsByTagName("janitor")[0].firstChild</code>
</li>
<li>
<code>employeesTag.getElementsByTagName("janitor")[1].firstChild</code>
</li>
<li>
<code>employeesTag.getElementsByTagName("janitor")[0].nextSibling</code>
</li>
</ul>
</div>
<div class="slide">
<h1>Larger XML file example</h1>
<pre class="examplecode xml" style="font-size: smaller">
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year><price>30.00</price>
</book>
<book category="computers">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<year>2003</year><price>49.99</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year><price>29.99</price>
</book>
<book category="computers">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year><price>39.95</price>
</book>
</bookstore>
</pre>
</div>
<!--
<div class="slide">
<h1>Details about XML node <a href="http://www.w3schools.com/dom/dom_nodes_info.asp">properties</a></h1>
<ul>
<li><code>nodeType</code> : what kind of node it is
<table>
<tr><th class="spaced">Kind of node</th><th><code>nodeType</code> value</th></tr>
<tr><td>element</td><td>1</td></tr>
<tr><td>attribute</td><td>2</td></tr>
<tr><td>text</td><td>3</td></tr>
<tr><td>comment</td><td>8</td></tr>
<tr><td>document</td><td>9</td></tr>
</table>
</li>
<li><code>nodeName</code> : uppercase version of tag such as <code>"DIV"</code> or <code>"ARTICLE"</code>
<ul>
<li>an attribute node's name is the attribute's name</li>
<li>all text nodes have name <code>"#text"</code></li>
<li>document node has name <code>"#document"</code></li>
</ul>
</li>
<li><code>nodeValue</code> : text inside a text node, or value of an attribute node</li>
</ul>
</div>
-->
<div class="slide">
<h1>Navigating node tree example</h1>
<pre class="examplecode js narrow">
<span class="comment">// make a paragraph for each book about computers</span>
var books = ajax.responseXML.getElementsByTagName("book");
for (var i = 0; i < books.length; i++) {
var category = books[i].getAttribute("category");
if (category == "computers") {
<span class="comment">// extract data from XML</span>
var title = books[i].getElementsByTagName("title")[0].firstChild.nodeValue;
var author = books[i].getElementsByTagName("author")[0].firstChild.nodeValue;
<span class="comment">// make an XHTML <p> tag containing data from XML</span>
var p = document.createElement("p");
p.innerHTML = title + ", by " + author;
document.body.appendChild(p);
}
}
</pre>
</div>
<div class="slide">
<h1>Exercise: Late day distribution</h1>
<ul>
<li>Write a program that shows how many students turn homework in late for each assignment.</li>
<li>Data service here: <a href="http://webster.cs.washington.edu/cse190m/hw/hw.php"><code>http://webster.cs.washington.edu/cse190m/hw/hw.php</code></a>
<ul>
<li>parameter: <code>assignment=hw<var>N</var></code></li>
</ul>
</li>
</ul>
</div>
<div class="slide">
<h1>A historical interlude: why XHTML?</h1>
<ul>
<li>in XML, different "flavors" can be combined in single document</li>
<li>theoretical benefit of including other XML data in XHTML
<ul>
<li>nobody does this</li>
</ul>
</li>
<li>most embedded data are in non-XML formats (e.g., Flash)
<ul>
<li>non-XML data must be embedded another way (we'll talk about this later on)</li>
</ul>
</li>
<li>requires browser/plugin support for other "flavor" of XML
<ul>
<li>development slow to nonexistent</li>
<li>most XML flavors are specialized uses</li>
</ul>
</li>
</ul>
</div>
<div class="slide">
<h1>Exercise: Animal game</h1>
<ul>
<li>Write a program that guesses which animal the user is thinking of. The program will arrive at a guess based on the user's responses to yes or no questions. The questions come from a web app named <code>animalgame.php</code>.</li>
</ul>
<div class="centerfigure">
<img src="images/animalgame.png" alt="The Animal Game" />
</div>
</div>
<div class="slide">
<h1>Practice problem: Animal game (cont'd)</h1>
<ul>
<li>The data comes in the following format:
<pre class="syntaxtemplate xml">
<node nodeid="<var>id</var>">
<question><var>question</var></question>
<yes nodeid="<var>id</var>" />
<no nodeid="<var>id</var>" />
</node>
</pre>
<pre class="syntaxtemplate xml">
<node nodeid="<var>id</var>">
<answer><var>answer</var></answer>
</node>
</pre>
</li>
<li>to get a node with a given id: <code>animalgame.php?nodeid=<var>id</var></code>
</li>
<li>start by requesting the node with <code>nodeid</code> of <code>1</code> to get the first question</li>
</ul>
</div>
<div class="slide">
<h1>Attacking the problem</h1>
<p>Questions we should ask ourselves:</p>
<ul>
<li>How do I retrieve data from the web app? (what URL, etc.)</li>
<li>Once I retrieve a piece of data, what should I do with it?</li>
<li>When the user clicks "Yes", what should I do?</li>
<li>When the user clicks "No", what should I do?</li>
<li>How do I know when the game is over? What should I do in this case?</li>
</ul>
</div>
<div class="slide">
<h1>Debugging <code>responseXML</code> in Firebug</h1>
<div class="centerfigure">
<img src="images/firebug_debug_ajax.png" alt="Firebug Debug Ajax" style="max-width: 70%;" />
</div>
<ul>
<li>can examine the entire XML document, its node/tree structure</li>
</ul>
</div>
<!--#include virtual="commonbottom.html" -->