Notes
Outline
XML Overview
Kevin Grossnicklaus
SSE
Kvgros@sseinc.com
Overview
History
Basics
Comma Delimited Flat Files
Example:
“Kevin”, “Grossnicklaus”,”(555) 555-5555”
“George”, “Castanza”, “(111) 111-1111”
Drawbacks
Hard to describe
No relationships
Only supported simple kinds of data
What if a field included control characters (CR’s, LF’s)
No Typing
HTML
Tag based
Derived from SGML
Designed for web based UI presentation
Few Rules
Some browsers (Hint: IE) do a little extra cleanup making us sloppy coders
Example of HTML
<body>
Here is an example of some HTML<br>
Notice that the above <br> but a line break in my code<br>
<table>
<tr>
<td>Kevin</td>
<td>Grossnicklaus</td>
</tr>
<tr>
<td>George</td>
<td>Castanza</td>
</tr>
</table>
Drawbacks of HTML to store data
Conflicting Standards
No forced structure
No way to specify language
Hard coded links
XML is a STANDARD
XML is a set of specifications from the World Wide Web Consortium (W3C)
Publicly, freely available specs at http://www.w3.org/TR
Universal industry support
Anyone can contribute, comment and implement
XML
Goals (as given in the version 1.0 spec)
XML shall be straight forwardly usable over the internet
XML shall support a wide variety of applications
XML shall be compatible with SGML
It shall be easy to write programs the process XML documents
The number of optional features in XML is to be kept to the absolute minimum, ideally zero
XML documents should be human-legible and reasonably clear
The XML design should be prepared quickly
The design of XML shall be formal and concise
XML documents shall be easy to create
Terseness in XML markup is of minimal importance
XML Structure: Elements
Elements
With Content:
<ElementName>Content</ElementName>
Without Content
<ElementName/>
Note:  <ElementName/> = <ElementName></ElementName>
Can be nested
XML Structure: Attributes
Define Element
<ElementName AttributeName=“AttributeValue”></ElementName>
Unordered
Can be declared only once per element
Can be used in empty elements
XML Structure: Comments
Example:
<!- - Comment Text - ->
Ignored by most XML processors
Basic Element Example
<? Xml version = “1.0”>
<- - This is a sample of a person XML - ->
<people>
<person Employee=“Yes”>
<First_Name>Kevin</First_Name>
<Last_Name>Grossnicklaus</Last_Name>
</person>
<person Employee=“No”>
<First_Name>George</First_Name>
<Last_Name>Castanza</Last_Name>
</person>
</people>
XML RULES
The document must contain a single root element
Every element must be correctly nested
Each attribute can have only one value
All attribute values must be enclosed in double quotation marks or single quotation marks
Elements must have begin and end tags, unless they are empty elements
Empty elements are denoted by a single tag ending with a slash (/)
Isolated markup characters are not allowed in content.  The special characters <, &, and > are represented as &gt, &amp, &lt in content sections.
A double quotation mark is represented as &quot, and a single quotation mark is represented as &apos in content sections
The sequence <[[ and ]]> cannot be used
If a document does not have a DTD, the values for all attributes must be of type CDATA by default
Element Normal vs. Attribute Normal
Example 1:
<Person NameFirst=“Kevin” NameLast=“Grossnicklaus”/>
Example 2:
<Person>
<NameFirst>Kevin</NameFirst>
<NameLast>Grossnicklaus</NameFirst>
</Person>
What is the difference?
Other Hints
Element names ARE CASE SENSITIVE
Names must start with a letter or “_”.  Can include letters, digits, “_”, “.”, “-”, but NO SPACES
Indent your XML to make it human readable
Processors ignore white space and indentation
Well Formed vs. Valid XML
Well Formed XML – XML data that does not reference a DTD and is therefore assumed to be correct
Valid XML – XML data that references a DTD and is validated as correct
Buzzwords
DTD – Defines the valid syntax of a class of XML documents.  It is not a valid XML document.
XML Schema – The definition of a specific XML structure.  It is a valid XML document.
Programming XML
There are two widely embraced interface suites for dealing with XML: SAX and DOM
The Simple API for XML (SAX) is a streaming interface Models an XML document as a sequence of method calls on receiver
The Document Object Model (DOM) is a traversal interface Models an XML document as a hierarchical graph of typed nodes
More Buzzwords
XPATH – Language for querying specific XML data from an XML document.
XSLT – XML based language for applying a formatting filter to an existing XML document
SOAP – XML format for making remote procedure calls through an HTTP interface
Microsoft.NET = XML
Visual Studio.NET
.NET Framework
SOAP, SOAP, SOAP
Links
Microsoft’s XML Developer Center
http://msdn.microsoft.com/xml/default.asp
Me
Kvgros@sseinc.com