wellformedness

wellformedness - XML An Introduction… An Introduction XML...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: XML An Introduction… An Introduction XML gives you considerable power to choose your own XML element types and invent your own grammars to create custom-made markup languages. custom-made Flexibility, however, can be dangerous for XML parsers if they don’t have some minimal rules to protect them. A parser dedicated to a single markup language such as an HTML (not XHTML) browser can accept some sloppiness in markup, because the set of tags is small and there is far less complexity in a web page. XML processors, on the other hand, have to be XML prepared for any kind of markup language, and thus a set of ground rules is necessary. set An Introduction These ground rules are very simple syntax constraints. All tags must use the proper delimiters. For example, an end tag must follow a start tag; elements can’t overlap and so on. Documents that satisfy these rules are said to be wellformed and will thus pass the wellformedness check. Wellformedness checks are done by some type of parser which will check your document(s) to make sure they follow all of the rules they are encoded with. Some of these rules are listed in the following slides. XML Rules For Creating Wellformed XML… Rule 1: Elements are case-sensitive. If you define you language to use lowercase elements, then all instances of those elements must be in lowercase, however, a language however can also be defined to use both uppercase and lowercase or just uppercase elements. XHTML, for example, has been defined to use lowercase for all of it’s elements. Bad Examples… <H1>Sample Heading</H1> <h1>Sample Heading</H1> <H1>Sample Heading</h1> XHTML XHTML XHTML These examples are all valid under HTML but These under XHTML they are invalid since they are in the incorrect case. incorrect Good Examples… <h1>Sample Heading</h1> <Roster-Name>CIS97YT</Roster-Name> XHTML Roster Language Rule 2: All All elements that contain text or other elements must have both start and ending tags. must Thus in XHTML, for example, a paragraph is only correct if a <p> and </p> are present. Bad Examples… <ul> <li>item 1 <li>item 2 <li>item 3 </ul> <list> <listitem>tomatoes <listitem>lettuce <listitem>green onion </list> Incorrect due to missing </li> tags. …and </listitem> tags. XHTML Kitchen Language Good Examples… <ul> <li>item 1</li> <li>item 2</li> <li>item 3</li> </ul> <list> <listitem>tomatoes</listitem> <listitem>lettuce</listitem> <listitem>green onion</listitem> </list> XHTML Kitchen Language Rule 3: All All empty elements (commonly known as standalone tags) must have a slash (/) before the end of the tag. the Thus in XHTML, for example, if you were to include a break or horizontal rule, you would enter <br /> or <hr />. In XHTML, you must also insert a space before the slash – other XML based languages do not require this space. This is to make the documents backwards browser compliant. Bad Examples… <img src=”icon.png”> <graphic filename=“icon.png”> XHTML Graphic Language These examples are invalid since they are both These examples of empty tags missing the slash (/) at the end of the tag to conclude the elements. the Good Examples… <img src=”icon.png”/> <graphic filename=“icon.png”/> XHTML Graphic Language Rule 4: All attributes must have a value. Any attributes in HTML that are also in XHTML that didn’t have a value will now have a value the same as the attribute name in lowercase. For example, the noshade attribute in a horizontal rule (<hr>) element will have a value of noshade and thus would appear as: <hr noshade=“noshade”/> Bad Examples… <hr noshade /> <graphic filename=“icon.png” border/> XHTML Graphic Language These examples are invalid since they are both These examples of elements having attributes without their required values. their Good Examples… <hr noshade=“noshade” /> <graphic filename=“icon.png” border=“1”/> XHTML Graphic Language Rule 5: All All attribute values must be contained in quotes, either single or double – no exceptions! quotes, Thus in XHTML unlike HTML, all values must be contained in single or double quotes, including literals (strings) and numerical values in order to be wellformed. Bad Examples… <img src=icon.png /> <graphic filename=icon.png/> XHTML Graphic Language These examples are invalid since they are both These examples elements with attributes that have values not enclosed in quotes. values Good Examples… <img src=”icon.png”/> <graphic filename=“icon.png”/> XHTML Graphic Language Rule 6: Elements Elements may not overlap. Elements must be nested properly within other elements and can not start before a subelement and end within the subelement. Bad Examples… <a>A bad example of <b>nesting</a> elements.</b> <list> <listitem>tomatoes</list> </listitem> XHTML Kitchen Language These examples are invalid since they are both These examples of overlapping elements or improper nesting. nesting. Good Examples… <a>A good example of <b>nesting</b> elements.</a> <list> <listitem>tomatoes</listitem> </list> XHTML Kitchen Language Rule 7: Isolated Isolated markup characters (characters essential to creating markup documents) may not appear in parsed content as is. not Isolated markup characters must be represented as a character entity and include the following: <, [, ], >, ', " and &. Isolated Markup Characters < &lt; [ &#91; &#93; > &gt; ' &apos; " &quot; & &amp; Bad Examples… <h1>Jack & Jill</h1> <equation>5 < 2</equation> XHTML Math Language These examples are invalid since they are both These examples including isolated markup characters examples that are not included as a character entity. Good Examples… <h1>Jack &amp; Jill</h1> <equation>5 &lt; 2</equation> XHTML Math Language Rule 8: Character Character entities must start with an ampersand (&) and end with a semi-colon (;) – no exceptions! no Thus in XHTML, if you wanted to include a non-breaking space you would enter &nbsp; and not &nbsp (without a semi-colon). Bad Examples… <h1>Jack &amp Jill</h1> <equation>5 &lt 2</equation> XHTML Math Language These examples are invalid since they are both These examples forgetting the semi-colon following the character entity. the Good Examples… <h1>Jack &amp; Jill</h1> <equation>5 &lt; 2</equation> XHTML Math Language Rule 9: Numerical Numerical character entities must have a hash (#) immediately after the ampersand (&) – no exceptions! exceptions! Thus in XHTML, if you wanted to include a midsized dot you would enter &#183; and not &183; (without a hash). Bad Examples… <div>&183; Mid-sized dot</div> <equation>5 &60; 2</equation> XHTML Math Language These examples are invalid since they are both These examples forgetting the hash symbol following the ampersand. the Good Examples… <div>&#183; Mid-sized dot</div> <equation>5 &#60; 2</equation> XHTML Math Language Rule 10: Element Element (and attribute) names must start with either a letter (uppercase or lowercase) or a underscore. underscore. Element names may contain letters, numbers, Element hyphens, periods and underscores inclusively. hyphens, Colons are only allowed for namespaces. Bad Examples… <bad*characters> <illegal space> <99number-start> Good Examples… <example-one> <_example2> <Example.Three> XML Checking For Wellformedness… Checking for Wellformedness Now that we know the rules, must now check ALL of Now our XML based documents to make sure they are wellformed and we will do this using the Xerces parser. At school Xerces is installed on both the Wintel and Unix machines but at home you will need to download the xmljar files. They are located at the following URL: http://puma.deanza.fhda.edu/distribute/marie /CIS97YT%20-%20XML/Download%20Area/ If you are using windows, proceed to the Windows directory and if you are using Linux, proceed to the Linux directory. Checking for Wellformedness If you do not have Java 1.3 or higher installed on your machine at home, please install that prior to installed your XML jar files. [NOTE: If all this installation business scares you, you can always ssh (or telnet) to voyager and check your documents there.] Once you have an environment that is capable of checking your documents for wellformedness you must either download or construct a batchfile (Wintel) or shell script (Unix). You can find these files in the same download area under batch files and shell scripts. Wintel Checking for Wellformedness The contents of the wellformed.bat batch file will look like the following: echo off java -cp f:\xmljar\xerces-2_1_0\xercesImpl.jar;f:\xmljar\xerces2_1_0\xmlParserAPIs.jar;f:\xmljar\xerces-2_1_0\xercesSamples.jar dom.Counter %1 Download Now Wintel Checking for Wellformedness To run the checker, enter the following command at the DOS command line: > wellformed my_xml_instance_file.xml Unix/Linux Checking for Wellformedness The contents of the wellformed.sh batch file will look like the following: #!/bin/sh java -cp /usr/local/xmljar/xerces-2_0_0/xercesImpl.jar:\ /usr/local/xmljar/xerces-2_0_0/xmlParserAPIs.jar:\ /usr/local/xmljar/xerces-2_0_0/xercesSamples.jar dom.Counter $1 Download Now Unix/Linux Checking for Wellformedness To run the checker, enter the following command at the Unix/Linux command line: > wellformed.sh my_xml_instance_file.xml Don’t forget to set the scripts permissions to 755! Checking for Wellformedness Finally, as you are running your checker you will discover errors in your XML instance file. Note that every single time the parser finds an error, it will bomb out telling you were the error is. Locate the error in your instance file, save it and then recheck it again! Do this procedure until you get a positive message Do stating ONLY your filename followed by a few statistics: statistics: myFile.xml: 148;33;0 ms (16 elems, 8 attrs, 0 spaces, 245 chars) XML Other Notes… More about XML Based Documents All XML and XHTML documents must technically begin All with the XML specification processing instruction and must appear before any other mark-up or comments. The XML specification processing instruction (PI) must include the version attribute along with the version of xml that the document is based upon. The XML specification will look like the following: <?xml version=“1.0”?> Warning: When Internet Explorer interprets an XHTML Warning: document on the web, the PI must not be present. document More about XML Based Documents The XML specification PI tag also supports several other attributes including the encoding attribute and the standalone attribute. The encoding attribute allows us to specify how the document is encoded. The default value is UTF-8. However, you can specify other values as well. For our assignments we will assume this encoding. The standalone attribute allows us to indicate whether or not there is a grammar file associated with this XML instance document. By default it is assumed that there is a grammar file which would give the standalone attribute a value of ‘no’ but if there isn’t, you need to specify this with a value of ‘yes’. More about XML Based Documents If we were to include all of the attributes within the XML specification PI for a document supporting the default attribute values it would look like: <?xml version=“1.0” encoding=“UTF-8” standalone=“no”?> More about XML Based Documents Given that we know the location of the grammar that Given XML and XHTML documents are based upon, a DOCTYPE tag will also be included AFTER the XML specification processing instruction. The DOCTYPE will link to a Document Type Definition (DTD) that the document should be based upon and is used to validate the document. The XHTML DOCTYPE tag will look like the following: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN“ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> XML XML Design Considerations… XML Document Design Goals The design goals for XML according to the W3C are: 1. XML shall be straightforwardly usable over the Internet. XML shall support a wide variety of applications. 3. XML shall be compatible with SGML. 4. It shall be easy to write programs which process XML It documents. documents. 5. The number of optional features in XML is to be kept to the The absolute minimum, ideally zero. absolute 6. XML documents should be human-legible and reasonably clear. 7. The XML design should be prepared quickly. 8. The design of XML shall be formal and concise. 9. XML documents shall be easy to create. 10. Terseness in XML markup is of minimal importance. 2. Lab Work.. Now that we have finished the first Unit you can now begin your FIRST assignment Check the Calendar for the due date! Next Time.. Next time we will investigate ways in which we can present our XML in the browser including CSS and Data Islands. ...
View Full Document

This note was uploaded on 12/11/2011 for the course CIS 92 taught by Professor Taylor-harper during the Fall '11 term at DeAnza College.

Ask a homework question - tutors are online