File:
[LON-CAPA] /
doc /
gutshtml /
SessionFou1.html
Revision
1.1:
download - view:
text,
annotated -
select for diffs
Fri Jun 28 20:30:29 2002 UTC (22 years, 3 months ago) by
www
Branches:
MAIN
CVS tags:
version_0_99_3,
version_0_99_2,
version_0_99_1,
version_0_99_0,
version_0_6_2,
version_0_6,
version_0_5_1,
version_0_5,
version_0_4,
stable_2002_july,
conference_2003,
STABLE,
HEAD
HTML version of GUTS manual. Individual files will still need cleanup.
1: <html>
2: <head>
3: <meta name=Title
4: content="Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style Files) (Guy)">
5: <meta http-equiv=Content-Type content="text/html; charset=macintosh">
6: <link rel=Edit-Time-Data href="Session%20Fou1_files/editdata.mso">
7: <title>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style
8: Files) (Guy)</title>
9: <style><!--
10: .MsoHeader
11: {tab-stops:center 3.0in right 6.0in;
12: font-size:10.0pt;
13: font-family:"Times New Roman";}
14: .MsoPlainText
15: {font-size:10.0pt;
16: font-family:"Courier New";}
17: .Section1
18: {page:Section1;}
19: .Section2
20: {page:Section2;}
21: -->
22: </style>
23: </head>
24: <body bgcolor=#FFFFFF link=blue vlink=purple class="Normal" lang=EN-US>
25: <div class=Section1>
26: <h2>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style
27: Files) (Guy)</h2>
28: <h3><a name="_Toc421867121">XML Files</a></h3>
29: <p><span style='color:black'>All HTML / XML files are run through the lonxml
30: handler before being served to a user. This allows us to rewrite many portion
31: of a document and to support serverside tags. There are 2 ways to add new
32: tags to the xml parsing engine, either through LON-CAPA style files or by
33: writing Perl tag handlers for the desired tags. </span></p>
34: <p><span style='color:black'><b>Global Variables</b></span></p>
35: <p><span style='color:black'>*
36: <i>$Apache::lonxml::debug</i></span><span
37: style='color:black'> - debugging control </span></p>
38: <p><span style='color:black'>*
39: <i>@Apache::lonxml::pwd</i></span><span
40: style='color:black'> - path to the directory containing the file currently being
41: processed </span></p>
42: <p><span style='color:black'>*
43: <i>@Apache::lonxml::outputstack</i></span><span
44: style='color:black'> </span></p>
45: <p><span style='color:black'><i>$Apache::lonxml::redirection</i></span><span
46: style='color:black'> - these two are used for capturing a subset of the output
47: for later processing, don't touch them directly use &startredirection
48: and &endredirection </span></p>
49: <p><span style='color:black'>*
50: <i>$Apache::lonxml::import</i></span><span
51: style='color:black'> - controls whether the <import> tag actually does anything
52: </span></p>
53: <p><span style='color:black'>*
54: <i>@Apache::lonxml::extlinks</i></span><span
55: style='color:black'> - a list of URLs that the user is allowed to look at because
56: of the current resource (images, and links) </span></p>
57: <p><span style='color:black'>*
58: <i>$Apache::lonxml::metamode</i></span><span
59: style='color:black'> - some output is turned off, the meta target wants a specific
60: subset, use <output> to guarentee that the catianed data will be in
61: the parsing output </span></p>
62: <p><span style='color:black'>*
63: <i>$Apache::lonxml::evaluate</i></span><span
64: style='color:black'> - controls whether run::evaluate actually derefences variable
65: references </span></p>
66: <p><span style='color:black'>*
67: <i>%Apache::lonxml::insertlist</i></span><span
68: style='color:black'> - data structure for edit mode, determines what tags can
69: go into what other tags </span></p>
70: <p><span style='color:black'>*
71: <i>@Apache::lonxml::namespace</i></span><span
72: style='color:black'> - stores the list of tag namespaces used in the insertlist.tab
73: file that are currently active, used only in edit mode. </span></p>
74: <p><span style='color:black'>*
75: <i>$Apache::lonxml::registered</i></span><span
76: style='color:black'> - set to 1 once the remote has been updated to know what
77: resource we are looking at. </span></p>
78: <p><span style='color:black'>*
79: <i>$Apache::lonxml::request</i></span><span
80: style='color:black'> - current Apache request object, or undef </span></p>
81: <p><span style='color:black'>*
82: <i>$Apache::lonxml::curdepth</i></span><span
83: style='color:black'> - current depth of the overall parse depth. Will be a string
84: like: 2_3_1 (first tag in the third second level tag in the second toplevel
85: tag). It gets set by callsub, and can be used in Perl tag implementations.
86: It relies upon the internal globals: <i>@Apache::lonxml::depthcounter</i></span><span
87: style='color:black'>, <i>$Apache::lonxml::depth</i></span><span
88: style='color:black'>, <i>$Apache::lonxml::olddepth</i></span><span
89: style='color:black'> </span></p>
90: <p><span style='color:black'>*
91: <i>$Apache::lonxml::prevent_entity_encode</i></span><span
92: style='color:black'> - By default the xmlparser will try to rencode any 8-bit
93: characters into HTMLEntity Codes, If this is set to a true value it will be
94: prevented. </span></p>
95: <p><span style='color:black'>In common usage, <i>$Apache::lonxml::prevent_entity_encode</i></span><span
96: style='color:black'>, <i>$Apache::lonxml::evaluate</i></span><span
97: style='color:black'>, <i>$Apache::lonxml::metamode</i></span><span
98: style='color:black'>, <i>$Apache::lonxml::import</i></span><span
99: style='color:black'>, should never be set to a value directly, but rather incremented
100: when you want the effect on, and decremented when you want the effect off.
101: </span></p>
102: <p><span style='color:black'><b>Notable Perl subroutines</b></span></p>
103: <p><span style='color:black'>If not specified these functions are in Apache::lonxml
104: </span></p>
105: <p><span style='color:black'>*
106: <i>xmlparse</i></span><span
107: style='color:black'> - see the XMLPARSE figure - also not callable from inside
108: a tag, if one needs to restart parsing, either create add a new LCParser to
109: the parser stack parser using the newparser function, or call inner_xmlparser,
110: see the xmlparse function in scripttag.pm </span></p>
111: <p><span style='color:black'>*
112: <i>recurse</i></span><span
113: style='color:black'> - acts just like <i>xmlparse</i></span><span
114: style='color:black'>, except it doesn't do the style definition check it always
115: calls <i>callsub</i></span><span style='color:black'> </span></p>
116: <p><span style='color:black'>*
117: <i>callsub</i></span><span
118: style='color:black'> - callsub looks if a perl subroutine is defined for the current
119: tag and calls. Otherwise it just returns the tag as it was read in. It also
120: will throw on a default editing interface unless the tag has a defined subroutine
121: that either returns something or requests that call sub not add the editing
122: interface. </span></p>
123: <p><span style='color:black'>*
124: <i>afterburn</i></span><span
125: style='color:black'> - called on the output of xmlparse, it can add highlights,
126: anchors, and links to regular expersion matches to the output. </span></p>
127: <p><span style='color:black'>*
128: <i>register_insert</i></span><span
129: style='color:black'> - builds the %Apache::lonxml::insertlist structure of what
130: tags can have what other tags inside. </span></p>
131: <p><span style='color:black'>*
132: <i>whichuser</i></span><span
133: style='color:black'> - returns a list of $symb, $courseid, $domain, $name that
134: is correct for calls to lonnet functions for this setup. Uses form.grade_
135: parameters, if the user is allowed to mgr in the course </span></p>
136: <p><span style='color:black'>*
137: <i>setup_globals</i></span><span
138: style='color:black'> - initializes all lonxml globals when xmlparse is called.
139: If you intend to create a new target you will likely need to tweak how the
140: globals are setup upon start up. </span></p>
141: <p><span style='color:black'>*
142: <i>init_safespace</i></span><span
143: style='color:black'> - creates Holes to external functions, creates some global
144: variables, and set the permitted operators of the global Safespace intepreter.
145: </span></p>
146: <p><span style='color:black'><b>Functions Tag Handlers can use</b></span></p>
147: <p><span style='color:black'>If not specified these functions are in Apache::lonxml
148: </span></p>
149: <p><span style='color:black'>*
150: <i>debug</i></span><span
151: style='color:black'> - a function to call to printout debugging messages. Will
152: only print when Apache::lonxml::debug is set to 1 </span></p>
153: <p><span style='color:black'>*
154: <i>warning</i></span><span
155: style='color:black'> - a function to use for warning messages. The message will
156: appear at the top of a resource when it is viewed in construction space only.
157: </span></p>
158: <p><span style='color:black'>*
159: <i>error</i></span><span
160: style='color:black'> - a function to use for error messages. The message will
161: appear at the top of a resource when it is viewed in construction space, and
162: will message the resource author and course instructor, while informing the
163: student that an error has occured otherwise. </span></p>
164: <p><span style='color:black'>*
165: <i>get_all_text</i></span><span
166: style='color:black'> - 2 args, tag to look for (need to use /tag to look for an
167: end tag) and a HTML::TokeParser reference, it will repedelyt get text from
168: the TokeParser until the requested tag is found. It will return all of the
169: document it pulled form the TokeParser. (See Apache::scripttag::start_script
170: for an example of usage.) </span></p>
171: <p><span style='color:black'>*
172: <i>get_param</i></span><span
173: style='color:black'> - 4 arguments, first is a scaler sting of the argument needed,
174: second is a reference to the parser arguments stack, third is a reference
175: to the Safe space, and fourth is an optional "context" value. This
176: subroutine allows a tag to get a tag argument, after being interpolated inside
177: the Safe space. This should be used if the tag might use a safe space variable
178: reference for the tag argument. (See Apache::scripttag::start_script for an
179: example.) This version only handles scalar variables. </span></p>
180: <p><span style='color:black'>*
181: <i>get_param_var</i></span><span
182: style='color:black'> - 4 arguments, first is a scaler sting of the argument needed,
183: second is a reference to the parser arguments stack, third is a reference
184: to the Safe space, and fourth is an optional "context" value. This
185: subroutine allows a tag to get a tag argument, after being interpolated inside
186: the Safe space. This should be used if the tag might use a safe space variable
187: reference for the tag argument. (See Apache::scripttag::start_script for an
188: example.) This version can handle list or hash variables properly. </span></p>
189: <p><span style='color:black'>*
190: <i>description</i></span><span
191: style='color:black'> - 1 argument, the token object. This will return the textual
192: decription of the current tag from the insertlist.tab file. </span></p>
193: <p><span style='color:black'>*
194: <i>whichuser</i></span><span
195: style='color:black'> - 0 arguments. This will take a look at the current environment
196: setting and return the current $symb, $courseid, $udom, $uname. You should
197: always use this function if you want to determine who the current user is.
198: (Since a instructor might be trying to view a students version of a resource.)
199: </span></p>
200: <p><span style='color:black'>*
201: <i>inner_xmlparse</i></span><span
202: style='color:black'> - 6 arguments, the target, an array pointer to the current
203: stack of tags, and array pointer to the current stack of tag arguments, an
204: array pointer to the current stack of LCParser's, a pointer to the current
205: Safe space, a pointer to the hash of current style definitions </span></p>
206: <p><span style='color:black'>*
207: <i>newparser</i></span><span
208: style='color:black'> - 3 args, first is a reference to the parser stack, second
209: should be a reference to a string scaler containg the text the newparser should
210: run over, third should be a scaler of the directory path the file the parser
211: is parsing was in. (See Apache::scripttag::start_import for an example.) </span></p>
212: <p><span style='color:black'>*
213: <i>register</i></span><span
214: style='color:black'> - should be called in a file's BEGIN block. 2 arguments,
215: a scaler string, and a list of strings. This allows a file to register what
216: tags it handles, and what the namespace of those tags are. Example: </span></p>
217: <p><span style='font-family:"Courier New";color:black'>sub BEGIN {</span></p>
218: <p><span style='font-family:"Courier New";color:black'> &Apache::lonxml::register('Apache::scripttag',('script','display'));</span></p>
219: <p><span style='font-family:"Courier New";color:black'>}</span></p>
220: <p><span style='color:black'>Would tell xmlparse that in Apache::scripttag it
221: can find handlers for <script> and <display>, if one regsiters
222: a tag that was already registered the previous one is remembered and will
223: be restored on a deregister. </span></p>
224: <p><span style='color:black'>*
225: <i>deregister</i></span><span
226: style='color:black'> - used to remove a previously registered tag implementation.
227: It will restore the previous registration if there was one. </span></p>
228: <p><span style='color:black'>*
229: <i>startredirection</i></span><span
230: style='color:black'> - used when a tag wants to save a portion of the document
231: for its end tag to use, but wants the intervening document to be normally
232: processed. (See Apache::scripttag::start_window for an example.) </span></p>
233: <p><span style='color:black'>*
234: <i>endredirection</i></span><span
235: style='color:black'> - used to stop preventing xmlparse from hiding output. The
236: return value is everthing that xmlparse has processed since the corresponding
237: startredirection. (See Apache::scripttag::end_window for an example.) </span></p>
238: <p><span style='color:black'>*
239: <i>Apache::run::evaluate</i></span><span
240: style='color:black'> - 3 args, first a string, second a reference to the Safe
241: space, 3 a string to be evaluated before the first arg. This subroutine will
242: do variable interpolation and simple function interpolations on the first
243: argument. (See Apache::lonxml::inner_xmlparse for an example.) </span></p>
244: <p><span style='color:black'>*
245: <i>Apache::run::run</i></span><span
246: style='color:black'> - 2 args, first a string, second a reference to the Safe
247: space. This handles passing the passed string into the Safe space for evaluation
248: and then returns the result. (See Apache::scripttag::start_script for an example.)</span></p>
249: <h3><a name="_Toc421867122">Style Files</a></h3>
250: <p><span style='color:black'> <img width=432 height=255
251: src="Session%20Fou1_files/image002.jpg" v:shapes="_x0000_i1025"> </span></p>
252: <p><span style='font-size:14.0pt;color:black'><b>Fig. 2.4.1</b></span><span
253: style='font-size:14.0pt;color:black'> Ð Using a style file</span></p>
254: <p><span style='color:black'><b>Style File specific tags</b></span></p>
255: <p><span style='color:black'><b><definetag></b></span><span
256: style='color:black'> - 2 arguments, <i>name</i></span><span style='color:black'>
257: name of new tag being defined, if proceeded with a / defining an end tag,
258: required; <i>parms</i></span><span style='color:black'> parameters of the
259: new tag, the value of these parameters can be accesed by $parametername. </span></p>
260: <p><span style='color:black'>*
261: <b><render></b></span><span
262: style='color:black'> - define what the new tag does for a non meta target </span></p>
263: <p><span style='color:black'>*
264: <b><meta></b></span><span
265: style='color:black'> - define what the new tag does for a meta target </span></p>
266: <p><span style='color:black'>*
267: <b><tex> / <web> / <latexsource></b></span><span style='color:black'>
268: - define what a new tag does for a specific no meta target, all data inside
269: a <render> is render to all targets except when surrounded by a specific
270: target tags.</span><span style='font-size:16.0pt;color:black'> </span></p>
271: <p class=MsoHeader> <img width=432 height=243
272: src="Session%20Fou1_files/image005.png" v:shapes="_x0000_i1026"> </p>
273: <p><span style='font-size:14.0pt'><b>Fig. 2.4.2</b></span><span
274: style='font-size:14.0pt'> Ð The parser</span></p>
275: <h3><a name="_Toc421867123">HTML::LCParser - Alternative HTML::Parser interface</a></h3>
276: <p class=MsoPlainText>SYNOPSIS</p>
277: <p class=MsoPlainText> require HTML::LCParser;</p>
278: <p class=MsoPlainText> $p = HTML::LCParser->new("index.html")
279: || die "Can't open: $!";</p>
280: <p class=MsoPlainText> while (my $token = $p->get_token) {</p>
281: <p class=MsoPlainText> #...</p>
282: <p class=MsoPlainText> }</p>
283: <p class=MsoPlainText>DESCRIPTION</p>
284: <p class=MsoPlainText>The C<HTML::LCParser> is an alternative interface
285: to the</p>
286: <p class=MsoPlainText>C<HTML::Parser> class. It is an C<HTML::PullParser>
287: subclass.</p>
288: <p class=MsoPlainText>The following methods are available:</p>
289: <p class=MsoPlainText>* $p = HTML::LCParser->new( $file_or_doc );</p>
290: <p class=MsoPlainText>The object constructor argument is either a file name,
291: a file handle</p>
292: <p class=MsoPlainText>object, or the complete document to be parsed.</p>
293: <p class=MsoPlainText>If the argument is a plain scalar, then it is taken as
294: the name of a</p>
295: <p class=MsoPlainText>file to be opened and parsed. If the file can't
296: be opened for</p>
297: <p class=MsoPlainText>reading, then the constructor will return an undefined
298: value and $!</p>
299: <p class=MsoPlainText>will tell you why it failed.</p>
300: <p class=MsoPlainText>If the argument is a reference to a plain scalar, then
301: this scalar is</p>
302: <p class=MsoPlainText>taken to be the literal document to parse. The value
303: of this</p>
304: <p class=MsoPlainText>scalar should not be changed before all tokens have been
305: extracted.</p>
306: <p class=MsoPlainText>Otherwise the argument is taken to be some object that
307: the</p>
308: <p class=MsoPlainText>C<HTML::LCParser> can read() from when it needs
309: more data. Typically</p>
310: <p class=MsoPlainText>it will be a filehandle of some kind. The stream
311: will be read() until</p>
312: <p class=MsoPlainText>EOF, but not closed.</p>
313: <p class=MsoPlainText>It also will turn attr_encoded on by default.</p>
314: <p class=MsoPlainText>* $p->get_token</p>
315: <p class=MsoPlainText>This method will return the next I<token> found
316: in the HTML document,</p>
317: <p class=MsoPlainText>or C<undef> at the end of the document. The
318: token is returned as an</p>
319: <p class=MsoPlainText>array reference. The first element of the array
320: will be a (mostly)</p>
321: <p class=MsoPlainText>single character string denoting the type of this token:
322: "S" for start</p>
323: <p class=MsoPlainText>tag, "E" for end tag, "T" for text,
324: "C" for comment, "D" for</p>
325: <p class=MsoPlainText>declaration, and "PI" for process instructions.
326: The rest of the array</p>
327: <p class=MsoPlainText>is the same as the arguments passed to the corresponding
328: HTML::Parser</p>
329: <p class=MsoPlainText>v2 compatible callbacks (see L<HTML::Parser>).
330: In summary, returned</p>
331: <p class=MsoPlainText>tokens look like this:</p>
332: <p class=MsoPlainText> ["S", $tag, $attr, $attrseq, $text,
333: $line]</p>
334: <p class=MsoPlainText> ["E", $tag, $text, $line]</p>
335: <p class=MsoPlainText> ["T", $text, $is_data, $line]</p>
336: <p class=MsoPlainText> ["C", $text, $line]</p>
337: <p class=MsoPlainText> ["D", $text, $line]</p>
338: <p class=MsoPlainText> ["PI", $token0, $text, $line]</p>
339: <p class=MsoPlainText>where $attr is a hash reference, $attrseq is an array
340: reference and</p>
341: <p class=MsoPlainText>the rest are plain scalars.</p>
342: <p class=MsoPlainText>* $p->unget_token($token,...)</p>
343: <p class=MsoPlainText>If you find out you have read too many tokens you can
344: push them back,</p>
345: <p class=MsoPlainText>so that they are returned the next time $p->get_token
346: is called.</p>
347: <p class=MsoPlainText>* $p->get_tag( [$tag, ...] )</p>
348: <p class=MsoPlainText>This method returns the next start or end tag (skipping
349: any other</p>
350: <p class=MsoPlainText>tokens), or C<undef> if there are no more tags in
351: the document. If</p>
352: <p class=MsoPlainText>one or more arguments are given, then we skip tokens until
353: one of the</p>
354: <p class=MsoPlainText>specified tag types is found. For example:</p>
355: <p class=MsoPlainText> $p->get_tag("font", "/font");</p>
356: <p class=MsoPlainText>will find the next start or end tag for a font-element.</p>
357: <p class=MsoPlainText>The tag information is returned as an array reference
358: in the same form</p>
359: <p class=MsoPlainText>as for $p->get_token above, but the type code (first
360: element) is</p>
361: <p class=MsoPlainText>missing. A start tag will be returned like this:</p>
362: <p class=MsoPlainText> [$tag, $attr, $attrseq, $text]</p>
363: <p class=MsoPlainText>The tagname of end tags are prefixed with "/",
364: i.e. end tag is</p>
365: <p class=MsoPlainText>returned like this:</p>
366: <p class=MsoPlainText> ["/$tag", $text]</p>
367: <p class=MsoPlainText>* $p->get_text( [$endtag] )</p>
368: <p class=MsoPlainText>This method returns all text found at the current position.
369: It will</p>
370: <p class=MsoPlainText>return a zero length string if the next token is not text.
371: The</p>
372: <p class=MsoPlainText>optional $endtag argument specifies that any text occurring
373: before the</p>
374: <p class=MsoPlainText>given tag is to be returned. All entities are unmodified.</p>
375: <p class=MsoPlainText>The $p->{textify} attribute is a hash that defines
376: how certain tags can</p>
377: <p class=MsoPlainText>be treated as text. If the name of a start tag matches
378: a key in this</p>
379: <p class=MsoPlainText>hash then this tag is converted to text. The hash
380: value is used to</p>
381: <p class=MsoPlainText>specify which tag attribute to obtain the text from.
382: If this tag</p>
383: <p class=MsoPlainText>attribute is missing, then the upper case name of the
384: tag enclosed in</p>
385: <p class=MsoPlainText>brackets is returned, e.g. "[IMG]". The
386: hash value can also be a</p>
387: <p class=MsoPlainText>subroutine reference. In this case the routine is
388: called with the</p>
389: <p class=MsoPlainText>start tag token content as its argument and the return
390: value is treated</p>
391: <p class=MsoPlainText>as the text.</p>
392: <p class=MsoPlainText>The default $p->{textify} value is:</p>
393: <p class=MsoPlainText> {img => "alt", applet => "alt"}</p>
394: <p class=MsoPlainText>This means that <IMG> and <APPLET> tags are
395: treated as text, and that</p>
396: <p class=MsoPlainText>the text to substitute can be found in the ALT attribute.</p>
397: <p class=MsoPlainText>* $p->get_trimmed_text( [$endtag] )</p>
398: <p class=MsoPlainText>Same as $p->get_text above, but will collapse any sequences
399: of white</p>
400: <p class=MsoPlainText>space to a single space character. Leading and trailing
401: white space is</p>
402: <p class=MsoPlainText>removed.</p>
403: <p class=MsoPlainText>EXAMPLES</p>
404: <p class=MsoPlainText>This example extracts all links from a document.
405: It will print one</p>
406: <p class=MsoPlainText>line for each link, containing the URL and the textual
407: description</p>
408: <p class=MsoPlainText>between the <A>...</A> tags:</p>
409: <p class=MsoPlainText> use HTML::LCParser;</p>
410: <p class=MsoPlainText> $p = HTML::LCParser->new(shift||"index.html");</p>
411: <p class=MsoPlainText> while (my $token = $p->get_tag("a"))
412: {</p>
413: <p class=MsoPlainText> my $url = $token->[1]{href}
414: || "-";</p>
415: <p class=MsoPlainText> my $text = $p->get_trimmed_text("/a");</p>
416: <p class=MsoPlainText> print "$url\t$text\n";</p>
417: <p class=MsoPlainText> }</p>
418: <p class=MsoPlainText>This example extract the <TITLE> from the document:</p>
419: <p class=MsoPlainText> use HTML::LCParser;</p>
420: <p class=MsoPlainText> $p = HTML::LCParser->new(shift||"index.html");</p>
421: <p class=MsoPlainText> if ($p->get_tag("title")) {</p>
422: <p class=MsoPlainText> my $title = $p->get_trimmed_text;</p>
423: <p class=MsoPlainText> print "Title: $title\n";</p>
424: <p class=MsoPlainText> }</p>
425: </div>
426: <br
427: clear=ALL style='page-break-before:always;'>
428: <div class=Section2> </div>
429: </body>
430: </html>
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>