doc/gutshtml/SessionFou1.html - view

File: [LON-CAPA] / doc / gutshtml / SessionFou1.html
Revision 1.1: download - view: text, annotated - select for diffs
Fri Jun 28 20:30:29 2002 UTC (22 years, 3 months ago) by www
Branches: MAIN
CVS tags: version_0_99_3, version_0_99_2, version_0_99_1, version_0_99_0, version_0_6_2, version_0_6, version_0_5_1, version_0_5, version_0_4, stable_2002_july, conference_2003, STABLE, HEAD

HTML version of GUTS manual. Individual files will still need cleanup.

1: <html> 2: <head> 3: <meta name=Title 4: content="Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style Files) (Guy)"> 5: <meta http-equiv=Content-Type content="text/html; charset=macintosh"> 6: <link rel=Edit-Time-Data href="Session%20Fou1_files/editdata.mso"> 7: <title>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style 8: Files) (Guy)</title> 9: <style> 22: </style> 23: </head> 24: <body bgcolor=#FFFFFF link=blue vlink=purple class="Normal" lang=EN-US> 25: <div class=Section1> 26: <h2>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style 27: Files) (Guy)</h2> 28: <h3><a name="_Toc421867121">XML Files</a></h3> 29: All HTML / XML files are run through the lonxml 30: handler before being served to a user. This allows us to rewrite many portion 31: of a document and to support serverside tags. There are 2 ways to add new 32: tags to the xml parsing engine, either through LON-CAPA style files or by 33: writing Perl tag handlers for the desired tags. 34: Global Variables 35: *          36: $Apache::lonxml::debug - debugging control 38: *          39: @Apache::lonxml::pwd - path to the directory containing the file currently being 41: processed 42: *          43: @Apache::lonxml::outputstack 45: $Apache::lonxml::redirection - these two are used for capturing a subset of the output 47: for later processing, don't touch them directly use &startredirection 48: and &endredirection 49: *          50: $Apache::lonxml::import - controls whether the <import> tag actually does anything 52: 53: *          54: @Apache::lonxml::extlinks - a list of URLs that the user is allowed to look at because 56: of the current resource (images, and links) 57: *          58: $Apache::lonxml::metamode - some output is turned off, the meta target wants a specific 60: subset, use <output> to guarentee that the catianed data will be in 61: the parsing output 62: *          63: $Apache::lonxml::evaluate - controls whether run::evaluate actually derefences variable 65: references 66: *          67: %Apache::lonxml::insertlist - data structure for edit mode, determines what tags can 69: go into what other tags 70: *          71: @Apache::lonxml::namespace - stores the list of tag namespaces used in the insertlist.tab 73: file that are currently active, used only in edit mode. 74: *          75: $Apache::lonxml::registered - set to 1 once the remote has been updated to know what 77: resource we are looking at. 78: *          79: $Apache::lonxml::request - current Apache request object, or undef 81: *          82: $Apache::lonxml::curdepth - current depth of the overall parse depth. Will be a string 84: like: 2_3_1 (first tag in the third second level tag in the second toplevel 85: tag). It gets set by callsub, and can be used in Perl tag implementations. 86: It relies upon the internal globals: @Apache::lonxml::depthcounter, $Apache::lonxml::depth, $Apache::lonxml::olddepth 90: *          91: $Apache::lonxml::prevent_entity_encode - By default the xmlparser will try to rencode any 8-bit 93: characters into HTMLEntity Codes, If this is set to a true value it will be 94: prevented. 95: In common usage, $Apache::lonxml::prevent_entity_encode, $Apache::lonxml::evaluate, $Apache::lonxml::metamode, $Apache::lonxml::import, should never be set to a value directly, but rather incremented 100: when you want the effect on, and decremented when you want the effect off. 101: 102: Notable Perl subroutines 103: If not specified these functions are in Apache::lonxml 104: 105: *          106: xmlparse - see the XMLPARSE figure - also not callable from inside 108: a tag, if one needs to restart parsing, either create add a new LCParser to 109: the parser stack parser using the newparser function, or call inner_xmlparser, 110: see the xmlparse function in scripttag.pm 111: *          112: recurse - acts just like xmlparse, except it doesn't do the style definition check it always 115: calls callsub 116: *          117: callsub - callsub looks if a perl subroutine is defined for the current 119: tag and calls. Otherwise it just returns the tag as it was read in. It also 120: will throw on a default editing interface unless the tag has a defined subroutine 121: that either returns something or requests that call sub not add the editing 122: interface. 123: *          124: afterburn - called on the output of xmlparse, it can add highlights, 126: anchors, and links to regular expersion matches to the output. 127: *          128: register_insert - builds the %Apache::lonxml::insertlist structure of what 130: tags can have what other tags inside. 131: *          132: whichuser - returns a list of $symb, $courseid, $domain, $name that 134: is correct for calls to lonnet functions for this setup. Uses form.grade_ 135: parameters, if the user is allowed to mgr in the course 136: *          137: setup_globals - initializes all lonxml globals when xmlparse is called. 139: If you intend to create a new target you will likely need to tweak how the 140: globals are setup upon start up. 141: *          142: init_safespace - creates Holes to external functions, creates some global 144: variables, and set the permitted operators of the global Safespace intepreter. 145: 146: Functions Tag Handlers can use 147: If not specified these functions are in Apache::lonxml 148: 149: *          150: debug - a function to call to printout debugging messages. Will 152: only print when Apache::lonxml::debug is set to 1 153: *          154: warning - a function to use for warning messages. The message will 156: appear at the top of a resource when it is viewed in construction space only. 157: 158: *          159: error - a function to use for error messages. The message will 161: appear at the top of a resource when it is viewed in construction space, and 162: will message the resource author and course instructor, while informing the 163: student that an error has occured otherwise. 164: *          165: get_all_text - 2 args, tag to look for (need to use /tag to look for an 167: end tag) and a HTML::TokeParser reference, it will repedelyt get text from 168: the TokeParser until the requested tag is found. It will return all of the 169: document it pulled form the TokeParser. (See Apache::scripttag::start_script 170: for an example of usage.) 171: *          172: get_param - 4 arguments, first is a scaler sting of the argument needed, 174: second is a reference to the parser arguments stack, third is a reference 175: to the Safe space, and fourth is an optional "context" value. This 176: subroutine allows a tag to get a tag argument, after being interpolated inside 177: the Safe space. This should be used if the tag might use a safe space variable 178: reference for the tag argument. (See Apache::scripttag::start_script for an 179: example.) This version only handles scalar variables. 180: *          181: get_param_var - 4 arguments, first is a scaler sting of the argument needed, 183: second is a reference to the parser arguments stack, third is a reference 184: to the Safe space, and fourth is an optional "context" value. This 185: subroutine allows a tag to get a tag argument, after being interpolated inside 186: the Safe space. This should be used if the tag might use a safe space variable 187: reference for the tag argument. (See Apache::scripttag::start_script for an 188: example.) This version can handle list or hash variables properly. 189: *          190: description - 1 argument, the token object. This will return the textual 192: decription of the current tag from the insertlist.tab file. 193: *          194: whichuser - 0 arguments. This will take a look at the current environment 196: setting and return the current $symb, $courseid, $udom, $uname. You should 197: always use this function if you want to determine who the current user is. 198: (Since a instructor might be trying to view a students version of a resource.) 199: 200: *          201: inner_xmlparse - 6 arguments, the target, an array pointer to the current 203: stack of tags, and array pointer to the current stack of tag arguments, an 204: array pointer to the current stack of LCParser's, a pointer to the current 205: Safe space, a pointer to the hash of current style definitions 206: *          207: newparser - 3 args, first is a reference to the parser stack, second 209: should be a reference to a string scaler containg the text the newparser should 210: run over, third should be a scaler of the directory path the file the parser 211: is parsing was in. (See Apache::scripttag::start_import for an example.) 212: *          213: register - should be called in a file's BEGIN block. 2 arguments, 215: a scaler string, and a list of strings. This allows a file to register what 216: tags it handles, and what the namespace of those tags are. Example: 217: sub BEGIN { 218:   &Apache::lonxml::register('Apache::scripttag',('script','display')); 219: } 220: Would tell xmlparse that in Apache::scripttag it 221: can find handlers for <script> and <display>, if one regsiters 222: a tag that was already registered the previous one is remembered and will 223: be restored on a deregister. 224: *          225: deregister - used to remove a previously registered tag implementation. 227: It will restore the previous registration if there was one. 228: *          229: startredirection - used when a tag wants to save a portion of the document 231: for its end tag to use, but wants the intervening document to be normally 232: processed. (See Apache::scripttag::start_window for an example.) 233: *          234: endredirection - used to stop preventing xmlparse from hiding output. The 236: return value is everthing that xmlparse has processed since the corresponding 237: startredirection. (See Apache::scripttag::end_window for an example.) 238: *          239: Apache::run::evaluate - 3 args, first a string, second a reference to the Safe 241: space, 3 a string to be evaluated before the first arg. This subroutine will 242: do variable interpolation and simple function interpolations on the first 243: argument. (See Apache::lonxml::inner_xmlparse for an example.) 244: *          245: Apache::run::run - 2 args, first a string, second a reference to the Safe 247: space. This handles passing the passed string into the Safe space for evaluation 248: and then returns the result. (See Apache::scripttag::start_script for an example.) 249: <h3><a name="_Toc421867122">Style Files</a></h3> 250: <img width=432 height=255 251: src="Session%20Fou1_files/image002.jpg" v:shapes="_x0000_i1025"> 252: Fig. 2.4.1 � Using a style file 254: Style File specific tags 255: <definetag> - 2 arguments, name 257: name of new tag being defined, if proceeded with a / defining an end tag, 258: required; parms parameters of the 259: new tag, the value of these parameters can be accesed by $parametername. 260: *          261: <render> - define what the new tag does for a non meta target 263: *          264: <meta> - define what the new tag does for a meta target 266: *          267: <tex> / <web> / <latexsource> 268: - define what a new tag does for a specific no meta target, all data inside 269: a <render> is render to all targets except when surrounded by a specific 270: target tags. 271: <img width=432 height=243 272: src="Session%20Fou1_files/image005.png" v:shapes="_x0000_i1026"> 273: Fig. 2.4.2 � The parser 275: <h3><a name="_Toc421867123">HTML::LCParser - Alternative HTML::Parser interface</a></h3> 276: SYNOPSIS 277:  require HTML::LCParser; 278:  $p = HTML::LCParser->new("index.html") 279: || die "Can't open: $!"; 280:  while (my $token = $p->get_token) { 281:      #... 282:  } 283: DESCRIPTION 284: The C<HTML::LCParser> is an alternative interface 285: to the 286: C<HTML::Parser> class.  It is an C<HTML::PullParser> 287: subclass. 288: The following methods are available: 289: * $p = HTML::LCParser->new( $file_or_doc ); 290: The object constructor argument is either a file name, 291: a file handle 292: object, or the complete document to be parsed. 293: If the argument is a plain scalar, then it is taken as 294: the name of a 295: file to be opened and parsed.  If the file can't 296: be opened for 297: reading, then the constructor will return an undefined 298: value and $! 299: will tell you why it failed. 300: If the argument is a reference to a plain scalar, then 301: this scalar is 302: taken to be the literal document to parse.  The value 303: of this 304: scalar should not be changed before all tokens have been 305: extracted. 306: Otherwise the argument is taken to be some object that 307: the 308: C<HTML::LCParser> can read() from when it needs 309: more data.  Typically 310: it will be a filehandle of some kind.  The stream 311: will be read() until 312: EOF, but not closed. 313: It also will turn attr_encoded on by default. 314: * $p->get_token 315: This method will return the next I<token> found 316: in the HTML document, 317: or C<undef> at the end of the document.  The 318: token is returned as an 319: array reference.  The first element of the array 320: will be a (mostly) 321: single character string denoting the type of this token: 322: "S" for start 323: tag, "E" for end tag, "T" for text, 324: "C" for comment, "D" for 325: declaration, and "PI" for process instructions.  326: The rest of the array 327: is the same as the arguments passed to the corresponding 328: HTML::Parser 329: v2 compatible callbacks (see L<HTML::Parser>).  330: In summary, returned 331: tokens look like this: 332:   ["S",  $tag, $attr, $attrseq, $text, 333: $line] 334:   ["E",  $tag, $text, $line] 335:   ["T",  $text, $is_data, $line] 336:   ["C",  $text, $line] 337:   ["D",  $text, $line] 338:   ["PI", $token0, $text, $line] 339: where $attr is a hash reference, $attrseq is an array 340: reference and 341: the rest are plain scalars. 342: * $p->unget_token($token,...) 343: If you find out you have read too many tokens you can 344: push them back, 345: so that they are returned the next time $p->get_token 346: is called. 347: * $p->get_tag( [$tag, ...] ) 348: This method returns the next start or end tag (skipping 349: any other 350: tokens), or C<undef> if there are no more tags in 351: the document.  If 352: one or more arguments are given, then we skip tokens until 353: one of the 354: specified tag types is found.  For example: 355:    $p->get_tag("font", "/font"); 356: will find the next start or end tag for a font-element. 357: The tag information is returned as an array reference 358: in the same form 359: as for $p->get_token above, but the type code (first 360: element) is 361: missing. A start tag will be returned like this: 362:   [$tag, $attr, $attrseq, $text] 363: The tagname of end tags are prefixed with "/", 364: i.e. end tag is 365: returned like this: 366:   ["/$tag", $text] 367: * $p->get_text( [$endtag] ) 368: This method returns all text found at the current position. 369: It will 370: return a zero length string if the next token is not text.  371: The 372: optional $endtag argument specifies that any text occurring 373: before the 374: given tag is to be returned. All entities are unmodified. 375: The $p->{textify} attribute is a hash that defines 376: how certain tags can 377: be treated as text.  If the name of a start tag matches 378: a key in this 379: hash then this tag is converted to text.  The hash 380: value is used to 381: specify which tag attribute to obtain the text from.  382: If this tag 383: attribute is missing, then the upper case name of the 384: tag enclosed in 385: brackets is returned, e.g. "[IMG]".  The 386: hash value can also be a 387: subroutine reference.  In this case the routine is 388: called with the 389: start tag token content as its argument and the return 390: value is treated 391: as the text. 392: The default $p->{textify} value is: 393:   {img => "alt", applet => "alt"} 394: This means that <IMG> and <APPLET> tags are 395: treated as text, and that 396: the text to substitute can be found in the ALT attribute. 397: * $p->get_trimmed_text( [$endtag] ) 398: Same as $p->get_text above, but will collapse any sequences 399: of white 400: space to a single space character.  Leading and trailing 401: white space is 402: removed. 403: EXAMPLES 404: This example extracts all links from a document.  405: It will print one 406: line for each link, containing the URL and the textual 407: description 408: between the <A>...</A> tags: 409:   use HTML::LCParser; 410:   $p = HTML::LCParser->new(shift||"index.html"); 411:   while (my $token = $p->get_tag("a")) 412: { 413:       my $url = $token->[1]{href} 414: || "-"; 415:       my $text = $p->get_trimmed_text("/a"); 416:       print "$url\t$text\n"; 417:   } 418: This example extract the <TITLE> from the document: 419:   use HTML::LCParser; 420:   $p = HTML::LCParser->new(shift||"index.html"); 421:   if ($p->get_tag("title")) { 422:       my $title = $p->get_trimmed_text; 423:       print "Title: $title\n"; 424:   } 425: </div> 426: 428: <div class=Section2> </div> 429: </body> 430: </html>