File:  [LON-CAPA] / doc / gutshtml / SessionFou1.html
Revision 1.2: download - view: text, annotated - select for diffs
Tue Jul 22 14:47:00 2003 UTC (20 years, 9 months ago) by bowersj2
Branches: MAIN
CVS tags: version_2_9_X, version_2_9_99_0, version_2_9_1, version_2_9_0, version_2_8_X, version_2_8_99_1, version_2_8_99_0, version_2_8_2, version_2_8_1, version_2_8_0, version_2_7_X, version_2_7_99_1, version_2_7_99_0, version_2_7_1, version_2_7_0, version_2_6_X, version_2_6_99_1, version_2_6_99_0, version_2_6_3, version_2_6_2, version_2_6_1, version_2_6_0, version_2_5_X, version_2_5_99_1, version_2_5_99_0, version_2_5_2, version_2_5_1, version_2_5_0, version_2_4_X, version_2_4_99_0, version_2_4_2, version_2_4_1, version_2_4_0, version_2_3_X, version_2_3_99_0, version_2_3_2, version_2_3_1, version_2_3_0, version_2_2_X, version_2_2_99_1, version_2_2_99_0, version_2_2_2, version_2_2_1, version_2_2_0, version_2_1_X, version_2_1_99_3, version_2_1_99_2, version_2_1_99_1, version_2_1_99_0, version_2_1_3, version_2_1_2, version_2_1_1, version_2_1_0, version_2_12_X, version_2_11_X, version_2_11_4_uiuc, version_2_11_4_msu, version_2_11_4, version_2_11_3_uiuc, version_2_11_3_msu, version_2_11_3, version_2_11_2_uiuc, version_2_11_2_msu, version_2_11_2_educog, version_2_11_2, version_2_11_1, version_2_11_0_RC3, version_2_11_0_RC2, version_2_11_0_RC1, version_2_11_0, version_2_10_X, version_2_10_1, version_2_10_0_RC2, version_2_10_0_RC1, version_2_10_0, version_2_0_X, version_2_0_99_1, version_2_0_2, version_2_0_1, version_2_0_0, version_1_99_3, version_1_99_2, version_1_99_1_tmcc, version_1_99_1, version_1_99_0_tmcc, version_1_99_0, version_1_3_X, version_1_3_3, version_1_3_2, version_1_3_1, version_1_3_0, version_1_2_X, version_1_2_99_1, version_1_2_99_0, version_1_2_1, version_1_2_0, version_1_1_X, version_1_1_99_5, version_1_1_99_4, version_1_1_99_3, version_1_1_99_2, version_1_1_99_1, version_1_1_99_0, version_1_1_3, version_1_1_2, version_1_1_1, version_1_1_0, version_1_0_99_3, version_1_0_99_2, version_1_0_99_1, version_1_0_99, version_1_0_3, version_1_0_2, version_1_0_1, version_1_0_0, version_0_99_5, version_0_99_4, loncapaMITrelate_1, language_hyphenation_merge, language_hyphenation, bz6209-base, bz6209, HEAD, GCI_3, GCI_2, GCI_1, BZ4492-merge, BZ4492-feature_horizontal_radioresponse, BZ4492-feature_Support_horizontal_radioresponse, BZ4492-Support_horizontal_radioresponse
Convert GUTs HTML to PROPER line endings.

<html>

<head>

<meta name=Title

content="Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style Files) (Guy)">

<meta http-equiv=Content-Type content="text/html; charset=macintosh">

<link rel=Edit-Time-Data href="Session%20Fou1_files/editdata.mso">

<title>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style 

Files) (Guy)</title>

<style><!--

.MsoHeader

	{tab-stops:center 3.0in right 6.0in;

	font-size:10.0pt;

	font-family:"Times New Roman";}

.MsoPlainText

	{font-size:10.0pt;

	font-family:"Courier New";}

.Section1

	{page:Section1;}

.Section2

	{page:Section2;}

-->

</style>

</head>

<body bgcolor=#FFFFFF link=blue vlink=purple class="Normal" lang=EN-US>

<div class=Section1> 

  <h2>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style 

    Files) (Guy)</h2>

  <h3><a name="_Toc421867121">XML Files</a></h3>

  <p><span style='color:black'>All HTML / XML files are run through the lonxml 

    handler before being served to a user. This allows us to rewrite many portion 

    of a document and to support serverside tags. There are 2 ways to add new 

    tags to the xml parsing engine, either through LON-CAPA style files or by 

    writing Perl tag handlers for the desired tags. </span></p>

  <p><span style='color:black'><b>Global Variables</b></span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>$Apache::lonxml::debug</i></span><span

style='color:black'> - debugging control </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>@Apache::lonxml::pwd</i></span><span

style='color:black'> - path to the directory containing the file currently being 

    processed </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>@Apache::lonxml::outputstack</i></span><span

style='color:black'> </span></p>

  <p><span style='color:black'><i>$Apache::lonxml::redirection</i></span><span

style='color:black'> - these two are used for capturing a subset of the output 

    for later processing, don't touch them directly use &amp;startredirection 

    and &amp;endredirection </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>$Apache::lonxml::import</i></span><span

style='color:black'> - controls whether the &lt;import&gt; tag actually does anything 

    </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>@Apache::lonxml::extlinks</i></span><span

style='color:black'> - a list of URLs that the user is allowed to look at because 

    of the current resource (images, and links) </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>$Apache::lonxml::metamode</i></span><span

style='color:black'> - some output is turned off, the meta target wants a specific 

    subset, use &lt;output&gt; to guarentee that the catianed data will be in 

    the parsing output </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>$Apache::lonxml::evaluate</i></span><span

style='color:black'> - controls whether run::evaluate actually derefences variable 

    references </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>%Apache::lonxml::insertlist</i></span><span

style='color:black'> - data structure for edit mode, determines what tags can 

    go into what other tags </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>@Apache::lonxml::namespace</i></span><span

style='color:black'> - stores the list of tag namespaces used in the insertlist.tab 

    file that are currently active, used only in edit mode. </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>$Apache::lonxml::registered</i></span><span

style='color:black'> - set to 1 once the remote has been updated to know what 

    resource we are looking at. </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>$Apache::lonxml::request</i></span><span

style='color:black'> - current Apache request object, or undef </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>$Apache::lonxml::curdepth</i></span><span

style='color:black'> - current depth of the overall parse depth. Will be a string 

    like: 2_3_1 (first tag in the third second level tag in the second toplevel 

    tag). It gets set by callsub, and can be used in Perl tag implementations. 

    It relies upon the internal globals: <i>@Apache::lonxml::depthcounter</i></span><span

style='color:black'>, <i>$Apache::lonxml::depth</i></span><span

style='color:black'>, <i>$Apache::lonxml::olddepth</i></span><span

style='color:black'> </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>$Apache::lonxml::prevent_entity_encode</i></span><span

style='color:black'> - By default the xmlparser will try to rencode any 8-bit 

    characters into HTMLEntity Codes, If this is set to a true value it will be 

    prevented. </span></p>

  <p><span style='color:black'>In common usage, <i>$Apache::lonxml::prevent_entity_encode</i></span><span

style='color:black'>, <i>$Apache::lonxml::evaluate</i></span><span

style='color:black'>, <i>$Apache::lonxml::metamode</i></span><span

style='color:black'>, <i>$Apache::lonxml::import</i></span><span

style='color:black'>, should never be set to a value directly, but rather incremented 

    when you want the effect on, and decremented when you want the effect off. 

    </span></p>

  <p><span style='color:black'><b>Notable Perl subroutines</b></span></p>

  <p><span style='color:black'>If not specified these functions are in Apache::lonxml 

    </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>xmlparse</i></span><span

style='color:black'> - see the XMLPARSE figure - also not callable from inside 

    a tag, if one needs to restart parsing, either create add a new LCParser to 

    the parser stack parser using the newparser function, or call inner_xmlparser, 

    see the xmlparse function in scripttag.pm </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>recurse</i></span><span

style='color:black'> - acts just like <i>xmlparse</i></span><span

style='color:black'>, except it doesn't do the style definition check it always 

    calls <i>callsub</i></span><span style='color:black'> </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>callsub</i></span><span

style='color:black'> - callsub looks if a perl subroutine is defined for the current 

    tag and calls. Otherwise it just returns the tag as it was read in. It also 

    will throw on a default editing interface unless the tag has a defined subroutine 

    that either returns something or requests that call sub not add the editing 

    interface. </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>afterburn</i></span><span

style='color:black'> - called on the output of xmlparse, it can add highlights, 

    anchors, and links to regular expersion matches to the output. </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>register_insert</i></span><span

style='color:black'> - builds the %Apache::lonxml::insertlist structure of what 

    tags can have what other tags inside. </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>whichuser</i></span><span

style='color:black'> - returns a list of $symb, $courseid, $domain, $name that 

    is correct for calls to lonnet functions for this setup. Uses form.grade_ 

    parameters, if the user is allowed to mgr in the course </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>setup_globals</i></span><span

style='color:black'> - initializes all lonxml globals when xmlparse is called. 

    If you intend to create a new target you will likely need to tweak how the 

    globals are setup upon start up. </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>init_safespace</i></span><span

style='color:black'> - creates Holes to external functions, creates some global 

    variables, and set the permitted operators of the global Safespace intepreter. 

    </span></p>

  <p><span style='color:black'><b>Functions Tag Handlers can use</b></span></p>

  <p><span style='color:black'>If not specified these functions are in Apache::lonxml 

    </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>debug</i></span><span

style='color:black'> - a function to call to printout debugging messages. Will 

    only print when Apache::lonxml::debug is set to 1 </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>warning</i></span><span

style='color:black'> - a function to use for warning messages. The message will 

    appear at the top of a resource when it is viewed in construction space only. 

    </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>error</i></span><span

style='color:black'> - a function to use for error messages. The message will 

    appear at the top of a resource when it is viewed in construction space, and 

    will message the resource author and course instructor, while informing the 

    student that an error has occured otherwise. </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>get_all_text</i></span><span

style='color:black'> - 2 args, tag to look for (need to use /tag to look for an 

    end tag) and a HTML::TokeParser reference, it will repedelyt get text from 

    the TokeParser until the requested tag is found. It will return all of the 

    document it pulled form the TokeParser. (See Apache::scripttag::start_script 

    for an example of usage.) </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>get_param</i></span><span

style='color:black'> - 4 arguments, first is a scaler sting of the argument needed, 

    second is a reference to the parser arguments stack, third is a reference 

    to the Safe space, and fourth is an optional &quot;context&quot; value. This 

    subroutine allows a tag to get a tag argument, after being interpolated inside 

    the Safe space. This should be used if the tag might use a safe space variable 

    reference for the tag argument. (See Apache::scripttag::start_script for an 

    example.) This version only handles scalar variables. </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>get_param_var</i></span><span

style='color:black'> - 4 arguments, first is a scaler sting of the argument needed, 

    second is a reference to the parser arguments stack, third is a reference 

    to the Safe space, and fourth is an optional &quot;context&quot; value. This 

    subroutine allows a tag to get a tag argument, after being interpolated inside 

    the Safe space. This should be used if the tag might use a safe space variable 

    reference for the tag argument. (See Apache::scripttag::start_script for an 

    example.) This version can handle list or hash variables properly. </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>description</i></span><span

style='color:black'> - 1 argument, the token object. This will return the textual 

    decription of the current tag from the insertlist.tab file. </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>whichuser</i></span><span

style='color:black'> - 0 arguments. This will take a look at the current environment 

    setting and return the current $symb, $courseid, $udom, $uname. You should 

    always use this function if you want to determine who the current user is. 

    (Since a instructor might be trying to view a students version of a resource.) 

    </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>inner_xmlparse</i></span><span

style='color:black'> - 6 arguments, the target, an array pointer to the current 

    stack of tags, and array pointer to the current stack of tag arguments, an 

    array pointer to the current stack of LCParser's, a pointer to the current 

    Safe space, a pointer to the hash of current style definitions </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>newparser</i></span><span

style='color:black'> - 3 args, first is a reference to the parser stack, second 

    should be a reference to a string scaler containg the text the newparser should 

    run over, third should be a scaler of the directory path the file the parser 

    is parsing was in. (See Apache::scripttag::start_import for an example.) </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>register</i></span><span

style='color:black'> - should be called in a file's BEGIN block. 2 arguments, 

    a scaler string, and a list of strings. This allows a file to register what 

    tags it handles, and what the namespace of those tags are. Example: </span></p>

  <p><span style='font-family:"Courier New";color:black'>sub BEGIN {</span></p>

  <p><span style='font-family:"Courier New";color:black'>&nbsp; &amp;Apache::lonxml::register('Apache::scripttag',('script','display'));</span></p>

  <p><span style='font-family:"Courier New";color:black'>}</span></p>

  <p><span style='color:black'>Would tell xmlparse that in Apache::scripttag it 

    can find handlers for &lt;script&gt; and &lt;display&gt;, if one regsiters 

    a tag that was already registered the previous one is remembered and will 

    be restored on a deregister. </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>deregister</i></span><span

style='color:black'> - used to remove a previously registered tag implementation. 

    It will restore the previous registration if there was one. </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>startredirection</i></span><span

style='color:black'> - used when a tag wants to save a portion of the document 

    for its end tag to use, but wants the intervening document to be normally 

    processed. (See Apache::scripttag::start_window for an example.) </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>endredirection</i></span><span

style='color:black'> - used to stop preventing xmlparse from hiding output. The 

    return value is everthing that xmlparse has processed since the corresponding 

    startredirection. (See Apache::scripttag::end_window for an example.) </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>Apache::run::evaluate</i></span><span

style='color:black'> - 3 args, first a string, second a reference to the Safe 

    space, 3 a string to be evaluated before the first arg. This subroutine will 

    do variable interpolation and simple function interpolations on the first 

    argument. (See Apache::lonxml::inner_xmlparse for an example.) </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <i>Apache::run::run</i></span><span

style='color:black'> - 2 args, first a string, second a reference to the Safe 

    space. This handles passing the passed string into the Safe space for evaluation 

    and then returns the result. (See Apache::scripttag::start_script for an example.)</span></p>

  <h3><a name="_Toc421867122">Style Files</a></h3>

  <p><span style='color:black'> <img width=432 height=255

src="Session%20Fou1_files/image002.jpg" v:shapes="_x0000_i1025"> </span></p>

  <p><span style='font-size:14.0pt;color:black'><b>Fig. 2.4.1</b></span><span

style='font-size:14.0pt;color:black'> Ð Using a style file</span></p>

  <p><span style='color:black'><b>Style File specific tags</b></span></p>

  <p><span style='color:black'><b>&lt;definetag&gt;</b></span><span

style='color:black'> - 2 arguments, <i>name</i></span><span style='color:black'> 

    name of new tag being defined, if proceeded with a / defining an end tag, 

    required; <i>parms</i></span><span style='color:black'> parameters of the 

    new tag, the value of these parameters can be accesed by $parametername. </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <b>&lt;render&gt;</b></span><span

style='color:black'> - define what the new tag does for a non meta target </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <b>&lt;meta&gt;</b></span><span

style='color:black'> - define what the new tag does for a meta target </span></p>

  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    <b>&lt;tex&gt; / &lt;web&gt; / &lt;latexsource&gt;</b></span><span style='color:black'> 

    - define what a new tag does for a specific no meta target, all data inside 

    a &lt;render&gt; is render to all targets except when surrounded by a specific 

    target tags.</span><span style='font-size:16.0pt;color:black'> </span></p>

  <p class=MsoHeader> <img width=432 height=243

src="Session%20Fou1_files/image005.png" v:shapes="_x0000_i1026"> </p>

  <p><span style='font-size:14.0pt'><b>Fig. 2.4.2</b></span><span

style='font-size:14.0pt'> Ð The parser</span></p>

  <h3><a name="_Toc421867123">HTML::LCParser - Alternative HTML::Parser interface</a></h3>

  <p class=MsoPlainText>SYNOPSIS</p>

  <p class=MsoPlainText>&nbsp;require HTML::LCParser;</p>

  <p class=MsoPlainText>&nbsp;$p = HTML::LCParser-&gt;new(&quot;index.html&quot;) 

    || die &quot;Can't open: $!&quot;;</p>

  <p class=MsoPlainText>&nbsp;while (my $token = $p-&gt;get_token) {</p>

  <p class=MsoPlainText>&nbsp;&nbsp;&nbsp;&nbsp; #...</p>

  <p class=MsoPlainText>&nbsp;}</p>

  <p class=MsoPlainText>DESCRIPTION</p>

  <p class=MsoPlainText>The C&lt;HTML::LCParser&gt; is an alternative interface 

    to the</p>

  <p class=MsoPlainText>C&lt;HTML::Parser&gt; class.&nbsp; It is an C&lt;HTML::PullParser&gt; 

    subclass.</p>

  <p class=MsoPlainText>The following methods are available:</p>

  <p class=MsoPlainText>* $p = HTML::LCParser-&gt;new( $file_or_doc );</p>

  <p class=MsoPlainText>The object constructor argument is either a file name, 

    a file handle</p>

  <p class=MsoPlainText>object, or the complete document to be parsed.</p>

  <p class=MsoPlainText>If the argument is a plain scalar, then it is taken as 

    the name of a</p>

  <p class=MsoPlainText>file to be opened and parsed.&nbsp; If the file can't 

    be opened for</p>

  <p class=MsoPlainText>reading, then the constructor will return an undefined 

    value and $!</p>

  <p class=MsoPlainText>will tell you why it failed.</p>

  <p class=MsoPlainText>If the argument is a reference to a plain scalar, then 

    this scalar is</p>

  <p class=MsoPlainText>taken to be the literal document to parse.&nbsp; The value 

    of this</p>

  <p class=MsoPlainText>scalar should not be changed before all tokens have been 

    extracted.</p>

  <p class=MsoPlainText>Otherwise the argument is taken to be some object that 

    the</p>

  <p class=MsoPlainText>C&lt;HTML::LCParser&gt; can read() from when it needs 

    more data.&nbsp; Typically</p>

  <p class=MsoPlainText>it will be a filehandle of some kind.&nbsp; The stream 

    will be read() until</p>

  <p class=MsoPlainText>EOF, but not closed.</p>

  <p class=MsoPlainText>It also will turn attr_encoded on by default.</p>

  <p class=MsoPlainText>* $p-&gt;get_token</p>

  <p class=MsoPlainText>This method will return the next I&lt;token&gt; found 

    in the HTML document,</p>

  <p class=MsoPlainText>or C&lt;undef&gt; at the end of the document.&nbsp; The 

    token is returned as an</p>

  <p class=MsoPlainText>array reference.&nbsp; The first element of the array 

    will be a (mostly)</p>

  <p class=MsoPlainText>single character string denoting the type of this token: 

    &quot;S&quot; for start</p>

  <p class=MsoPlainText>tag, &quot;E&quot; for end tag, &quot;T&quot; for text, 

    &quot;C&quot; for comment, &quot;D&quot; for</p>

  <p class=MsoPlainText>declaration, and &quot;PI&quot; for process instructions.&nbsp; 

    The rest of the array</p>

  <p class=MsoPlainText>is the same as the arguments passed to the corresponding 

    HTML::Parser</p>

  <p class=MsoPlainText>v2 compatible callbacks (see L&lt;HTML::Parser&gt;).&nbsp; 

    In summary, returned</p>

  <p class=MsoPlainText>tokens look like this:</p>

  <p class=MsoPlainText>&nbsp; [&quot;S&quot;,&nbsp; $tag, $attr, $attrseq, $text, 

    $line]</p>

  <p class=MsoPlainText>&nbsp; [&quot;E&quot;,&nbsp; $tag, $text, $line]</p>

  <p class=MsoPlainText>&nbsp; [&quot;T&quot;,&nbsp; $text, $is_data, $line]</p>

  <p class=MsoPlainText>&nbsp; [&quot;C&quot;,&nbsp; $text, $line]</p>

  <p class=MsoPlainText>&nbsp; [&quot;D&quot;,&nbsp; $text, $line]</p>

  <p class=MsoPlainText>&nbsp; [&quot;PI&quot;, $token0, $text, $line]</p>

  <p class=MsoPlainText>where $attr is a hash reference, $attrseq is an array 

    reference and</p>

  <p class=MsoPlainText>the rest are plain scalars.</p>

  <p class=MsoPlainText>* $p-&gt;unget_token($token,...)</p>

  <p class=MsoPlainText>If you find out you have read too many tokens you can 

    push them back,</p>

  <p class=MsoPlainText>so that they are returned the next time $p-&gt;get_token 

    is called.</p>

  <p class=MsoPlainText>* $p-&gt;get_tag( [$tag, ...] )</p>

  <p class=MsoPlainText>This method returns the next start or end tag (skipping 

    any other</p>

  <p class=MsoPlainText>tokens), or C&lt;undef&gt; if there are no more tags in 

    the document.&nbsp; If</p>

  <p class=MsoPlainText>one or more arguments are given, then we skip tokens until 

    one of the</p>

  <p class=MsoPlainText>specified tag types is found.&nbsp; For example:</p>

  <p class=MsoPlainText>&nbsp;&nbsp; $p-&gt;get_tag(&quot;font&quot;, &quot;/font&quot;);</p>

  <p class=MsoPlainText>will find the next start or end tag for a font-element.</p>

  <p class=MsoPlainText>The tag information is returned as an array reference 

    in the same form</p>

  <p class=MsoPlainText>as for $p-&gt;get_token above, but the type code (first 

    element) is</p>

  <p class=MsoPlainText>missing. A start tag will be returned like this:</p>

  <p class=MsoPlainText>&nbsp; [$tag, $attr, $attrseq, $text]</p>

  <p class=MsoPlainText>The tagname of end tags are prefixed with &quot;/&quot;, 

    i.e. end tag is</p>

  <p class=MsoPlainText>returned like this:</p>

  <p class=MsoPlainText>&nbsp; [&quot;/$tag&quot;, $text]</p>

  <p class=MsoPlainText>* $p-&gt;get_text( [$endtag] )</p>

  <p class=MsoPlainText>This method returns all text found at the current position. 

    It will</p>

  <p class=MsoPlainText>return a zero length string if the next token is not text.&nbsp; 

    The</p>

  <p class=MsoPlainText>optional $endtag argument specifies that any text occurring 

    before the</p>

  <p class=MsoPlainText>given tag is to be returned. All entities are unmodified.</p>

  <p class=MsoPlainText>The $p-&gt;{textify} attribute is a hash that defines 

    how certain tags can</p>

  <p class=MsoPlainText>be treated as text.&nbsp; If the name of a start tag matches 

    a key in this</p>

  <p class=MsoPlainText>hash then this tag is converted to text.&nbsp; The hash 

    value is used to</p>

  <p class=MsoPlainText>specify which tag attribute to obtain the text from.&nbsp; 

    If this tag</p>

  <p class=MsoPlainText>attribute is missing, then the upper case name of the 

    tag enclosed in</p>

  <p class=MsoPlainText>brackets is returned, e.g. &quot;[IMG]&quot;.&nbsp; The 

    hash value can also be a</p>

  <p class=MsoPlainText>subroutine reference.&nbsp; In this case the routine is 

    called with the</p>

  <p class=MsoPlainText>start tag token content as its argument and the return 

    value is treated</p>

  <p class=MsoPlainText>as the text.</p>

  <p class=MsoPlainText>The default $p-&gt;{textify} value is:</p>

  <p class=MsoPlainText>&nbsp; {img =&gt; &quot;alt&quot;, applet =&gt; &quot;alt&quot;}</p>

  <p class=MsoPlainText>This means that &lt;IMG&gt; and &lt;APPLET&gt; tags are 

    treated as text, and that</p>

  <p class=MsoPlainText>the text to substitute can be found in the ALT attribute.</p>

  <p class=MsoPlainText>* $p-&gt;get_trimmed_text( [$endtag] )</p>

  <p class=MsoPlainText>Same as $p-&gt;get_text above, but will collapse any sequences 

    of white</p>

  <p class=MsoPlainText>space to a single space character.&nbsp; Leading and trailing 

    white space is</p>

  <p class=MsoPlainText>removed.</p>

  <p class=MsoPlainText>EXAMPLES</p>

  <p class=MsoPlainText>This example extracts all links from a document.&nbsp; 

    It will print one</p>

  <p class=MsoPlainText>line for each link, containing the URL and the textual 

    description</p>

  <p class=MsoPlainText>between the &lt;A&gt;...&lt;/A&gt; tags:</p>

  <p class=MsoPlainText>&nbsp; use HTML::LCParser;</p>

  <p class=MsoPlainText>&nbsp; $p = HTML::LCParser-&gt;new(shift||&quot;index.html&quot;);</p>

  <p class=MsoPlainText>&nbsp; while (my $token = $p-&gt;get_tag(&quot;a&quot;)) 

    {</p>

  <p class=MsoPlainText>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; my $url = $token-&gt;[1]{href} 

    || &quot;-&quot;;</p>

  <p class=MsoPlainText>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; my $text = $p-&gt;get_trimmed_text(&quot;/a&quot;);</p>

  <p class=MsoPlainText>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print &quot;$url\t$text\n&quot;;</p>

  <p class=MsoPlainText>&nbsp; }</p>

  <p class=MsoPlainText>This example extract the &lt;TITLE&gt; from the document:</p>

  <p class=MsoPlainText>&nbsp; use HTML::LCParser;</p>

  <p class=MsoPlainText>&nbsp; $p = HTML::LCParser-&gt;new(shift||&quot;index.html&quot;);</p>

  <p class=MsoPlainText>&nbsp; if ($p-&gt;get_tag(&quot;title&quot;)) {</p>

  <p class=MsoPlainText>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; my $title = $p-&gt;get_trimmed_text;</p>

  <p class=MsoPlainText>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print &quot;Title: $title\n&quot;;</p>

  <p class=MsoPlainText>&nbsp; }</p>

</div>

<br

clear=ALL style='page-break-before:always;'>

<div class=Section2> </div>

</body>

</html>


FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>