SEARCH
NEW RPMS
DIRECTORIES
ABOUT
FAQ
VARIOUS
BLOG
DONATE




YUM REPOSITORY

 
 

MAN page from RedHat 7.X perl-XML-Twig-1.10-8.i386.rpm

Twig

Section: User Contributed Perl Documentation (3)
Updated: perl v5.6.0
Index 

NAME

XML::Twig - A perl module for processing huge XML documents in tree mode. 

SYNOPSIS

    single-tree mode            my $t= new XML::Twig();        $t->parse( '<doc><para>para1</para></doc>');        $t->print;
    chunk mode         my $t= new XML::Twig( TwigHandlers => { section => \&flush});        $t->parsefile( 'doc.xml');        $t->flush;        sub flush { $_[0]->flush; }
 

DESCRIPTION

This module provides a way to process XML documents. It is build on topof XML::Parser.

The module offers a tree interface to the document, while allowing to output the parts of it that have been completely processed.

It allows minimal resource (CPU and memory) usage by building the treeonly for the parts of the documents that need actual processing, through the use of the TwigRoots and TwigPrintOutsideRoots options. The finish and finish_print methods also help to increase performances.

XML::Twig tries to make simple things easy so it tries its best to takes care of a lot of the (usually) annoying (but sometimes necessary) features that come with XML and XML::Parser:

Whitespaces
Whitespaces that look non-significant are discarded, this behaviour can be controlled using the KeepSpaces, KeepSpacesIn and DiscardSpacesIn options.
Encoding
You can specify that you want the output in the same encoding as the input(provided you have valid XML, which means you have to specify the encodingeither in the document or when you create the Twig object) using the KeepEncodingoption
 

METHODS

 

Twigs

A twig is a subclass of XML::Parser, so all XML::Parser methods can be usedon one, including parse and parsefile.setHandlers on the other hand cannot not be used,see `the BUGS entry elsewhere in this document'
new
This is a class method, the constructor for XML::Twig. Options are passedas keyword value pairs. Recognized options are the same as XML::Parser,plus some XML::Twig specifics:

- TwigHandlers
This argument replaces the corresponding XML::Parser argument. It consistsof a hash { gi => \&handler} A gi (generic identifier) is just a tag name by the way.When an element is CLOSED the corresponding handler is called, with 2 arguments,the twig and the `the Element entry elsewhere in this document'. The twig includes the document tree taht has been built so far, the element is the complete sub-tree for the element.Text is stored in elements which gi is #PCDATA (due to mixed content, textand sub-element in an element there is no way to store the text as just anattribute of the enclosing element).A special gi _all_ is used to call a function for each element. The special gi_default_ is used to call a handler for each element that does NOT have a specific handler.
TwigRoots
This argument let's you build the tree only for those elements you are interestedin.

  Example: my $t= new XML::Twig( TwigRoots => { title => 1, subtitle => 1});           $t->parsefile( file);
returns a twig containing a document including only title and subtitle elements,as children of the root element.

This feature is still in ALPHA mode but it is quite powerfull (see benchmarks).

WARNING: TwigRoots elements should NOT be nested, that would hopelessly confuseXML::Twig ;--(

TwigPrintOutsideRoots
To be used in conjunction with the TwigRoots argument. When set to a true value this will print the document outside of the TwigRoots elements.

 Example: my $t= new XML::Twig( TwigRoots =>                                   { title    => 1 },                                 TwigPrintOutsideRoots => 1,                                 TwigHandlers =>                                    { title    => \&number_title },                               );           $t->parsefile( file);           { my $nb;           sub number_title             { my( $twig, $title);               $nb++;               $title->prefix( "$nb "; }               $title->print;             }           }
This example prints the document outside of the title element, calls number_title foreach title element, prints it, and then resumes printing the document. The twig is builtonly for the title elements.

This feature is still in ALPHA mode but it is quite powerfull (see benchmarks).

- LoadDTD
If this argument is set to a true value, parse or parsefile on the twig will load the DTD information. This information can then be accessed through the twig, in a DTDHandler for example. This will load even an external DTD.

See the DTD Handling entry elsewhere in this document for more information

- DTDHandler
Sets a handler that will be called once the doctype (and the DTD) have been loaded,with 2 arguments, the twig and the DTD.

-item - StartTagHandlers

A hash { gi => \&handler}. Sets element handlers that are called when the element is open (at the end of the XML::Parser Start handler). THe handlers are called with2 params: the twig and the element. The element is empty at that point, its attributes are created though.

WARNING: StartTagHandlers are NOT called outside ot TwigRoots if that argumentis used.

A special gi _all_ is used to call a function for each tag, just as an XML::Parser Start handler would be.

The main use for those handlers is probably to create temporary attributes that will be used when processing the element with the normal TwigHanlder.

You should also use it to change tags if you use flush. If you change the tag in aregular TwigHanlder then the start tag might already have been flushed.

-item - CharHandler

A reference to a subroutine that will be called every time PCDATA.

-item KeepEncoding

This is a (slightly?) evil option: if the XML document is not UTF-8 encoded andyou want to keep it that way, then setting KeepEncoding will use the Expatoriginal_string method for character, thus keeping the original encoding, as well as the original entities in the strings.

WARNING: attribute values will NOT keep their encoding (they will be convertedto UTF8).

WARNING: this option is NOT used when parsing with the non-blocking parser (parse_start, parse_more, parse_done methods).

- Id
This optional argument gives the name of an attribute that can be used asan ID in the document. Elements whose ID is known can be accessed throughthe elt_id method. Id defaults to 'id'.See `the BUGS entry elsewhere in this document'
- DiscardSpaces
If this optional argument is set to a true value then spaces are discardedwhen they look non-significant: strings containing only spaces are discarded.This argument is set to true by default.
- KeepSpaces
If this optional argument is set to a true value then all spaces in thedocument are kept, and stored as PCDATA.KeepSpaces and DiscardSpaces cannot be both set.
- DiscardSpacesIn
This argument sets KeepSpaces to true but will cause the twig builder todiscard spaces in the elements listed.The syntax for using this argument is: new XML::Twig( DiscardSpacesIn => [ 'elt1', 'elt2']);
- KeepSpacesIn
This argument sets DiscardSpaces to true but will cause the twig builder tokeep spaces in the elements listed.The syntax for using this argument is: new XML::Twig( KeepSpacesIn => [ 'elt1', 'elt2']);

- root
Returns the root element of a twig
- elt_id ($id)
Returns the element whose id attribute is $id
- entity_list
Returns the entity list of a twig
- change_gi ($old_gi, $new_gi)
Performs a (very fast) global change. All elements old_gi are now new_gi.See `the BUGS entry elsewhere in this document'
- flush OPTIONAL_FILEHANDLE OPTIONNAL_OPTIONS
Flushes a twig up to (and including) the current element, then deletesall unnecessary elements from the tree that's kept in memory.flush keeps track of which elements need to be open/closed, so if youflush from handlers you don't have to worry about anything. Just keep flushing the twig every time you're done with a sub-tree and it willcome out well-formed. After the whole parsing don't forget to flushone more time to print the end of the document.The doctype and entity declarations are also printed.

Use the Update_DTD option if you have updated the (internal) DTD and/or the entity list and you want the updated DTD to be output

   Example: $t->flush( Update_DTD => 1);            $t->flush( \*FILE, Update_DTD => 1);            $t->flush( \*FILE);
flush take an optional filehandle as an argument.
- purge
Does the same as a flush except it does not print the twig. It just deletesall elements that have been completely parsed so far.
- print OPTIONNAL_FILEHANDLE OPTIONNAL_OPTIONS
Prints the whole document associated with the twig. To be used only AFTER theparse.

OPTIONNAL_OPTIONS: see flush.

- sprint OPTIONNAL_OPTIONS
Returns the text of the whole document associated with the twig. To be used onlyAFTER the parse.

OPTIONNAL_OPTIONS: see flush.

- print_prolog OPTIONNAL_FILEHANDLE OPTIONNAL_OPTIONS
Prints the prolog (XML declaration + DTD + entity declarations) of a document.

OPTIONNAL_OPTIONS: see flush.

- prolog OPTIONNAL_FILEHANDLE OPTIONNAL_OPTIONS
Returns the prolog (XML declaration + DTD + entity declarations) of a document.

OPTIONNAL_OPTIONS: see flush.

- finish
Call Expat finish method.Unsets all handlers (including internal ones that set context), but expatcontinues parsing to the end of the document or until it finds an error.It should finish up a lot faster than with the handlers set.
- finish_print
Stop twig processing, flush the twig and proceed to finish printing the document asfast as possible. Use this method when modifying a document and the modification is done.
- depth
Calls Expat's depth method , which returns the depth in the tree during the parsing. This is usefull when using the TwigRoots option to still get info on the actual document.
- in_element(NAME)
Call Expat in_element method.Returns true if NAME is equal to the name of the innermost currently openedelement. If namespace processing is being used and you want to checkagainst a name that may be in a namespace, then use the generate_ns_namemethod to create the NAME argument. Usefull when using the TwigRoots option.
- within_element(NAME)
Call Expat within_element method.Returns the number of times the given name appears in the context list.If namespace processing is being used and you want to checkagainst a name that may be in a namespace, then use the generate_ns_namemethod to create the NAME argument. Usefull when using the TwigRoots option.
- parse(SOURCE [, OPT => OPT_VALUE [...]])
This method is inherited from XML::Parser.The SOURCE parameter should either be a string containing the whole XMLdocument, or it should be an open IO::Handle. Constructor options toXML::Parser::Expat given as keyword-value pairs may follow the SOURCEparameter. These override, for this call, any options or attributes passedthrough from the XML::Parser instance.

A die call is thrown if a parse error occurs. Otherwise it will return 1or whatever is returned from the Final handler, if one is installed.In other words, what parse may return depends on the style.

- parsestring
This is just an alias for parse for backwards compatibility.
- parsefile(FILE [, OPT => OPT_VALUE [...]])
This method is inherited from XML::Parser.Open FILE for reading, then call parse with the open handle. The fileis closed no matter how parse returns. Returns what parse returns.
 

Element


- new ($gi, @content)
The gi is optionnal (but then you can't have a content ), the contentcan be just a string or a list of strings and element.

 Examples: my $elt1= new XML::Twig::Elt();           my $elt2= new XML::Twig::Elt( 'para');             my $elt3= new XML::Twig::Elt( 'para', 'this is a para');             my $elt4= new XML::Twig::Elt( 'para', $elt3, 'another para');
The strings are not parsed, the element is not attached to any twig.
- parse ($string, %args)
Creates an element from an XML string. The string is actuallyparsed as a new twig, then the root of that twig is returned.The arguments in %args are passed to the twig.As always if the parse fails the parser will die, so use aneval if you want to trap syntax errors.
- set_gi ($gi)
Sets the gi of an element
- gi
Returns the gi of the element
- is_pcdata
Returns 1 if the element is a #PCDATA one, returns 0 otherwise.
- is_cdata
Returns 1 if the element is a #CDATA one, returns 0 otherwise.
- closed
Returns true if the element has been closed. Might be usefull if you aresomewhere in the tree, during the parse, and have no idea whether a parentelement is completely loaded or not.
- is_pcdata
Returns true if the element is a PCDATA (if it's gi is '#PCDATA')
- pcdata
Returns the text of a PCDATA element or undef
- set_pcdata ($text)
Sets the text of a PCDATA element.
- append_pcdata ($text)
Add the text at the end of a #PCDATA element.
- is_cdata
Returns true if the element is a CDATA (if it's gi is '#CDATA')
- cdata
Returns the text of a CDATA element or undef
- set_cdata ($text)
Sets the text of a CDATA element.
- append_cdata ($text)
Add the text at the end of a #CDATA element.
- root
Returns the root of the twig containing the element
- twig
Returns the twig containing the element.
- parent ($optional_gi)
Returns the parent of the element, or the first ancestor whose gi is $gi.
- first_child ($optional_gi)
Returns the first child of the element, or the first child whose gi is $gi. (ie the first of the element children whose gi matches) .
- last_child ($optional_gi)
Returns the last child of the element, or the last child whose gi is $gi. (ie the last of the element children whose gi matches) .
- prev_sibling ($optional_gi)
Returns the previous sibling of the element, or the first one whose gi is $gi.
- next_sibling ($optional_gi)
Returns the next sibling of the element, or the first one whose gi is $gi.
- atts
Returns a hash ref containing the element attributes
- set_atts ({att1=>$att1_val, att2=> $att2_val... )
Sets the element attributes with the hash supplied as argument
- del_atts
Deletes all the element attributes.
- set_att ($att, $att_value)
Sets the attribute of the element to a value
- att ($att)
Returns the attribute value
- del_att ($att)
Delete the attribute for the element
- set_id ($id)
Sets the id attribute of the element to a value.See `the elt_id entry elsewhere in this document' to change the id attribute name
- id
Gets the id attribute vakue
- del_id ($id)
Deletes the id attribute of the element and remove it from the id listfor the document
- children ($optional_gi)
Returns the list of children (optionally whose gi is $gi) of the element
- ancestors ($optional_gi)
Returns the list of ancestors (optionally whose gi is $gi) of the element
- next_elt ($optional_gi)
Returns the next elt (optionally whose gi is $gi) of the element. This is defined as the next element which opens after the current element opens.Which usually means the first child of the element.Counter-intuitive as it might look this allows you to loop through thewhole document by starting from the root.
- prev_elt ($optional_gi)
Returns the previous elt (optionally whose gi is $gi) of the element. Thisis the first element which open the current one. So it's usually eitherthe last descendant of the previous sibling or simply the parent
- level ($optionnal_gi)
Returns the depth of the element in the twig (root is 0)If the optionnal gi is given then only ancestors of the given type are counted.

WARNING: in a tree created using the TwigRoots option this will not return thelevel in the document tree, level 0 will be the document root, level 1 will be the TwigRoots elements. During the parsing (in a TwigHandler)you can use the depth method on the twig object to get the real parsing depth.

- in ($potential_parent)
Returns true if the element is in the potential_parent
- in_context ($gi, $optional_level)
Returns true if the element is included in an element whose gi is $gi,within $level levels.
- cut
Cuts the element from the tree.
- paste ($optional_position, $ref)
Pastes a (previously cut) element.The optionnal position element can be

- first_child (default)
The element is pasted as the first child of the $ref element
- last_child
The element is pasted as the last child of the $ref element
- before
The element is pasted before the $ref element, as its previous sibling
- after
The element is pasted after the $ref element, as its next sibling

- move ($optional_position, $ref)
Move an element in the treeThis is just a cut then a paste, syntax is the same as paste
- prefix ($text)
Add a prefix to an element. If the element is a PCDATA element the textis added to the pcdata, if the elements first_child is a PCDATA then thetext is added to it's pcdata, otherwise a new PCDATA element is created and pasted as the first child of the element.
- erase
Erases the element: the element is deleted and all of its children arepasted in its place.
- delete
Cut the element and frees the memory
- DESTROY
Frees the element from memory
- start_tag
Returns the string for the start tag for the element, including the /> at the end of an empty element tag
- end_tag
Returns the string for the end tag of an element, empty for an empty one.
- print OPTIONNAL_FILEHANDLE
Prints an entire element, including the tags, optionally to a FILEHANDLE
- sprint ($elt, $optional_no_enclosing_tag)
Returns the string for an entire element, including the tags. To be used with caution!If the optional second argument is true then only the string inside the element is returned (the start and end tag for $elt are not).
- text
Returns a string consisting of all the PCDATA and CDATA in an element, without the tagging
- set_text ($string)
Sets the text for the element: if the element is a PCDATA, just set itstext, otherwise cut all the children of the element and create a singlePCDATA child for it, which holds the text
- set_content (@list_of_elt_and_strings)
Sets the content for the element, from as list of strings and elements.Cuts all the element children, then pastes the list elements, creating a PCDATA element for strings.
- insert ($gi)
Inserts an element $gi as the only child of the element, all children of the element are set as children of the new element, returns the new element
private methods

set_parent ( $parent)

set_first_child ( $first_child)

set_last_child ( $last_child)

set_prev_sibling ( $prev_sibling)

set_next_sibling ( $next_sibling)

set_twig_current

del_twig_current

twig_current

flushed
This method should NOT be used, always flush the twig, not an element
set_flushed

del_flushed

flush

Those methods should not be used, unless of course you find some creative and interesting, not to mention usefull, ways to do it.

 

Entity_list


- new
Creates an entity list
- add ($ent)
Adds an entity to an entity list.
- delete ($ent or $gi).
Deletes an entity (defined by its name or by the Entity object) from the list.
- print (OPTIONAL_FILEHANDLE)
Prints the entity list
 

Entity


- new ($name, $val, $sysid, $pubid, $ndata)
Same arguments has the Entity handler for XML::Parser
- print (OPTIONNAL_FILEHANDLE)
Prints an entity declaration
- text
Returns the entity declaration text
 

EXAMPLES

See the test file in t/test[1-n].t Additional examples can be found at http://standards.ieee.org/resources/spasystem/twig/

To figure out what flush does call the following script with an
 xml file and an element name as arguments

  use XML::Twig;
  my ($file, $elt)= @ARGV;  my $t= new XML::Twig( TwigHandlers =>       { $elt => sub {$_[0]->flush; print "\n[flushed here]\n";} });  $t->parsefile( $file, ErrorContext => 2);  $t->flush;  print "\n";
 

NOTES

 

DTD Handling

3 possibilities here
- DTD
No doctype, no DTD information, no entitiy information, the world is simple...
- Internal DTD
The XML document includes an internal DTD, and maybe entity declarations

If you use the LoadDTD option when creating the twig the DTD information and the entity declarations can be accessed.

The DTD and the entity declarations will be flush'ed (or print'ed) either asis(if they have not been modified) or as reconstructed (poorly, comments are lost, order is not kept, due to it's content this DTD should not be viewed by anyone) if they have been modified. You can also modify them directly by changing the $twig->{twig_doctype}->{internal} field (straight from XML::Parser, see the Doctype handler doc)

- External DTD
The XML document includes a reference to an external DTD, and maybe entity declarations.

If you use the LoadDTD when creating the twig the DTD information and the entity declarations can be accessed. The entity declarations will be flush'ed (or print'ed) either asis (if they have not been modified) or as reconstructed (badly,comments are lost, order is not kept).

You can change the doctype through the $twig->set_doctype method and print the dtd through the $twig->dtd_text or $twig->dtd_print methods.

If you need to modify the entity list this is probably the easiest way to do it.

 

Flush

If you set handlers and use flush, do not forget to flush the twig onelast time AFTER the parsing, or you might be missing the end of the document.

Remember that element handlers are called when the element is CLOSED, soif you have handlers for nested elements the inner handlers will be calledfirst. It makes it for example trickier than it would seem to number nestedclauses. 

BUGS


- ID list
The ID list is NOT updated at the moment when ID's are modified or elements cut or deleted.
- change_gi
Does not work if you do:
     $twig->change_gi( $old1$new);
     $twig->change_gi( $old2$new);
     $twig->change_gi( $new$even_newer);
- sanity check on XML::Parser method calls
XML::Twig should really prevent calls to some XML::Parser methods, especially the setHandlers one.
 

TODO


- multiple twigs are not well supported
A number of twig features are just global at the moment. These includethe ID list and the ``gi pool'' (if you use change_gi then you change the gi for ALL twigs).

Next version will try to support these while trying not to be to hard onperformances (at least when a single twig is used!).

- XML::Parser-like handlers
Sometimes it would be nice to be able to use both XML::Twig handlers andXML::Parser handlers, for example to perform generic tasks on all opentags, like adding an ID, or taking care of the autonumbering.

Next version...

 

BENCHMARKS

You can use the `benchmark_twig' file to do additional benchmarks.Please send me benchmark information for additional systems. 

AUTHOR

Michel Rodriguez <m.v.rodriguezAATTieee.org>

This library is free software; you can redistribute it and/or modifyit under the same terms as Perl itself.

Bug reports and comments to m.v.rodriguezAATTieee.org.The XML::Twig page is at http://standards.ieee.org/resources/spasystem/twig/ 

SEE ALSO

XML::Parser


 

Index

NAME
SYNOPSIS
DESCRIPTION
METHODS
Twigs
Element
Entity_list
Entity
EXAMPLES
NOTES
DTD Handling
Flush
BUGS
TODO
BENCHMARKS
AUTHOR
SEE ALSO

This document was created byman2html,using the manual pages.
 
internet katowice