MAN page from Mandrake 10.X perl-XMLTV-0.5.39-1mdk.noarch.rpm


Section: User Contributed Perl Documentation (3)
Updated: 2004-01-07


XMLTV - Perl extension to read and write TV listings in XMLTV format 


  use XMLTV;  my $data = XMLTV::parsefile('tv.xml');  my ($encoding, $credits, $ch, $progs) = @$data;  my $langs = [ 'en', 'fr' ];  print 'source of listings is: ', $credits->{'source-info-name'}, "\n"      if defined $credits->{'source-info-name'};  foreach (values %$ch) {      my ($text, $lang) = @{XMLTV::best_name($langs, $_->{'display-name'})};      print "channel $_->{id} has name $text\n";      print " language $lang\n" if defined $lang;  }  foreach (@$progs) {      print "programme on channel $_->{channel} at time $_->{start}\n";      next if not defined $_->{desc};      foreach (@{$_->{desc}}) {          my ($text, $lang) = @$_;          print "has description $text\n";          print " language $lang\n" if defined $lang;      }  }

The value of $data will be something a bit like:

  [ 'UTF-8',    { 'source-info-name' => 'Ananova', 'generator-info-name' => 'XMLTV' },    { '' => { 'display-name' => [ [ 'en',  'BBC Radio 4' ],                                                   [ 'en',  'Radio 4'     ],                                                   [ undef, '4'           ] ],                               'id' => '' },      ... },    [ { start => '200111121800', title => [ [ 'Simpsons', 'en' ] ],        channel => '' },      ... ] ]


This module provides an interface to read and write files in XMLTVformat (a TV listings format defined by xmltv.dtd). In general elementnames in the XML correspond to hash keys in the Perl data structure.You can think of this module as a bit like XML::Simple, butspecialized to the XMLTV file format.

The Perl data structure corresponding to an XMLTV file has fourelements. The first gives the character encoding used for text data,typically UTF-8 or ISO-8859-1. (The encoding value could also beundef meaning 'unknown', when the library can't work out what itis.) The second element gives the attributes of the root <tv>element, which give information about the source of the TV listings.The third element is a list of channels, each list element being ahash corresponding to one <channel> element. The fourth element issimilarly a list of programmes. More details about the data structureare given later. The easiest way to find out what it looks like is toload some small XMLTV files and use Data::Dumper to print out theresulting structure. 


Takes an XMLTV document (a string) and returns the Perl datastructure. It is assumed that the document is valid XMLTV; if notthe routine may die() with an error (although the current implementationjust warns and continues for most small errors).

The first element of the listref returned, the encoding, may varyaccording to the encoding of the input document, the versions of perland "XML::Parser" installed, the configuration of the XMLTV libraryand other factors including, but not limited to, the phase of themoon. With luck it should always be either the encoding of the inputfile or UTF-8.

Attributes and elements in the XML file whose names begin with 'x-'are skipped silently. You can use these to include information whichis not currently handled by the XMLTV format, or by this module.

Like "parse()" but takes one or more filenames instead of a stringdocument. The data returned is the merging of those file contents:the programmes will be concatenated in their original order, thechannels just put together in arbitrary order (ordering of channelsshould not matter).

It is necessary that each file have the same character encoding, ifnot, an exception is thrown. Ideally the credits information wouldalso be the same between all the files, since there is no obvious way tomerge it - but if the credits information differs from one file to thenext, one file is picked arbitrarily to provide credits and a warningis printed. If two files give differing channel definitions for thesame XMLTV channel id, then one is picked arbitrarily and a warningis printed.

In the simple case, with just one file, you needn't worryabout mismatching of encodings, credits or channels.

The deprecated function "parsefile()" is a wrapper allowing just onefilename.

parse_callback(document, encoding_callback, credits_callback, channel_callback, programme_callback)
An alternative interface. Whereas "parse()" reads the whole documentand then returns a finished data structure, with this routine youspecify a subroutine to be called as each <channel> element is readand another for each <programme> element.

The first argument is the document to parse. The remaining argumentsare code references, one for each part of the document.

The callback for encoding will be called once with a string giving theencoding. In present releases of this module, it is also possible forthe value to be undefined meaning 'unknown', but it's hoped thatfuture releases will always be able to figure out the encoding used.

The callback for credits will be called once with a hash reference.For channels and programmes, the appropriate function will be calledzero or more times depending on how many channels / programmes arefound in the file.

The four subroutines will be called in order, that is, the encodingand credits will be done before the channel handler is called and allthe channels will be dealt with before the first programme handler iscalled.

If any of the code references is undef, nothing is called for that partof the file.

For backwards compatibility, if the value for 'encoding callback' isnot a code reference but a scalar reference, then the encoding foundwill be stored in that scalar. Similarly if the 'credits callback'is a scalar reference, the scalar it points to will be set to pointto the hash of credits. This style of interface is deprecated: newcode should just use four callbacks.

For example:

    my $document = '<tv>...</tv>';

    my $encoding;    sub encoding_cb( $ ) { $encoding = shift }

    my $credits;    sub credits_cb( $ ) { $credits = shift }

    # The callback for each channel populates this hash.    my %channels;    sub channel_cb( $ ) {        my $c = shift;        $channels{$c->{id}} = $c;    }

    # The callback for each programme.  We know that channels are    # always read before programmes, so the %channels hash will be    # fully populated.    #    sub programme_cb( $ ) {        my $p = shift;        print "got programme: $p->{title}->[0]->[0]\n";        my $c = $channels{$p->{channel}};        print 'channel name is: ', $c->{'display-name'}->[0]->[0], "\n";    }

    # Let's go.    XMLTV::parse_callback($document, \&encoding_cb, \&credits_cb,                          \&channel_cb, \&programme_cb);
parsefiles_callback(encoding_callback, credits_callback, channel_callback, programme_callback, filenames...)
As "parse_callback()" but takes one or more filenames to open,merging their contents in the same manner as "parsefiles()". Notethat the reading is still gradual - you get the channels andprogrammes one at a time, as they are read.

Note that the same <channel> may be present in more than one file, sothe channel callback will get called more than once. It's yourresponsibility to weed out duplicate channel elements (since writingthem out again requires that each have a unique id).

For compatibility, there is an alias "parsefile_callback()" which isthe same but takes only a single filename, before the callbackarguments. This is deprecated.

write_data(data, options...)
Takes a data structure and writes it as XML to standard output. Anyextra arguments are passed on to XML::Writer's constructor, for example

    my $f = new IO::File '>out.xml'; die if not $f;    write_data($data, OUTPUT => $f);

The encoding used for the output is given by the first element of thedata.

Normally, there will be a warning for any Perl data which is notunderstood and cannot be written as XMLTV, such as strange keys inhashes. But as an exception, any hash key beginning with anunderscore will be skipped over silently. You can store 'internal useonly' data this way.

If a programme or channel hash contains a key beginning with 'debug',this key and its value will be written out as a comment inside the<programme> or <channel> element. This lets you include smalldebugging messages in the XML output.

best_name(languages, pairs [, comparator])
The XMLTV format contains many places where human-readable text isgiven an optional 'lang' attribute, to allow mixed languages. This isrepresented in Perl as a pair [ text, lang ], although the secondelement may be missing or undef if the language is unknown. Whenseveral alernatives for an element (such as <title>) can be given, therepresentation is a list of [ text, lang ] pairs. Given such a list,what is the best text to use? It depends on the user's preferredlanguage.

This function takes a list of acceptable languages and a list of [string,language] pairs, and finds the best one to use. This means first findingthe appropriate language and then picking the 'best' string in thatlanguage.

The best is normally defined as the first one found in a usablelanguage, since the XMLTV format puts the most canonical versionsfirst. But you can pass in your own comparison function, for exampleif you want to choose the shortest piece of text that is in anacceptable language.

The acceptable languages should be a reference to a list of languagecodes looking like 'ru', or like 'de_DE'. The text pairs should be areference to a list of pairs [ string, language ]. (As a special caseif this list is empty or undef, that means no text is present, and theresult is undef.) The third argument if present should be a cmp-stylefunction that compares two strings of text and returns 1 if the firstargument is better, -1 if the second better, 0 if they're equallygood.

Returns: [s, l] pair, where s is the best of the strings to use and lis its language. This pair is 'live' - it is one of those from thelist passed in. So you can use "best_name()" to find the best pairfrom a list and then modify the content of that pair.

(This routine depends on the "Lingua::Preferred" module beinginstalled; if that module is missing then the first availablelanguage is always chosen.)


    my $langs = [ 'de', 'fr' ]; # German or French, please

    # Say we found the following under $p->{title} for a programme $p.    my $pairs = [ [ 'La CitE des enfants perdus', 'fr' ],                  [ 'The City of Lost Children', 'en_US' ] ];

    my $best = best_name($langs, $pairs);    print "chose title $best->[0]\n";
list_channel_keys(), list_programme_keys()
Some users of this module may wish to enquire at runtime about whichkeys a programme or channel hash can contain. The data in the hashcomes from the attributes and subelements of the corresponding elementin the XML. The values of attributes are simply stored as strings,while subelements are processed with a handler which may return acomplex data structure. These subroutines returns a hash mapping keyto handler name and multiplicity. This lets you know what data typescan be expected under each key. For keys which come from attributesrather than subelements, the handler is set to 'scalar', just as forsubelements which give a simple string. See ``DATA STRUCTURE'' fordetails on what the different handler names mean.

It is not possible to find out which keys are mandatory and whichoptional, only a list of all those which might possibly be present.An example use of these routines is the tv_grep(1) program, whichcreates its allowed command line arguments from the names of programmesubelements.

catfiles(w_args, filename...)
Concatenate several listings files, writing the output to somewherespecified by "w_args". Programmes are catenated together, channelsare merged, for credits we just take the first and warn if the othersdiffer.

The first argument is a hash reference giving information to pass to"XMLTV::Writer"'s constructor. But do not specify encoding, thiswill be taken from the input files. Currently "catfiles()" will failwork if the input files have different encodings.

cat(data, ...)
Concatenate (and merge) listings data. Programmes are catenatedtogether, channels are merged, for credits we just take the first andwarn if the others differ (except that the 'date' of the result is thelatest date of all the inputs).

Whereas "catfiles()" reads and writes files, this function takesalready-parsed listings data and returns some more listings data. Itis much more memory-hungry.

Like "cat()" but ignores the programme data and just returnsencoding, credits and channels. This is in case for scalabilityreasons you want to handle programmes individually, but stillmerge the smaller data.


For completeness, we describe more precisely how channels andprogrammes are represented in Perl. Each element of the channels listis a hashref corresponding to one <channel> element, and likewise forprogrammes. The possible keys of a channel (programme) hash are thenames of attributes or subelements of <channel> (<programme>).

The values for attributes are not processed in any way; an attribute"fred="jim"" in the XML will become a hash element with key 'fred',value 'jim'.

But for subelements, there is further processing needed to turn theXML content of a subelement into Perl data. What is done depends onwhat type of data is stored under that subelement. Also, if a certainelement can appear several times then the hash key for that elementpoints to a list of values rather than just one.

The conversion of a subelement's content to and from Perl data isdone by a handler. The most common handler is with-lang, used forhuman-readable text content plus an optional 'lang' attribute. Thereare other handlers for other data structures in the file format.Often two subelements will share the same handler, since they hold thesame type of data. The handlers defined are as follows; note thatmany of them will silently strip leading and trailing whitespace inelement content. Look at the DTD itself for an explanation of thewhole file format.

Unless specified otherwise, it is not allowed for an element expectedto contain text to have empty content, nor for the text to containnewline characters.

Turns a list of credits (for director, actor, writer, etc.) into ahash mapping 'role' to a list of names. The names in each role arekept in the same order.
Reads and writes a simple string as the content of the XML element.
Converts the content of a <length> element into a number of seconds(so <length units=``minutes''>5</minutes> would be returned as 300). Onwriting out again tries to convert a number of seconds to a time inminutes or hours if that would look better.
The representation in Perl of XMLTV's odd episode numbers is as apair of [ content, system ]. As specified by the DTD, if the system isnot given in the file then 'onscreen' is assumed. Whitespace in the'xmltv_ns' system is unimportant, so on reading it is normalized toa single space on either side of each dot.
The <video> section is converted to a hash. The <present> subelementcorresponds to the key 'present' of this hash, 'yes' and 'no' areconverted to Booleans. The same applies to <colour>. The content ofthe <aspect> subelement is stored under the key 'aspect'. These keyscan be missing in the hash just as the subelements can be missing inthe XML.
This is similar to video. <present> is a Boolean value, whilethe content of <stereo> is stored unchanged.
The 'start' and 'channel' attributes are converted to keys in a hash.
The content of the element is ignored: it signfies something by itsvery presence. So the conversion from XML to Perl is a constant truevalue whenever the element is found; the conversion from Perl to XMLis to write out the element if true, don't write anything if false.
The 'type' attribute and the 'language' subelement (both optional)become keys in a hash. But see language for what to pass as thevalue of that element.
The rating is represented as a tuple of [ rating, system, icons ].The last element is itself a listref of structures returned by theicon handler.
In XML this is a string 'X/Y' plus a list of icons. In Perl representedas a pair [ rating, icons ] similar to rating.
An icon in XMLTV files is like the <img> element in HTML. It isrepresented in Perl as a hashref with 'src' and optionally 'width'and 'height' keys.
In XML something like title can be either <title>Foo</title>or <title lang=``en''>Foo</title>. In Perl these are stored as[ 'Foo' ] and [ 'Foo', 'en' ]. For the former [ 'Foo', undef ]would also be okay.

This handler also has two modifiers which may be added to the nameafter '/'. /e means that empty text is allowed, and will bereturned as the empty tuple [], to mean that the element is presentbut has no text. When writing with /e, undef will also beunderstood as present-but-empty. You cannot however specify alanguage if the text is empty.

The modifier /m means that the text is allowed to span multiplelines.

So for example with-lang/em is a handler for text with language,where the text may be empty and may contain newlines. Note that thewith-lang-or-empty of earlier releases has been replaced bywith-lang/e.

Now, which handlers are used for which subelements (keys) of channelsand programmes? And what is the multiplicity (should you expect asingle value or a list of values)?

The following tables map subelements of <channel> and of <programme>to the handlers used to read and write them. Many elements have theirown handler with the same name, and most of the others usewith-lang. The third column specifies the multiplicity of theelement: * (any number) will give a list of values in Perl, +(one or more) will give a nonempty list, ? (maybe one) will give ascalar, and 1 (exactly one) will give a scalar which is not undef. 

Handlers for <channel>

display-name, with-lang, +
icon, icon, *
url, scalar, *

Handlers for <programme>

title, with-lang, +
sub-title, with-lang, *
desc, with-lang/m, *
credits, credits, ?
date, scalar, ?
category, with-lang, *
language, with-lang, ?
orig-language, with-lang, ?
length, length, ?
icon, icon, *
url, scalar, *
country, with-lang, *
episode-num, episode-num, *
video, video, ?
audio, audio, ?
previously-shown, previously-shown, ?
premiere, with-lang/em, ?
last-chance, with-lang/em, ?
new, presence, ?
subtitles, subtitles, *
rating, rating, *
star-rating, star-rating, ?

At present, no parsing or validation on dates is done because datesmay be partially specified in XMLTV. For example '2001' means thatthe year is known but not the month, day or time of day. Maybe in thefuture dates will be automatically converted to and fromDate::Manip objects. For now they just use the scalar handler.Similar remarks apply to URLs. 


When reading a file you have the choice of using "parse()" to gulpthe whole file and return a data structure, or using"parse_callback()" to get the programmes one at a time, althoughchannels and other data are still read all at once.

There is a similar choice when writing data: the "write_data()"routine prints a whole XMLTV document at once, but if you want towrite an XMLTV document incrementally you can manually create an"XMLTV::Writer" object and call methods on it. Synopsis:

  use XMLTV;  my $w = new XMLTV::Writer();  $w->comment("Hello from XML::Writer's comment() method");  $w->start({ 'generator-info-name' => 'Example code in pod' });  my %ch = (id => 'test-channel', 'display-name' => [ [ 'Test', 'en' ] ]);  $w->write_channel(\%ch);  my %prog = (channel => 'test-channel', start => '200203161500',              title => [ [ 'News', 'en' ] ]);  $w->write_programme(\%prog);  $w->end();

XMLTV::Writer inherits from XML::Writer, and provides the following extraor overridden methods:

new(), the constructor
Creates an XMLTV::Writer object and starts writing an XMLTV file, printingthe DOCTYPE line. Arguments are passed on to XML::Writer's constructor,except that the 'encoding' key if present gives the XML character encoding.For example:

my $w = new XMLTV::Writer(encoding => 'ISO-8859-1');

If encoding is not specified, XML::Writer's default is used(currently UTF-8).

Write the start of the <tv> element. Parameter is a hashref which givesthe attributes of this element.
Write several channels at once. Parameter is a reference to a hashmapping channel id to channel details. They will be written sortedby id, which is reasonable since the order of channels in an XMLTVfile isn't significant.
Write a single channel. You can call this routine if you want, butmost of the time "write_channels()" is a better interface.
Write details for a single programme as XML.
Say you've finished writing programmes. This ends the <tv> elementand the file.


Ed Avis, 


The file format is defined by the DTD xmltv.dtd, which is included inthe xmltv package along with this module. It should be installed inyour system's standard place for SGML and XML DTDs.

The xmltv package has a web page at<> which carriesinformation about the file format and the various tools and apps whichare distributed with this module.



Handlers for <channel>
Handlers for <programme>

This document was created byman2html,using the manual pages.