MAN page from Fedora 20 perl-HTML-Parser-3.71-4.fc20.x86_64.rpm


Section: User Contributed Perl Documentation (3)
Updated: 2013-03-25


HTML::Entities - Encode or decode strings with HTML entities 


 use HTML::Entities; $a = "Våre norske tegn bør &#230res"; decode_entities($a); encode_entities($a, "\200-\377");

For example, this:

 $input = "vis-a-vis Beyonce's\npapier-mache resume"; print encode_entities($input), "\n"

Prints this out:

 vis-à-vis Beyoncé's naïve papier-mâché résumé


This module deals with encoding and decoding of strings with HTMLcharacter entities. The module provides the following functions:
decode_entities( $string, ... )
This routine replaces HTML entities found in the $string with thecorresponding Unicode character. Unrecognized entities are left alone.

If multiple strings are provided as argument they are each decodedseparately and the same number of strings are returned.

If called in void context the arguments are decoded in-place.

This routine is exported by default.

_decode_entities( $string, \%entity2char )
_decode_entities( $string, \%entity2char, $expand_prefix )
This will in-place replace HTML entities in $string. The %entity2charhash must be provided. Named entities not found in the %entity2charhash are left alone. Numeric entities are expanded unless their valueoverflow.

The keys in %entity2char are the entity names to be expanded and theirvalues are what they should expand into. The values do not have to besingle character strings. If a key has ``;'' as suffix,then occurrences in $string are only expanded if properly terminatedwith ``;''. Entities without ``;'' will be expanded regardless of howthey are terminated for compatibility with how common browsers treatentities in the Latin-1 range.

If $expand_prefix is TRUE then entities without trailing ``;'' in%entity2char will even be expanded as a prefix of a longerunrecognized name. The longest matching name in %entity2char will beused. This is mainly present for compatibility with an MSIEmisfeature.

   $string = "foo&nbspbar";   _decode_entities($string, { nb => "@", nbsp => "\xA0" }, 1);   print $string;  # will print "foo bar"

This routine is exported by default.

encode_entities( $string )
encode_entities( $string, $unsafe_chars )
This routine replaces unsafe characters in $string with their entityrepresentation. A second argument can be given to specify which characters toconsider unsafe. The unsafe characters is specified using the regularexpression character class syntax (what you find within brackets in regularexpressions).

The default set of characters to encode are control chars, high-bit chars, andthe "<", "&", ">", "'" and """ characters. But this,for example, would encode just the "<", "&", ">", and """ characters:

  $encoded = encode_entities($input, '<>&"');

and this would only encode non-plain ascii:

  $encoded = encode_entities($input, '^\n\x20-\x25\x27-\x7e');

This routine is exported by default.

encode_entities_numeric( $string )
encode_entities_numeric( $string, $unsafe_chars )
This routine works just like encode_entities, except that the replacemententities are always "&#xhexnum;" and never "&entname;". Forexample, "encode_entities("r\xF4le")" returns ``r&ocirc;le'', but"encode_entities_numeric("r\xF4le")" returns ``r&#xF4;le''.

This routine is not exported by default. But you can alwaysexport it with "use HTML::Entities qw(encode_entities_numeric);"or even "use HTML::Entities qw(:DEFAULT encode_entities_numeric);"

All these routines modify the string passed as the first argument, ifcalled in a void context. In scalar and array contexts, the encoded ordecoded string is returned (without changing the input string).

If you prefer not to import these routines into your namespace, you cancall them as:

  use HTML::Entities ();  $decoded = HTML::Entities::decode($a);  $encoded = HTML::Entities::encode($a);  $encoded = HTML::Entities::encode_numeric($a);

The module can also export the %char2entity and the %entity2charhashes, which contain the mapping from all characters to thecorresponding entities (and vice versa, respectively). 


Copyright 1995-2006 Gisle Aas. All rights reserved.

This library is free software; you can redistribute it and/ormodify it under the same terms as Perl itself.




This document was created byman2html,using the manual pages.