MAN page from PLD 2UTF-1.22-7.i386.rpm


Section: User Manuals (1)
Updated: 27 September 2000


2UTF, fromUTF - translates legacy char-sets to/from unicode(7),decodes MIME messages 


[2UTF|fromUTF][options] [charmap_file_or_alias]<input >output 


2UTFis a filter which converts legacy char-sets tounicode(7)(UCS - Universal Character Set) and reverse ifpossible. It can also display char-maps,linux console font and a range ofunicode(7)glyphs inutf-8(7)encoding.2UTFusesiconv(3)library, but it still can get char-map for single-byte legacy char-sets fromfrom tables found at database, or other similar files with user defined format.It can invoke external filters specified in configuration file.

charmap_file_or_aliasis pathname, alias or filename for the file with char-map definition.Alias or filenameis converted to uppercase. Aliases specified inlocaleschar-mapsare cached in special file if directory/file permissions allow this.If exact match for alias or filename isn't found,*alias*or*filename*glob pattern is used (except for aliases found in configuration file).

-(hyphen-minus) and_(low line, spacing underscore) characters in aliases are ignored.

       When invoked without char-map specified a mail message is assumed
on standard input. I donotprovide any warranty and recommend you to backup your mail if you use2UTFas automatic mail filter. Seeprocmailrcfile inexamplessubdirectory. It should handle commontext and multi-part types, MIME style encoded and plain non-standard 8-bit headers. Everythingis converted toutf-8(7)encoding.Messages with MIME stylepgp(1)signatures should be passed untouched. 


stops option checking for the rest of the command line.
-2 --UCS-2 --ucs-2
Outdated option.output (input if--reversespecified) 2 byte wide characters.
-4 --UCS-4 --ucs-4
Outdated option.output (input if--reversespecified) 4 byte wide characters.
-8 --UTF-8 --utf-8
(default)output (input if--reversespecified) multi-byteutf-8(7)characters.
-cFILENAME --charmap-file=FILENAME
the alternative way to specify filename or pathname for char-map file.
-C --create-aliases
rescans all available char-map files and (re)creates aliases file.You should have write permission for this file.
-d[N] --debug[=N]
Outputs debugging info to stderr. Implies--verbose.Nis debug level from 1 to 9. Default is 1.
-e --encode-headers
Reencode back decoded RFC-2047 MIME words in headers. Can only be used with--iconv=only.
-f[FORMAT] --format[=FORMAT]
sscanf(3)format string for reading char-map file. If not specified the default format as output by2UTF -h(used by char-maps fromWG15locales) is assumed. Aliases specified in locale char-map files are recognized.Lines beginning with%or#and lines not matching formatare ignored. In case of duplicated lines the last line takes precedence." 0x%x 0x%X "sscanf(3)format string is always assumedfor char-map files ending in.TXT,.Xor.x.This corresponds to char-maps you can get from
-o --forward
(default)converts *to* unicode(7)if invoked as2UTF,and tries convert *from*unicode(7)if invoked asfromUTF.
-H --HTML --html
This applies to approximations when converting fromunicode(7).Special HTML characters appearing after approximations are changedto &lt; &gt; &quot; and &amp;.
-h -? --? --help -help
Prints the program's version number, default parameters,and a short usage message to the program's standard error output and exits.
-i only --iconv=only
Don't read configuration file, don't use built in charmap paths and use onlyiconv(3)for conversion.
-i first --iconv=first
Attempt to useiconv(3)before charmap files for conversion. Internal approximations are always used when output char-set is 'US-ASCII'.
-i last --iconv=last
Attempt to use charmap files beforeiconv(3)for conversion.
-l --list-charmaps
Lists char-maps and aliases currently in aliases database, then exits.This includes only char-maps usable by2UTF.
-p --pathnames
Prints pathnames for configuration file, default compiled-in directories forchar-map files, actually used directories for char-map files,pathname for aliases cache.
-r --reverse
tries convert back *from*unicode(7)if invoked as2UTF,and converts *to* unicode(7)if invoked asfromUTF.
-W --show-charmap
outputs table of char-map characters inutf-8(7)encoding. .(period) is substituted for 0x0000-0x001F and 0x007F.?(question mark) is substituted for undefined characters.
-S --spit-glyphs
outputs table of characters inutf-8(7)encoding at F000-F1FFunicode(7)private use area. This corresponds to current console font in Linux.
-S[min][-][max] --spit-glyphs=[min][-][max]
outputs table of characters inutf-8(7)encoding at given range.minandmaxisunicode(7)hex numbers from 0 to 7FFFFFFF.mindefaults to 0.maxdefaults tomin+ 511 if-is present.
-s --switch-to-UTF-8
tries to switch toutf-8(7)mode by writing <ESC>%G to the program's standard error output. Useecho -ne '\ 33%@'to switch back if required. This doesn't work on all terminals.
-u[X] --unknown-char[=X]
Outdated option.SubstituteXfor unknown single byte characters and errors.IfXisn't specified the defaultcharacter as output by2UTF -his assumed.Xcan be a single character, hex (0x80), octal (0200) ordecimal (128) number. This can be useful when translatingto single-byte encoding.
-v --verbose
verbose mode.
-V --version
shows program's version and some copyright information.

       Rightmost option or alias takes precedence. Long options may be abbreviated. Short options may be grouped.

Defaultunicode(7)character for errors and unknown characters is 0xFFFD.Approximations can be performed if conversion is from Unicode to single-byte legacychar-sets. US-ASCII strings up to 4 bytes length is substituted for charactersundefined in the output char-set. These strings are defined at the compile time. 


       To view ISO_8859-3:1988 document use:

2UTF --verbose --switch-to-UTF-8 8859-3 <document | less -r

       To translate from CP1257 to BALTIC (ISO-IR-179) use:

2UTF -2 1257 <cp1257_file | fromUTF -2 baltic >baltic_charset_file

       To call a BBS using 869 "code page" use:

minicom -l -t linux |2UTF --switch-to-UTF-8 IBM869

       To convert everything from UTF-8 to US-ASCII:

fromUTF us-ascii <UTF-8_file

       See also


There can be self-explanatory configuration file2UTF.config.It is searched in /usr/local/etc/, /usr/etc/, /etc/ or other directoriesdefined at compile time. Configuration file can specify directory names for char-map filesand external filters for conversion to and from other legacy char-sets andencodings not supported byiconv(3).



2UTF(1),iconv(1),iconv(3),tcs(1),recodeinfo page

Yuditeditor and converter at

'trans'program at

On BSD systems:utf2(4),multibyte(3),

On Linux:unicode(7),utf-8(7),console_codes(4),charsets(4)

Look at char-map files. 



       Bug reports, comments and suggestions please send to:

Ricardas Cepas <>or<rchAATTWriteMe.Com> 


Due to thepopen(3)bug in older Linux glibc versions non-existent commands in configuration file are not detected.So please check configuration file by hand.

Transformation from UTF-8 can be slow.

Characters can be lost if char-map files used are incomplete .

Reverse transformation is not perfect.

See alsoTO-DOfile.

Please use atyour own risk only.



       This program (including this man page) is distributed under
BSD style license(seeBSD_style_licensefile in the documentation directory)orGNU General Public License V2excepthdr.h, plan9.h and utf.cfiles (if used) fromtcs(1),public domain code frommimedecode.cfile. Copyright statements should bekept unchanged.

       For the copyright information see file




This document was created byman2html,using the manual pages.