MAN page from PLD 2UTF-1.22-7.i386.rpm
2UTF
Section: User Manuals (1)
Updated: 27 September 2000
Index NAME
2UTF, fromUTF - translates legacy char-sets to/from
unicode(7),decodes MIME messages
SYNOPSIS
[
2UTF|
fromUTF][
options]
[
charmap_file_or_alias]
<input >output DESCRIPTION
2UTFis a filter which converts legacy char-sets tounicode(7)(UCS - Universal Character Set) and reverse ifpossible. It can also display char-maps,linux console font and a range ofunicode(7)glyphs inutf-8(7)encoding.2UTFusesiconv(3)library, but it still can get char-map for single-byte legacy char-sets fromfrom tables found atftp://ftp.unicode.org/orwg15-localepackage database, or other similar files with user defined format.It can invoke external filters specified in configuration file.
charmap_file_or_aliasis pathname, alias or filename for the file with char-map definition.Alias or filenameis converted to uppercase. Aliases specified inlocaleschar-mapsare cached in special file if directory/file permissions allow this.If exact match for alias or filename isn't found,*alias*or*filename*glob pattern is used (except for aliases found in configuration file).
-(hyphen-minus) and_(low line, spacing underscore) characters in aliases are ignored.
When invoked without char-map specified a mail message is assumed
on standard input. I donotprovide any warranty and recommend you to backup your mail if you use2UTFas automatic mail filter. Seeprocmailrcfile inexamplessubdirectory. It should handle commontext and multi-part types, MIME style encoded and plain non-standard 8-bit headers. Everythingis converted toutf-8(7)encoding.Messages with MIME stylepgp(1)signatures should be passed untouched.
OPTIONS
- --
- stops option checking for the rest of the command line.
- -2 --UCS-2 --ucs-2
- Outdated option.output (input if--reversespecified) 2 byte wide characters.
- -4 --UCS-4 --ucs-4
- Outdated option.output (input if--reversespecified) 4 byte wide characters.
- -8 --UTF-8 --utf-8
- (default)output (input if--reversespecified) multi-byteutf-8(7)characters.
- -cFILENAME --charmap-file=FILENAME
- the alternative way to specify filename or pathname for char-map file.
- -C --create-aliases
- rescans all available char-map files and (re)creates aliases file.You should have write permission for this file.
- -d[N] --debug[=N]
- Outputs debugging info to stderr. Implies--verbose.Nis debug level from 1 to 9. Default is 1.
- -e --encode-headers
- Reencode back decoded RFC-2047 MIME words in headers. Can only be used with--iconv=only.
- -f[FORMAT] --format[=FORMAT]
- sscanf(3)format string for reading char-map file. If not specified the default format as output by2UTF -h(used by char-maps fromWG15locales) is assumed. Aliases specified in locale char-map files are recognized.Lines beginning with%or#and lines not matching formatare ignored. In case of duplicated lines the last line takes precedence." 0x%x 0x%X "sscanf(3)format string is always assumedfor char-map files ending in.TXT,.Xor.x.This corresponds to char-maps you can get fromftp://ftp.unicode.org.
- -o --forward
- (default)converts *to* unicode(7)if invoked as2UTF,and tries convert *from*unicode(7)if invoked asfromUTF.
- -H --HTML --html
- This applies to approximations when converting fromunicode(7).Special HTML characters appearing after approximations are changedto < > " and &.
- -h -? --? --help -help
- Prints the program's version number, default parameters,and a short usage message to the program's standard error output and exits.
- -i only --iconv=only
- Don't read configuration file, don't use built in charmap paths and use onlyiconv(3)for conversion.
- -i first --iconv=first
- Attempt to useiconv(3)before charmap files for conversion. Internal approximations are always used when output char-set is 'US-ASCII'.
- -i last --iconv=last
- Attempt to use charmap files beforeiconv(3)for conversion.
- -l --list-charmaps
- Lists char-maps and aliases currently in aliases database, then exits.This includes only char-maps usable by2UTF.
- -p --pathnames
- Prints pathnames for configuration file, default compiled-in directories forchar-map files, actually used directories for char-map files,pathname for aliases cache.
- -r --reverse
- tries convert back *from*unicode(7)if invoked as2UTF,and converts *to* unicode(7)if invoked asfromUTF.
- -W --show-charmap
- outputs table of char-map characters inutf-8(7)encoding. .(period) is substituted for 0x0000-0x001F and 0x007F.?(question mark) is substituted for undefined characters.
- -S --spit-glyphs
- outputs table of characters inutf-8(7)encoding at F000-F1FFunicode(7)private use area. This corresponds to current console font in Linux.
- -S[min][-][max] --spit-glyphs=[min][-][max]
- outputs table of characters inutf-8(7)encoding at given range.minandmaxisunicode(7)hex numbers from 0 to 7FFFFFFF.mindefaults to 0.maxdefaults tomin+ 511 if-is present.
- -s --switch-to-UTF-8
- tries to switch toutf-8(7)mode by writing <ESC>%G to the program's standard error output. Useecho -ne '\ 33%@'to switch back if required. This doesn't work on all terminals.
- -u[X] --unknown-char[=X]
- Outdated option.SubstituteXfor unknown single byte characters and errors.IfXisn't specified the defaultcharacter as output by2UTF -his assumed.Xcan be a single character, hex (0x80), octal (0200) ordecimal (128) number. This can be useful when translatingto single-byte encoding.
- -v --verbose
- verbose mode.
- -V --version
- shows program's version and some copyright information.
Rightmost option or alias takes precedence. Long options may be abbreviated. Short options may be grouped.
Defaultunicode(7)character for errors and unknown characters is 0xFFFD.Approximations can be performed if conversion is from Unicode to single-byte legacychar-sets. US-ASCII strings up to 4 bytes length is substituted for charactersundefined in the output char-set. These strings are defined at the compile time.
EXAMPLES
To view ISO_8859-3:1988 document use:
2UTF --verbose --switch-to-UTF-8 8859-3 <document | less -r
To translate from CP1257 to BALTIC (ISO-IR-179) use:
2UTF -2 1257 <cp1257_file | fromUTF -2 baltic >baltic_charset_file
To call a BBS using 869 "code page" use:
minicom -l -t linux |2UTF --switch-to-UTF-8 IBM869
To convert everything from UTF-8 to US-ASCII:
fromUTF us-ascii <UTF-8_file
See also
examplessubdirectory.
FILES
There can be self-explanatory configuration file
2UTF.config.It is searched in /usr/local/etc/, /usr/etc/, /etc/ or other directoriesdefined at compile time. Configuration file can specify directory names for char-map filesand external filters for conversion to and from other legacy char-sets andencodings not supported by
iconv(3).
SEE ALSO
2UTF(1),
iconv(1),
iconv(3),
tcs(1),
recodeinfo page
Yuditeditor and converter athttp://www.yudit.org/
ftp://ftp.cnd.org/pub/ifcss.org/software/unix/convert/
'trans'program atftp://ftp.funet.fi/pub/doc/charsets/
On BSD systems:utf2(4),multibyte(3),
On Linux:unicode(7),utf-8(7),console_codes(4),charsets(4)
Look atftp://ftp.unicode.org/andftp://dkuug.dk/i18n/WG15-collection/charmaps/for char-map files.
URL
http://x-lt.richard.eu.org/me/rch/ll.html#2UTF
BUG REPORTS
Bug reports, comments and suggestions please send to:
Ricardas Cepas <rchAATTrichard.eu.org>or<rchAATTWriteMe.Com>
BUGS
Due to thepopen(3)bug in older Linux glibc versions non-existent commands in configuration file are not detected.So please check configuration file by hand.
Transformation from UTF-8 can be slow.
Characters can be lost if char-map files used are incomplete .
Reverse transformation is not perfect.
See alsoTO-DOfile.
Please use atyour own risk only.
COPYING
This program (including this man page) is distributed under
BSD style license(see
BSD_style_licensefile in the documentation directory)or
GNU General Public License V2except
hdr.h,
plan9.h and
utf.cfiles (if used) from
tcs(1),public domain code from
mimedecode.cfile. Copyright statements should bekept unchanged.
For the copyright information see file
copyright.
Index
- NAME
- SYNOPSIS
- DESCRIPTION
- OPTIONS
- EXAMPLES
- FILES
- SEE ALSO
- URL
- BUG REPORTS
- BUGS
- COPYING
This document was created byman2html,using the manual pages.