Section: btparse (3)
bt_format_names - formatting BibTeX names for consistent output
bt_name_format * bt_create_name_format (char * parts, boolean abbrev_first); void bt_free_name_format (bt_name_format * format); void bt_set_format_text (bt_name_format * format, bt_namepart part, char * pre_part, char * post_part, char * pre_token, char * post_token); void bt_set_format_options (bt_name_format * format, bt_namepart part, boolean abbrev, bt_joinmethod join_tokens, bt_joinmethod join_part); char * bt_format_name (bt_name * name, bt_name_format * format);
After splitting a name into its components parts (represented as a"bt_name"
structure), you often want to put it back together again as asingle string in a consistent way. btparse
provides a very flexibleway to do this, generally in two stages: first, you create a ``nameformat'' which describes how to put the tokens and parts of any name backtogether, and then you apply the format to a particular name.
The ``name format'' is encapsulated in a "bt_name_format" structure,which is created with "bt_create_name_format()". This functionincludes some clever trickery that means you can usually get away withcalling it alone, and not need to do any customization of the format.If you do need to customize the format, though, "bt_set_format_text()"and "bt_set_format_options()" provide that capability.
The format controls the following:
- which name parts are printed, and in what order (e.g. ``first von lastjr'', or ``von last jr first'')
- the text that precedes and follows each part (e.g. if the first namefollows the last name, you probably want a comma before the `first'part: ``Smith, John'' rather than ``Smith John'')
- the text that precedes and follows each token (e.g. if the first name isabbreviated, you may want a period after each token: ``J. R. Smith''rather than ``J R Smith'')
- the method used to join the tokens of each part together
- the method used to join each part to the following part
All of these except the list of parts to format are kept in arraysindexed by name part: for example, the structure has a field
char * post_token[BT_MAX_NAMEPARTS]
and "post_token[BTN_FIRST]" ("BTN_FIRST" is from the "bt_namepart""enum") is the string to be added after each token in the firstname---for example, "." if the first name is to be abbreviated in theconventional way.
Yet another "enum", "bt_joinmethod", describes the available methodsfor joining tokens together. Note that there are two sets of joinmethods in a name format: between tokens within a single part, andbetween the tokens of two different parts. The first allows you, forexample, to change "J R Smith" (first name abbreviated with nopost-token text but tokens joined by a space) to "JR Smith" (thesame, but first-name tokens jammed together). The second is mainly usedto ensure that ``von'' and ``last'' name-parts may be joined with a tie:"de~Roche" rather than "de Roche".
The token join methods are:
- Insert a ``discretionary tie'' between tokens. That is, either a space ora ``tie'' is inserted, depending on context. (A ``tie,'' otherwise known asunbreakable space, is currently hard-coded as "~"---from TeX.)
The format is then applied to a particular name by "bt_format_name()",which returns a new string.
- Always insert a space between tokens.
- Always insert a ``tie'' ("~") between tokens.
- Insert nothing between tokens---just jam them together.
Tokens are joined together, and thus the choice of whether to insert a``discretionary tie'' is made, at two places: within a part and betweentwo parts. Naturally, this only applies when "BTJ_MAYTIE" was suppliedas the token-join method; "BTJ_SPACE" and "BTJ_FORCETIE" always inserteither a space or tie, and "BTJ_NOTHING" always adds nothing betweentokens. Within a part, ties are added after a the first token if it isless than three characters long, and before the last token. Betweenparts, a tie is added only if the preceding part consisted of singletoken that was less than three characters long. In all other cases,spaces are inserted. (This implementation slavishly follows BibTeX.)
bt_name_format * bt_create_name_format (char * parts, boolean abbrev_first)
Creates a name format for a given set of parts, with variations for themost common forms of customization---the order of parts and whether toabbreviate the first name.
The "parts" parameter specifies which parts to include in a formattedname, as well as the order in which to format them. "parts" must be astring of four or fewer characters, each of which denotes one of thefour name parts: for instance, "vljf" means to format all four partsin ``von last jr first'' order. No characters outside of the set"fvlj" are allowed, and no characters may be repeated."abbrev_first" controls whether the `first' part will be abbreviated(i.e., only the first letter from each token will be printed).
In addition to simply setting the list of parts to format and the``abbreviate'' flag for the first name, "bt_create_name_format()"initializes the entire format structure so as to minimize the need forfurther customizations:
- The ``token join method''---what to insert between tokens of the samepart---is set to "BTJ_MAYTIE" (discretionary tie) for all parts
- The ``part join method''---what to insert after the final token of aparticular part, assuming there are more parts to come---is set to"BTJ_SPACE" for the `first', `last', and `jr' parts. If the `von' partis present and immediately precedes the `last' part (which will almostalways be the case), "BTJ_MAYTIE" is used to join `von' to `last';otherwise, `von' also gets "BTJ_SPACE" for the inter-part join method.
- The abbreviation flag is set to "FALSE" for the `von', `last', and `jr'parts; for `first', the abbreviation flag is set to whatever you pass inas "abbrev_first".
- Initially, all ``surrounding text'' (pre-part, post-part, pre-token, andpost-token) for all parts is set to the empty string. Then a few tweaksare done, depending on the "abbrev_first" flag and the order oftokens. First, if "abbrev_first" is "TRUE", the post-token text forfirst name is set to "."---this changes "J R Smith" to"J. R. Smith", which is usually the desired form. (If you don'twant the periods, you'll have to set the post-token text yourself with"bt_set_format_text()".)
Then, if `jr' is present and immediately after `last' (almost always thecase), the pre-part text for `jr' is set to ", ", and the inter-partjoin method for `last' is set to "BTJ_NOTHING". This changes "John Smith Jr" (where the space following "Smith" comes fromformatting the last name with a "BTJ_SPACE" inter-part join method) to"John Smith, Jr" (where the ", " is now associated with "Jr"---that way, if there is no `jr' part, the ", " willnot be printed.)
Finally, if `first' is present and immediately follows either `jr' or`last' (which will usually be the case in ``last-name first'' formats),the same sort of trickery is applied: the pre-part text for `first' isset to ", ", and the part join method for the preceding part (either`jr' or `last') is set to "BTJ_NOTHING".
While all these rules are rather complicated, they mean that you areusually freed from having to do any customization of the name format.Certainly this is the case if you only need "fvlj" and "vljf" partorders, only want to abbreviate the first name, want periods afterabbreviated tokens, non-breaking spaces in the ``right'' places, andcommas in the conventional places.
If you want something out of the ordinary---for instance, abbreviatedtokens jammed together with no puncuation, or abbreviated lastnames---you'll need to customize the name format a bit with"bt_set_format_text()" and "bt_set_format_options()".
void bt_free_name_format (bt_name_format * format)
Frees a name format created by "bt_create_name_format()".
void bt_set_format_text (bt_name_format * format, bt_namepart part, char * pre_part, char * post_part, char * pre_token, char * post_token)
Allows you to customize some or all of the surrounding text for a singlename part. Supply "NULL" for any chunk of text that you don't want tochange.
For instance, say you want a name format that will abbreviate firstnames, but without any punctuation after the abbreviatedtokens. You could create and customize the format as follows:
format = bt_create_name_format ("fvlj", TRUE); bt_set_format_text (format, BTN_FIRST, /* name-part to customize */ NULL, NULL, /* pre- and post- part text */ NULL, ""); /* empty string for post-token */
Without the "bt_set_format_text()" call, "format" would result innames formatted like "J. R. Smith". After setting the post-tokentext for first names to "", this name would become "J R Smith".
void bt_set_format_options (bt_name_format * format, bt_namepart part, boolean abbrev, bt_joinmethod join_tokens, bt_joinmethod join_part)
Allows further customization of a name format: you can set theabbreviation flag and the two token-join methods. Alas, there is nomechanism for leaving a value unchanged; you must set everything with"bt_set_format_options()".
For example, let's say that just dropping periods from abbreviatedtokens in the first name isn't enough; you really want to savespace by jamming the abbreviated tokens together: "JR Smith" ratherthan "J R Smith" Assuming the two calls in the above example havebeen done, the following will finish the job:
bt_set_format_options (format, BTN_FIRST, TRUE, /* keep same value for abbrev flag */ BTJ_NOTHING, /* jam tokens together */ BTJ_SPACE); /* space after final token of part */
Note that we unfortunately had to know (and supply) the current valuesfor the abbreviation flag and post-part join method, even though we wereonly setting the intra-part join method.
char * bt_format_name (bt_name * name, bt_name_format * format)
Once a name format has been created and customized to your heart'scontent, you can use it to format any number of names that have beensplit with "bt_split_name" (see bt_split_names). Simply pass thename structure and name format structure, and a newly-allocated stringcontaining the formatted name will be returned to you. It is yourresponsibility to "free()" this string.
Greg Ward <gwardAATTpython.net>
- SEE ALSO
This document was created byman2html,using the manual pages.