SEARCH
NEW RPMS
DIRECTORIES
ABOUT
FAQ
VARIOUS
BLOG
DONATE


YUM REPOSITORY

 
 

bt_postprocess

Section: btparse (3)
Updated: 2003-10-25
Index 

NAME

bt_postprocess - post-processing of BibTeX strings, values, and entries 

SYNOPSIS

   void bt_postprocess_string (char * s,                               ushort options)

   char * bt_postprocess_value (AST *   value,                                ushort  options,                                 boolean replace);

   char * bt_postprocess_field (AST *   field,                                 ushort  options,                                 boolean replace);

   void bt_postprocess_entry (AST *  entry,                              ushort options);
 

DESCRIPTION

When btparse parses a BibTeX entry, it initially stores the resultsin an abstract syntax tree (AST), in a form exactly mirroring the parseddata. For example, the entry

   @Article{Jones:1997a,     AuThOr = "Bob   Jones" # and # "Jim Smith ",     TITLE = "Feeding Habits of              the Common Cockroach",     JoUrNaL = j_ent,     YEAR = 1997   }

would parse to an AST that could be represented as follows:

   (entry,"Article")     (key,"Jones:1997a")     (field,"AuThOr")       (string,"Bob   Jones")       (macro,"and")       (string,"Jim Smith ")     (field,"TITLE")       (string,"Feeding Habits of               the Common Cockroach")     (field,"JoUrNaL")       (macro,"j_ent")     (field,"YEAR")       (number,"1997")

The advantage of this form is that all the important information in theentry is readily available by traversing the tree using the functionsdescribed in bt_traversal. This obvious problem is that the data isa little too raw to be immediately useful: entry types and field namesare inconsistently capitalized, strings are full of unwanted whitespace,field values not reduced to single strings, and so forth.

All of these problems are addressed by btparse's post-processingfunctions, described here. Normally, you won't have to call thesefunctions---the library does the Right Thing for you after parsing eachentry, and you can customize what exactly the Right Thing is for yourapplication. (For instance, you can tell it to expand macros, but notto concatenate substrings together.) However, it's conceivable that youmight wish to move the post-processing into your own code and out of thelibrary's control. More likely, you could have strings that come fromsomething other than BibTeX files that you would like to have treated asBibTeX strings; for that situation, the post-processing functions areessential. Finally, you might just be curious about what exactlyhappens to your data after it's parsed. If so, you've come to the rightplace for excruciatingly detailed explanations. 

FUNCTIONS

btparse offers four points of entry to its post-processing code. Ofthese, probably only the first and last---for processing individualstrings and whole entries---will be commonly used. 

Post-processing entry points

To understand why four entry points are offered, an explanation of thesample AST shown above will help. First of all, the whole entry isrepresented by the "(entry,"Article")" node; this node has the entrykey and all its field/value pairs as children. Entry nodes are returnedby "bt_parse_entry()" and "bt_parse_entry_s()" (see bt_input) aswell as "bt_next_entry()" (which traverses a list of entries returnedfrom "bt_parse_file()"---see bt_traversal). Whole entries may bepost-processed with "bt_postprocess_entry()".

You may also need to post-process a single field, or just the valueassociated with it. (The difference is that processing the field canchange the field name---e.g. to lowercase---in addition to the fieldvalue.) The "(field,"AuThOr")" node above is an example of a fieldsub-AST, and "(string,"Bob Jones")" is the first node in the list ofsimple values representing that field's value. (Recall that a fieldvalue is, in general, a list of simple values.) Field nodes arereturned by "bt_next_field()", value nodes by "bt_next_value()". Theformer may be passed to "bt_postprocess_field()" for post-processing,the latter to "bt_postprocess_value()".

Finally, individual strings may wander into your program from manyplaces other than a btparse AST. For that reason,"bt_postprocess_string()" is available for post-processing arbitrarystrings. 

Post-processing options

All of the post-processing routines have an "options" parameter, whichyou can use to fine-tune the post-processing. (This is just like theper-metatype string-processing options that you can set before parsingentries; see "bt_set_stringopts()" in bt_input.) Like elsewhere inthe library, "options" is a bitmap constructed by or'ing togethervarious predefined constants. These constants and their effects aredocumented in ``String processing option macros'' in btparse.
bt_postprocess_string ()
   void bt_postprocess_string (char * s,                               ushort options)

Post-processes an individual string, "s", which is modified in place.The only post-processing option that makes sense on individual stringsis whether to collapse whitespace according to the BibTeX rules; thus,if "options & BTO_COLLAPSE" is false, this function has no effect.(Although it makes a complete pass over the string anyways. This is forfuture expansion.)

The exact rules for collapsing whitespace are simple: non-spacewhitespace characters (tabs and newlines mainly) are converted to space,any strings of more than one space within are collapsed to a singlespace, and any leading or trailing spaces are deleted. (Ensuring thatall whitespace is spaces is actually done by btparse's lexicalscanner, so strings in btparse ASTs will never have whitespace apartfrom space. Likewise, any strings passed to bt_postprocess_string()should not contain non-space whitespace characters.)

bt_postprocess_value ()
   char * bt_postprocess_value (AST *   value,                                ushort  options,                                 boolean replace);

Post-processes a single field value, which is the head of a list ofsimple values as returned by "bt_next_value()". All of the relevantstring-processing options come into play here: conversion of numbers tostrings ("BTO_CONVERT"), macro expansion ("BTO_EXPAND"), collapsing ofwhitespace ("BTO_COLLAPSE"), and string pasting ("BTO_PASTE"). Sincepasting substrings together without first expanding macros andconverting numbers would be nonsensical, attempting to do so is a fatalerror.

If "replace" is true, then the list headed by "value" will be replacedby a list representing the processed value. That is, if string pastingis turned on ("options & BTO_PASTE" is true), then this list will becollapsed to a single node containing the single string that resultsfrom pasting together all the substrings. If string pasting is not on,then each node in the list will be left intact, but will have itstext replaced by processed text.

If "replace" is false, then a new string will be built on the fly andreturned by the function. Note that if pasting is not on in this case,you will only get the last string in the list. (It doesn't really makea lot of sense to post-process a value without pasting unless you'rereplacing it with the new value, though.)

Returns the string that resulted from processing the whole value, whichonly makes sense if pasting was on or there was only one value in thelist. If a multiple-value list was processed without pasting, the laststring in the list is returned (after processing).

Consider what might be done to the value of the "author" field in theabove example, which is the concatenation of a string, a macro, andanother string. Assume that the macro "and" expands to " and ", andthat the variable "value" points to the sub-AST for this value.The original sub-AST corresponding to this value is

   (string,"Bob   Jones")   (macro,"and")   (string,"Jim Smith ")

To fully process this value in-place, you would call

   bt_postprocess_value (value, BTO_FULL, TRUE);

This would convert the value to a single-element list,

   (string,"Bob Jones and Jim Smith")

and return the fully-processed string "Bob Jones and Jim Smith".Note that the "and" macro has been expanded, interpolated between thetwo literal strings, everything pasted together, and finally whitespacecollapsed. (Collapsing whitespace before concatenating the stringswould be a bad idea.)

(Incidentally, "BTO_FULL" is just a macro for the combination of allpossible string-processing options, currently:

   BTO_CONVERT | BTO_EXPAND | BTO_PASTE | BTO_COLLAPSE

There are two other similar shortcut macros: "BTO_MACRO" to express thespecial string-processing done on macro values, which is the same as"BTO_FULL" except for the absence of "BTO_COLLAPSE"; and"BTO_MINIMAL", which means no string-processing is to be done.)

Let's say you'd rather preserve the list nature of the value, whileexpanding macros and converting any numbers to strings. (Thisconversion is trivial: it just changes the type of the node from"BTAST_NUMBER" to "BTAST_STRING". ``Number'' values are always storedas a string of digits, just as they appear in the file.) This would bedone with the call

   bt_postprocess_value      (value, BTO_CONVERT|BTO_EXPAND|BTO_COLLAPSE,TRUE);

which would change the list to

   (string,"Bob Jones")   (string,"and")   (string,"Jim Smith")

Note that whitespace is collapsed here before any concatenation canbe done; this is probably a bad idea. But you can do it if you wish.(If you get any ideas about cooking up your own value post-processingscheme by doing it in little steps like this, take a look at the sourceto "bt_postprocess_value()"; it should dissuade you from such aventure.)

bt_postprocess_field ()
   char * bt_postprocess_field (AST *   field,                                 ushort  options,                                 boolean replace);

This is little more than a front-end to "bt_postprocess_value()"; theonly difference is that you pass it a ``field'' AST node (eg. the"(field,"AuThOr")" in the above example), and that it transforms thefield name in addition to its value. In particular, the field name isforced to lowercase; this behaviour is (currently) not optional.

Returns the string returned by "bt_postprocess_value()".

bt_postprocess_entry ()
   void bt_postprocess_entry (AST *  entry,                              ushort options);

Post-processes all values in an entry. If "entry" points to the ASTfor a ``regular'' or ``macro definition'' entry, then the values are justwhat you'd expect: everything on the right-hand side of a field or macro``assignment.'' You can also post-process comment and preamble entries,though. Comment entries are essentially one big string, so onlywhitespace collapsing makes sense on them. Preambles may have multiplestrings pasted together, so all the string-processing options apply tothem. (And there's nothing to prevent you from using macros in apreamble.)

 

SEE ALSO

btparse, bt_input, bt_traversal 

AUTHOR

Greg Ward <gwardAATTpython.net>


 

Index

NAME
SYNOPSIS
DESCRIPTION
FUNCTIONS
Post-processing entry points
Post-processing options
SEE ALSO
AUTHOR

This document was created byman2html,using the manual pages.