MAN page from CentOS Other analysis-pipeline-5.11.3-5.el8.x86_64.rpm


Section: Analysis Pipeline (8)
Updated: 2015-09-25


pipeline - Examine SiLK Flow, YAF, or IPFIX records as they arrive 


There are 4 possible data sources: SiLK, YAF, IPFIX, or a configuration file with all of the details.

There are 4 possible input modes, 3 of which run continuously and will be runas a daemon by default: UDP or TCP socket (which require --break-on-recs), polling a directory for new files. The last is a finite list of files to process, which is never run as a daemon.

Allowable combinations: SiLK with directory polling or named files. YAF with UDP or TCP sockets or named files. IPFIX with UDP or TCP sockets, directorypolling, or named files.

A data source configuration file contains all necessary details of both the data source and the input method.

There are 4 general input modes for pipeline, each of which can be run with snarf and without snarf.

To run pipeline when built with snarf, a snarf destination can be specified with:--snarf-destination=ENDPOINT.

To run pipeline when built without snarf, alert log files must be specified with:--alert-log-file=FILE_PATH --aux-alert-file=FILE_PATH

In the examples below, substitute the above alerting configurations in place of``ALERT CONFIGURATION OPTIONS''.

To run pipeline continuously but not as a daemon:

  pipeline --configuration-file=FILE_PATH        ALERT CONFIGURATION OPTIONS        { --silk | --yaf | --ipfix }        { --udp-port=NUMBER | --tcp-port=NUMBER |             --incoming-directory=DIR_PATH --error-directory=DIR_PATH            [--archive-directory=DIR_PATH] [--flat-archive]         }        [--break-on-recs=NUMBER]        { [--time-is-clock] | [--time-field-name=STRING] |           [--time-from-schema] |          [--time-field-ent=NUMBER --time-field-id=NUMBER]         }        [--polling-interval=NUMBER] [--polling-timeout=NUMBER ]        [--country-code-file=FILE_PATH]        [--site-config-file=FILENAME]        --do-not-daemonize

To run pipeline over a finite list of files:

  pipeline --configuration-file=FILE_PATH        ALERT CONFIGURATION OPTIONS        { --silk | --yaf | --ipfix }        --name-files        [--break-on-recs=NUMBER]        { [--time-is-clock] | [--time-field-name=STRING] |          [--time-from-schema] |          [--time-field-ent=NUMBER --time-field-id=NUMBER]         }        [--polling-interval=NUMBER] [--polling-timeout=NUMBER ]        [--country-code-file=FILE_PATH]        [--site-config-file=FILENAME]

To run pipeline using a configuration file specifying all data source anddata input options. Daemonizing can be turned off it needed.

  pipeline --configuration-file=FILE_PATH        ALERT CONFIGURATION OPTIONS        --data-source-configuration-file=FILE_PATH        [--country-code-file=FILE_PATH]        [--site-config-file=FILENAME]        { --do-not-daemonize |          { --log-destination=DESTINATION |            --log-directory=DIR_PATH [--log-basename=BASENAME] |            --log-pathname=FILE_PATH           }          [--log-level=LEVEL] [--log-sysfacility=NUMBER]          [--pidfile=FILE_PATH]         }

To run pipeline continuously as a daemon:

  pipeline --configuration-file=FILE_PATH        ALERT CONFIGURATION OPTIONS        { --silk | --yaf | --ipfix }        { --udp-port=NUMBER | --tcp-port=NUMBER |            --incoming-directory=DIR_PATH --error-directory=DIR_PATH            [--archive-directory=DIR_PATH] [--flat-archive]         }        [--break-on-recs=NUMBER]        { [--time-is-clock] | [--time-field-name=STRING] |           [--time-from-schema] |           [--time-field-ent=NUMBER --time-field-id=NUMBER]         }        [--polling-interval=NUMBER] [--polling-timeout=NUMBER ]        [--country-code-file=FILE_PATH]        [--site-config-file=FILENAME]        { --log-destination=DESTINATION          | --log-directory=DIR_PATH [--log-basename=BASENAME]          | --log-pathname=FILE_PATH         }        [--log-level=LEVEL] [--log-sysfacility=NUMBER]        [--pidfile=FILE_PATH]

Help options:

  pipeline --configuration-file=FILE_PATH --verify-configuration  pipeline --help  pipeline --version


The Analysis Pipeline program, pipeline, is designed to be run overthree different types of input. The first, as in version 4.x, is files of SiLK Flow records as they are processed by the SiLK packing system. The second typeis data coming directly out of YAF (or super_mediator) including deep packet inspection information. The last is any raw IPFIX records.

pipeline requires a configuration file that specifies filtersand evaluations. The filter blocks determine which flow recordsare of interest (similar to SiLK's rwfilter(1) command). Theevaluation blocks can compute aggregate information over the flowrecords (similar to rwuniq(1)) to determine whether the flowrecords should generate an alert. Information on the syntax of theconfiguration file is available in the Analysis Pipeline Handbook.

The output that pipeline produces depends on whether support forthe snarf alerting library was compiled into the pipeline binary,as described in the next subsections.

Either form of output from pipeline includes country codeinformation. To map the IP addresses to country codes, a SiLK prefixmap file, country_codes.pmap must be available to pipeline.This file can be installed in SiLK's install tree, or its location canbe specified with the SILK_COUNTRY_CODES environment variable or the--country-codes-file command line switch. 

Output Using Snarf

When pipeline is built with support for the snarf alertinglibrary (<>), the--snarf-destination switch can be used to specify where to send thealerts. The parameter to the switch takes the form"tcp://HOST:PORT", which specifies that a snarfd process isrunning on HOST at PORT. When --snarf-destination is notspecified, pipeline uses the value in the SNARF_ALERT_DESTINATIONenvironment variable. If it is not set, pipeline prints the alertsencoded in JSON (JavaScript Object Notation). The outputs go to thelog file when running as a daemon, or to the standard output when the--name-files switch is specified. 

Legacy Output Not Using Snarf

When snarf support is not built into pipeline, the output ofpipeline is a textual file in pipe-delimited ("|"-delimited)format describing which flow records raised an alert and the type ofalert that was raised. The location of the output file must bespecified via the --alert-log-file switch. The file is in a formatthat a properly configured ArcSight Log File Flexconnector can use.The file in theshare/analysis-pipeline/ directory can be used to configure theArcSight Flexconnector to process the file.

pipeline can provide additional information about the alert in aseparate file, called the auxiliary alert file. To use this feature,specify the complete path to the file in the --aux-alert-fileswitch. This option is required.

pipeline will assume that both the alert-log-file and theaux-alert-file are under control of the logrotate(8) daemon. Seethe Analysis Pipeline Handbook for details. 

Integrating pipeline into the SiLK Packing System

Normally pipeline is run as a daemon during SiLK's collection andpacking process. pipeline runs on the flow records after they havebeen processed rwflowpack(8), since pipeline may need to use theclass, type, and sensor data that rwflowpack assigns to each flowrecord.

pipeline should get a copy of each incremental file thatrwflowpack generates. There are three places that pipeline canbe inserted so it will see every incremental file:


We describe each of these in turn. If none of these daemons are inuse at your site, you must modify how rwflowpack runs, which isalso described below. 


To use pipeline with the rwsender in SiLK 2.2 or later, specifya --local-directory argument to rwsender, and have pipelineuse that directory as its incoming-directory, for example:

 rwsender ... --local-directory=/var/silk/pipeline/incoming ... pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...


When pipeline is running on a dedicated machine separate from themachine where rwflowpack is running, one can use a dedicatedrwreceiver to receive the incremental files from an rwsenderrunning on the machine where rwflowpack is running. In this case,the incoming-directory for pipeline will be thedestination-directory for rwreceiver. For example:

 rwreceiver ... --destination-dir=/var/silk/pipeline/incoming ... pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...

When pipeline is running on a machine where an rwreceiver(version 2.2. or newer) is already running, one can specify anadditional --duplicate-destination directory to rwreceiver, andhave pipeline use that directory as its incoming directory. Forexample:

 rwreceiver ... --duplicate-dest=/var/silk/pipeline/incoming ... pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...


One way to use pipeline with rwflowappend is to haverwflowappend store incremental files into an archive-directory, andhave pipeline process those files. However, since rwflowappend stores the incremental files in subdirectories under the archive-directory, youmust specify a --post-command to rwflowappend to move (or copy)the files into another directory where pipeline can process them.For example:

 rwflowappend ... --archive-dir=/var/silk/rwflowappend/archive       --post-command='mv %s /var/silk/pipeline/incoming' ... pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...

Note: Newer versions of rwflowappend support a --flat-archiveswitch, which places the files into the root of the archive-directory.For this situation, make the archive-directory of rwflowappend theincoming-directory of pipeline:

 rwflowappend ... --archive-dir=/var/silk/pipeline/incoming pipeline ... --incoming-directory=/var/silk/pipeline/incoming ...

rwflowpack only

If none of the above daemons are in use at your site becauserwflowpack writes files directly into the data repository, you mustmodify how rwflowpack runs so it uses a temporary directory thatrwflowappend monitors, and you can then insert pipeline afterrwflowappend has processed the files.

Assuming your current configuration for rwflowpack is:

 rwflowpack --sensor-conf=/var/silk/rwflowpack/sensor.conf       --log-directory=/var/silk/rwflowpack/log       --root-directory=/data

You can modify it as follows:

 rwflowpack --sensor-conf=/var/silk/rwflowpack/sensor.conf       --log-directory=/var/silk/rwflowpack/log       --output-mode=sending       --incremental-dir=/var/silk/rwflowpack/incremental       --sender-dir=/var/silk/rwflowappend/incoming rwflowappend --root-directory=/data       --log-directory=/var/silk/rwflowappend/log       --incoming-dir=/var/silk/rwflowappend/incoming       --error-dir=/var/silk/rwflowappend/error       --archive-dir=/var/silk/rwflowappend/archive       --post-command='mv %s /var/silk/pipeline/incoming' ... pipeline --silk --incoming-directory=/var/silk/pipeline/incoming       --error-directory=/var/silk/pipeline/error       --log-directory=/var/silk/pipeline/log       --configuration-file=/var/silk/pipeline/pipeline.conf

Non-daemon mode

There are two ways to run pipeline in non-daemon mode. The first is to runit using one of the ways above that runs forever (socket or directory polling)but just not run it as a daemon. use --do-not-daemonize to keep the processis the foreground.

The other way is to run pipeline over files whose names are specifiedon the command line. In this mode, pipeline stays in theforeground, processes the files, and exits. None of the filesspecified on the command line are changed in any way---they areneither moved nor deleted. To run pipeline in this mode, specifythe --name-files switch and the names of the files to process. 


Option names may be abbreviated if the abbreviation is unique or is anexact match for an option. A parameter to an option may be specifiedas --arg=param or --arg param, though the first form isrequired for options that take optional parameters. 

General Configuration

These switches affect general configuration of pipeline. The firsttwo switches are required:
Give the path to the configuration file that specifies the filtersthat determine which flow records are of interest and theevaluations that signify when an alert is to be raised. Thisswitch is required.
Use the designated country code prefix mapping file instead of thedefault.
Read the SiLK site configuration from the named file FILENAME.When this switch is not provided, the location specified by theSILK_CONFIG_FILE environment variable is used if that variable is notempty. The value of SILK_CONFIG_FILE should include the name of thefile. Otherwise, the application looks for a file named silk.confin the following directories: the directories$SILK_PATH/share/silk/ and $SILK_PATH/share/; and theshare/silk/ and share/ directories parallel to the application'sdirectory.
pipeline comes with a public suffix file provided by Mozilla at: To provide pipeline with a different list, use this option to provide a file. The file must be formatted the same way as Mozilla's file. This is optional.
The number of integer minutes between pipeline logging statistics regarding records processed and memory usage. Setting this value to 0 turns off this feature. This is optional and the default value is 5 minutes.

Data Source Configuration Options

pipeline needs to know what general type of data it will be receiving,SiLK flows, YAF data, or raw IPFIX. If there are multiple data sources,a data source configuration file is required. If using a daemonconfig file, the data source configuration file variable is required.

If there is a single data source, the data source type can be specified on the command line. Depending on the type of data, there are differentavailable options for receiving data.

The records are SiLK flows. The data input method options are thesame as in past versions
        Pipeline will poll a direcory forever for new flow files
        The list of files pipeline will process are listed on
        the command line as the last group of arguments
The records are coming directly from a YAF sensor (or froman instance of super_mediator). The data input options are:
    --udp-port=NUMBER and --break-on-recs=NUMBER
        UDP socket to listen for YAF data on, and how many records
        to process before breaking and running evaluations. 
    --tcp-port=NUMBER and --break-on-recs=NUMBER
        TCP socket to listen for YAF data on, and how many records
        to process before breaking and running evaluations.
        Process YAF data files listed on the command line.
The records are raw IPFIX records, not coming directly from YAF. Thedata input options are:
    --udp-port=NUMBER and --break-on-recs=NUMBER
        UDP socket to listen for YAF data on, and how many records
         to process before breaking and running evaluations.
    --tcp-port=NUMBER and --break-on-recs=NUMBER
        TCP socket to listen for YAF data on, and how many records
        to process before breaking and running evaluations.
        Process YAF data files listed on the command line.
        Pipeline will poll a direcory forever for new flow files
The data source and input options are detailed in a configuration file.The sytnax for the file can be referenced by the Pipeline Handbook.

Timing Source Configuration Options

If the primary (or only) data source is SiLK, these options are not used.If it is a SiLK data source, flow end time is still used for timing source.

Otherwise, one of these options is required to provide a timing source.

Use the system clock time as the timing source
Use the provided field name as the timing source.
--time-field-ent=NUMBER and --time-field-id=NUMBER
These must be used together, as it takes an enterprise ID and an element ID todefine an information element. This element will be used as the timing source.
Use the timing source specified by the schema. If no timing source is specifiedby the schema(s) used, pipeline will report an error.
Versions 4.x only worked on SiLK files, which provided an easy way to knowwhen to stop processing/filtering records and run evaluations. When acceptinga stream of records from a socket, there is no break, so pipeline needs toknow how many records to process/filter before running evaluations. Use thisoption to tell pipeline how many records to process. This option is required for socket connections.

Alert Destination when Snarf is Available

When pipeline is built with support for snarf(<>), the following switch isavailable. Its use is optional.
Specify where pipeline is to send alerts. The ENDPOINT has theform "tcp://HOST:PORT", which specifies that a snarfdprocess is running on HOST at PORT. When this switch is notspecified, pipeline uses the value in the SNARF_ALERT_DESTINATIONenvironment variable. If that variable is not set, pipeline printsthe alerts locally, either to the log file (when running as a daemon),or to the standard output.

Alert Destination when Snarf is Unavailable

When pipeline is built without support for snarf, the followingswitches are available, and the --alert-log-file switch isrequired.
Specify the path to the file where pipeline will write the alertrecords. The full path to the log file must be specified.pipeline assumes that this file will be under control of thelogrotate(8) command.
Have pipeline provide additional information about an alert toFILE_PATH. When a record causes an alert, pipeline writes therecord in textual format to the alert-log-file. Often there isadditional information associated with an alert that cannot becaptured in a single record; this is especially true forstatistic-type alerts. The aux-alert-file is a location forpipeline to write that additional information. The FILE_PATHmust be an absolute path, and pipeline assumes that this file willbe under control of the logrotate(8) command.

Daemon Mode

The following switches are used when pipeline is run as a daemon.They may not be mixed with the switches related to Processing ExistingFiles described below. The first two switches are required, and atleast one switch related to logging is required.
Watch this directory for new SiLK Flow files that are to be processedby pipeline. pipeline ignores any files in this directory whosenames begin with a dot ("."). In addition, new files will only beconsidered when their size is constant for one polling-interval afterthey are first noticed.
Sets the interval in seconds for how often pipeline checks for new files if polling a direcory using --incoming-directory
Sets the amount of time in seconds pipeline will wait for a newfile when polling a directory using --incoming-directory
Listen on a UDP port for YAF or IPFIX records, not SiLK records.pipeline will reestablish this connection if the sender closes the socket, unless --do-not-reestablish is used.
Listen on a TCP port for YAF or IPFIX records, not SiLK records.pipeline will reestablish this connection if the sender closesthe socket, unless --do-not-reestablish is used.
Store in this directory SiLK files that were NOT successfullyprocessed by pipeline.

One of the following mutually-exclusive logging-related switches isrequired:

Specify the destination where logging messages are written. WhenDESTINATION begins with a slash "/", it is treated as a filesystem path and all log messages are written to that file; there is nolog rotation. When DESTINATION does not begin with "/", it mustbe one of the following strings:
Messages are not written anywhere.
Messages are written to the standard output.
Messages are written to the standard error.
Messages are written using the syslog(3) facility.
Messages are written to the syslog facility and to the standard error(this option is not available on all platforms).
Use DIR_PATH as the directory where the log files are written.DIR_PATH must be a complete directory path. The log files have theform


where YYYYMMDD is the current date and LOG_BASENAME is theapplication name or the value passed to the --log-basename switchwhen provided. The log files will be rotated: at midnight local timea new log will be opened and the previous day's log file will becompressed using gzip(1). (Old log files are not removed bypipeline; the administrator should use another tool to removethem.) When this switch is provided, a process-ID file (PID) willalso be written in this directory unless the --pidfile switch isprovided.

Use FILE_PATH as the complete path to the log file. The log filewill not be rotated.

The following switches are optional:

Move incoming SiLK Flow files that pipeline processes successfullyinto the directory DIR_PATH. DIR_PATH must be a completedirectory path. When this switch is not provided, the SiLK Flow filesare deleted once they have been successfully processed. When the--flat-archive switch is also provided, incoming files are movedinto the top of DIR_PATH; when --flat-archive is not given, eachfile is moved to a subdirectory based on the current local time:DIR_PATH/YEAR/MONTH/DAY/HOUR/. Removing files from thearchive-directory is not the job of pipeline; the systemadministrator should implement a separate process to clean thisdirectory.
When archiving incoming SiLK Flow files via the --archive-directoryswitch, move the files into the top of the archive-directory, not intosubdirectories of the archive-directory. This switch has no effect if--archive-directory is not also specified. This switch can be usedto allow another process to watch for new files appearing in thearchive-directory.
Configure pipeline to check the incoming directory for newfiles every NUM seconds. The default polling interval is 15seconds.
Set the severity of messages that will be logged. The levels frommost severe to least are: "emerg", "alert", "crit", "err","warning", "notice", "info", "debug". The default is "info".
Set the facility that syslog(3) uses for logging messages. Thisswitch takes a number as an argument. The default is a value thatcorresponds to "LOG_USER" on the system where pipeline isrunning. This switch produces an error unless--log-destination=syslog is specified.
Use LOG_BASENAME in place of the application name for the files inthe log directory. See the description of the --log-directoryswitch.
Set the complete path to the file in which pipeline writes itsprocess ID (PID) when it is running as a daemon. No PID file iswritten when --do-not-daemonize is given. When this switch is notpresent, no PID file is written unless the --log-directory switchis specified, in which case the PID is written toLOGPATH/
Force pipeline to stay in the foreground---it does not become adaemon. Useful for debugging.

Process Existing Files

Cause pipeline to run its analysis over a specific set of filesnamed on the command line. Once pipeline has processed thosefiles, it exits. This switch cannot be mixed with the Daemon Mode andLogging and Daemon Configuration switches described above. When usingfiles named on the command line, pipeline will not move or deletethe files.

Help Options

Verify that the syntax of the configuration file is correct andthen exit pipeline. If the file is incorrect or if it does notdefine any evaluations, an error message is printed and pipelineexits abnormally. If the file is correct, pipeline simply exitswith status 0.
Print the information elements available based on the schemas that arrive. Whenusing any data source other than SiLK flows, this feature requires data toarrive such that templates/schemas can be read and information elements madeavailable. This option will not verify your configuration file.
Print the information elements available based on the schemas that arrive, andverify the syntax of the configuration file. When using any data source other than SiLK flows, this feature requires data to arrive such that templates/schemas can be read and information elements made available.
Print the available options and exit.
Print the version number and information about how the SiLK libraryused by pipeline was configured, then exit the application.


This environment variable is used as the value for the--site-config-file when that switch is not provided.
This environment variable allows the user to specify the country codemapping file that pipeline will use. The value may be a completepath or a file relative to the SILK_PATH. If the variable is notspecified, the code looks for a file named country_codes.pmap inthe location specified by SILK_PATH.
This environment variable gives the root of the install tree. As partof its search for the SiLK site configuration file, pipeline checksfor a file named silk.conf in the directories$SILK_PATH/share/silk and $SILK_PATH/share. To find the countrycode prefix map file, pipeline checks those same directories for afile named country_codes.pmap.
When pipeline is built with snarf support (<>), this environment variablespecifies the location to send the alerts. The --snarf-destinationswitch has precedence over this variable.


silk(7), rwflowappend(8), rwflowpack(8), rwreceiver(8),rwsender(8), rwfilter(1), rwuniq(1), syslog(3),logrotate(8), <>,Analysis Pipeline Handbook, The SiLK Installation Handbook



Output Using Snarf
Legacy Output Not Using Snarf
Integrating pipeline into the SiLK Packing System
rwflowpack only
Non-daemon mode
General Configuration
Data Source Configuration Options
Timing Source Configuration Options
Alert Destination when Snarf is Available
Alert Destination when Snarf is Unavailable
Daemon Mode
Process Existing Files
Help Options

This document was created byman2html,using the manual pages.