[tcpdump-workers] endianness of portable BPF bytecode (DRAFT revision 3)

Denis Ovsienko denis at ovsienko.info
Sat Jun 25 08:01:52 EDT 2022


Hello list.

Below you can find the next draft revision.  It incorporates some
feedback received from Guy and Michael, also it reorders the TLVs and
adds SnapLen TLV, Netmask TLV and EOF TLV.  Also the text has been
converted to a man page, so it could live next to pcap-savefile(5) when
it is ready enough.

One thing that does not look quite right to me in this revision is
that, for example, LinkTypeValue TLV could be not an optional TLV, but a
part of the fixed header because DLT is a meaningful bit of information
and could (and possibly should) be checked against the DLT where the
bytecode is applied. (This is exactly what
https://github.com/the-tcpdump-group/libpcap/issues/211 does.)

tc-bpf(8) on Linux says: "Since libpcap does not support all Linux'
specific cBPF extensions in its compiler..."

If that's true, then the header would need another field to indicate
the cBPF dialect, so if anybody/anything was to validate the bytecode,
the result would always be conclusive.  Could anybody tell an example
of such cBPF differences if those indeed exist?

If you see an issue with terminology or style, let me know.

----------------------------------------------------------------------
CBPF-SAVEFILE(5)          File Formats Manual         CBPF-SAVEFILE(5)

NAME
       cbpf-savefile - cBPF savefile format (DRAFT revision 3)

DESCRIPTION
       This  man  page  discusses a file format for cBPF, which is the
       "classic" (and for a long time the only) Berkeley  Packet  Fil-
       ter.  It does NOT apply to the newer eBPF variety of BPF.

       The main purpose of this file format is to store cBPF bytecode,
       most commonly compiled from a BPF filter expression using libp-
       cap.   Besides  that, the format allows to encode some informa-
       tion about the context in which the compilation was done.  This
       meta-data can make it easier to reproduce the compilation later
       if required.

       In the following specification integer  fields  are  big-endian
       unsigned,  String  fields do not use NUL character for termina-
       tion or padding.

FILE FORMAT
       A cBPF savefile consists of a fixed-size header and a variable-
       size body as follows:

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |      0xA1     |      0xB2     |     0xC3      |     0xCB      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |      'c'      |      'B'      |     'P'       |     'F'       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |    MajorVer   |    MinorVer   |       InstructionCount=n      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                         instruction 1                         |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                         instruction 2                         |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       ~                                                               ~
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                         instruction n                         |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                   optional trailing TLV space                 |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       The first four bytes contain a fixed signature, also known as a
       magic number, to make it easy to identify the file  type  auto-
       matically.  The next four bytes contain the ASCII string "cBPF"
       to provide a hint for manual identification.

       MajorVer and MinorVer contain the major and the  minor  version
       numbers of this format respectively.  The current major version
       is 1 and the current minor version is 0.  Format  changes  that
       do not impact compatibility (e.g., new TLV types) increment the
       minor version only.  Other format changes increment  the  major
       version and reset the minor version to 0.

       InstructionCount is the last field of the fixed header, it con-
       tains the number of bytecode instructions following the header.
       By  convention, valid BPF bytecode must consist of at least one
       instruction, so in a valid cBPF savefile this field value is at
       least 1.

       The  file  format  thus far minimizes the overhead for software
       that only writes or reads cBPF bytecode.  If there is any  data
       after the last instruction, it is the trailing TLV space, which
       mostly contains meta-data for human  interpretation.   It  con-
       tains TLVs in the format specified below.

INSTRUCTION FORMAT
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |             opcode            |       jt      |       jf      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                               k                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       This  is  the traditional encoding of a cBPF instruction.  Note
       that usually its endianness depends on the machine, but in this
       format it is fixed.

TLV FORMAT
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Type             |            Length=m           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       ~                          Value (m bytes)                      ~
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       All  TLVs  are optional.  Every TLV may appear in the same cBPF
       savefile at most once.  Length value does not include Type  and
       Length.   Code  points  for Type and the associated Length con-
       straints are defined below.

   EOF TLV
       Allows to mark the end of TLV space (hence of the file) explic-
       itly  to make it clear that the file is not truncated.  If this
       TLV is present in the TLV space, it may appear the last only.

       Type is 0, Length is 0, Value is empty.

   LinkTypeValue TLV
       Allows to record the link-layer header type value used for  the
       compilation, usually this is either the linktype input argument
       to  pcap_open_dead(3PCAP)  or  the  dlt   input   argument   to
       pcap_set_datalink(3PCAP).  By convention link-layer header type
       values are limited to 16 bits.

       Type is 1, Length is 2, Value contains an integer.

   LinkTypeName TLV
       Allows     to     record     the     input     argument      to
       pcap_datalink_name_to_val(3PCAP)  if  the  latter  was  used to
       translate a DLT name into the DLT  value  (the  same  name  can
       sometimes produce different values in different contexts).

       Type is 2, Length is variable, Value contains an ASCII string.

   SnapLen TLV
       Allows  to record the snapshot length used for the compilation,
       usually this is the snaplen input argument to  pcap_open_dead()
       or pcap_set_snaplen(3PCAP).

       Type is 3, Length is 4, Value contains an integer.

   Filter TLV
       Allows  to  record the filter expression that was compiled into
       the bytecode,  usually  this  is  the  str  input  argument  to
       pcap_compile(3PCAP).

       Type is 4, Length is variable, Value contains an ASCII string.

   OptReq TLV
       Allows  to  record  whether  optimization was requested for the
       compilation or not, usually this is the optimize input argument
       to  pcap_compile().  Note that some link-layer header types and
       filter keywords disable the optimization automatically in libp-
       cap.

       Type is 5, Length is 1, Value contains 1 or 0.

   Netmask TLV
       Allows  to  record  the  value  of  netmask  input  argument to
       pcap_compile().

       Type is 6, Length is 4, Value contains a 32-bit IPv4 netmask.

   Comment TLV
       Allows to record a free-form text, for example,  the  name  and
       version of the program that generated the file.

       Type is 7, Length is variable, Value contains a UTF-8 string.

   Timestamp TLV
       Allows to record when the compilation was performed.

       Type is 8, Length is 8, Value contains a 64-bit Unix timestamp.

SOFTWARE SUPPORT
       None at the time of this writing.

SEE ALSO
       pcap-savefile(5)

                             24 June 2022             CBPF-SAVEFILE(5)

-- 
    Denis Ovsienko


More information about the tcpdump-workers mailing list