XSV(1)


Table of Contents

1. NAME
2. SYNOPSIS
3. DESCRIPTION
4. FORMATS
5. FORMAT PARAMTERS
6. OTHER OPTIONS
7. SELECTION OF FIELDS
8. EXAMPLES
9. CAVEATS
10. LICENSE

1. NAME

xsv - manipulate CSV-like text files

2. SYNOPSIS

xsv input-format [output-format] options [fields]

3. DESCRIPTION

xsv is a utility for manipulating text files, whose lines are divided into fields. This includes popular formats for textual databases like CSV (Comma-Separated Values), TSV (Tab-Separated Values), formats with other separators (like /etc/passwd), and many other formats.

xsv reads lines from the standard input. Each line is split into fields according to a specified file format. The fields are then written to the standard output in a possibly different format. Additionally, the fields can be re-ordered, trimmed, and otherwise manipulated.

4. FORMATS

If a single file format is selected, it is used for both the input and the output. If two file formats are given, the former applies to the input, the latter to the output. If no format is given, --tsv is assumed.

-t, --tsv
Tab-separated values, or more generally fields separated by a single occurrence of a delimiter character. By default, the delimiter is the TAB character, but it can be changed by the -d option.
-c, --csv
Comma-separated values — the traditional CSV format as defined in RFC 4180. Fields are separated by a single comma. When a field contains a comma, it is enclosed in double quotes. When it contains double quotes, they are repeated. The only deviations from the RFC are that we do not put a CR at the end of a line (although we accept it on the input) and that each line can have a different number of fields.
-w, --ws
The fields are separated by arbitrary sequence of whitespace characters (spaces, tabs and form-feeds). Leading or trailing whitespace is interpreted as an empty field (this can be overridden by --sloppy). When used for output, exactly one space is used.
-r, --regex=regex
The fields are separated by sequences of characters satisfying the given Perl-compatible regular expression (see pcrepattern(3) for a full description of their syntax). For example, --regex='#+' separates fields by an arbitrary number of hashes. Leading or trailing separators are interpreted as empty fields (this can be overridden by --sloppy). This format can be used only for input.
--table
An output-only format, which displays the data in form of a table. Data in each column are justified to the width of the longest item. With --grid, an ASCII-art grid is added. Please note that this requires two passes over the data, so pre-formatted data are stored in a temporary file.

5. FORMAT PARAMTERS

Each format option can be followed by parameters specific to that format:

-d, --fs=character
Use the specified character as a field separator (delimiter). Applies to --csv and --tsv.
-f, --fields=name,name,
Assign names to fields. The names can be then used to refer to fields instead of numbers.
-h, --header
The file starts with a header line, which contains field names. It can be combined with --fields, if you want to override the names.
-q, --quiet
By default, xsv prints warnings when something suspicious happens (e.g., an unterminated quote in CSV, or when we attempt to print a field, which contains the separator character). If the warnings are too noisy, use --quiet to silence them.
--always-quote
When writing CSV files, quote all fields, even if it is not needed.
--table-sep=n
Separate table columns by n spaces. When not given, two spaces are used. Applies to --table only.
--grid
Decorate the table by an ASCII-art grid of vertical lines. The lines sit in the middle of inter-column spaces. Applies to --table only.
-s, --sloppy
Ignore separators at the beginning or at the end of a line. Otherwise, they are interpreted as empty fields. Applies to --ws and --regex.

6. OTHER OPTIONS

There are several options, which do not apply to the file format. Instead, they specify how the data should be transformed between the input and the output.

--trim
Delete leading and trailing spaces in each field.
--equalize
When different lines contain a different number of fields, pad the short ones with empty fields. Please note that this requires two passes over the data, possibly storing the data to a temporary file in between.

7. SELECTION OF FIELDS

By default, xsv copies all fields from the input to the output. Instead of that, you can specify a list of fields or field ranges to copy. Unlike cut(1), the fields are copied in the given order.

A field can be identified by its number (starting with 1), or by its name when --fields or --header is given. A field range has the form field-field; either field can be omitted, which refers to the first/last field of the line.

8. EXAMPLES

xsv </etc/passwd -d: 3 1 prints the UID and the login name of each user.

xsv </etc/passwd -d: --tsv 3 1 prints the same, but separated by a tabulator.

xsv </etc/passwd -d: -flogin,passwd,uid,gid,full --tsv uid login does the same using column names.

xsv --csv --table --grid formats a given CSV file as a nice table.

9. CAVEATS

In most cases, xsv does not assume anything about the character set — the files are treated as sequences of bytes. The only exception is formatting of tables, which needs to account for on-screen space. In this case, the character set specified by the system locale is assumed. However, all characters are considered of the same width, including potential combining Unicode characters.

10. LICENSE

xsv was written by Martin Mares <mj@ucw.cz>. It can be distributed and used under the terms of the GNU General Public License version 2.