Tuesday, May 18, 2010

Perl one-liner Add first line to all perl file

SkyHi @ Tuesday, May 18, 2010

Example perl one liners for command line use, a summary of important perl command line arguments, and how to convert between 1-liners and full Perl scripts. This page assumes the reader has a reasonable amount of Perl experience. Consult sites like learn.perl.org and Perl Monks to learn more about Perl, or visit the #perl channel on the Freenode IRC network. Also consider the book Minimal Perl by Tim Maher, which covers one liner style Perl in great detail.

The following examples require a Unix shell, such as zsh. Windows systems will need double quotes in place of single quotes.

perl one liners favor quick command line searching and editing. I recommend the practices outlined in Perl Best Practices when developing scripts or applications with Perl. See also Famous Perl One-Liners Explained for additional discussion of Perl one liners.

Learned something? Blog about it!

Perl Argument Overview

Arguments to perl can alter how Perl processes input. For a complete list of these invocation options, see perlrun.

  • -e specifies Perl expressions. More than one can be used, if needed. Other options should not follow this option.
  • $ perl -e 'print "Hello";' -e 'print " World\n"'
    Hello World

    Under Perl 5.10 and higher, the -E option enables various features:

    % perl -E 'say "Hello World"'
    Hello World

  • -p loops over and prints input.
  • -n loops over and does not print input.
  • -l strips newlines on input, and adds them on output. Use this option by default, unless the newlines need special handling, or for efficiency reasons.
  • Use the -ple or -nle option clusters, depending on whether input data should be printed by default or not. The expression 42 does nothing; these examples show the default behavior, and how to enable printing with a -nle.

    $ echo test | perl -ple 42
    test
    $ echo test | perl -nle 42
    $ echo test | perl -nle 'print'
    test

  • -i causes perl to operate on files in-place, and optionally also backs up the files via -i.bak or whatever. I strongly recommend previewing without the -i option before making permanent changes!
  • Never use the -ie '…' invocation, as the -i option reads the e as the backup filename suffix, not the -e as intended. Construct command lines with -e as the last argument before the expression to avoid these sorts of errors:

    # DOS to Unix text convert (example only, dos2unix much faster)
    $ perl -i -pe 's/\r//g' file

    # Legacy MacOS to Unix text convert
    $ perl -i -pe 's/\r/\n/g' file

    # Unix to DOS text convert (unix2dos much faster)
    $ perl -i -pe 's/\n/\r\n/' file

    I shun the backup filename extension to -i, such as -i.bak. Instead, I store data under version control, and thus can revert changes should a sandbox edit go awry. Version control also offers diff support to sanity check the changes made, and commits to log reasons with changes.

  • -a enables auto-split of input into the @F array.
  • Use perl -lane … when processing input into columns: easy to remember (data split into multiple lanes), and handles line breaks via the -l option. Arrays in Perl start at 0, not 1. Also note .. only handles positive ranges (1..2 not 2..1). Use reverse 1..2 to produce a negative trending series.

    $ echo a b c | perl -lane 'print $F[1]'
    b
    $ echo a b c | perl -lane 'print "@F[0..1]"'
    a b
    $ echo a b c | perl -lane 'print "@F[-2,-1]"'
    b c

    Consult perlvar to learn about @F and other special variables.

    Alternatives such as cut or awk may be more efficient for parsing delimited data. Perl functions like getpwent may be better suited to parsing /etc/passwd data.

  • -F specifies the characters to split on with the -a option. Like -i it takes an argument, so should be used apart from other option sets:
  • $ perl -F: -lane 'print $F[0] if !/^#/' /etc/passwd

  • -0 specifies the input record separator. More on this option later.
  • -M lets you load nifty modules such as File::Slurp or IO::All.
  • Life with CPAN covers methods to install perl modules.

  • -d enables debugging mode. For interactive debugging, run something like perl -d -e42, then enter exit when done. I prefer one liners or scripts to any interactive mode.
  • $ perl -d -e42

    Loading DB routines from perl5db.pl version 1.28
    Editor support available.

    Enter h or `h h' for help, or `man perldebug' for more help.

    main::(-e:1): 42
    DB<1> print "Hello World"
    Hello World
    DB<2> exit
    Debugged program terminated. Use q to quit or R to restart,
    use O inhibit_exit to avoid stopping after program termination,
    h q, h R or h O to get additional info.
    DB<3> q

Example One Liners

Experiment with these to practice the expressions; otherwise, make backups in the event an expression runs amok. The examples assume a Unix Bourne compatible shell (such as zsh); other command lines may require altering the quotes around the Perl code (double quotes for Windows), or changes to support C-like shells (csh, tcsh). For more information on shell commands, see my shell tips page.

Current Filename

The special $ARGV variable holds the current filename, or - when data arrives via the standard input filehandle. See perlvar for more information on special variables like $ARGV. As -nle or -ple run code for each line of input, a special block (such as BEGIN or END) or lookup hash must be used if the filename must only be printed once.

$ wc -l /etc/passwd
36 /etc/passwd
$ perl -nle 'END { print $ARGV }' /etc/passwd
/etc/passwd

$ echo test | perl -nle 'print $ARGV'
-
$ (echo test; echo test2) | perl -nle 'print $ARGV'
-
-

$ perl -nle 'print $ARGV if !$seen{$ARGV}++' /etc/passwd /etc/shells
/etc/passwd
/etc/shells

The filename can be used when sending output to a new command that needs the original filename.

  • Include filename in output
  • If looking for data in multiple files, prefix the output with the filename, so the matches can be linked back to the source file. This example searches for unquoted Perl heredoc expressions (<<EOF instead of a more readable <<"END_USAGE"):

    $ perl -nle 'print "$ARGV:$_" if m/<<\s*[A-Z]/' `find . -type f`

    To then edit the matching files with vi, use:

    $ vi `perl -nle 'print $ARGV if m/<<\s*[A-Z]/' `find . -type f``

  • Sendmail Logs
  • The following example matches Sendmail queue strings followed by from=<>, and prints out the filename and queue identifier. The subsequent shell while loop searches for the queue strings in the original file with grep.

    $ perl -nle 'print "$ARGV $1" if /: (\w{14}): from=<>/' /var/log/maillog* \
    | while read filename queueid; do grep $queueid $filename; done

Strip out lines

Use perl -nle 'print if ! …' to say “print, except for the following cases.” Practical uses include omitting lines matching a regular expression, or removing the first line from a file. For more information on regular expressions in Perl, see perlretut and perlreref. Lookup $. in perlvar. Operator precedence may require the use of unless instead of if ! or parenthesized expressions. See perlop for details.

$ (echo a; echo b) | perl -nle 'print if !/b/'
a
$ (echo a; echo b) | perl -nle 'print unless $. == 1'
b

A warning about the special line number variable $. and multiple files: the eof function must be used to reset $. for each new file, as otherwise the line count increases across the files. The following examples demonstrate this behavior by looping over the input file twice; note the use of close ARGV if eof in the second case.

$ cat input
foo
bar
zot
$ perl -nle 'print $.' input input
1
2
3
4
5
6
$ perl -nle 'print $.; close ARGV if eof' input input
1
2
3
1
2
3

Add a line to a file

Appending data to existing files is easy. So is inserting data into arbitrary locations in a file, such as prepending a new first line to a set of files. In the following case, #!/usr/bin/perl will be added as the first line of all *.pl files in the current directory.

$ perl -i -ple 'print q{#!/usr/bin/perl} if $. == 1; close ARGV if eof' *.pl

mod:

$ perl -i -ple "print q{This's rocks!} if $. == 1; close ARGV if eof" *.pl

If a recursive replace is needed, either investigate the use of the modules File::Find or IO::All, or list all the files via a Unix shell command. If filenames contain spaces, use find -print0 and xargs -0 to avoid filenames being misinterpreted by the shell.

$ perl -i -ple 'print q{#!/usr/bin/perl} if $. == 1; close ARGV if eof' \
`find . -type f -name "*.pl"`


$ find . -type f -name "*.pl" -print0 | \
xargs -0 perl -i -ple 'print q{#!/usr/bin/perl} if $. == 1; close ARGV if eof'

The following trick shows how to replace the second line of a file with some text, but only if that line is blank.

$ perl -ple '$_ = "some text" if $. == 2 and m/^$/; close ARGV if eof'

To pipe null delimited data to perl without using xargs -0, supply no argument to the -0 option to perl:

$ find . -type f -print0 | perl -0 -ne 'print "$_\n"'

To alter the last line of a file with an in-place edit, use the eof function as a test:

$ (echo one; echo two) > test
$ perl -i -ple 'tr/a-z/A-Z/ if eof' test
$ cat test
one
TWO

Home on the range

To match or skip blocks of text, use the .. operator. perlop details this operater. This example prints lines, unless blank:

$ cat input
foo



bar
$ perl -ne 'print unless /^$/../^$/' input
foo
bar

The unless statement is equivalent to if not, but is different from if ! due to the associativity and precedence rules covered in perlop. A benefit of this behavior allows the reduction of runs of blank lines to a single blank line.

$ perl -ne 'print if ! /^$/../^$/' input
foo

bar

Line numbers can also be used with the range operator, for instance to remove the first four lines of a file.

$ perl -nle 'print unless 1 .. 4' input
bar

To match a single line with the range operator, use 5..5.

Altering record parsing

Perl uses the -0 option to allow changing the input record separator. Use -00 to operate in paragraph mode, and -0777 to treat the file as a single line. The paragraphs file contains the -0 documentation from perlrun, used in the following example:

$ perl -00 -ne 'print if /special/' paragraphs
The special value 00 will cause Perl to slurp files in paragraph
mode. The value 0777 will cause Perl to slurp files whole because
there is no legal byte with that value.

Parsing the entire input file as a single line can be used to alter the newlines that otherwise require a range operator to deal with, as shown above. By treating an entire file as a single line, a s///g expression can eliminate runs of blank lines:

$ cat input
foo



bar
$ perl -0777 -pe 's/\n+/\n/g' input
foo
bar

Match some data with Backreferences

Use backreferences to extract matching data. If matching a single expression, such as words from the paragraphs file, use a for loop to print them all:

$ perl -nle 'print for m/\b(\S+)\b/g' paragraphs

A while loop must be used when making multiple backreferences: find the matches, then use the $1 and $2 variables to print the results. Another contrived example: find the words on either side of all the the in a file.

$ perl -nle 'print for m/(\S+)\s+the\s+(\S+)/g' paragraphs
specifies
input
digits,
null
follow
digits.
by
null
use
hexadecimal
where
"H"
use
"-x"
$ perl -nle 'while(m/(\S+)\s+the\s+(\S+)/g){print "$1 $2"}' paragraphs
specifies input
digits, null
follow digits.
by null
use hexadecimal
where "H"
use "-x"

Custom Quoting

Shell quoting may cause problems when writing expressions on the command line. On Unix, wrap Perl expressions in single quotes to prevent unwanted shell interpolation. To use a literal single quote inside a single quoted string, the awkward '\'' syntax ends the single quoted string, include a literal quote, then restart the quoted string:

$ perl -le 'print "'\'' is a single quote"'
' is a single quote

Alternative: use an octal code instead; see ascii(1) for a dictionary of ASCII to octal values.

$ perl -le 'print "\047 is a single quote"'
' is a single quote

The od -bc command will display the octal codes for any data passed to it:

$ perl -le 'print "\047"' | od -bc
0000000 047 012
' \n
0000002
$ perl -le 'print chr for 1..250' | od -bc

Perl also allows different quoting operators, see the “Quote and Quote-like Operators” section under perlop for more information on these.

$ perl -le 'print q{single quoted: $$} . qq{ interpolated: $$}'
single quoted: $$ interpolated: 11506

Output to Multiple Files

To split output among multiple files, change where standard output points at based on some test. For example, the following will split a Unix mail file inbox (in mbox format) into multiple files named filename.*, incrementing a number for each message in the mailbox.

$ perl -pe 'BEGIN { $n=1 } open STDOUT, ">$ARGV.$n" and $n++ if /^From /' inbox

Recursive File Mangling

Either the shell or Perl modules can be used to alter files in subdirectories. Perl modules to use include File::Find or File::Find::Rule, among others. Relevant shell commands on Unix include find(1) and xargs(1). Replace the echo in the examples below with the perl command to run on the files. The -print0 argument to find and xargs -0 will work even if files have spaces in their name, unlike the first case.

$ echo `find . -type f`
$ find . -type f -print0 | xargs -0 echo

Fun with @INC

Search under @INC (see perlvar for more information on this array) to find installed Perl modules. This example will search most of @INC for any module names beginning with Config. The shell backticks collapse the directory list into proper arguments for find(1), and the Perl grep excludes non-existent directories and the current directory from the search.

$ find `perl -le 'print for grep {$_ ne q{.}and -d} @INC'` -name "Config*"

Make use of the related %INC hash to find the location of loaded modules on the underlying filesystem:

$ perl -MCPAN -le 'print $INC{"CPAN.pm"}'
/System/Library/Perl/5.8.6/CPAN.pm

Downloading YouTube Videos

Peteris Krumins posted an excellent discussion on downloading YouTube videos with a Perl one-liner.

Converting One Liners

One liners may be used as quick example code, or could be found in someone’s shell history. This may not be the ideal form for commonly used commands. The following section demonstrates how to convert one liners into Perl scripts.

  • Newline handling
  • The -l command line option can easily be ported, simply list it on the shebang line.

    #!/usr/bin/perl -w -l
    use strict;

  • Loop over input (-pe or -ne)
  • Printing loops can be replaced with a while block that prints by default. For a non-printing loop, remove the print statement.

    #!/usr/bin/perl -w -l
    use strict;

    while (<>) {
    # code from -e expressions here

    print;
    } continue {
    close ARGV if eof;
    }

    Special BEGIN or END blocks can be copied in directly, or placed before and after the while loop in the main namespace.

  • In Place Editing
  • To convert the -i in-place option, use the $^I variable (perldoc perlvar), and ensure the files to be processed are in @ARGV before looping over <>.

    # trick to expand globs in input for systems with poor shells (Windows)
    local @ARGV = map glob, @ARGV;

    local $^I = '.orig';

    while (<>) {
    # code here

    print;
    } continue { close ARGV if eof }

    Consider also the File::AtomicWrite module to help with atomic file writes.

END without END

-ple '$c++; END { print $c }' can also be written as -ple '$c++ }{ print $c'.


REFERENCES

http://sial.org/howto/perl/one-liner/