A Study in Bash Shell : Intermediate Shell Tools

Sifting Through Files for a String

ubuntu@ip-172-31-24-188:~$ grep echo *
grep: git: Is a directory
setup.sh:  echo "Please start this script with root privileges!"
setup.sh:  echo "Try again with sudo."
setup.sh:  echo "This script was designed to run on Ubuntu 14.04 Trusty!"
setup.sh:  echo "Do you wish to continue anyway?"
setup.sh:      * ) echo "Please answer with Yes or No [y|n].";;
setup.sh:  echo ""

Getting Just the Filename from a Search

ubuntu@ip-172-31-24-188:~$ grep -l echo *
grep: git: Is a directory
setup.sh

Getting a Simple True/False from a Search

$ grep -q findme bigdata.file
$ if [ $? -eq 0 ] ; then echo yes ; else echo nope ; fi
nope

Searching for Text While Ignoring Case

$ grep -i error logfile.msgs

Doing a Search in a Pipeline

$ some pipeline | of commands | grep

Paring Down What the Search Finds

$ grep -i dec logfile

but you find that you also get phrases like these:

...
error on Jan 01: not a decimal number   
error on Feb 13: base converted to Decimal
warning on Mar 22: using only decimal numbers
error on Dec 16 : the actual message you wanted
error on Jan 01: not a decimal number
...

A quick and dirty solution in this case is to pipe the first result into a second grep and tell the second grep to ignore any instances of “decimal”:

$ grep -i dec logfile | grep -vi decimal

Searching for an SSN

123-45-6789

$ grep '[0-9]\{3\}-\{0,1\}[0-9]\{2\}-\{0,1\}[0-9]\{4\}' datafile

Keeping Some Output, Discarding the Rest

yang@ubuntu:~$ awk '{print $1}'  one.pl
sub
...
$ awk '{print $1}' < myinput.file  
or even from a pipe, like this:   
$ cat myinput.file | awk '{print $1}

Keeping Only a Portion of a Line of Output

yang@ubuntu:~$ ls -l | awk '{print $1, $NF}'
total 12
drwxrwxr-x codesearch
-rw-rw-r-- one.pl
drwxrwxr-x web

Reversing the Words on Each Line

yang@ubuntu:~$ awk '{for (i=NF; i>0; i--) {printf "%s ", $i;} printf "\n" }' < one.pl
sub 
{ 
'sb\n'; print 
}

Summing a List of Numbers

yang@ubuntu:~$ ls -l | awk '{sum += $2} END {print sum}'
24

Counting String Values

#
# cookbook filename: asar.awk
#
NF > 7 {
user[$3]++
}
END {
for (i in user)
{
printf "%s owns %d files\n", i, user[i]
}
}

$ ls -lR /usr/local | awk -f asar.awk
bin owns 68 files
albing owns 1801 files
root owns 13755 files
man owns 11491 files

Showing Data As a Quick and Easy Histogram

#
# cookbook filename: hist.awk
#
function max(arr, big)
{
big = 0;
for (i in user)
{
if (user[i] > big) { big=user[i];}
}
return big
}
NF > 7 {
user[$3]++
}
END {
# for scaling
maxm = max(user);
for (i in user)
{
#printf "%s owns %d files\n", i, user[i]
scaled = 60 * user[i] / maxm ;
printf "%-10.10s [%8d]:", i, user[i]
for (i=0; i<scaled; i++) {
printf "#";
}
printf "\n";
}
}

$ ls -lR /usr/local | awk -f hist.awk
bin [ 68]:#
albing [ 1801]:#######
root [ 13755]:##################################################
man [ 11491]:##########################################

Sorting Your Output

$ sort file1.txt file2.txt myotherfile.xyz

Sorting Numbers

$ sort -n somedata

Sorting IP Addresses

$ sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n ipaddr.list

Cutting Out Parts of Your Output

yang@ubuntu:~/web/tornado$ ps -l
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
0 S  1000  5014  5013  0  80   0 -  5512 wait   pts/2    00:00:00 bash
0 R  1000  5915  5014  0  80   0 -  2185 -      pts/2    00:00:00 ps
yang@ubuntu:~/web/tornado$ ps -l | cut -c12-19
 PID  PP
5014  50
5916  50
5917  50

Removing Duplicate Lines

yang@ubuntu:~$ sort -u one.pl 

{
}
 print 'sb\n';
sub main
sub sb

Compressing Files

The universally accepted Unix or Linux format would be a tarball.tar.gz created like
this:

$ tar cf tarball_name.tar directory_of_files
$ gzip tarball_name.tar

If you have GNU tar, you could use -Z for compress (don’t, this is obsolete), -z for
gzip (safest), or -j for bzip2 (highest compression). Don’t forget to use an appropriate filename, this is not automatic.

$ tar czf tarball_name.tgz directory_of_files

Uncompressing Files

File extension	Command
.tar	tar tf -list, tar xf -extract
.tar.gz, .tgz	tar tzf -list, tar xzf -extract
.tar.bz2	tar tjf -list, tar xjf -extract
.tar.Z	tar tZf -list, tar xZf -extract
.zip	unzip

Checking a tar Archive for Unique Directories

$ tar tf some.tar | awk -F/ '{print $1}' | sort -u

Translating Characters

$ tr ';' ',' <be.fore >af.ter

Converting Uppercase to Lowercase

$ tr 'A-Z' 'a-z' <be.fore >af.ter

Converting DOS Files to Linux Format

Use the -d option on tr to delete the character(s) in the supplied list. For example, to delete all DOS carriage returns (\r), use the command:

$ tr -d '\r' <file.dos >file.txt

Counting Lines, Words, or Characters in a File

[yuyang@mnsdev14:~/temp]$wc a.outbash
       9       9     100 a.outbash
[yuyang@mnsdev14:~/temp]$wc -l a.outbash
       9 a.outbash
[yuyang@mnsdev14:~/temp]$wc -w a.outbash   
       9 a.outbash
[yuyang@mnsdev14:~/temp]$wc -c a.outbash 
     100 a.outbash