A Study in Bash Shell : Intermediate Shell Tools
Sifting Through Files for a String
ubuntu@ip-172-31-24-188:~$ grep echo *
grep: git: Is a directory
setup.sh: echo "Please start this script with root privileges!"
setup.sh: echo "Try again with sudo."
setup.sh: echo "This script was designed to run on Ubuntu 14.04 Trusty!"
setup.sh: echo "Do you wish to continue anyway?"
setup.sh: * ) echo "Please answer with Yes or No [y|n].";;
setup.sh: echo ""
Getting Just the Filename from a Search
ubuntu@ip-172-31-24-188:~$ grep -l echo *
grep: git: Is a directory
setup.sh
Getting a Simple True/False from a Search
$ grep -q findme bigdata.file
$ if [ $? -eq 0 ] ; then echo yes ; else echo nope ; fi
nope
Searching for Text While Ignoring Case
$ grep -i error logfile.msgs
Doing a Search in a Pipeline
$ some pipeline | of commands | grep
Paring Down What the Search Finds
$ grep -i dec logfile
but you find that you also get phrases like these:
...
error on Jan 01: not a decimal number
error on Feb 13: base converted to Decimal
warning on Mar 22: using only decimal numbers
error on Dec 16 : the actual message you wanted
error on Jan 01: not a decimal number
...
A quick and dirty solution in this case is to pipe the first result into a second grep and tell the second grep to ignore any instances of “decimal”:
$ grep -i dec logfile | grep -vi decimal
Searching for an SSN
123-45-6789
$ grep '[0-9]\{3\}-\{0,1\}[0-9]\{2\}-\{0,1\}[0-9]\{4\}' datafile
Keeping Some Output, Discarding the Rest
yang@ubuntu:~$ awk '{print $1}' one.pl
sub
...
$ awk '{print $1}' < myinput.file
or even from a pipe, like this:
$ cat myinput.file | awk '{print $1}
Keeping Only a Portion of a Line of Output
yang@ubuntu:~$ ls -l | awk '{print $1, $NF}'
total 12
drwxrwxr-x codesearch
-rw-rw-r-- one.pl
drwxrwxr-x web
Reversing the Words on Each Line
yang@ubuntu:~$ awk '{for (i=NF; i>0; i--) {printf "%s ", $i;} printf "\n" }' < one.pl
sub
{
'sb\n'; print
}
Summing a List of Numbers
yang@ubuntu:~$ ls -l | awk '{sum += $2} END {print sum}'
24
Counting String Values
#
# cookbook filename: asar.awk
#
NF > 7 {
user[$3]++
}
END {
for (i in user)
{
printf "%s owns %d files\n", i, user[i]
}
}
$ ls -lR /usr/local | awk -f asar.awk
bin owns 68 files
albing owns 1801 files
root owns 13755 files
man owns 11491 files
Showing Data As a Quick and Easy Histogram
#
# cookbook filename: hist.awk
#
function max(arr, big)
{
big = 0;
for (i in user)
{
if (user[i] > big) { big=user[i];}
}
return big
}
NF > 7 {
user[$3]++
}
END {
# for scaling
maxm = max(user);
for (i in user)
{
#printf "%s owns %d files\n", i, user[i]
scaled = 60 * user[i] / maxm ;
printf "%-10.10s [%8d]:", i, user[i]
for (i=0; i<scaled; i++) {
printf "#";
}
printf "\n";
}
}
$ ls -lR /usr/local | awk -f hist.awk
bin [ 68]:#
albing [ 1801]:#######
root [ 13755]:##################################################
man [ 11491]:##########################################
Sorting Your Output
$ sort file1.txt file2.txt myotherfile.xyz
Sorting Numbers
$ sort -n somedata
Sorting IP Addresses
$ sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n ipaddr.list
Cutting Out Parts of Your Output
yang@ubuntu:~/web/tornado$ ps -l
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 1000 5014 5013 0 80 0 - 5512 wait pts/2 00:00:00 bash
0 R 1000 5915 5014 0 80 0 - 2185 - pts/2 00:00:00 ps
yang@ubuntu:~/web/tornado$ ps -l | cut -c12-19
PID PP
5014 50
5916 50
5917 50
Removing Duplicate Lines
yang@ubuntu:~$ sort -u one.pl
{
}
print 'sb\n';
sub main
sub sb
Compressing Files
The universally accepted Unix or Linux format would be a tarball.tar.gz created like
this:
$ tar cf tarball_name.tar directory_of_files
$ gzip tarball_name.tar
If you have GNU tar, you could use -Z for compress (don’t, this is obsolete), -z for
gzip (safest), or -j for bzip2 (highest compression). Don’t forget to use an appropriate filename, this is not automatic.
$ tar czf tarball_name.tgz directory_of_files
Uncompressing Files
File extension | Command |
---|---|
.tar | tar tf -list, tar xf -extract |
.tar.gz, .tgz | tar tzf -list, tar xzf -extract |
.tar.bz2 | tar tjf -list, tar xjf -extract |
.tar.Z | tar tZf -list, tar xZf -extract |
.zip | unzip |
Checking a tar Archive for Unique Directories
$ tar tf some.tar | awk -F/ '{print $1}' | sort -u
Translating Characters
$ tr ';' ',' <be.fore >af.ter
Converting Uppercase to Lowercase
$ tr 'A-Z' 'a-z' <be.fore >af.ter
Converting DOS Files to Linux Format
Use the -d option on tr to delete the character(s) in the supplied list. For example, to delete all DOS carriage returns (\r), use the command:
$ tr -d '\r' <file.dos >file.txt
Counting Lines, Words, or Characters in a File
[yuyang@mnsdev14:~/temp]$wc a.outbash
9 9 100 a.outbash
[yuyang@mnsdev14:~/temp]$wc -l a.outbash
9 a.outbash
[yuyang@mnsdev14:~/temp]$wc -w a.outbash
9 a.outbash
[yuyang@mnsdev14:~/temp]$wc -c a.outbash
100 a.outbash