12 Useful Commands For Filtering Text for Effective File Operations in Linux

In this article, we will review a number of command line tools that act as filters in Linux. A filter is a program that reads standard input, performs an operation upon it and writes the results to standard output.

For this reason, it can be used to process information in powerful ways such as restructuring output to generate useful reports, modifying text in files and many other system administration tasks.

With that said, below are some of the useful file or text filters in Linux.

1. Awk Command

Awk is a remarkable pattern scanning and processing language, it can be used to build useful filters in Linux. You can start using it by reading through our Awk series Part 1 to Part 13.

Additionally, also read through the awk man page for more info and usage options:

$ man awk

2. Sed Command

sed is a powerful stream editor for filtering and transforming text. We’ve already written a two useful articles on sed, that you can go through it here:

The sed man page has added control options and instructions:

$ man sed

3. Grep, Egrep, Fgrep, Rgrep Commands

These filters output lines matching a given pattern. They read lines from a file or standard input, and print all matching lines by default to standard output.

Note: The main program is grep, the variations are simply the same as using specific grep options as below (and they are still being used for backward compatibility):

$ egrep = grep -E
$ fgrep = grep -F
$ rgrep = grep -r

Below are some basic grep commands:

tecmint@TecMint ~ $ grep "aaronkilik" /etc/passwd
aaronkilik:x:1001:1001::/home/aaronkilik:

tecmint@TecMint ~ $ cat /etc/passwd | grep "aronkilik"
aaronkilik:x:1001:1001::/home/aaronkilik:

4. head Command

head is used to display the first parts of a file, it outputs the first 10 lines by default. You can use the -n num flag to specify the number of lines to be displayed:

tecmint@TecMint ~ $ head /var/log/auth.log  
Jan  2 10:45:01 TecMint CRON[3383]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan  2 10:45:01 TecMint CRON[3383]: pam_unix(cron:session): session closed for user root
Jan  2 10:51:34 TecMint sudo:  tecmint : TTY=unknown ; PWD=/home/tecmint ; USER=root ; COMMAND=/usr/lib/linuxmint/mintUpdate/checkAPT.py
Jan  2 10:51:34 TecMint sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Jan  2 10:51:39 TecMint sudo: pam_unix(sudo:session): session closed for user root
Jan  2 10:55:01 TecMint CRON[4099]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan  2 10:55:01 TecMint CRON[4099]: pam_unix(cron:session): session closed for user root
Jan  2 11:05:01 TecMint CRON[4138]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan  2 11:05:01 TecMint CRON[4138]: pam_unix(cron:session): session closed for user root
Jan  2 11:09:01 TecMint CRON[4146]: pam_unix(cron:session): session opened for user root by (uid=0)

tecmint@TecMint ~ $ head  -n 5 /var/log/auth.log  
Jan  2 10:45:01 TecMint CRON[3383]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan  2 10:45:01 TecMint CRON[3383]: pam_unix(cron:session): session closed for user root
Jan  2 10:51:34 TecMint sudo:  tecmint : TTY=unknown ; PWD=/home/tecmint ; USER=root ; COMMAND=/usr/lib/linuxmint/mintUpdate/checkAPT.py
Jan  2 10:51:34 TecMint sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Jan  2 10:51:39 TecMint sudo: pam_unix(sudo:session): session closed for user root

Learn how to use head command with tail and cat commands for effective usage in Linux.

5. tail Command

tail outputs the last parts (10 lines by default) of a file. Use the -n num switch to specify the number of lines to be displayed.

The command below will output the last 5 lines of the specified file:

tecmint@TecMint ~ $ tail -n 5 /var/log/auth.log
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on 0.0.0.0 port 22.
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on :: port 22.
Jan  6 13:01:27 TecMint sshd[1269]: Received SIGHUP; restarting.
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on 0.0.0.0 port 22.
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on :: port 22.

Additionally, tail has a special option -f for watching changes in a file in real-time (especially log files).

The following command will enable you monitor changes in the specified file:

tecmint@TecMint ~ $ tail -f /var/log/auth.log
Jan  6 12:58:01 TecMint sshd[1269]: Server listening on :: port 22.
Jan  6 12:58:11 TecMint sshd[1269]: Received SIGHUP; restarting.
Jan  6 12:58:12 TecMint sshd[1269]: Server listening on 0.0.0.0 port 22.
Jan  6 12:58:12 TecMint sshd[1269]: Server listening on :: port 22.
Jan  6 13:01:27 TecMint sshd[1269]: Received SIGHUP; restarting.
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on 0.0.0.0 port 22.
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on :: port 22.
Jan  6 13:01:27 TecMint sshd[1269]: Received SIGHUP; restarting.
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on 0.0.0.0 port 22.
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on :: port 22.

Read through the tail man page for a complete list of usage options and instructions:

$ man tail

6. sort Command

sort is used to sort lines of a text file or from standard input.

Below is the content of a file named domains.list:

tecmint@TecMint ~ $ cat domains.list
tecmint.com
tecmint.com
news.tecmint.com
news.tecmint.com
linuxsay.com
linuxsay.com
windowsmint.com
windowsmint.com

You can run a simple sort command to sort the file content like so:

tecmint@TecMint ~ $ sort domains.list
linuxsay.com
linuxsay.com
news.tecmint.com
news.tecmint.com
tecmint.com
tecmint.com
windowsmint.com
windowsmint.com

You can use sort command in many ways, go through some of the useful articles on sort command as follows:

7. uniq Command

uniq command is used to report or omit repeated lines, it filters lines from standard input and writes the outcome to standard output.

After running sort on an input stream, you can remove repeated lines with uniq as in the example below.

To indicate the number of occurrences of a line, use the -c option and ignore differences in case while comparing by including the -i option:

tecmint@TecMint ~ $ cat domains.list
tecmint.com
tecmint.com
news.tecmint.com
news.tecmint.com
linuxsay.com
linuxsay.com
windowsmint.com

tecmint@TecMint ~ $ sort domains.list | uniq -c 
2 linuxsay.com
2 news.tecmint.com
2 tecmint.com
1 windowsmint.com

Read through the uniq man page for further usage info and flags:

$ man uniq

8. fmt Command

fmt simple optimal text formatter, it reformats paragraphs in specified file and prints results to the standard output.

The following is the content extracted from the file domain-list.txt:

1.tecmint.com 2.news.tecmint.com 3.linuxsay.com 4.windowsmint.com

To reformat the above content to a standard list, run the following command with -w switch is used to define the maximum line width:

tecmint@TecMint ~ $ cat domain-list.txt 
1.tecmint.com 2.news.tecmint.com 3.linuxsay.com 4.windowsmint.com

tecmint@TecMint ~ $ fmt -w 1 domain-list.txt
1.tecmint.com 
2.news.tecmint.com 
3.linuxsay.com 
4.windowsmint.com

9. pr Command

pr command converts text files or standard input for printing. For instance on Debian systems, you can list all installed packages as follows:

$ dpkg -l

To organize the list in pages and columns ready for printing, issue the following command.

tecmint@TecMint ~ $ dpkg -l | pr --columns 3 -l 20  

2017-01-06 13:19                                                  Page 1


Desired=Unknown/Install ii  adduser		ii  apg
| Status=Not/Inst/Conf- ii  adwaita-icon-theme	ii  app-install-data
|/ Err?=(none)/Reinst-r ii  adwaita-icon-theme- ii  apparmor
||/ Name		ii  alsa-base		ii  apt
+++-=================== ii  alsa-utils		ii  apt-clone
ii  accountsservice	ii  anacron		ii  apt-transport-https
ii  acl			ii  apache2		ii  apt-utils
ii  acpi-support	ii  apache2-bin		ii  apt-xapian-index
ii  acpid		ii  apache2-data	ii  aptdaemon
ii  add-apt-key		ii  apache2-utils	ii  aptdaemon-data


2017-01-06 13:19                                                  Page 2


ii  aptitude		ii  avahi-daemon	ii  bind9-host
ii  aptitude-common	ii  avahi-utils		ii  binfmt-support
ii  apturl		ii  aview		ii  binutils
ii  apturl-common	ii  banshee		ii  bison
ii  archdetect-deb	ii  baobab		ii  blt
ii  aspell		ii  base-files		ii  blueberry
ii  aspell-en		ii  base-passwd		ii  bluetooth
ii  at-spi2-core	ii  bash		ii  bluez
ii  attr		ii  bash-completion	ii  bluez-cups
ii  avahi-autoipd	ii  bc			ii  bluez-obexd

.....

The flags used here are:

--column defines number of columns created in the output.
-l specifies page length (default is 66 lines).

10. tr Command

This tool translates or deletes characters from standard input and writes results to standard output.

The syntax for using tr is as follows:

$ tr options set1 set2

Take a look at the examples below, in the first command, set1( [:upper:] ) represents the case of input characters (all upper case).

Then set2([:lower:]) represents the case in which the resultant characters will be. It’s same thing in the second example and the escape sequence \n means print output on a new line:

tecmint@TecMint ~ $ echo "WWW.TECMINT.COM" | tr [:upper:] [:lower:]
www.tecmint.com

tecmint@TecMint ~ $ echo "news.tecmint.com" | tr [:lower:] [:upper:]
NEWS.TECMINT.COM

11. more Command

more command is a useful file perusal filter created basically for certificate viewing. It shows file content in a page like format, where users can press [Enter] to view more information.

You can use it to view large files like so:

tecmint@TecMint ~ $ dmesg | more
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.4.0-21-generic (buildd@lgw01-21) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) ) #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 (Ubuntu 4.4.0-21.37-generic
 4.4.6)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-21-generic root=UUID=bb29dda3-bdaa-4b39-86cf-4a6dc9634a1b ro quiet splash vt.handoff=7
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] x86/fpu: Using 'eager' FPU context switches.
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d3ff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009d400-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000a56affff] usable
[    0.000000] BIOS-e820: [mem 0x00000000a56b0000-0x00000000a5eaffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000a5eb0000-0x00000000aaabefff] usable
--More--

12. less Command

less is the opposite of more command above but it offers extra features and it’s a little faster with large files.

Use it in the same way as more:

tecmint@TecMint ~ $ dmesg | less
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.4.0-21-generic (buildd@lgw01-21) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) ) #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 (Ubuntu 4.4.0-21.37-generic
 4.4.6)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-21-generic root=UUID=bb29dda3-bdaa-4b39-86cf-4a6dc9634a1b ro quiet splash vt.handoff=7
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] x86/fpu: Using 'eager' FPU context switches.
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d3ff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009d400-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000a56affff] usable
[    0.000000] BIOS-e820: [mem 0x00000000a56b0000-0x00000000a5eaffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000a5eb0000-0x00000000aaabefff] usable
:

Learn Why ‘less’ is Faster Than ‘more’ Command for effective file navigation in Linux.

That’s all for now, do let us know of any useful command line tools not mentioned here, that act as a text filters in Linux via the comment section below.

2 thoughts on “12 Useful Commands For Filtering Text for Effective File Operations in Linux”

kannu

February 4, 2021 at 2:45 pm

Your list of commands is good. I need some help.

In my interview, one question is asked which I have no answer to. The question is what is swap space? Please explain to me in the simple words if you know about this.
- Riyan
  
  August 20, 2022 at 2:45 pm
  
  Swap space in Linux is used when the amount of physical memory (RAM) is full. If the system needs more memory resources and the RAM is full, inactive pages in memory are moved to the swap space. While swap space can help machines with a small amount of RAM, it should not be considered a replacement for more RAM.