DevOps · K8s · Volleyball · Travel  •  DevOps · K8s · Volleyball · Travel  •  DevOps · K8s · Volleyball · Travel
Explore NY Stream

Awk & Sed Introduction and Printing Operations

— ny_wk

Awk & Sed Introduction and Printing Operations

Awk and sed are the two text-processing power tools every Linux sysadmin should master: awk slices files into records and fields to build formatted reports, while sed is a stream editor that finds, substitutes, deletes, and rewrites text on the fly. This guide walks through both tools from first principles with corrected, copy-paste-ready examples you can run on any modern Linux or macOS shell.

If you have only ever used sed to swap one word for another, you are using a fraction of its power. The same goes for awk, which is a full programming language with variables, conditions, loops, and arithmetic. Learning awk and sed together turns one-off manual edits into repeatable, scriptable pipelines.

What Awk is and why sysadmins use it

Awk is a pattern-scanning and data-extraction language named after its three creators at Bell Labs: Alfred Aho, Peter Weinberger, and Brian Kernighan. It reads input line by line, splits each line into fields, and runs the actions you define whenever a line matches a pattern.

The version installed on most Linux distributions is actually gawk (GNU awk), symlinked to awk. It is ideal for log analysis, CSV manipulation, and one-line reports. Key characteristics:

  • Awk treats a text file as a set of records (lines by default) and fields (words by default).
  • It supports variables, conditionals, loops, arrays, and arithmetic and string operators.
  • It reads from a file or standard input and writes to standard output, so it slots cleanly into pipes.
  • It is designed for structured text — do not point it at binary files.

Awk syntax and working model

The general form of an awk program is a series of pattern { action } rules:

  1. awk 'pattern { action }' file
  2. The pattern is a condition or regular expression that selects lines.
  3. The action (inside braces) is the statement(s) to run on matching lines; separate multiple statements with semicolons.
  4. Single quotes wrap the program so the shell does not expand $, *, or other special characters.

Either the pattern or the action is optional, but not both. With no pattern, the action runs on every line. With no action, the default action is to print the whole matching line. Note the difference between omitting the braces and writing empty braces: {} does nothing at all and suppresses the default print.

A sample data file

All awk examples below use this whitespace-separated employee.txt. Each row is ID, Name, Role, Department, Salary:

100  Thomas  Manager    Sales       $5,000
200  Jason   Developer  Technology  $5,500
300  Sanjay  Sysadmin   Technology  $7,000
400  Nisha   Manager    Marketing   $9,500
500  Randy   DBA        Technology  $6,000

Awk printing operations, field by field

These are the everyday awk patterns that cover the bulk of real sysadmin work.

1. Print every line (default behaviour)

With no pattern and a bare print, awk echoes each line. print with no argument prints $0, the whole record:

  1. awk '{ print }' employee.txt

This is functionally the same as cat employee.txt, but it confirms awk is reading the file correctly before you add logic.

2. Print only the lines that match a pattern

To print rows containing either "Thomas" or "Nisha", use the logical OR operator || between two regex patterns. A common copy-paste error is to write /Thomas/ > /Nisha/; the > is a comparison, not OR, and will not do what you expect.

  1. awk '/Thomas/ || /Nisha/' employee.txt
100  Thomas  Manager    Sales       $5,000
400  Nisha   Manager    Marketing   $9,500

3. Print specific fields

Awk automatically splits each record on whitespace and stores the pieces in $1, $2, and so on; $0 is the entire line. The built-in variable NF holds the Number of Fields, so $NF is the last field. To print the name and salary columns:

  1. awk '{ print $2, $5 }' employee.txt
  2. awk '{ print $2, $NF }' employee.txt
Thomas $5,000
Jason $5,500
Sanjay $7,000
Nisha $9,500
Randy $6,000

The comma between fields inserts the output field separator (a space by default). Omit the comma — print $2 $5 — and awk concatenates the values with no space.

4. BEGIN and END blocks for headers and footers

Two special patterns control setup and teardown. The BEGIN block runs once before any input is read; the END block runs once after the last line is processed. They are perfect for printing report headers, initialising counters, and printing summaries. In awk, # starts a comment.

  1. awk 'BEGIN { print "Name\tDesignation\tDept\tSalary" } { print $2"\t"$3"\t"$4"\t"$NF } END { print "----\nReport generated" }' employee.txt
Name    Designation    Dept    Salary
Thomas  Manager        Sales       $5,000
Jason   Developer      Technology  $5,500
Sanjay  Sysadmin       Technology  $7,000
Nisha   Manager        Marketing   $9,500
Randy   DBA            Technology  $6,000
----
Report generated

For perfectly aligned columns, reach for printf instead of print, for example printf "%-8s %-10s\n", $2, $3.

5. Numeric field comparisons

Patterns can test field values numerically. To list employees whose ID is greater than 200, compare $1 directly — awk treats it as a number in numeric context:

  1. awk '$1 > 200' employee.txt
300  Sanjay  Sysadmin   Technology  $7,000
400  Nisha   Manager    Marketing   $9,500
500  Randy   DBA        Technology  $6,000

6. Match a field against a regular expression

The ~ operator means "matches this regex" (and !~ means "does not match"). To print everyone in the Technology department, test the fourth field:

  1. awk '$4 ~ /Technology/' employee.txt
200  Jason   Developer  Technology  $5,500
300  Sanjay  Sysadmin   Technology  $7,000
500  Randy   DBA        Technology  $6,000

7. Count matching records

Combine a pattern, a counter, and an END block to produce a tally. Variables in awk default to zero, so the explicit BEGIN initialisation is optional but tidy:

  1. awk '$4 ~ /Technology/ { count++ } END { print "Employees in Technology =", count }' employee.txt
Employees in Technology = 3

Swap in { sum += $1 } to total a column, or build an associative array (dept[$4]++) to group counts by department in a single pass — that is where awk leaves simple grep far behind.

Sed: the stream editor for automated text edits

Sed reads input line by line, applies your editing commands to each line in a temporary buffer (the "pattern space"), and prints the result. Because it is non-interactive and reads from a stream, it is perfect for scripted, repeatable edits across many files. Most engineers only ever use its substitute command, but sed can delete, insert, append, transform, and filter too.

A sample data file for sed

The sed examples use this file.txt:

unix is great os. unix is opensource. unix is free os.
learn operating system.
unixlinux which one you choose.

Important safety note: by default sed writes the result to standard output and does not modify the file. To edit the file in place, add the -i flag — and always test without it first, or use -i.bak to keep a backup copy.

1. Substitute the first match on each line

The substitute command is s/pattern/replacement/. By default it replaces only the first occurrence on each line:

  1. sed 's/unix/linux/' file.txt
linux is great os. unix is opensource. unix is free os.
learn operating system.
linuxlinux which one you choose.

The s is the substitute operation, the / characters are delimiters, the first segment is the search regex, and the second is the replacement.

2. Replace the Nth occurrence on a line

Add a number flag to target a specific occurrence. This replaces the second "unix" on each line:

  1. sed 's/unix/linux/2' file.txt

3. Replace every occurrence with the g flag

The global flag g replaces all matches on each line:

  1. sed 's/unix/linux/g' file.txt
linux is great os. linux is opensource. linux is free os.
learn operating system.
linuxlinux which one you choose.

4. Replace from the Nth occurrence onward

Combine a number with g to replace from the Nth match to the end of the line. 3g means "from the third occurrence onward". On the first sample line, the three "unix" instances appear before "great", "opensource", and "free", so only the third is changed:

  1. sed 's/unix/linux/3g' file.txt
unix is great os. unix is opensource. linux is free os.
learn operating system.
unixlinux which one you choose.

5. Change the delimiter

When the pattern itself contains slashes (such as a URL), escaping every / gets ugly. Sed lets you use any character as the delimiter — just put it right after the s. These three commands are equivalent:

  1. sed 's/http:\/\//www/' file.txt
  2. sed 's_http://_www_' file.txt
  3. sed 's|http://|www|' file.txt

6. Reuse the matched text with &

In the replacement, & stands for the entire matched string. This is handy when you want to wrap or duplicate a match rather than replace it:

  1. sed 's/unix/{&}/' file.txt wraps the first "unix" in braces: {unix}.
  2. sed 's/unix/{&&}/' file.txt duplicates it: {unixunix}.

7. Capture groups with backreferences \1 to \9

Parentheses (escaped in basic regex as \( and \)) create capture groups you can reference in the replacement as \1, \2, and so on. Examples:

  • Double a word: sed 's/\(unix\)/\1\1/' file.txt turns the first "unix" into "unixunix".
  • Swap two adjacent words: sed 's/\(unix\)\(linux\)/\2\1/' file.txt turns "unixlinux" into "linuxunix".
  • Reverse the first three characters of each line: sed 's/^\(.\)\(.\)\(.\)/\3\2\1/' file.txt.

Tip: GNU sed's -E (extended regex) flag lets you drop the backslashes — sed -E 's/(unix)(linux)/\2\1/' — which is far more readable.

8. Print the changed line twice with the p flag

The p flag prints the pattern space again after substitution, so substituted lines appear twice and unchanged lines appear once:

  1. sed 's/unix/linux/p' file.txt

9. Print only the substituted lines

Pair -n (suppress automatic printing) with the p flag to show only the lines where a substitution happened — a clean way to see exactly what changed:

  1. sed -n 's/unix/linux/p' file.txt

Used alone, -n suppresses all output.

10. Chain multiple edits

Run several substitutions in one pass with repeated -e options (cleaner than piping sed into sed):

  1. sed -e 's/unix/linux/' -e 's/os/system/' file.txt
  2. Equivalent pipe: sed 's/unix/linux/' file.txt | sed 's/os/system/'

11-13. Target specific lines or matched lines

You can scope a command to a line number, a range, or lines matching a pattern (an "address"):

CommandWhat it does
sed '3 s/unix/linux/' file.txtSubstitute only on line 3.
sed '1,3 s/unix/linux/' file.txtSubstitute on lines 1 through 3.
sed '2,$ s/unix/linux/' file.txtSubstitute from line 2 to the last line ($).
sed '/linux/ s/unix/centos/' file.txtOn lines containing "linux", replace "unix" with "centos".

14-16. Delete, duplicate, and filter like grep

  • Delete lines: sed '2d' file.txt removes line 2; sed '5,$d' file.txt removes line 5 to the end.
  • Duplicate every line: sed 'p' file.txt prints each line twice.
  • Act like grep: sed -n '/unix/p' file.txt prints only matching lines (same as grep unix); sed -n '/unix/!p' file.txt inverts the match (same as grep -v unix), where ! negates the address.

17-19. Append, insert, and change whole lines

Sed can add or replace entire lines around a match:

  • Append after a match with a: sed '/unix/a Added after' file.txt.
  • Insert before a match with i: sed '/unix/i Added before' file.txt.
  • Change the whole matched line with c: sed '/unix/c Changed line' file.txt.

20. Transliterate characters with y

The y command maps characters one-to-one, like tr. This uppercases every "u" and "l":

  1. sed 'y/ul/UL/' file.txt
Unix is great os. Unix is opensoUrce. Unix is free os.
Learn operating system.
UnixLinUx which one yoU choose.

Common pitfalls with awk and sed

  • Forgetting the quotes. Always single-quote awk and sed programs so the shell does not expand $ or *.
  • Assuming sed edits the file. Without -i, sed only prints to stdout. With -i, it overwrites permanently — use -i.bak until you trust the command.
  • Confusing > with logical OR. In awk, combine patterns with ||, not >.
  • Whitespace columns vs. real delimiters. Awk's default split is on runs of whitespace; for CSV use -F',' and remember quoted commas need a proper parser.
  • BRE vs. ERE. In basic sed you must escape \(, \), \+; switch to sed -E / awk (which uses ERE) for cleaner patterns.
  • Greedy global replace. The g flag changes every match on a line; if you meant only the first, drop it.

Verification: confirm your edits did what you expected

  1. Dry-run first. Run sed without -i and read the output before committing.
  2. Diff before and after. sed 's/unix/linux/g' file.txt | diff file.txt - shows exactly which lines change.
  3. Count matches. grep -c unix file.txt before, and re-check after, to confirm the expected number of replacements.
  4. Validate awk field logic by printing NF and $0: awk '{ print NF": "$0 }' file.txt reveals mis-split rows.
  5. Keep a backup. Use cp file.txt file.txt.bak or sed -i.bak so you can roll back instantly.

Key Takeaways

  • Awk is a field-aware programming language for extracting columns and building reports; sed is a stream editor for find-and-replace and line edits.
  • In awk, $1..$NF are fields, $0 is the whole line, and BEGIN/END blocks handle setup and summaries.
  • In sed, s/old/new/ replaces the first match; add g for all, a number for the Nth, and -n .../p to show only changes.
  • Sed does not touch your file unless you pass -i — always dry-run first and keep a .bak.
  • Use || for OR in awk, capture groups (\1/\2) in sed, and -E for cleaner extended regex in both.

Frequently Asked Questions

What is the difference between awk and sed?

Sed is a stream editor focused on line-oriented find, replace, insert, and delete operations. Awk is a full programming language that understands records and fields, so it excels at column extraction, calculations, and formatted reports. Use sed for quick text substitution and awk when you need field logic, arithmetic, or grouping.

How do I edit a file in place with sed?

Use the -i flag: sed -i 's/old/new/g' file.txt. Because this overwrites the file permanently, run it first without -i to preview, or use sed -i.bak 's/old/new/g' file.txt to automatically keep a backup named file.txt.bak.

How do I print a specific column with awk?

Awk splits each line into fields on whitespace by default. Print the second column with awk '{ print $2 }' file, or the last column with awk '{ print $NF }' file. For comma-separated files, set the field separator: awk -F',' '{ print $2 }' file.csv.

What does the g flag mean in sed?

The g (global) flag tells sed to replace every match on each line, not just the first. Without it, s/unix/linux/ changes only the first "unix" per line; with it, s/unix/linux/g changes all of them.

If this helped you tame the command line, subscribe to @explorenystream on YouTube for more Linux and sysadmin tutorials.