-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path2022-04-09T06_51_14.688ZAwk - A Tutorial and Introduction - by Bruce Barnett.html
3437 lines (3358 loc) · 143 KB
/
2022-04-09T06_51_14.688ZAwk - A Tutorial and Introduction - by Bruce Barnett.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "https://www.w3.org/TR/html4/strict.dtd"> <html lang=en-US style><!--
Page saved with SingleFile
url: https://www.grymoire.com/Unix/Awk.html
saved date: Sat Apr 09 2022 12:21:14 GMT+0530 (India Standard Time)
--><meta charset=utf-8><title>
Awk - A Tutorial and Introduction - by Bruce Barnett
</title>
<meta name=Description content="The Grymoire's AWK tutorial">
<meta name=Keywords content="awk, tutorial, unix, linux, shell scripts, regular expressions, tr">
<meta name=Author content="Bruce Barnett">
<meta property=og:title content="The Grymoire's tutorial on AWK">
<meta property=og:type content=website>
<meta property=og:url content=https://www.grymoire.com/Unix/Awk.html>
<meta property=og:description content="The Grymoire - Tutorial on the AWK program language.">
<link rel=canonical href=https://www.grymoire.com/Unix/Awk.html>
<style>body{font-family:Verdana,sans-serif;font-size:16px;line-height:1.2;background-color:#ffd980;margin:0}p{width:100%;font-size:120%;line-height:1.2}ul{padding:0;margin:0}li{display:block;position:relative;list-style-type:none;line-height:150%}li a{display:block;text-align:left;padding:0px 0px;text-decoration:none}ul ul{display:block;padding:2px 16px;left:0;top:100%}li:hover ul{display:block}ul ul ul{left:100%;padding:2px 16px;top:0}li:hover>ul{display:block;color:rgb(0,96,255);padding-bottom:5px;font-weight:bold;border-bottom-width:1px;border-bottom-style:solid;border-bottom-color:#C6EC8C}a:active{color:rgb(255,0,102);font-weight:bold}h1{border-bottom:2px solid #009}#centerDoc{position:absolute;margin-top:50px;margin-left:5%;width:inherit;padding-left:15px;border:1px solid #333;background-color:#EEE8AA}.topnav{overflow:hidden;background-color:#FFAA00}.topnav a{float:left;display:block;text-align:center;padding:14px 16px;text-decoration:none;font-size:17px}.dropdown{float:left;overflow:hidden}.dropdown .dropbtn{font-size:17px;border:none;outline:none;padding:14px 16px;background-color:inherit;font-family:inherit;margin-left:10px;margin:0}topnav a:hover,.dropdown:hover .dropbtn{background-color:#555;color:white}@media screen and (max-width:600px){.topnav a:not(:first-child),.dropdown .dropbtn{display:none}}</style>
<meta name=viewport content="width=device-width, initial-scale=1">
<link type=image/x-icon rel="shortcut icon" href=><style>.sf-hidden{display:none!important}</style><meta http-equiv=content-security-policy content="default-src 'none'; font-src 'self' data:; img-src 'self' data:; style-src 'unsafe-inline'; media-src 'self' data:; script-src 'unsafe-inline' data:;"></head>
<body>
<h1 id=Awk><a href=#toc_Awk>Awk</a></h1>
<div class=topnav id=myTopnav>
<a href=https://www.grymoire.com/index.html>Home</a>
<div class=dropdown>
<button class=dropbtn>Unix/Linux <span>▼</span></button>
<div class="dropdown-content sf-hidden">
</div>
</div>
<div class=dropdown>
<button class=dropbtn>Security <span>▼</span></button>
<div class="dropdown-content sf-hidden">
</div>
</div>
<div class=dropdown>
<button class=dropbtn>Misc <span>▼</span></button>
<div class="dropdown-content sf-hidden">
</div>
</div>
<div class=dropdown>
<button class=dropbtn>References <span>▼</span></button>
<div class="dropdown-content sf-hidden">
</div>
</div>
<a href=https://www.grymoire.com/magic.html>Magic</a>
<a href=https://www.grymoire.com/Search.html>Search</a>
<a href=https://www.grymoire.com/About.html>About</a>
<a href=https://www.grymoire.com/Unix/donate.html>Donate</a>
<a href=https://www.grymoire.com/Unix/Awk.html style=font-size:15px class="icon sf-hidden">☰</a>
</div>
<div id=centerDoc>
<p>
<h2 id=Table_of_Contents><a href=#toc_Table_of_Contents>Table of Contents</a></h2>
Last modified: Fri Nov 27 09:56:48 2020
<p>Part of the <a href=https://www.grymoire.com/Unix/index.html title="Unix Tutorials">Unix tutorials</a> And then there's <a href=https://grymoire.wordpress.com/ title="Grymoire's blog">My blog</a>
<h2 id=You_can_buy_me_a_coffee.2C_please><a href=#toc_You_can_buy_me_a_coffee.2C_please>You can buy me a coffee, please</a></h2>
<p>I would appreciate it if you occasionally
<a href=https://www.buymeacoffee.com/grymoire>buy me a coffee</a>. Thanks.
<p>Click on a topic in this table to jump there. Click on the topic title to come back to the Table Of Contents.
<ul>
<li><a href=#uh-0 name=toc-uh-0>Why learn AWK?</a>
<li><a href=#uh-1 name=toc-uh-1>Basic Structure</a>
<li><a href=#uh-2 name=toc-uh-2>Executing an AWK script</a>
<li><a href=#uh-3 name=toc-uh-3>Which shell to use with AWK?</a>
<li><a href=#uh-4 name=toc-uh-4>Dynamic Variables</a>
<li><a href=#uh-5 name=toc-uh-5>The Essential Syntax of AWK</a>
<li><a href=#uh-6 name=toc-uh-6>Arithmetic Expressions</a>
<ul>
<li><a href=#uh-7 name=toc-uh-7>Unary arithmetic operators</a>
<li><a href=#uh-8 name=toc-uh-8>The Autoincrement and Autodecrement Operators</a>
<li><a href=#uh-9 name=toc-uh-9>Assignment Operators</a>
<li><a href=#uh-10 name=toc-uh-10>Conditional expressions</a>
<li><a href=#uh-11 name=toc-uh-11>Regular Expressions</a>
<li><a href=#uh-12 name=toc-uh-12>And/Or/Not</a>
</ul>
<li><a href=#uh-13 name=toc-uh-13>Summary of AWK Commands</a>
<li><a href=#uh-14 name=toc-uh-14>AWK Built-in Variables</a>
<ul>
<li><a href=#uh-15 name=toc-uh-15>FS - The Input Field Separator Variable</a>
<li><a href=#uh-16 name=toc-uh-16>OFS - The Output Field Separator Variable</a>
<li><a href=#uh-17 name=toc-uh-17>NF - The Number of Fields Variable</a>
<li><a href=#uh-18 name=toc-uh-18>NR - The Number of Records Variable</a>
<li><a href=#uh-19 name=toc-uh-19>RS - The Record Separator Variable</a>
<li><a href=#uh-20 name=toc-uh-20>ORS - The Output Record Separator Variable</a>
<li><a href=#uh-21 name=toc-uh-21>FILENAME - The Current Filename Variable</a>
</ul>
<li><a href=#uh-22 name=toc-uh-22>Associative Arrays</a>
<ul>
<li><a href=#uh-23 name=toc-uh-23>Multi-dimensional Arrays</a>
<li><a href=#uh-24 name=toc-uh-24>Example of using AWK's Associative Arrays</a>
<ul>
<li><a href=#uh-25 name=toc-uh-25>Output of the script</a>
</ul>
</ul>
<li><a href=#uh-26 name=toc-uh-26>Picture Perfect PRINTF Output</a>
<ul>
<li><a href=#uh-27 name=toc-uh-27>PRINTF - formatting output</a>
<li><a href=#uh-28 name=toc-uh-28>Escape Sequences</a>
<li><a href=#uh-29 name=toc-uh-29>Format Specifiers</a>
<li><a href=#uh-30 name=toc-uh-30>Width - specifying minimum field size</a>
<li><a href=#uh-31 name=toc-uh-31>Left Justification</a>
<li><a href=#uh-32 name=toc-uh-32>The Field Precision Value</a>
<li><a href=#uh-33 name=toc-uh-33>Explicit File output</a>
</ul>
<li><a href=#uh-33a name=toc-uh-33a>Flow Control with next and exit</a>
<li><a href=#uh-34 name=toc-uh-34>AWK Numerical Functions</a>
<ul>
<li><a href=#uh-35 name=toc-uh-35>Trigonometric Functions</a>
<li><a href=#uh-36 name=toc-uh-36>Exponents, logs and square roots</a>
<li><a href=#uh-37 name=toc-uh-37>Truncating Integers</a>
<li><a href=#uh-38 name=toc-uh-38>Random Numbers</a>
<li><a href=#uh-39 name=toc-uh-39>The Lotto script</a>
</ul>
<li><a href=#uh-40 name=toc-uh-40>String Functions</a>
<ul>
<li><a href=#uh-41 name=toc-uh-41>The Length function</a>
<li><a href=#uh-42 name=toc-uh-42>The Index Function</a>
<li><a href=#uh-43 name=toc-uh-43>The Substr function</a>
<li><a href=#uh-45 name=toc-uh-45>The Split function</a>
<li><a href=#uh-44 name=toc-uh-44>GAWK's Tolower and Toupper function</a>
<li><a href=#uh-46 name=toc-uh-46>NAWK's string functions</a>
<ul>
<li><a href=#uh-47 name=toc-uh-47>The Match function</a>
<li><a href=#uh-48 name=toc-uh-48>The System function</a>
<li><a href=#uh-49 name=toc-uh-49>The Getline function</a>
<li><a href=#uh-50 name=toc-uh-50>The systime function</a>
<li><a href=#uh-51 name=toc-uh-51>The Strftime function</a>
</ul>
</ul>
<li><a href=#uh-52 name=toc-uh-52>User Defined Functions</a>
<li><a href=#uh-53 name=toc-uh-53>AWK patterns</a>
<li><a href=#uh-54 name=toc-uh-54>Formatting AWK programs</a>
<li><a href=#uh-55 name=toc-uh-55>Environment Variables</a>
<ul>
<li><a href=#uh-56 name=toc-uh-56>ARGC - Number or arguments (NAWK/GAWK)</a>
<li><a href=#uh-57 name=toc-uh-57>ARGV - Array of arguments (NAWK/GAWK)</a>
<li><a href=#uh-58 name=toc-uh-58>ARGIND - Argument Index (GAWK only)</a>
<li><a href=#uh-59 name=toc-uh-59>FNR (NAWK/GAWK)</a>
<li><a href=#uh-60 name=toc-uh-60>OFMT (NAWK/GAWK)</a>
<li><a href=#uh-61 name=toc-uh-61>RSTART, RLENGTH and match (NAWK/GAWK)</a>
<li><a href=#uh-62 name=toc-uh-62>SUBSEP - Multi-dimensional array separator (NAWK/GAWK)</a>
<li><a href=#uh-63 name=toc-uh-63>ENVIRON - environment variables (GAWK only)</a>
<li><a href=#uh-64 name=toc-uh-64>IGNORECASE (GAWK only)</a>
<li><a href=#uh-65 name=toc-uh-65>CONVFMT - conversion format (GAWK only)</a>
<li><a href=#uh-66 name=toc-uh-66>ERRNO - system errors (GAWK only)</a>
<li><a href=#uh-67 name=toc-uh-67>FIELDWIDTHS - fixed width fields (GAWK only)</a>
</ul>
<li><a href=#uh-68 name=toc-uh-68>AWK, NAWK, GAWK, or PERL</a>
</ul>
<ins class="adsbygoogle sf-hidden" style=display:block data-ad-client=ca-pub-3246203470757260 data-ad-slot=7313086580 data-ad-format=auto></ins>
<h1 id=Intro_to_AWK><a href=#toc_Intro_to_AWK>Intro to AWK</a></h1>
<p>Copyright 1994,1995 Bruce Barnett and General Electric Company
<p>Copyright 2001, 2004, 2013, 2014 Bruce Barnett
<p>All rights reserved
<p>You are allowed to print copies of this tutorial for your personal
use, and link to this page, but you are not allowed to make electronic
copies, or redistribute this tutorial in any form without permission.
<p>
<p> Original version written in 1994 and published in the Sun Observer
<p>
Awk is an extremely versatile programming language for working on
files. We'll teach you just enough to understand the examples in this
page, plus a smidgen.
<p> The examples given below have the extensions of the executing script as part of the filename. Once you download it, and make it executable, you can rename it anything you want.
<p><h2><a name=uh-0 href=#toc-uh-0>Why learn AWK?</a></h2><p>
<p>In the past I have
covered
<i>grep</i> and
<i>sed</i>. This section discusses AWK, another cornerstone
of UNIX shell programming.
There are three variations of AWK:
<dl><dd>AWK - the (very old) original from AT&T<br>
NAWK - A newer, improved version from AT&T<br>
GAWK - The Free Software foundation's version<br>
</dl><p>Originally, I didn't plan to discuss NAWK, but several UNIX vendors
have replaced AWK with NAWK, and there are several incompatibilities
between the two. It would be cruel of me to not warn you about the
differences. So I will highlight those when I come to them.
It is important to know than all of AWK's features are in NAWK and
GAWK.
Most, if not all, of NAWK's features are in GAWK.
NAWK ships as part of Solaris. GAWK does not. However, many sites on
the Internet have the sources freely available. If you user Linux, you have GAWK. But in general, assume that I am talking about the classic AWK unless otherwise noted.
<p>And now there is talk about <a href=https://en.wikipedia.org/wiki/AWK#Versions_and_implementations>MAWK, TAWK, and JAWK</a>.
<p>Why is AWK so important? It is an excellent filter and report writer.
Many UNIX utilities generates rows and columns of information.
AWK is an excellent tool for processing these rows and columns,
and is easier to use AWK than most conventional programming languages.
It can be considered to be a pseudo-C interpretor,
as it understands the same arithmatic operators as C.
AWK also has string manipulation functions, so it can search for
particular strings and modify the output. AWK also has associative arrays,
which are incredible useful, and is a feature most computing languages
lack. Associative arrays can make a complex problem a trivial exercise.
<p>I'll try to cover the essential parts or AWK, and mention the extensions/variations.
The "new AWK," or "nawk", comes on the Sun system, and you may find it
superior to the old AWK in many ways. In particular,
it has better diagnostics, and won't print out the infamous
"bailing out near line ..." message the original AWK is prone to do.
Instead,
"nawk" prints out the line it didn't understand,
and highlights the bad parts with arrows. GAWK does this as well, and this really helps a lot.
If you find yourself needing a feature that is very difficult or impossible
to do in AWK, I suggest you either use NAWK, or GAWK, or convert your AWK script into
PERL using the
"a2p" conversion program which comes with PERL.
PERL is a marvelous language, and I use it all the time, but
I do not plan to cover PERL in these tutorials.
Having made my intention clear, I can continue with a clear conscience.
<p>Many UNIX utilities have strange names. AWK is one of those utilities.
It is not an abbreviation for <i>awk</i>ward. In fact, it is an elegant
and simple language.
The work
"AWK" is derived from the initials of the language's three developers: A. Aho,
B. W. Kernighan and P. Weinberger.
<p>
<p><h2><a name=uh-1 href=#toc-uh-1>Basic Structure</a></h2><p>
<p>The essential organization of an AWK program follows the form:
<dl><dd><i>pattern</i> { action }<br>
</dl><p>The pattern specifies when the action is performed.
Like most UNIX utilities, AWK is line oriented. That is,
the pattern specifies a test that is performed with each line read
as input. If the condition is true, then the action is taken.
The default pattern is something that matches every line.
This is the blank or null pattern. Two other important patterns
are specified by the keywords
"BEGIN" and
"END". As you might expect, these two words specify actions to be taken
before any lines are read, and after the last line is read.
The AWK program below:
<dl><dd>
<pre>BEGIN { print "START" }
{ print }
END { print "STOP" }
</pre>
</dl><p>
<p>adds one line before and one line after the input file.
This isn't very useful, but with a simple change, we can make
this into a typical AWK program:
<dl><dd>BEGIN { print "File\tOwner"}<br>
{ print $8, "\t", $3}<br>
END { print " - DONE -" }<br>
</dl><p>
<p>I'll improve the script in the next sections, but we'll call it "FileOwner".
But let's not put it into a script or file yet. I will cover that part in a bit. Hang on and follow with me so you get the flavor of AWK.
<p>The characters
"\t" Indicates a tab character so the output lines up on even boundries.
The
"$8" and
"$3" have a meaning similar to a shell script. Instead of the eighth and third
argument, they mean the eighth and third field of the input line.
You can think of a field as a column, and the action you specify
operates on each line or row read in.
<p>There are two differences between AWK and a shell processing the
characters within double quotes. AWK understands special characters
follow the
"\" character like "t". The Bourne and C UNIX shells do not.
Also, unlike the shell (and PERL) AWK does not evaluate variables within
strings. To explain, the second line could not be written like this:
<dl><dd> {print "$8\t$3" }<br>
</dl><p>That example would print
"$8 $3". Inside the quotes, the dollar sign is not a special character.
Outside, it corresponds to a field.
What do I mean by the third and eight field?
Consider the Solaris
"/usr/bin/ls -l" command, which has eight columns of information.
The System V version (Similar to the Linux version),
"/usr/5bin/ls -l" has 9 columns.
The third column is the owner, and the eighth (or nineth) column in the name of the file.
This AWK program can be used to process the output of the
"ls -l" command, printing out the filename, then the owner, for each file.
I'll show you how.
<p>Update: On a linux system, change "$8" to "$9".
<p>One more point about the use of a dollar sign.
In scripting languages like Perl and the various shells, a dollar sign
means the word following is the name of the variable. Awk is
different. The dollar sign means that we are refering to a field or
column in the current line. When switching between Perl and AWK you must remener that "$" has a different meaning.
So the following piece of code prints two "fields" to standard out. The first field
printed is the number "5", the second is the fifth field (or column) on the input
line.
<dl><dd>BEGIN { x=5 }<br>
{ print x, $x}<br>
</dl><p>
<p><h2><a name=uh-2 href=#toc-uh-2>Executing an AWK script</a></h2>
<p> So let's start writing our first AWK script.
There are a couple of ways to do this.
<p>Assuming the first script is called
"FileOwner", the invocation would be
<dl><dd>ls -l | FileOwner<br>
</dl><p>
<p>This might generate the following if there were only
two files in the current directory:
<dl><dd>File Owner<br>
<p><br>
a.file barnett<br>
another.file barnett<br>
- DONE -<br>
</p></dl><p>There are two problems with this script.
Both problems are easy to fix, but I'll hold
off on this until I cover the basics.
<p>The script itself can be written in many ways.
I've show both the C shell (csh/tcsh), and Bourne/Bash/POSIX shell script.
The C shell version would look like this:
<br>
<pre>#!/bin/csh -f
# Linux users have to change $8 to $9
awk '\
BEGIN { print "File\tOwner" } \
{ print $8, "\t", $3} \
END { print " - DONE -" } \
'
</pre>
<p>
And of course, once you create this script, you need to make this script executable by typing
<pre>chmod +x awk_example.1.csh
</pre>
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/awk_example1.csh>awk_example1.csh</a><br>
<p>As you can see in the above script, each line of the AWK script must
have a backslash if it is not the last line of the script.
This is necessary as the C shell doesn't, by default, allow strings to be longer than a line.
I have a long list of complaints about using the C shell. See
<a href=https://www.grymoire.com/Unix/CshTop10.txt>Top Ten reasons not to use the C shell</a><br>
The Bourne shell (as does most shells) allows quoted strings
to span several lines:
<br><br>#!/bin/sh<br>
# Linux users have to change $8 to $9 <br>
awk '<br>
BEGIN { print "File\tOwner" } <br>
{ print $8, "\t", $3} <br>
END { print " - DONE -" } <br>
'<br>
<p>And again, once it is created, it has to be made executable:
<pre>chmod +x awk_example1.sh
</pre>
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/awk_example1.sh>awk_example1.sh</a><br>
<p>By the way, I give example scripts in the tutorial, and use an extension on the filename to indicate the type of script.
You can, of course, "install" the script in your home "bin" directory by typing
<pre>cp awk_example1.sh $HOME/bin/awk_example1
chmod +x $HOME/bin/awk_example1
</pre>
<p>
A third type of AWK script is a "native' AWK script, where you don't use the shell. You can write the commands in a file, and execute
<pre>awk -f filename
</pre>
<p>Since AWK is also an interpretor, like the shell, you can save yourself a step and make the file executable
by add one line in the beginning of the file:
<br><br>#!/bin/awk -f<br>
BEGIN { print "File\tOwner" }<br>
{ print $8, "\t", $3}<br>
END { print " - DONE -" }<br>
<br>
<p>Then execute "chmod +x" and ise this file as a new UNIX command.
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/awk_example1.awk>awk_example1.awk</a><br>
<p>Notice the
"-f" option following '#!/bin/awk "
above, which is also used in the third format where you use AWK to execute the file directly, i.e. "awk -f filename".
The "-f" option specifies the AWK file containing the instructions.
As you can see, AWK considers lines that start with a
"#" to be a comment, just like the shell. To be precise, anything from the
"#" to the end of the line is a comment (unless its inside an AWK string.
However, I always comment my AWK scripts with the
"#" at the start of the line, for reasons I'll discuss later.
<p>Which format should you use? I prefer the last format when possible.
It's shorter and simpler. It's also easier to debug problems.
If you need to use a shell, and want to avoid using too many files, you can
combine them as we did in the first and second example.
<p><h2><a name=uh-3 href=#toc-uh-3>Which shell to use with AWK?</a></h2><p>
<p>The format of the original AWK is not free-form.
You cannot put new line breaks just anywhere.
They must go in particular locations.
To be precise, in the original AWK
you can insert a new line character
after the curly braces, and at the end of a command,
but not elsewhere.
If you wanted to break a long line into two lines at any other place,
you had to use a backslash:
<br><br>#!/bin/awk -f<br>
BEGIN { print "File\tOwner" }<br>
{ print $8, "\t", \ <br>
$3} <br>
END { print " - DONE -" }<br>
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/awk_example2.awk>awk_example2.awk</a><br>
<p>The Bourne shell version would be
<br><br>#!/bin/sh<br>
awk '<br>
BEGIN { print "File\tOwner" }<br>
{ print $8, "\t", \<br>
$3}<br>
END { print "done"} <br>
'<br>
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/awk_example2.sh>awk_example2.sh</a><br>
<p>while the C shell would be
<br><br>#!/bin/csh -f<br>
awk '<br>
BEGIN { print "File\tOwner" }\<br>
{ print $8, "\t", \\<br>
$3}\<br>
END { print "done"}\<br>
'<br>
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/awk_example2.csh>awk_example2.csh</a><br>
<p>As you can see, this demonstrates how awkward the
C shell is when enclosing an AWK script.
Not only are back slashes needed for every line, some lines need two, then the old original AWK is used.
Newer AWK's are more flexible where newlines can be added.
Many people, like me, will warn you about the C shell.
Some of the problems are subtle, and you may never see them.
Try to include an AWK or
<i>sed</i> script within a C shell script, and the back slashes will drive you crazy.
This is what convinced me to learn the Bourne shell years ago, when I
was starting out (before the Korn shell or Bash shell were available).
Even if you insist on use the C shell, you should at least learn enough of the Borne/POSIX shell to set variables,
which by some strange coincidence is the subject
of the next section.
<p>
<h2><a name=uh-4 href=#toc-uh-4>Dynamic Variables</a></h2>
<p>Since you can make a script an AWK executable
by mentioning
"#!/bin/awk -f" on the first line, including an AWK script inside a shell script
isn't needed unless you want to either eliminate the need for an extra file,
or if you want to pass a variable to the insides of an AWK script.
Since this is a common problem, now is as good a time to explain the
technique.
I'll do this by showing a simple AWK program
that will only print one column. <b>NOTE: there will be a bug in the first version.</b>
The number of the column will be specified by the first argument.
The first version of the program, which we will call
"Column", looks like this:
<br><br>#!/bin/sh<br>
#NOTE - this script does not work!<br>
column="$1"<br>
awk '{print $column}'<br>
<br>Click here to get file (but be aware that it doesn't work): <a href=https://www.grymoire.com/Unix/Scripts/Column1.sh>Column1.sh</a><br>
<p>A suggested use is:
<dl><dd>ls -l | Column 3<br>
</dl><p>
<p>This would print the third column from the
<i>ls</i> command, which would be the owner of the file.
You can change this into a utility that
counts how many files are owned by each user by adding
<dl><dd>ls -l | Column 3 | uniq -c | sort -nr<br>
</dl><p>
<p><b>Only one problem: the script doesn't work.</b>
The value of the
"column" variable is not seen by AWK. Change
"awk" to
"echo" to check. You need to turn off the quoting
when the variable is seen. This can be done by
ending the quoting, and restarting it after the variable:
<br><br>#!/bin/sh<br>
column="$1"<br>
awk '{print $'"$column"'}'<br>
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/Column2.sh>Column2.sh</a><br>
<p>This is a very important concept, and throws experienced programmers a
curve ball.
In many computer languages, a string has a start quote, and end quote,
and the contents in between.
If you want to include a special character inside the quote, you must
prevent the
character from having the typical meaning.
In the C language, this is down by putting a backslash before the character.
In other languages, there is a special combination of characters to to this.
In the C and Bourne shell, the quote is just a switch.
It turns the interpretation mode on or off.
There is really no such concept as
"start of string" and
"end of string". The quotes toggle a switch inside the interpretor.
The quote character is not passed on to the application.
This is why there are two pairs of quotes above.
Notice there are two dollar signs. The first one is quoted,
and is seen by AWK. The second one is not quoted, so the shell
evaluates the variable, and replaces
"$column" by the value.
If you don't understand, either change
"awk" to
"echo", or change the first line to read
"#!/bin/sh -x".
<p>Some improvements are needed, however.
The Bourne shell has a mechanism to provide a value for a variable
if the value isn't set, or is set and the value is an empty string.
This is done by using the format:
<dl><dd>${<i>variable</i>:-<i>defaultvalue</i>}<br>
</dl><p>This is shown below, where the default column will be one:
<br><br>#!/bin/sh<br>
column="${1:-1}"<br>
awk '{print $'"$column"'}'<br>
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/Column3.sh>Column3.sh</a><br>
<p>We can save a line by combining these two steps:
<br><br>#!/bin/sh<br>
awk '{print $'"${1:-1}"'}'<br>
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/Column4.sh>Column4.sh</a><br>
<p>It is hard to read, but it is compact.
There is one other method that can be used.
If you execute an AWK command and include on the command line information in the following form:
<dl><dd><i>variable</i>=<i>value</i><br>
</dl><p>
<p>this variable will be set when the AWK script starts.
An example of this use would be:
<br><br>#!/bin/sh<br>
awk '{print $c}' c="${1:-1}"<br>
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/Column5.sh>Column5.sh</a><br>
<p>This last variation does not have the problems with
quoting the previous example had.
You should master the earlier example, however, because you can use it
with any script or command. The second method is special to AWK.
Modern AWK's have other options as well. See the
<a href=https://cfajohnson.com/shell/cus-faq-2.html#Q24> comp.unix.shell FAQ</a>.
<p><h2><a name=uh-5 href=#toc-uh-5>The Essential Syntax of AWK</a></h2><p>Earlier I discussed ways to start an AWK script. This section will
discuss the various grammatical elements of AWK.
<p><h2><a name=uh-6 href=#toc-uh-6>Arithmetic Expressions</a></h2><p>There are several arithmetic operators, similar to C. These are the binary
operators,
which operate on two variables:
<table border=1>
<tbody><tr><th colspan=3>AWK Table 1<br>Binary Operators</tr>
<tr><th>Operator<th>Type<th>Meaning</tr>
<tr><td>+<td>Arithmetic<td>Addition</tr>
<tr><td>-<td>Arithmetic<td>Subtraction</tr>
<tr><td>*<td>Arithmetic<td>Multiplication</tr>
<tr><td>/<td>Arithmetic<td>Division</tr>
<tr><td>%<td>Arithmetic<td>Modulo</tr>
<tr><td><space><td>String<td>Concatenation</tr>
</table>
<p>
Using variables with the value of
"7" and
"3", AWK returns the following results for each operator
when using the print command:
<table border=1>
<tbody><tr><th>Expression<th>Result</tr>
<tr><td>7+3<td>10</tr>
<tr><td>7-3<td>4</tr>
<tr><td>7*3<td>21</tr>
<tr><td>7/3<td>2.33333</tr>
<tr><td>7%3<td>1</tr>
<tr><td>7 3<td>73</tr>
</table>
<p>There are a few points to make.
The modulus operator finds the remainder after an integer divide.
The
<i>print</i> command output a floating point number on the divide, but an integer
for the rest.
The string concatenate operator is confusing, since it isn't even visible.
Place a space between two variables and the strings are concatenated together.
This also shows that numbers are converted automatically into strings
when needed.
Unlike C, AWK doesn't have
"types" of variables. There is one type only, and it can be a string or
number. The conversion rules are simple. A number can easily be
converted into a string. When a string is converted into a number,
AWK will do so. The string
"123" will be converted into the number 123. However, the string
"123X" will be converted into the number 0. (NAWK will behave differently,
and converts the string into integer 123, which is found in the
beginning of the string).
<p><h2><a name=uh-7 href=#toc-uh-7>Unary arithmetic operators</a></h2><p>The
"+" and
"-" operators can be used before variables and numbers.
If X equals 4, then the statement:
<dl><dd>print -x;<br>
</dl><p>will print
"-4".
<p><h2><a name=uh-8 href=#toc-uh-8>The Autoincrement and Autodecrement Operators</a></h2><p>
<p>AWK also supports the
"++" and
"--" operators of C. Both increment or decrement the variables by one. The
operator can only be used with a single variable, and can be before or
after the variable. The prefix form modifies the value, and then uses
the result, while the postfix form gets the results of the variable,
and afterwards modifies the variable. As an example, if X has the
value of 3, then the AWK statement
<dl><dd>print x++, " ", ++x;<br>
</dl><p>
<p>would print the numbers 3 and 5. These operators are also assignment
operators, and can be used by themselves on a line:
<dl><dd>x++;<br>
--y;<br>
</dl><p>
<p><h2><a name=uh-9 href=#toc-uh-9>Assignment Operators</a></h2><p>Variables can be assigned new values with the assignment operators.
You know about
"++" and
"--". The other assignment statement is simply:
<dl><dd><i>variable</i> = <i>arithmetic_expression</i><br>
</dl><p>
<p>Certain operators have precedence over others;
parenthesis can be used to control grouping.
The statement
<dl><dd>x=1+2*3 4;<br>
</dl><p>
<p>is the same as
<dl><dd>x = (1 + (2 * 3)) "4";<br>
</dl><p>
<p>Both print out
"74".
<p>Notice spaces can be added for readability.
AWK, like C, has special assignment operators, which combine
a calculation with an assignment. Instead of saying
<dl><dd>x=x+2;<br>
</dl><p>
<p>you can more concisely say:
<dl><dd>x+=2;<br>
</dl><p>
<p>The complete list follows:
<table border=1>
<tbody><tr><th colspan=2>AWK Table 2<br>Assignment Operators</tr>
<tr><th>Operator<th>Meaning</tr>
<tr><td>+=<td>Add result to variable</tr>
<tr><td>-=<td>Subtract result from variable</tr>
<tr><td>*=<td>Multiply variable by result</tr>
<tr><td>/=<td>Divide variable by result</tr>
<tr><td>%=<td>Apply modulo to variable</tr>
</table>
<p><h2><a name=uh-10 href=#toc-uh-10>Conditional expressions</a></h2><p>The second type of expression in AWK is the conditional expression.
This is used for certain tests, like the
<i>if</i> or
<i>while</i>. Boolean conditions evaluate to true or false. In AWK, there is a
definite difference between a boolean condition, and an arithmetic
expression. You cannot convert a boolean condition to an integer or
string. You can, however, use an arithmetic expression as a
conditional expression. A value of 0 is false, while anything else is
true. Undefined variables has the value of 0.
Unlike AWK, NAWK lets you use booleans as integers.
<p>Arithmetic values can also be converted into boolean conditions by
using relational operators:
<table border=1>
<tbody><tr><th colspan=2>AWK Table 3<br>Relational Operators</tr>
<tr><th>Operator<th>Meaning</tr>
<tr><td>==<td>Is equal</tr>
<tr><td>!=<td>Is not equal to</tr>
<tr><td>><td>Is greater than</tr>
<tr><td>>=<td>Is greater than or equal to</tr>
<tr><td><<td>Is less than</tr>
<tr><td><=<td>Is less than or equal to</tr>
</table>
<p>These operators are the same as the C operators.
They can be used to compare numbers or strings.
With respect to strings, lower case letters are greater than upper
case letters.
<p><h2><a name=uh-11 href=#toc-uh-11>Regular Expressions</a></h2><p>Two operators are used to compare strings to regular expressions:
<table border=1>
<tbody><tr><th colspan=2>AWK Table 4<br>Regular Expression Operators</tr>
<tr><th>Operator<th>Meaning</tr>
<tr><td>~<td>Matches</tr>
<tr><td>!~<td>Doesn't match</tr>
</table>
<p>The order in this case is particular. The regular expression must be
enclosed by slashes, and comes after the operator.
AWK supports extended regular expressions, so the following are
examples of valid tests:
<dl><dd>word !~ /START/<br>
lawrence_welk ~ /(one|two|three)/<br>
</dl><p>
<p><h2><a name=uh-12 href=#toc-uh-12>And/Or/Not</a></h2><p>There are two boolean operators that can be used with conditional expressions.
That is, you can combine two conditional expressions with the
"or" or
"and" operators:
"&&" and
"||". There is also the unary not operator:
"!".
<p><h2><a name=uh-13 href=#toc-uh-13>Summary of AWK Commands</a></h2><p>There are only a few commands in AWK.
The list and syntax follows:
<dl><dd>if ( <i>conditional</i> ) <i>statement</i> [ else <i>statement</i> ]<br>
while ( <i>conditional</i> ) <i>statement</i><br>
for ( <i>expression</i> ; <i>conditional</i> ; <i>expression</i> ) <i>statement</i><br>
for ( <i>variable</i> in <i>array</i> ) <i>statement</i><br>
break<br>
continue<br>
{ [ <i>statement</i> ] ...}<br>
<i>variable</i>=<i>expression</i><br>
print [ <i>expression-list</i> ] [ > <i>expression</i> ]<br>
printf <i>format</i> [ , <i>expression-list</i> ] [ > <i>expression</i> ]<br>
next <br>
exit<br>
</dl><p>
<p>At this point, you can use AWK as a language for simple calculations;
If you wanted to calculate something, and not read any lines for
input,
you could use the
<i>BEGIN</i> keyword discussed earlier, combined with a
<i>exit</i> command:
<pre>#!/bin/awk -f
BEGIN {
# Print the squares from 1 to 10 the first way
i=1;
while (i <= 10) {
printf "The square of ", i, " is ", i*i;
i = i+1;
}
# do it again, using more concise code
for (i=1; i <= 10; i++) {
printf "The square of ", i, " is ", i*i;
}
# now end
exit;
}
</pre>
<p>
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/awk_print_squares.awk>awk_print_squares.awk</a><br>
The following asks for a number, and then squares it:
<pre>#!/bin/awk -f
BEGIN {
print "type a number";
}
{
print "The square of ", $1, " is ", $1*$1;
print "type another number";
}
END {
print "Done"
}
</pre>
<p>
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/awk_ask_for_square.awk>awk_ask_for_square.awk</a><br>
<p>The above isn't a good filter, because it asks for input each time.
If you pipe the output of another program into it, you would generate
a lot of meaningless prompts.
<p>Here is a filter that you should find useful. It counts lines, totals
up the numbers in the first column, and calculates the average.
Pipe
"wc -c *" into it, and it will count files, and tell you the average number of
words per file, as well as the total words and the number of files.
<pre>#!/bin/awk -f
BEGIN {
# How many lines
lines=0;
total=0;
}
{
# this code is executed once for each line
# increase the number of files
lines++;
# increase the total size, which is field #1
total+=$1;
}
END {
# end, now output the total
print lines " lines read";
print "total is ", total;
if (lines > 0 ) {
print "average is ", total/lines;
} else {
print "average is 0";
}
}
</pre>
<p>
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/average.awk>average.awk</a><br>
<p>You can pipe the output of
"ls -s" into this filter to count the number of files, the total size, and the
average size. There is a slight problem with this script, as it
includes the output of
"ls" that reports the total.
This causes the number of files to be off by one.
Changing
<dl><dd>lines++;<br>
</dl><p>to
<dl><dd>if ($1 != "total" ) lines++;<br>
</dl><p>
<p>will fix this problem.
Note the code which prevents a divide by zero.
This is common in well-written scripts.
I also initialize the variables to zero. This is not necessary, but it
is a good habit.
<p><h2><a name=uh-14 href=#toc-uh-14>AWK Built-in Variables</a></h2><p>
<p>I have mentioned
two kinds of variables: positional and user defined.
A user defined variable is one you create. A positional variable is
not a special variable, but a function triggered by the dollar sign.
Therefore
<dl><dd> print $1;<br>
</dl><p>
<p>and
<dl><dd> X=1;<br>
print $X;<br>
</dl><p>
<p>do the same thing: print the first field on the line.
There are two more points about positional variables that are very useful.
The variable
"$0" refers to the entire line that AWK reads in.
That is, if you had eight fields in a line,
<dl><dd> print $0;<br>
</dl><p>
<p>is similar to
<dl><dd> print $1, $2, $3, $4, $5, $6, $7, $8<br>
</dl><p>
<p>This will change the spacing between the fields; otherwise, they
behave the same.
You can modify positional variables.
The following commands
<dl><dd> $2="";<br>
print;<br>
</dl><p>
<p>deletes the second field.
If you had four fields, and wanted to print out the second and fourth
field, there are two ways. This is the first:
<dl><dd><pre>#!/bin/awk -f
{
$1="";
$3="";
print;
}
</pre>
</dl><p>
<p>and the second
<dl><dd><p><pre>#!/bin/awk -f
{
print $2, $4;
}
</pre>
</dl><p>
<p>These perform similarly, but not identically.
The number of spaces between the values vary.
There are two reasons for this.
The actual number of fields does not change.
Setting a positional variable to an empty string
does not delete the variable. It's still there, but the contents has been
deleted. The other reason is the way AWK outputs
the entire line. There is a field separator that specifies what character
to put between the fields on output.
The first example outputs four fields, while the second outputs two.
In-between each field is a space.
This is easier to explain if the characters between fields
could be modified to be made more visible.
Well, it can. AWK provides special variables for just that purpose.
<p><h2><a name=uh-15 href=#toc-uh-15>FS - The Input Field Separator Variable</a></h2><p>AWK can be used to parse many system administration files.
However, many of these files do not have whitespace as a separator.
as an example, the password file uses colons.
You can easily change the field separator character to be a colon
using the
"-F" command line option.
The following command will print out accounts that don't have passwords:
<dl><dd>awk -F: '{if ($2 == "") print $1 ": no password!"}' </etc/passwd<br>
</dl><p>
<p>There is a way to do this without the command line option.
The variable
"FS" can be set like any variable, and has the same function
as the
"-F" command line option. The following is a
script that has the same function as the one above.
<pre>#!/bin/awk -f
BEGIN {
FS=":";
}
{
if ( $2 == "" ) {
print $1 ": no password!";
}
}
</pre>
<p>
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/awk_nopasswd.awk>awk_nopasswd.awk</a><br>
<p>The second form can be used to create a UNIX utility, which I will name
"chkpasswd", and executed like this:
<dl><dd>chkpasswd </etc/passwd<br>
</dl><p>
<p>The command
"chkpasswd -F:" cannot be used, because AWK
will never see this argument. All interpreter scripts
accept one and only one argument, which is immediately after
the
"#!/bin/awk" string. In this case, the single argument is
"-f". Another difference between the command line option and the internal variable
is the ability to set the input field separator to be more than one character.
If you specify
<dl><dd>FS=": ";<br>
</dl><p>
<p>then AWK
will split a line into fields wherever it sees those two characters,
in that exact order.
You cannot do this on the command line.
<p>There is a third advantage the internal variable has over
the command line option:
you can change the field separator character
as many times as you want while reading a file.
Well, at most once for each line.
You can even change it depending on the line you read.
Suppose you had the following file
which contains the numbers 1 through 7 in three
different formats. Lines 4 through 6
have colon separated fields, while the others separated by spaces.
<dl><dd>ONE 1 I<br>
TWO 2 II<br>
#START<br>
THREE:3:III<br>
FOUR:4:IV<br>
FIVE:5:V<br>
#STOP<br>
SIX 6 VI<br>
SEVEN 7 VII<br>
</dl><p>
<p>The AWK program can easily switch between these formats:
<pre>#!/bin/awk -f
{
if ($1 == "#START") {
FS=":";
} else if ($1 == "#STOP") {
FS=" ";
} else {
#print the Roman number in column 3
print $3
}
}
</pre>
<p>
<br>Click here to get file: <a href=https://www.grymoire.com/Unix/Scripts/awk_example3.awk>awk_example3.awk</a><br>
<p>Note the field separator variable retains its value until it
is explicitly changed.
You don't have to reset it for each line.
Sounds simple, right? However, I have a trick question for you.
What happens if you change the field separator while reading a line?
That is, suppose you had the following line
<dl><dd>One Two:Three:4 Five<br>
</dl><p>
<p>and you executed the following script:
<dl><dd><pre>#!/bin/awk -f
{
print $2
FS=":"
print $2
}
</pre>
</dl><p>
<p>What would be printed?
"Three" or
"Two:Three:4?" Well, the script would print out
"Two:Three:4" twice. However, if you deleted the first print statement,
it would print out
"Three" once!
I thought this was very strange at first, but after pulling out some
hair, kicking the deck, and yelling at muself and everyone who had
anything to do with the development of UNIX, it is intuitively obvious.
You just have to be thinking like a professional programmer to realize
it is intuitive.
I shall explain, and prevent you from
causing yourself physical harm.
<p>If you change the field separator
<b>before</b> you read the line, the change
<b>affects</b> what you read.
If you change it
<b>after</b> you read the line, it will
<b>not</b> redefine the variables.
You wouldn't want a variable to change on you
as a side-effect of another action. A programming language
with hidden side effects is broken, and should not be trusted.
AWK allows you to redefine the field separator either before or after
you read the line, and does the right thing each time.
Once you read the variable, the variable will not change unless you
change it.
Bravo!
<p>To illustrate this further, here is another version of the
previous code that changes the field separator dynamically.
In this case, AWK does it by examining field
"$0", which is the entire line.
When the line contains a colon, the field separator is a colon,
otherwise, it is a space. Here is a version that worked with older versions of awk:
<pre>#!/bin/awk -f
{
if ( $0 ~ /:/ ) {
FS=":";
} else {
FS=" ";
}
#print the third field, whatever format
print $3
}