-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmain.tex
3825 lines (2969 loc) · 143 KB
/
main.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[
12pt,
paper=a4,
]{scrartcl} % ->
% -> KOMA.
% http://mirrors.ctan.org/macros/latex/contrib/koma-script/doc/scrguien.pdf
\newcommand{\thisdocversion}{\detokenize{2023_06_23_0}}
\input{preamble}
\input{mymacros}
\newcommand{\studentname}{Juan Antonio Pedreira Martos}
\title{NUMA11/FMNN01 -- Notes}
\author{\studentname}
% \date{September 5, 2021}
\date{\footnotesize{Version \texttt{\thisdocversion}}}
% Monkey patching the vertical space...
\addtokomafont{author}{\vspace*{-0.7em}}
\addtokomafont{date}{\vspace*{-0.3em}}
\addtokomafont{title}{\vspace*{-0.5em}} % Affects all
\addtokomafont{titlehead}{\vspace*{-6em}}
\begin{document}
% \makeatletter
% \texttt{\meaning}
% \makeatother
\maketitle
\vspace*{-2.5em}
% \vspace*{-0.5em}
These are my notes for the course ``NUMA11/FMNN01 Numerical Linear Algebra'' taught at Lund University in Autumn 2021 by the professors Claus Führer and Philipp Birken.
\addsec{Preliminaries}
Standard disclaimer: all mistakes/typos are mine. I am not responsible if this document causes any problem and/or scares your cat. Some parts are intentionally not mathematically rigorous for the sake of space and clarity.
I appreciate any constructive criticism/corrections/suggestions.
Conventions: Vectors (1D) are treated as single-column matrices (2D) by default and are always typeset in boldface: $\bm v$. To denote a row vector, use transpose ($\transp$) or Hermitian transpose ($^*$) of a vector.
Acronyms: RHS/LHS means Righ/Left-hand Side of an equation. ``iff'' means ``if and only if''.
\addsec{References}
\begin{itemize}
\item {} [Lecture slides here].
\item Trefethen, L.\@ N., Bau, D.\@ (1997). \emph{Numerical Linear Algebra}. 1st edition. SIAM. ISBN: 9780898713619.
\item Golub, G.\@ H., van Loan, C.\@ F.\@ (2013). \emph{Matrix Computations}. 4th edition. JHU~Press. ISBN: 9781421407944.
\item {} [TODO: finish list of references / use bibtex or biblatex]
\end{itemize}
\newpage
\addsec{Basic Linear Algebra notions}
Assumed known concepts: vector space, scalar, vector, dimension, linear combination, span of vectors, basis, collinear vectors, linear in/dependance, vector subspace.%, positive / negative (semi)definite.
\begin{description}
\item[Matrix:] Sequence of numbers laid out in a ``table'' $A$ of size ${m\times n}$, that is, $m$ rows (vertical vectors, ordered top-to-bottom), by $n$ columns (vertical vectors, ordered left-to-right); $m\cdot n$ entries in total.
These numbers (scalars) can be real $\RR$ or complex $\CC$ (or in any field in general...).
Individual entries are denoted by $a_{ij}$ or (to avoid confusion) $(A)_{i,j}$.
Columns of a matrix $A$ are denoted $\bm a_i$ and rows are denoted $ \bm a_{i,:}\transp \in \RR^{1\times n}$.
The entries $a_{ii}$ ($1 \le i\le \min(m,n)$) form the diagonal of the matrix $A$.
\item[Linear mapping:] a.k.a.\@ linear operator, vector space hom(e)omorphism.
Function $f$ between real (or complex) vector spaces \[\quad f: \RR^m \to \RR^n ;\quad \bm x \mapsto f(\bm x) = \bm y; \quad \bm x \in \RR^m, \, \bm y \in \RR^n .\]
Must meet linearity requirements:
\begin{itemize}
\item Mapping of the sum is the sum of the mappings: $\bm a + \bm b \mapsto f(\bm a) + f(\bm b)$.
\item Scaling of the input maps to the scaling of output (homogeneity): $\alpha\bm a \mapsto \alpha f(\bm a)$; this includes the case $\bm 0\mapsto \bm 0$.
\end{itemize}
[Pedantic remark: the input or output spaces of a real linear mapping $f$ may not actually be $\RR^m$ or $\RR^n$, but at least they must be \emph{isomorphic} to them; also, we always assume finite-dimensional vector spaces.]
Any output vector $\bm y$ is a linear combination of $n$ vectors (in $\RR^m$) called \emph{columns}; the $n$ coefficients of the linear combination are directly given by the coordinates of the input vector in a given basis.
If we select fixed bases for both the input and output spaces, then this operator is uniquely determined by a matrix that relates the coordinates between spaces and vice~versa: \[
A \in\RR^{m\times n} \leftrightarrow A\bm x = f(\bm x) = \bm y
\]
In this case (i.e.\@ given the operator and bases), we can talk about the ``matrix operator'' and ``action of the matrix'' on a vector. And we use then the term ``matrix'' loosely to refer both to the ``linear operator'' and the associated ``table of numbers'' in a given basis. Also the \emph{columns} correspond to the columns of the matrix, of course.
We distinguish between \emph{basis}-independant/-invariant properties of a mapping/matrix (e.g.\@ rank, eigenvalues, etc.) and \emph{basis}-dependant ones (e.g. entries of $A$, eigenvectors).
Usually, unless otherwise stated, everything is implicitly referred with respect to the ``canonical'' basis (vectors $\bm e_i = (..., 0 , 1, 0, ...)$). The list of coordinates of any vector in the canonical basis is considered the ``absolute'' representation of a that vector (tuple or list of numbers, called ``components'' of the vector).
Moreover, when $m=n$ we usually assume that the input basis is the same as the output basis.
IMPORTANT: the same linear operation on vectors can be described by different matrices (i.e.\@ different ``tables of numbers'') if the basis is changed (typical example: diagonalization).
\item[Square matrix:] Matrix where $m=n$; for any associated mapping, the input and output spaces are the same; the mapping is called \emph{endomorphism}.
$n$ is sometimes called the \emph{order} of the matrix $A$.
Matrices that are not square are called rectangular.
\item[Diagonal matrix:] Matrix where all entries outside the diagonal are zero. (Not necessarily square). Note that this property \emph{does} depend on the basis.
\item[Identity matrix:] Square diagonal matrix $I$, with all ones in the diagonal. When the input and output bases are the same, the associated operator is the identity (input and output vectors are the same).
\item[Zero matrix:] Matrix with all zeros $0\in\RR^{m\times n}$. Zero operator (the output space is the zero vector), absorbing element (multiplying by a zero matrix always results in a zero matrix, of the suitable size), basis-independent property.
\item[Transpose of a matrix:] (a.k.a.\@ transposed matrix) Matrix $A\transp$ that results by exchanging the rows and columns of the ``table'' (like a 2D symmetry with respect to the diagonal).
\[
A\in\RR^{m\times n} \implies A\transp\in \RR^{n\times m},\,\,
(A\transp)_{ij} = a_{ji}
\]
For any associated mapping, the input and output spaces are also exchanged as a consequence.
\item[Symmetric matrix:] Matrix $A$ such that $A = A\transp$ (necessarily square).
Main idea: the action performed by the (linear combination of the) columns is the same as the action performed by the rows.
\item[Hermitian transposed matrix:] matrix $A^* = \mathrm{conj}(A\transp)$; it is the matrix the results from transposition and complex conjugation of $A$.
If $A$ is real, then $A\transp = A^{*}$.
Many results on real symmetric matrices can be generalized by just replacing $A\transp$ with $A^*$.
\item[Hermitian matrix:] Matrix $A$ such that $A = A^*$ (necessarily square).
If $A$ is real symmetric then it is also Hermitian.
Property: the diagonal of $A$ Hermitian must be real.
\item[Inner/dot product:]
$\displaystyle\bm x^* \bm y =\sum_{i=1}^n \bar x_i^*\, y_i$; if vectors are real: $\displaystyle\bm x\transp \bm y =\sum_{i=1}^n x_i\, y_i$.
Note: this is the ``standard'' inner product (with Gram matrix $G=I$).
\item[Euclidean norm:] (induced by the standard inner product) \[\|\bm x\| = \normtwo{\bm x}= \sqrt{\bm x^* \bm x} = \sqrt{\sum_i |x_i|^2}.\]
This is the norm used ``by-default'' for vectors. It the same as the 2-norm ($p$-norm with $p=2$) (more info in the norm section).
\item[Angle between two vectors:]{} Geometric angle $\alpha_{\bm x,\bm y}$, between nonzero vector pairs $\bm x,\bm y$.
Defining property:
\[\bm x^* \bm y = \normtwo{\bm x}\,\normtwo{\bm y}\,\cos \alpha_{\bm x,\bm y}\]
\item[Unit/Normalized vector:] $\bm x$ is unit vector when $\|\bm x\|=1$.
\item[Matrix-by-vector product:] Application of the matrix operator $A$ on a given vector:
\[
A\in\RR^{m\times n},\,\bm x\in \RR^n;\quad A\bm x \in \RR^m
\]
Can be expressed both as linear combination of columns and as dot products with the rows:
\begin{align*}
A\bm x &=
(\bm a_1| \cdots | \bm a_n)\, \bm x=
\bm a_1 x_1 + \cdots + \bm a_n x_n=
\sum_{j=1}^n \bm a_j x_n
\\
A\bm x &=
\begin{pNiceMatrix}[hlines]
\bm a_{1,:}\transp
\\
\vdots
\\
\bm a_{m,:}\transp
\end{pNiceMatrix}
\, \bm x
=
\begin{pNiceMatrix}[hlines]
\bm a_{1,:}\transp \bm x
\\
\vdots
\\
\bm a_{m,:}\transp \bm x
\end{pNiceMatrix}
\end{align*}
Each entry is the result of a dot product with a row:
\[
(A\bm x)_i =
\bm a_{i,:}\transp,\bm x
=
\sum_{k=1}^n a_{ik}\, x_k,
\quad
i \in \{1,..., m\}
\]
\item[Matrix-by-matrix product:] Composition of two matrix operations in order.
Intermediate output and input spaces have to match, meaning that: the product $AB$ is defined for $A\in\RR^{m\times n}$, $B\in\RR^{p\times q}$ only when $n=p$; then, the shape of the product matrix $AB$ is $m\times q$.
As a function composition, it has the associative property: $(AB)C = A(BC) = ABC$; parentheses are not needed, because grouping order is not ambiguous.
It can be expressed in terms of matrix-by-vector operations:
\begin{align*}
A B =
\begin{pNiceMatrix}[vlines]
A\, \bm b_1
& \cdots &
A\, \bm b_q
\end{pNiceMatrix}
&=
\begin{pNiceMatrix}[vlines]
(\bm a_1| ... | \bm a_n)\, \bm b_1
& \cdots &
(\bm a_1| ... | \bm a_n)\, \bm b_q
\end{pNiceMatrix}
\\[10pt]
&=
\begin{pNiceMatrix}[vlines]
\sum_{k=1}^n \bm a_k b_{k1}
& \cdots &
\sum_{k=1}^n \bm a_k b_{kq}
\end{pNiceMatrix}
\end{align*}
\begin{align*}
A B =
\begin{pNiceMatrix}[hlines]
\bm a_{1:}\transp
\\ \vdots \\
\bm a_{m:}\transp
\end{pNiceMatrix}
\begin{pNiceMatrix}[vlines]
\bm b_{1}
& \cdots &
\bm b_{q}
\end{pNiceMatrix}
&=
\begin{pNiceMatrix}[hlines,vlines]
\bm a_{1,:}\transp, \bm b_{1}
& \cdots &
\bm a_{1,:}\transp, \bm b_{q}
\\
\vdots & \ddots & \vdots
\\
\bm a_{m,:}\transp, \bm b_{q}
& \cdots &
\bm a_{m,:}\transp, \bm b_{q}
\end{pNiceMatrix}
\end{align*}
Each matrix entry can be expressed as a row-by-column dot product:
\[
(AB)_{ij}= \bm a_{i,:}\transp, \bm b_j =
\sum_{k=1}^n a_{i,k} \, b_{k,j}
\]
Finally, $AB$ can be also expressed as a sum of column-by-row outer products:
\[
AB =
\begin{pNiceMatrix}[vlines]
\bm a_{1}
& \cdots &
\bm a_{n}
\end{pNiceMatrix}
\begin{pNiceMatrix}[hlines]
\bm b_{1:}\transp
\\ \vdots \\
\bm b_{n:}\transp
\end{pNiceMatrix}
=
\sum_{k=1}^{n} \bm a_k \, \bm b_{k:}\transp
\]
[TODO: I don't like the notation $\bm a_{i,:}\transp$, find something better...]
[TODO: alternative approach using Einstein notation (by understanding this notation, all previous interpretations become obvious).
See: \url{https://ajcr.net/Basic-guide-to-einsum/}
]
Properties of transposed of a product:
\begin{itemize}
\item $(AB)\transp= B\transp A\transp$
\item $(A_1A_2...A_{N-1}A_N)\transp =
A_N\transp A_{N-1}\transp...A_2 \transp A_1 \transp$
\item same with $(^*)$ instead of $(\transp)$
\end{itemize}
\item[Permutation matrix:] Square matrix with exactly one $1$ per column (hence, per row), the rest of entries are zeros. It is a ``permutation'' of the identity matrix.
It is used to permute (reorder) the rows or columns of a target matrix $A$:
\begin{itemize}
\item Pre-multiplication permutes the rows of $A$: $PA$.
\item Post-multiplication permutes the columns of $A$: $AP$.
\end{itemize}
Property: permutations are orthogonal ($P^{-1}=P\transp$).
\item[Range of a matrix:] $\range(A)$ is the vector space generated by the span of the columns of $A^{m\times n}$. It is a subspace of $\RR^n$. It does not depend on the basis (but the resulting coordinates are dependent).
\item[Rank of a matrix:] Dimension of the range of a matrix, that is, the number of independent columns.
Property: $\rank(A)=\rank(A\transp)$ (row and column ranks are equal).
Property: $\rank(A) \le \min(m, n)$.
The number $\rank(A) - \min(m, n)$ is called \emph{rank deficiency}.
\item[Full rank matrix:] Matrix with the highest possible rank, i.e.\@ $\min(m,n)$; the rank deficiency is zero. It is called full rank by columns if all columns are independent (i.e.\@ rank $n$, only happens with ``tall'' matrices $m\ge n$), and the same by rows (i.e rank $m$, ``fat'' matrices $m\le n$).
If $A$ is square ($m=n$): it is full rank by columns iff it is full rank by rows.
\item[Null space of a linear mapping/matrix:] (a.k.a.\@ kernel).
[...
\phantom{ } TODO: explain this
...]
\item[Nullity of a matrix:]
Defined as: $\mathrm{Nullity}(A) = \dim(\Ker(A))$.
Note that the \emph{rank deficiency} is a different concept; but that they result the same when $m\ge n$ (including square matrices).
\item[Rank–nullity theorem:]
\[
\rank(A) + \mathrm{Nullity}(A)
= n
\]
($n$ is the number of columns, i.e.\@ the dimension of the input space).
This is a dimension decomposition:
\[
\dim(\range(A)) + \dim(\Ker(A))
= n = \dim(\RR^n)
\]
\item[Inverse matrix:] $A^{-1}\in\RR^{n\times n}$ is the matrix associated with the inverse operator of $A$ (same size); that is, it allows to recover ''all`` possible inputs $\bm x$ from the output $\bm y=A\bm x$ by just applying it: $A^{-1}\bm y = A^{-1} A\bm x = \bm x$. $A A^{-1} = I$
$A^{-1}$ exists iff $A$ is full rank (then $A$ is called \emph{invertible}). If $A^{-1}$ exists it is unique. Invertibility of $A$ is a basis-invariant property.
Inverse of product property: $(AB)^{-1} = B^{-1}A^{-1}$, given $A$ and $B$ invertible.
\item[Singular matrix:] Square matrix that has no inverse (basis-invariant property).
The operator is non-bijective: for a given output $\bm y$ there may be zero (non-surjective) or multiple (non-injective) corresponding inputs $\bm x$.
An invertible matrix is also called regular.
\item[Property:] multiplication by invertible matrix preserves rank (and deficiency).
In particular: for $B$ invertible ($A$ might not be square)
\[\boxed{\rank(A) = \rank(AB) = \rank(BA)}\]
Proof:
\[
\bm x\in \Ker(A) \implies A\bm x=\bm 0 \implies B A \bm x = \bm 0 \implies \bm x\in \Ker(BA)
\]
Conversely, $B$ is invertible, then $\Ker(B) = \{\bm 0\}$.
\[ B\bm y = \bm 0 \implies \bm y = \bm 0 \]
Then:
\[\bm x \in \Ker(BA) \implies BA\bm x = \bm 0 \implies B\underbrace{(A\bm x)}_{\bm y} = \bm 0 \implies (A\bm x) = \bm 0 \implies \bm x \in \Ker(A) \]
[Shorter proof: $B A\bm x = \bm 0 \implies B^{-1}B A\bm x = B^{-1} \bm 0 \implies A\bm x = \bm 0$]
So, the deficiencies are the same and then $\rank(A) = \rank(BA)$.
Same argument with rows (premultiply by $\bm x\transp$): $\rank(A) = \rank(AB)$.
\item[Coordinates of a vector in a basis:] list of scalar coefficients used to (uniquely) represent a given vector as the linear combination of the vectors of a given basis.
\item[Change of basis:] change of representation as coordinates of a vector from a given basis into another.
An invertible (square) matrix uniquely represents a change of basis: if $A\bm x = \bm b$ with $A$ square has only one solution, we can interpret $\bm b$ as a vector (assume canonical basis for simplicity), and then $\bm x$ contains the coordinates in the basis defined by the columns of $A$.
\item[Similar matrices:] Two square matrices $A,\,B$ are called similar when there exists an invertible matrix $P$ such that $B= P^{-1}\, A\, P$.
Similarity defines an IMPORTANT equivalence relation: two similar matrices correspond to the same square mapping (endomorphism) under different bases, being $P$ the change of basis matrix. The pre- and post-multiplication correspond to the forward/inverse change of basis to/from the output/input spaces.
[Recall that we assume that for square matrices, the input and output spaces are referred to the same basis.]
When two square matrices are similar they share:
\begin{itemize}
\item the rank
\item the characteristic polynomial $\chi_A(\lambda)$
\item the eigenvalues $\lambda_i$
\item the eigenvectors (note though, that the coordinate representation changes as the basis is changed).
\item the eigenvalue multiplicities: $\mu_a(\lambda_i)$ and $\mu_g(\lambda_i)$
\item the trace and the determinant
\item in general, all the properties that are basis-invariant
\item if $A$ is Hermitian (/real symmetric) then $B$ is not in general also Hermitian, but this is guaranteed when $P$ is unitary (/orthogonal).
\end{itemize}
\item[System of linear equations:] $A\bm x = \bm b$, $A\in \RR^{m\times n}$.
The system may have either: one solution, infinitely many solutions (they form a subspace) or no solutions (when $\bm b \notin \range(A)$).
These cases can be characterized by the ranks of the matrices $A$ and $(A|\bm b)$:
\begin{itemize}
\item NO SOLUTIONS iff $\rank(A) \neq \rank(A|\bm b) = \rank(A) + 1$.
The system is called incompatible.
Reason: $\bm b$ does not lie in the range space of $A$ because it is not a linear combination of the columns of $A$, i.e.\@ $\bm b$ is linearly independent with them, so the resulting rank gets incremented by one.
\item INFINITE SOLUTIONS iff $\rank(A) = \rank(A|\bm b) < n$ (where $n$ is the number of columns in $A$).
Reason: $\bm b$ lies in the range space of $A$, and, at the same time, the columns of $A$ are linearly dependent, this implies that, for the vectors in the range space, there are multiple ways of being expressed as a linear combination of the columns of $A$. The vector $\bm b$ is therefore ambiguously representable.
In particular, two solutions $\bm x_1$ and $\bm x_2$ with $A\bm x_1=A\bm x_2=\bm b$ are related by the nullspace of $A$ because: $A(\bm x_2 -\bm x_1) = A \bm x_\Delta = \bm 0$, so the set of all possible solutions is $\{\bm x_1 + \bm x_\Delta : \, \forall \bm x_\Delta \in \Ker(A) \}$.
Then, there are $\dim(\Ker(A))=\mathrm{Nullity}(A)$ ``degrees of freedom''.
\item Exactly ONE SOLUTION iff $\rank(A) = \rank(A|\bm b) =
n$.
Reason: there is exactly one linear combination of the columns of $A$ that results in $\bm b$; this is because, by the rank-nullity theorem, $\mathrm{Nullity}(A)=0$.
If the matrix $A$ is square ($m=n$) it is invertible, and we can ensure in this case that the columns of $A$ form a basis of $\RR^n$ and thus all equations have one solution regardless of $\bm b$.
\end{itemize}
We can always convert a system with at least one solution...
\begin{itemize}
\item ...into a system with NO solutions by adding a contradictory equation, e.g.\@ $0=1$ or copy one equation and change the RHS, etc.
\item ...into a system with exactly the same solutions by adding a redundant equation, e.g.\@ $0=0$, repeating one of the previous equations, etc.
\item ...into a system with infinite solutions (adds one more ``degree of freedom''), by adding a new unused variable ($A$ has a new column of zeros).
\end{itemize}
Regardless of the ranks and number of solutions:
\begin{itemize}
\item if $m>n$, the system is called \emph{overdetermined} (used in least squares fitting).
\item if $n>m$, it is called \emph{underdetermined}.
\item and if $n=m$, it is called \emph{exactly determined}.
\end{itemize}
\item[Eigen-stuff:] an ``eigenvalue'' is a scalar $\lambda$ associated with a square linear operator $A$, such that there exists a nonzero vector $\bm v$, called eigenvector, with \[
\boxed{A\bm v = \lambda\bm v}.
\]
That is: input and output are collinear/parallel/proportional.
Note: all nonzero vectors collinear with $\bm v$ are eigenvectors (with the same $\lambda$).
Eigenvalues of $A$ are basis-invariant. There are as many eigenvalues as $n$ (matrix order), but some of them may be complex for real $A$ (prototypical example: rotations).
Each distinct eigenvalue $\lambda_i$ defines a subspace of associated eigenvectors. This set of vectors (+the zero vector) form a vector space called eigenspace (specifically, the $\lambda_i$-eigenspace). The dimension of each eigenspace is called geometric multiplicity $\mu_g(\lambda_i)$ (property of operator, basis-invariant).
We can consider two cases:
\begin{itemize}
\item $\lambda_i\neq 0$, then, the action of $A$ does not change the direction of the input eigenvector $\bm v$.
\item $\lambda_i=0$, then, the eigenspace is the nullspace; the input eigenvector is ``squished'' into the vector $\bm 0$. The $0$-eigenspace is precisely $\Ker(A)$.
\end{itemize}
Properties:
\begin{itemize}
\item $A$ and $A\transp$ have same eigenvalues; the eigenvectors are different in general.
\item $A$ is singular iff $\lambda=0$ is an eigenvalue. The eigenspace is $\Ker(A)$.
\item $A$ and $A^{-1}$ (if exists) have inverse eigenvalues. The eigenvectors for each $\lambda_i$, $\lambda_i^{-1}$ are the same.
\item $\alpha A$ has eigenvalues $\alpha\lambda_i$. The eigenvectors are the same as for $A$ and $\lambda_i$.
\item $A^k$ has eigenvalues $\lambda_i^k$. Note that, for example, if $1$ and $-1$ are eigenvalues of $A$, they are merged into 1 for $A^2$. The eigenvectors are the same.
\item If $p(x)$ is a polynomial, evaluating $p(A)$ results in a matrix that has eigenvalues $p(\lambda_i)$. The eigenvectors are the same.
\item $(A-\mu I)$ has eigenvalues $(\lambda_i - \mu)$. The eigenvectors are the same.
\item If $A=A^*$ (Hermitian or symmetric real), all eigenvalues are real. (Also, see later, they are orthogonally diagonalizable).
\item If $A$ is Hermitian and (semi)-def.\@ positive/negative, all eigenvalues are (zero or) positive/negative.
\item If $A$ has $n$ distinct eigenvalues, it is necessarily diagonalizable (because each eigenspace has at least dimension 1).
\item (Because of determinant property) The product of products of all eigenvalues of $A$ and all of $B$ is the same as the product of all eigenvalues of $AB$.
\item $AB$ has eigenvalue $0$ iff $A$ or $B$ (or both) has eigenvalue $0$.
\item In general, we cannot deduce the eigenvalues of $AB$ just from the eigenvalues of $A$ and $B$. Same for $A+B$.
\end{itemize}
\item[Trace of matrix:]{} $\trace(A)$ is the sum of the elements of the diagonal. If the matrix is square, it is equal to the sum of eigenvalues.
\item[Determinant:] weird operator $\det(\cdot)$ defined on square matrices (and linear mappings).
Difficult to compute (most efficient algorithms compute the eigenvalues first).
We mostly care only about it for the following properties:
\begin{itemize}
\item it is uniquely defined for any square matrix
\item it is basis-independent.
\item it is zero iff the matrix is singular
\item it is equal to the product of all eigenvalues
\item the determinant of the matrix product is the product of determinants
\item the determinant of the inverse is the inverse of the determinant
\end{itemize}
\item[Characteristic polynomial:] defined for a square matrix $A$, as \[\chi_A(\lambda) = {\det}(\lambda I - A),\] which is a polynomial on $\lambda$. $\lambda=\lambda_i$ is a root of $\chi_A(\lambda)$ iff it is eigenvalue of $A$. It is a basis-invariant property of the operator (endomorphism, as it is square).
Property: the constant term of [...] is the minus determinant of A. [TODO]
The multiplicity of a particular root/eigenvalue $\lambda_i$ in the polynomial $\chi_A(\lambda)$ is called arithmetic multiplicity $\mu_a(\lambda_i)$.
By the fundamental theorem of algebra: \[\sum_i \mu_a(\lambda_i) =n.\]
Also: $\mu_g(\lambda_i) \le \mu_a(\lambda_i)$.
Property: a square matrix is diagonalizable iff $\mu_g(\lambda_i) = \mu_a(\lambda_i)$.
Note: some books consider a real matrix non-diagonalizable when it has complex eigenvalues (those eigenvalues are not considered ``valid''); in such case this analysis becomes more complicated.
\newpage
\item[Cayley-Hamilton theorem:] a square matrix is always ``root'' of its characteristic polynomial (generalized to allow plugging in matrices instead of scalars $\lambda\in\RR$):\[
\chi_A(A) = 0 \in \RR^{n\times n}
\]
\item[Normal matrix:] square matrix that commutes with its transposed (/Hermitian for complex):
\[AA^* =A^*A\]
The following matrix types are normal: orthogonal (unitary), Hermitian (incl.\@ real symmetric; this also includes all (semi)definite types).
Property: the only normal matrices with eigenvalues on the unit circle are unitary.
\item[Orthogonal vectors:] two nonzero vectors $\bm x$, $\bm y$ are orthogonal when $\bm x\transp \bm y = 0$.
This is denoted as: $\bm x \perp \bm y$.
Two orthogonal vectors are linearly independent. A set of vectors is orthogonal if they are pairwise orthogonal (therefore, they form the basis of a subspace).
If also normalized (unit $2$-norm), they are called orthonormal.
\item[Orthogonal matrix:] real square matrix $Q$ such that $Q\transp = Q^{-1}$. A matrix $Q$ is orthogonal iff all columns (and rows) form an orthonormal basis.
\item[Unitary matrix:] Square matrix with $Q^*=Q^{-1}$ (Hermitian transpose). Generalization of orthogonal: if $Q$ is unitary and real, it is orthogonal.
Multiplication by orthogonal/unitary preserves inner product (and therefore angles and norm-2 lengths):
\begin{gather*}
(Q\bm x)^*(Q\bm y) = \bm x^*\bm y
\\
\normtwo{Q\bm x} =
\sqrt{(Q\bm x)^*(Q\bm x)} =
\sqrt{\bm x^*\bm x} =
\normtwo{\bm x}
\\
\normfrob{Q\bm x} =
\normfrob{\bm x}
\\[4pt]
|{\det}(Q)| =1 \qquad \text{(${\det}(Q)=\pm 1$, if $Q$ is real (orthogonal))}
\end{gather*}
Orthogonal/Unitary matrices are normal (commute with (Hermitian) transpose).
All eigenvalues $\lambda_i$ lie on the unit circle (these are the only normal matrices with this property). Note: even the real 2D rotations have complex eigenvalues!
Proof: let $Q$ be unitary and $\bm v$ be an eigenvector of $Q$ with eigenvalue $\lambda$, then
\[
Q\bm v = \lambda \bm v\implies (Q\bm v)^* Q\bm v = (\lambda \bm v)^* \lambda \bm v
\implies \cdots \implies \norm{\bm v}^2 = |\lambda|^2 \norm{\bm v}^2
\]
\[
\implies \boxed{|\lambda| = 1}
\]
All orthogonal (unitary) transformations are isometries in $\RR^n$ ($\CC^n$).
Particular cases of orthogonal matrices (in $\RR$):
\begin{itemize}
\item (Pure??) Reflection matrix: has one eigenvalue $\lambda_1=-1$, rest $\lambda_i=1$.
\item Proper rotation matrix: for 2D, two complex conjugates in unit circle; for 3D, also the eigenvalue $\lambda_3=1$ (eigenvector is the axis of rotation); for higher dims, complicated (Special Orthogonal group, $\mathrm{SO}(n)$), but always $\det(Q)=+1$.
\item Improper rotation matrix: composition of rotations and reflections that do not result in a proper rotation; $\det(Q)=-1$.
\item Planar rotation: it is a proper rotation for which the matrix in some orthonormal basis is as follows:
\begingroup
\newcommand{\snth}{s_\theta}
\newcommand{\csth}{c_\theta}
\NiceMatrixOptions{code-for-first-row = \scriptstyle,code-for-first-col = \scriptstyle }
\setcounter{MaxMatrixCols}{12}
\newcommand{\blue}{\color{blue}}
\[
R_\theta(f) = \begin{pNiceMatrix}[last-row,last-col,nullify-dots,xdots/line-style={dashed,blue}]
1& & & \Vdots & & & & \Vdots \\
& \Ddots[line-style=standard] \\
& & 1 \\
\Cdots[color=blue,line-style=dashed]& & & \blue \csth &
\Cdots & & & \blue {\text{-}\snth} & & & \Cdots & \blue \leftarrow p \\
& & & & 1 \\
& & &\Vdots & & \Ddots[line-style=standard] & & \Vdots \\
& & & & & & 1 \\
\Cdots & & & \blue \snth & \Cdots & & \Cdots & \blue \csth & & & \Cdots & \blue \leftarrow q \\
& & & & & & & & 1 \\
& & & & & & & & & \Ddots[line-style=standard] \\
& & & \Vdots & & & & \Vdots & & & 1 \\
& & & \blue \overset{\uparrow}{p} & & & & \blue \overset{\uparrow}{q} \\
\end{pNiceMatrix}\]
\[\snth = \sin\theta,\,\csth = \cos\theta\]
\endgroup
where $\theta$ is the rotation angle (counterclockwise), and $p$,~$q$ are the coordinate indices corresponding to a 2D plane.
Every proper rotation can be written as composition (matrix product) of proper rotations [TODO: find source of this claim...].
\end{itemize}
\item[Semi-orthogonal/semi-unitary matrix:] matrix $\hat Q \in \RR^{m\times n}$ whose columns are orthonormal. Note that, by definition, $m \ge n$ (there cannot be a set of more than $n=m$ orthogonal vectors in $\RR^m$).
It has the property: $\hat Q\transp \hat Q = I$.
Note that, unless $m=n$, we don't have the property $\hat Q \hat Q\transp \overset{!}{=} I$ (that would be fully orthogonal). (We see later that $\hat Q \hat Q\transp$ is a projection matrix).
Left and right multiplication by semi-orthonormal $\hat Q$ preserves 2-norm:
\[
\normtwo{\hat Q \bm x}=
\sqrt{(\hat Q \bm x)\transp (\hat Q \bm x)}=
\sqrt{\bm x\transp \hat Q\transp \hat Q \bm x}=
\normtwo{\bm x}
\]
\[
\normtwo{ \bm x \hat Q}=
\cdots =
\sqrt{\smash[b]{{\hat Q\transp \underbrace{\bm x\transp\bm x}_{\text{scalar}} \hat Q}}}
=\normtwo{\bm x}
\]\vspace{0.01em} % Fix underbrace
But multiplication by $\hat Q\transp$ does NOT preserve the 2-norm.
\item[Orthogonal components of a vector:]
With the orthonormal set of vectors (basis of a subspace of $\RR^m$), forming the columns of the semi-orthogonal matrix $\hat Q$ \[
\hat Q=
\begin{pNiceMatrix}[vlines]
\bm q_1 & \bm q_2 & \cdots &\bm q_n
\end{pNiceMatrix}
\in \RR^{m\times n}, \, m\ge n
,\] and a given vector $\bm u \in \sspan(\{\hat{\bm q_i}\}_{1\le i\le n} ) = \range(\hat Q)$, we can obtain the $i$-th coordinate on this basis by just using the scalar product:
\[
(\bm u)_{i,(\hat Q)} = \bm q_i\transp \bm u
\]
For a general vector $\bm v$, we obtain the ``residual component'' vector $\bm r$ (or simply ``residual''), with respect to $\hat Q$:
\[
\bm r = \bm v -
\underbrace{\left(
(\bm q_1^* \bm v)\, \bm q_1 + \cdots+
(\bm q_n^* \bm v)\, \bm q_n
\right)}_\text{projection of $v$ on $\sspan(\hat Q)$}
\]
It can be proven that $\bm r$ is either zero (when $\bm v\in \sspan(\hat Q)$) or orthogonal to $\sspan(\hat Q)$ (when $\bm v\notin \sspan(S)$), in both cases we have: \[\bm q_i^* \bm r = 0, \, \forall i\]
Then, for the expression:
\[
\bm v = \bm r + \sum_{i=1}^{n}(\bm q_i \bm q_i^*)\,\bm v
\]
If $\bm v \notin \sspan(Q)$ this is a decomposition into $n+1$ orthogonal components.
Otherwise ($\bm v \in \sspan(Q)$), $\bm r$ is zero; e.g.\@ if $m=n$, then $Q$ is a basis of the whole space $\RR^n$, so $\bm r$ must be zero.
\item[Orthogonal complement of a set of vectors:]
Vector subspace formed by all vectors orthogonal to all vectors in a set $S$.
Denoted by: $S^\perp$.
Property: if $S$ is a subspace, then \[(S^\perp)^\perp = S.\]
And also, when $S\subseteq \RR^n$ is subspace, then
\[ S \oplus S^\perp = \RR^n.\]
[TODO: explain the meaning of a direct sum.]
\item[Diagonalization:]
(a.k.a.\@ eigendecomposition) decomposition of a given matrix $A$ by using a \emph{similar} diagonal matrix $\Lambda$, that is:
\[\boxed{A = P\,\Lambda\,P^{-1}}.\] $\Lambda$ actually contains the eigenvalues ($\lambda_i$ repeated $\mu_a(\lambda_i)$ times), and the columns of $P$ are lin.\@ indep.\@ eigenvectors (they form a basis).
When such a decomposition is possible, that is, if $n$ lin.\@ independent eigenvectors can be found (hence $P$ square can be built), then $A$ is diagonalizable, a.k.a.\@ non-defective.
$A$ is diagonalizable iff all $\mu_a(\lambda_i)=\mu_g(\lambda_i)$, or iff (equivalent condition) $\sum_i\mu_g(\lambda_i)=n$.
(Note: here we assumed the convention that a complex eigenvalue is considered valid for real matrices)
In the case of defective matrices, this can be generalized by replacing $\Lambda$ diagonal with an ``almost-diagonal'' version $J$ called Jordan canonical form; $J$ is block diagonal, where each block $J_i$ has eigenvalues on the diagonal and ones on the first upper diagonal (or lower, depending on convention). If $A$ is diagonalizable, then $\Lambda=J$ is valid in the diagonalization.
Property: a matrix is \emph{orthogonally diagonalizable} ($A=Q\Lambda Q\transp$, with $Q\transp=Q^{-1}$) iff it is normal. If it is also real symmetric/Hermitian then all eigenvalues are real (in fact: for a normal matrix, the eigenvalues are real iff it is Hermitian).
\end{description}
\newpage
\addsec{Norms}
\begin{description}
\item[Vector norm axiomatic definition:]\phantom{ }
An operator on vectors $\norm{\cdot} : \RR^m \to \RR$ is called a norm when the following conditions are met:
\begin{itemize}
\item $\norm{a\bm x} = |a| \norm{\bm x}$ (homogeneity)
\item $\norm{\bm x + \bm y} \le \norm{\bm x} + \norm{\bm y}$ (triangle inequality)
\item $\norm{\bm x} = 0 \iff \bm x = \bm 0$
(when only this one fails, we have a seminorm)
\end{itemize}
\item[Vector norm properties:]\phantom{ }
\begin{itemize}
\item $\norm{\bm x} \ge 0$ (positive definiteness, sometimes as an axiom definition, but not necessary; also valid for seminorms)
\item $
|\norm{\bm x} - \norm{\bm y} | \le \norm{\bm x - \bm y}
\quad \text{(follows from triangle inequality)}
$
\end{itemize}
\item[Vector $p$-norms:]\phantom{ }
\begin{align*}
\norm{\bm x}_p =& \left(\sum_i |x_p|^p\right)^{1/p}
\quad \text{($p$-norm)}
\\
\norm{\bm x}_1 =& \sum_i |x_i|
\\
\normtwo{\bm x} =& \left(\sum_i |x_p|^2\right)^{1/2}
= \sqrt{\bm x ^* \bm x}
\\
\norminf{\bm x} =& \max_{1\le i\le m}|x_i|
\end{align*}
$\norminf{\bm x}$ can be proven to be the asymptotic case of $p$-norm for $p\to\infty$.
Note: not all vector norms are $p$-norms.
\item[Weighted norm:]\phantom{ }
\[
\norm{\bm x}_W = \norm{W\bm x}, \text{ for $W$ invertible (usually diagonal)}
\]
\item[H\"older's inequality:]
Relates scalar product and vector $p$-norms:
\[
|\bm x\transp \bm y| \le \norm{\bm x}_p\norm{\bm y}_q, \quad \text{ with }\frac{1}{p}+\frac{1}{q} = 1, \quad p, q\in [1,\infty]
\]
\item[Cauchy-Schwarz inequality:]
H\"older's for $p=q=2$.
\[
|\bm x\transp \bm y| \le \normtwo{\bm x}\normtwo{\bm y}
\]
Also recall: $
\bm x\transp \bm y =
\normtwo{\bm x}\normtwo{\bm y} \cos \alpha_{\bm x,\bm y}$
\newpage
\item[Vector norms are equivalent:] (this is valid for finite-dimensional vector spaces)
\[
c_1 \norm{\bm x}_\alpha \le
\norm{\bm x}_\beta \le
c_2 \norm{\bm x}_\alpha
\]
For some $c_1$, $c_2$ that do not depend on $\bm x$ (but maybe on $n$).
Some particular cases:
\begin{align*}
\normtwo{\bm x} &\le \normone{\bm x} \le \sqrt{n}\normtwo{\bm x}
\\
\norminf{\bm x} &\le \normtwo{\bm x} \le \sqrt{n}\norminf{\bm x}
\\
\norminf{\bm x} &\le \normone{\bm x} \le n\norminf{\bm x}
\end{align*}
\item[Spectral radius:] defined for square matrices as the largest eigenvalue (in abs.\@ value).
\[\rho(A) = \max_i |\lambda_i(A)| \]
\item[Matrix norms:]
The vector norm axiomatic definition can be directly generalized into matrices (just by replacing vectors with matrices).
Matrix norms are also all equivalent.
\item[``Entry-wise'' matrix norms:]
Norms that result from the use of a vector norm on the ``vectorized'' or ``flattened'' matrix $\bm v_A = \mathrm{vec}(A)$ (1D vector with $m\cdot n$ entries, may be traversed column-wise or row-wise or in any arbitrary order; this order irrelevant for most vector norms ``of interest'').
The usual example of ``Entry-wise'' matrix norm is the Frobenius norm (using vector 2-norm).
\item[Induced matrix norms:]
Given two vector norms $\norm{\cdot}_\alpha$ and $\norm{\cdot}_\beta$ that can be applied to the output and input spaces (resp.) of a matrix $A$, we define the induced matrix norm of $A$ as:
\[
\norm{A}_{(\alpha,\beta)}
=
\sup_{\bm x\neq \bm 0}
\frac{\norm{A\bm x}_\alpha}{\norm{\bm x}_\beta}
=
\max_{\norm{\bm x}_{\beta}=1}
{\norm{A\bm x}_\alpha}
\]
Pedantic remark: $\sup$ is not really needed and $\max$ is enough, because the set of values of the quotient is compact, regardless of the fact that $\RR^n-\{\bm 0\}$ is not compact.
Shorthand notation: $\norm{A}_{(\alpha,\alpha)}
=\norm{A}_{\alpha}$
For $Q$ orthogonal: $\norm{Q}_{2} = 1$, (because $Q\bm x$ preserves the 2-norm of $\bm x$).
Special cases:
\begin{itemize}
\item $\displaystyle \norm{A}_1 = \norm{A}_{(1,1)} = \max_j \sum_i |A_{ij}|$
\,\,\,
(largest column, sum over rows)
\item $\displaystyle \norm{A}_\infty = \norm{A}_{(\infty,\infty)} = \max_i \sum_j |A_{ij}|$
\,\,\,
(largest row, sum over columns)
\item $\displaystyle \norm{A}_2 = \norm{A}_{(2,2)} = \sqrt{\rho(A\transp A)}= \sqrt{\rho(A A\transp)} = \sigma_1$
Largest singular value of $A$ (see SVD), a.k.a. spectral norm.
\end{itemize}
\vspace{0.5em}
Property for any induced matrix norm $\alpha$: \[\rho(A) \le \norm{A}_\alpha\]
\newpage
\item[Frobenius norm:]
It is the ``entry-wise'' 2-norm.
\[
\normfrob{A}
= \sqrt{\sum_i\sum_j |A_{ij}|^2}
= \sqrt{\trace(A\transp\, A)}
= \sqrt{\trace(A\, A\transp)}
\]
(See section on properties of $A\transp A$ and $AA\transp$)
And also (because singular values are eigenvalues of $A\transp A$):
\[
\normfrob{A}
= \sqrt{\sum_i\sigma_i^2}
\]
(See section on SVD)
\item[Property:] for $Q_1$ and $Q_2$ orthogonal (of suitable sizes), the 2-norm and Frobenius norm are preserved:
\[\normtwo{Q_1A} = \normtwo{A} = \normtwo{AQ_2} \]
\[\normfrob{Q_1A} = \normfrob{A} = \normfrob{AQ_2} \]
\item[Submultiplicative norm:] Matrix norm $\norm{\cdot}$ with
\[
\norm{AB}\le \norm{A}\,\norm{B}
\]
All induced matrix norms (including $p$-norms) and the Frobenius norm are sub-multiplicative.
Example of NON-sub-multiplicative matrix norm: $\norm{A} = \max_{i,j} |a_{ij}|$ (this is the entry-wise $\infty$-norm).
\item[Property:]
$\norm{\cdot}$ is an induced matrix norm, then $\rho(A)<1\implies \norm{A} < 1$
(in fact, this would also work for any submultiplicative matrix norm??)
\item[Property:]
\[
\lim_{k\to\infty}A^k = 0 \iff \rho(A) < 1
\]
% Exercise!
%Proof: choose any induced matrix norm $\norm{\cdot}$.
%Suppose $\rho(A)<1$, then $\norm{A} < 1$, then as $\norm{A^k}\le \norm{A}^k$, we have $\norm{A^k} \to 0$ and therefore $A^k \to 0$
\item[Property:]
For any submultiplicative matrix norm:
\[
\lim_{k\to\infty}\norm{A^k}^{1/k} = \rho(A)
\]
\item[Compact set:]
In the usual case of a set in $\RR^n$ (or $\CC^n$), compact means ``closed and bounded''. (In the more general case, it means some topological abstract nonsense involving \emph{covers} and \emph{finite subcovers}).
\item[``Compactness argument'':] (a.k.a.\@ \emph{extreme value theorem})
``A continuous function attains its maximum (and minimum) value in a compact set''.
\end{description}
\newpage
\addsec{Properties of \texorpdfstring{$A\transp A$ and $AA\transp$}{AAT and AAT}}
What follows is important for the least squares methods and the Moore-Penrose pseudoinverse calculation.
More info in textbook [\footnote{\emph{{\'A}lgebra lineal y geometr{\'\i}a vectorial}, Borobia, A.\@ and Estrada. B.\@ UNED (Sanz y Torres), ISBN: 978-84-15550-85-3.}] Proposition 8.41 (I'm sorry, it is in Spanish...).
The matrix $A\transp A$ is sometimes called Gram matrix or Gramian matrix\footnote{See \url{http://www.seas.ucla.edu/~vandenbe/133A/lectures/inverses.pdf}.}.
Let $A\in \RR^{m\times n}$, then it holds:
\begin{itemize}
\item $A\transp A$ has size $n \times n$ and $AA\transp$ has size $m \times m$.
\item {} $A\transp A$ is symmetric square ($n\times n$) and is ``smaller'' than $A$ when $m>n$
(e.g.\@ $A\bm x=\bm b$ is an overdetermined system).
$AA\transp$ is symmetric square ($m\times m$) and is ``smaller'' than $A$ when $n>m$.
\item {} As they are symmetric, both matrices are also \emph{normal}.
\item {}$(A\transp A)$ and $(A A\transp )$ are symmetric positive semidefinite matrices (SPSD), therefore:
\begin{itemize}
\item All of the eigenvalues are nonnegative (can be zero).
\item They admit the Cholesky decomposition (in the SPD case, it is also unique).
%\item They admit the ``square root decomposition'': $M = B B$, with $B$ Hermitian (or real symmetric) and positive definite. Such $B$ (with those properties) is unique.
\end{itemize}
\item Each entry $(A\transp A)_{ij}$ is the inner product of two columns of $A$ (for $AA\transp$, those are rows of $A$).
This clearly explains why for $Q$ orthogonal with $m=n$ orthonormal columns we get $ Q\transp Q=QQ\transp=I$; that is, $Q$ is .
Note that $Q$ here is square; if it was $\hat Q$ rectangular (i.e.\@ \emph{semi}-orthogonal, with orthonormal columns; requires $m\ge n$), then the product $\hat Q\transp \hat Q = I$ is never commutative (the resulting matrix has a different size!); in fact, $\hat{Q}\hat{Q}\transp$ is NOT full rank (nonetheless, it is still SPSD).
As we will see later, $\hat{Q}\hat{Q}\transp$ is actually a projection matrix onto $\range(\hat Q)$.
\item The trace is the sum of all elements of $A$ squared (square of Frob norm):
\[ \trace(A\transp A) = \trace(A A\transp) = \sum_i\sum_j |a_{ij}|^2= \normfrob{A}^2\]
\vspace{-0.55cm}
\item $\rank(A) = \rank(A\transp A) = \rank(AA\transp )$.
\item $\Ker(A) = \Ker(A\transp A)$.
\item $\Ker(A\transp) = \Ker(AA\transp )$.
\item If $m < n$, $A\transp A$ cannot be invertible.
\item In the case $m\ge n$, $(A\transp A)$ is invertible iff $A$ is full (column) rank.
\item $(A\transp A)$ and $(A A \transp)$ share the same nonzero eigenvalues.
\item Being symmetric, by the spectral theorem, $A\transp A$ admits orthogonal diagonalization (and with real eigenvalues): \[A\transp A = V\Lambda V\transp.\] And also $A A \transp = U\Lambda^{\prime} U\transp$, where $\Lambda$ and $\Lambda^{\prime}$ contain the same nonzero elements.
\item If $A$ is square, the largest singular value is $\normtwo{A} = \sqrt{\rho(A\transp A)}$ (see SVD section).
\end{itemize}
\addsec{Singular Value Decomposition (SVD)}
Best explanation ever: \url{https://www.youtube.com/watch?v=rYz83XPxiZo}
Also, worth reading:
\begin{itemize}
\item \url{https://gregorygundersen.com/blog/2018/12/10/svd/}
\item \url{https://gregorygundersen.com/blog/2018/12/20/svd-proof/}
\end{itemize}
\vspace{0.5em}
Consider a general matrix $A \in \RR^{m\times n}$ (could be not full rank and rectangular (not square)).
[TODO: add geometric definition (ellipsoid axis and radius)]
We are looking for two sets of orthonormal vectors $\bm u_i \in \RR^{m}$ and $\bm v_i \in \RR^n$, such that:
\[A\bm v_i = \sigma_i \bm u_i\,, \quad \sigma_i \ge 0\]
And also $\sigma_i$ are ordered in descending order:
\[ \sigma_i > \sigma_j \iff i > j \]
\begin{itemize}
\item $\bm u_i$ are called left singular vectors, they are in the image (output space).
\item $\bm v_i$ are called right singular vectors, they are in the domain (input space).
\item $\sigma_i \ge 0$ are called singular values.
\end{itemize}
This could be seen as the generalization of the definition of eigenvalue/vector for non-square matrices; that is, for an eigenvector we would have $\bm u_i= \bm v_i$, but note that this is not a proper generalization because an eigenvalue can be negative or even complex (SVD has all $\sigma_i$ real and nonnegative) and the eigenvectors may not form an orthogonal basis (SVD has all $\bm u_i$ orthonormal and all $\bm v_i$ orthonormal). Moreover (proven later) a SVD always exists regardless of whether $A$ square is diagonalizable or not.
We can find at most $N=\min(m,n)$ independent vectors and the first $r=\rank(A)$ of which have $\sigma_i\neq 0$ and the last $N-r$ have $\sigma_i= 0$:
\[
\left\{
\begin{array}{lclcl}
A \bm v_1 &=& \sigma_1 \, \bm u_1 \\
A \bm v_2 &=& \sigma_1 \, \bm u_2 \\
&\vdots \\
A \bm v_r &=& \sigma_r \, \bm u_r \\
A \bm v_{r+1} &=& \bm 0 &=& 0 \,\bm u_{r+1} \\
&\vdots \\
A \bm v_N &=& \bm 0 &=& 0 \,\bm u_{N} \\
\end{array}
\right.
\]
\newpage
Now, from these equations:
\begin{itemize}
\item The last $\bm v_i$ (for $i>r$) must form an orthonormal basis of $\Ker(A)$.
\item The first $\bm u_i$ (for $i\le r$) must form an orthonormal basis of $\range(A)$.
\item The last $\bm u_i$ (for $i>r$) can be chosen arbitrarily, but in such a way that all $\bm u_i$ end up being orthonormal. Besides [TODO: check this], in the case that $n\ge m$, these last $\bm u_i$ span the subspace $(\range(A))^\perp$ (orthogonal complement of the range).
\end{itemize}
This can be rewritten in matrix notation instead of vector-by-vector:
\[
A \,
\begin{pNiceMatrix}[vlines]
\bm v_1 & \cdots&\bm v_r& \cdots & \bm v_N \\
\end{pNiceMatrix}
=
\begin{pNiceMatrix}[vlines]
\bm u_1 & \cdots&\bm u_r & \cdots & \bm u_N \\
\end{pNiceMatrix}
\begin{pNiceMatrix}
\sigma_1 \\
& \sigma_2 \\
& & \ddots \\
& & & \sigma_r \\
& & & & \ddots \\
& & & & & \sigma_N
\end{pNiceMatrix}
\]
\[
\implies
A \hat V = \hat U \hat\Sigma
\implies
\boxed{A = \hat U \hat \Sigma \hat V\transp}
\]
With $\hat{U}\in \RR^{m\times N}$, $\hat{V}\in \RR^{n\times N}$, $\hat\Sigma\in\RR^{N\times N}$.
$\hat U$ and $\hat V$ are \emph{semi}-orthogonal.