From f979e38bbb74f903d52d7bc7055a7e69693dc346 Mon Sep 17 00:00:00 2001
From: Leandro Martinez <lmartine@unicamp.br>
Date: Wed, 15 Mar 2023 16:52:53 -0300
Subject: [PATCH] new mutability note

---
 Project.toml          |   2 +-
 docs/make.jl          |   2 +-
 docs/src/immutable.md | 245 ++++++++++++++++++++++++------------------
 3 files changed, 145 insertions(+), 104 deletions(-)

diff --git a/Project.toml b/Project.toml
index afa6b84..0ccbf90 100644
--- a/Project.toml
+++ b/Project.toml
@@ -1,7 +1,7 @@
 name = "JuliaNotes"
 uuid = "bb23d089-5b45-4deb-921c-94082d129500"
 authors = ["Leandro Martinez <leandro@iqm.unicamp.br> and contributors"]
-version = "1.0.19-DEV"
+version = "1.0.19"
 
 [deps]
 Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
diff --git a/docs/make.jl b/docs/make.jl
index e9d98f9..2a3db16 100644
--- a/docs/make.jl
+++ b/docs/make.jl
@@ -19,7 +19,7 @@ makedocs(
         "Anonymous functions" => "anonymous.md",
         #"Union splitting" => "splitting.md",
         "Tracking allocations" => "memory.md",
-        "Immutable variables" => "immutable.md",
+        "Mutability" => "immutable.md",
         "ERROR: No method..." => "nomethod.md",
         "Vector{Int} <: Vector{Real} is false???" => "typevariance.md",
         "Figures and LaTeX" => "figures.md",
diff --git a/docs/src/immutable.md b/docs/src/immutable.md
index c612f95..36bc86b 100644
--- a/docs/src/immutable.md
+++ b/docs/src/immutable.md
@@ -1,9 +1,29 @@
-# Immutable, mutable and allocations 
+# Immutable and mutable variables
 
-In Julia there are mutable and immutable values (variables), and this distinction and implications are not always obvious depending on the previous programming experience. 
+## Immutability and mutability
+
+In Julia there are mutable and immutable values (variables), and this distinction and implications are not always obvious depending on one's previous programming experience. 
 
 For instance, having past experience with Fortran, this distinction was a novelty for me. In Fortran one declares every variable, and at the moment of declaring one has the impresion of having reserved a place in the memory for that variable, even with scalar one does that. Along the Fortran code, when a different value is assigned to the label assigned to the variable, the value changes, and that can be quite simply interpreted as if the value stored in the memory position which was reserved for that variable changed. 
 
+From a Fortran user perspective, this result, for example, is quite astonishing:
+```julia-repl
+julia> function add_one(x)
+           x = x + 1
+           return x
+       end;
+
+julia> x = 1
+1
+
+julia> add_one(x)
+2
+
+julia> x
+1
+```
+That codes appears to be mutating the value associated to the variable `x` but, nevertheless, the value of `x` is the outer scope does not change. 
+
 In Julia (and many other higher-level languages, in particular), one has to understand the difference between mutable and imutable values. The reason, from a user perspective, derives from many features of high level languages. For instance, consider the code:
 ```julia-repl
 julia> x = 1
@@ -22,31 +42,26 @@ The number `1` is an immutable value. That is, we cannot convert it into somethi
 x = 1
 x = 2
 ```
-does not mean mutating the variable `x`, but simply assigning the label `x` to a different immutable value. The implications of this are important. For instance, consider the function `f` and its application below:
-```julia-repl
-julia> function f(x)
-           x = x + 1
-           return x
-       end;
+does not mean mutating the variable `x`, but simply assigning the label `x` to a different immutable value. 
 
-julia> x = 1
-1
-
-julia> f(x)
-2
+The implications of this are important, as was shown in the `add_one` function above.
+From the structure of the function, one would be tempted to interpret that `x` was mutated inside it, and thus that the value of `x` after the application of the function would have changed. It does not, though, and this can be confusing. In this case, the point is that what was passed to the function was the *value* `1`, an immutable value, and within `add_one` initially the label `x` was assigned to it. 
 
-julia> x
-1
+Next, in the line 
+```julia
+x = x + 1
 ```
+the label `x` was reassigned, to the result of the value of the input `x` (`1`), plus one, and the *value* `2` was returned. The label `x` of the outer scope was simply unchanged, and continued to be assigned to the value `1`. 
 
-From the structure of the function, one would be tempted to interpret that `x` was mutated inside it, and thus that the value of `x` after the application of the function would have changed. It does not, though, and this can be confusing. In this case, the point is that what was passed to the function was the *value* `1`, an immutable value, and within `f` initially the label `x` was assigned to it. 
-
-Next, the label `x` was reassined, to the result of the value of the input `x` (`1`), plus one, and the *value* `2` was returned. The label `x` of the outer scope was simply unchanged, and continued to be assigned to the value `1`. 
+If we wanted to assign the label `x` of the outer scope of the function to the output of `add_one`, we would need to do that explicitly:
+```julia
+x = add_one(x)
+```
 
-This is quite different from the following code snipet:
+Those considerations become initially more confusing when one sees the following code snippet:
 ```julia-repl
-julia> function g(x)
-           x[1] = 0
+julia> function add_one(x::Vector{Int})
+           x[1] = x[1] + 1 
            return x
        end;
 
@@ -55,124 +70,125 @@ julia> x = [1,2]
  1
  2
 
-julia> g(x)
+julia> add_one(x)
 2-element Vector{Int64}:
- 0
+ 2
  2
 
 julia> x
 2-element Vector{Int64}:
- 0
+ 2
  2
 ```
-The most fundamental difference here is that `x`, as a vector, is now a *mutable* value. That has an implication on how it is generally stored in the memory: it has an address. What is passed to the function is the address of the vector in memory. Inside `g` the vector is mutated, and the array is returned, meaning that its address in memory is returned. One can effectivelly assign a new label to the returned value:
+The most fundamental difference here is that `x`, as a `Vector{Int}`, that is, a vector of integers. That vector is *mutable*. That has an implication on how it is generally stored in the memory: it has an address. What is passed to the function is the address of the vector in memory. Inside `add_one` the vector is mutated, and the array is returned, meaning that its address in memory is returned. One can effectivelly assign a new label to the returned value:
 ```julia-repl
 julia> x = [1,2];
 
-julia> y = g(x);
+julia> y = add_one(x);
 
 julia> y
 2-element Vector{Int64}:
- 0
+ 2
  2
 
 julia> x
 2-element Vector{Int64}:
- 0
+ 2
  2
 
 julia> y === x
 true
 ```
-with the last line confirming that `x` and `y` are just two different labels assigned to the exact same object. 
-
-
-
-
-
-
-
-
+And the last line, comparing `y` and `x` with `===` confirms that these two labels are assigned to the exact same object.
 
+## Heap and Stack memory
 
+The distinction between mutable and mutable objects have important distinctions when it comes to performance. These distinctions are associated to the assumptions that the compiler can make about the behavior of these variables and, thus, about how they can be stored in the memory.
 
+In particular, there are two major ways in which values (numbers, objects, arrays), can be stored in memory: in the *heap* or in the *stack*. Very roughly, the *heap* is the most flexible form of storing objects, and probably the one that maps more clearly into our naive intuitions about how memory works. In the *heap* the objects have addresses, which point to the actual positions in the memory where the objects are located. For example, 
+```julia
+x = [1,2]
+```
+is an array with that (normally) has an address in the *heap* memory, pointing to where the array starts. When we mutate one element of this array, we explicitly change the bit content at the physical memory position where the array is stored. This is fine, but it is **slow**, there are too many indirections in this process.
 
+In the *stack* memory, the objects, and values, have a more loose sense. The stack, although physically the same, is a bunch of continuous memory reserved to be used by a program in an efficient manner. In particular, Fortran codes that do not use manual memory allocation will reserve all the memory required for the code in the *stack*. A continuous block of memory is reserved, and thus the accesses to the memory is fast, except that it brings some limitations in the use of the available computer memory (*stack overflow* errors are pretty common in old Fortran code).
 
+The stack memory can be used very efficiently, in particular because the processor can directly access the values without the indirection associated to the address of those values. A function can write data to the *stack* memory quickly, at the end of the last used position (piling the data), use the memory, and finally just release the memory by informing the system the limits of that memory block used. In summary, this is fast, and as much as possible one would want that the *stack* model of memory is used for critical operations instead of heap memory. 
 
+## Relation to mutability and immutability
 
-The mutable vs. immutable thing, from the perspective of a previous Fortran user: In Fortran everything seems to have its place in memory, as I said, and everything seems to be mutable, although that might not be true in practice. Thus, there is an abstraction layer there that must be overcome. I hope what I say in what follows is not too wrong. 
+This properties of memory management become somewhat linked to the mutability and immutability of variables in Julia. This is not a strict relation (as we will see), but it is rule of thumb that is important in the design of fast Julia code: 
 
-One learns in this process that values can "exist" in different ways. A variable may occupy a place in one type of memory, the type of memory that we understood that existed, which is called the "heap". The variables in the heap have an address to the position in memory they occupy and, thus, the value that they assume can be changed, by modifying the content of that position in the memory. This is where mutable variables are stored.
+- In general, new *mutable* values are stored in the *heap*, and new *immutable* values are stored in the *stack*. 
 
-In Fortran, from a user perspective, everything seems to be in the "heap" (although that might not be true, the compiler will decide that), in such a way that one can program as if every variable had an address in memory and its value can be modified by modifying the content of that position in the memory. Thus, everything seems to be mutable in Fortran.  Additionally, labels are assigned forever to the same positions in memory.
+One reason for this behavior is that mutable values, in particular most arrays, can change in size. If they can change in size, significant memory rearrangements cannot be ruled out by the compiler, and thus the array storage starts with an address (a pointer), which can be adjusted if the content of the array has to be moved from one physical position in the memory to another position. For that the program has to request a memory space to the system, which takes time. This is all *heap*  memory flexibility in action.
 
-Now we learn that some variables might exist in other types of memory, the "stack" and (I guess) the processor cache. These types of memory are much faster than the "heap" to work with, and if a variable can be assigned to these places your code will be faster. However, the values occupying these types of memory positions do not have an address in the usual sense. You cannot change the value associated to that position in memory because that value in that position in memory is not persistent, that is, it will be discarded as soon as possible. Even if the value will be used later, it might be that it is copied somewhere else in the stack without our knowledge if that results to be the best strategy for performance. We do not control where these values are stored and, then, we cannot assign different values for these variables, because this actually does not make sense, they are only values stored somewhere in a fast access memory.
+On the other side, immutable values have a fixed memory size, and thus the compiler can reserve a block of memory for the value in the *stack*, use it, and release it when the lifetime of the object is over. Still more flexibility and optimizations are possible since the specific reserved block can be discarded in favor of a new *stack* allocation if that turns out to be the most efficient strategy for the specific set of operations in hand. 
 
-Thus we learn that in a loop like:
+For example, in our `add_one` function, slightly modified here for clarifying the argument:
 ```julia
-s = 0.
-for i in 1:3
-   x = 2*i
-   s = s + x
+function add_one(x::Int)
+    y = x + 1
+    return y
 end
 ```
+there are two values involved. The immutable input value of `x`, and the also immutable value of `y` resulting from adding one to the input value.
 
-`x` might not be allocated in memory at all. It might occupy a temporary place in the fast stack memory or, even, only in the processor cache. In general we don't know what is going on with `x`, and we should not care about that, the people that implemented the compilers are much smarter than us and implemented most likely the best choice. Perhaps it will be stored in the slow "heap" memory, with an address, particularly if it was a huge array instead of a simple scalar, but it doesn't mater. (in this case probably it is just inlined, but the idea is the same)
-
-A Fortran user is surprised that a loop like that does not *allocate* anything. We learn that everything has its place in memory, even the counter of the loop, so that code should at least allocate some values.  Yet, now we discover that these allocations "do not count", because are fast allocations in these non-addressed types of memory.
-
-But we have to learn that for the compiler have freedom to choose what to do with `x`, the content of `x` cannot change. Thus, it must be immutable. In the loop above, it doesn't even make sense calling `x` the *same* variable at each loop iteration. It is just a temporary value assigned to some fast memory position that will be used and discarded. 
-
-Therefore, if we write
+Naively, one would could think that `y` requires a new position in memory, with a new address. Yet, this value of known bit size can be stored in the *stack* (in such a simple example the memory used can be even a processor cache, which is even faster), making the operation must faster than if we explicitly required a new memory place with an address bound. We can emulate requiring a new memory address for the `y` value with this code:
 ```julia
-x = 1 
-x = 2
+function add_one_allocate(x)
+    y = [ x + 1 ]
+    return y
+end
 ```
-the two `x` are probably just two completely independent values stored in these fast memories. Both the "first x" and the "second x" are immutable. Actually what is immutable is the Integer values 1 and 2, and `x` is only a temporary label to one or other of this values. The first `x` will be discarded when convenient without we knowing where it was stored, if it was stored at all. 
+In this example, `y` will be returned as a `Vector{Int}`, with a single element equal to `x + 1`. Let us see how these functions perform:
+```julia-repl
+julia> @btime add_one(1)
+  0.880 ns (0 allocations: 0 bytes)
+2
 
-Yet, if we write
-```julia
-x = Vector{Int}(undef,1)
-x[1] = 1
-x[1] = 2
+julia> @btime add_one_allocate(1)
+  15.060 ns (1 allocation: 64 bytes)
+1-element Vector{Int64}:
+ 2
 ```
-we are assuming that for some reason you want to access the same position in memory repeatedly, and this must be stored in the slower heap memory, where things have real addresses. This `x` is a mutable object, you can actually change the content associated with the position it occupies in memory explicitly. 
+Not only creating `y` in the second case "counts" as an allocation (because it is a *heap* allocation), but the code is much, much slower. This is "correct", and necessary here, since the compiler cannot know if `y` will be resized thereafter, such that it has to create a *heap* address to the object and request system memory for that. It cannot guarantee that `y` will fit the *stack*, which it can for immutable value resulting from the first function.  
 
-Later we learn that vectors can also be immutable (why not? If a number can be stored in these fast memories, why not a bunch of numbers?). And we can use StaticArrays where small vectors behave the same as any other immutable value, like a number. This means that:
+Thus, it is a general rule-of-thumb that working with immutable values is faster than with mutable ones, particularly in what concerns creating those values in intermediate computation states, where the values can be eventually discarded.
 
-```julia
+## Static arrays 
+
+The main property of immutable values is their fixed size. Thus, it is possible to perform fast computations with arrays if they are also fixed in size. The `StaticArrays` package brings this feature to Julia.
+
+In many senses, a static array is no different from any other specific type of number, `Int`, `Float64`, for examples. Its representation in memory is a continuous block of memory of fixed size, except that it may contain more "numbers" (or other values). 
+
+Static arrays allow programming patterns like this:
+```julia-repl
 julia> function f()
-         s = 0.
-         for i in 1:1000
-           x = SVector{3,Float64}(i, sqrt(i), i^2)
-           for j in 1:3
-             s = s + x[j]
+           s = 0.
+           for i in 1:1000
+               x = SVector{3,Float64}(i, sqrt(i), i^2)
+               for j in 1:3
+                   s = s + x[j]
+               end
            end
-         end
-         s
-       end
-f (generic function with 1 method)
+           s
+       end;
 
-julia> f()
+julia> @btime f()
+  2.601 μs (0 allocations: 0 bytes)
 3.343550974558874e8
-
-julia> @allocated f()
-0
-
 ```
-
-Wait, that function that generates 1000 vectors of dimension 3 does not *allocate* anything? Yet it doesn't, because these static arrays are immutable, so they only exist in the fast memory positions which are temporary. Knowing this allows a bunch of code optimizations which are very cool, and a very pretty syntax if you are dealing with particle simulations. For instance, you can do:
-
-```julia
+Wait, that function that generates 1000 vectors of dimension 3 does not *allocate* anything? Yet it doesn't, because these static arrays have fixed size, so they only exist in the fast memory positions which are temporary. Knowing this allows a bunch of code optimizations which are very cool, and a very pretty syntax if you are dealing with particle simulations. For instance, you can do:
+```julia-repl
 julia> x = [ SVector{3,Float64}(1,1,1) for i in 1:3 ]; # positions of 3 particles
 
 julia> function update_positions!(x)
-         for i in 1:length(x)
-           y = 2*x[i] # whatever needed
-           x[i] = y 
-         end
-       end
-update_positions! (generic function with 1 method)
+           for i in eachindex(x)
+               y = 2*x[i] # whatever needed
+               x[i] = y 
+           end
+       end;
 
 julia> update_positions!(x)
 
@@ -185,27 +201,53 @@ julia> x
 julia> @allocated update_positions!(x)
 0
 ```
+Thus, even if we need to create temporary intermediate arrays, this can be done quickly, without *heap* allocations, and with a syntax that resembles the arithmetics of vectors very naturally. As an additional advantage, the function above functions just as well if the input `x` array is an array of scalars or static vectors of any other dimension. 
 
-The `update_positions!` function is mutating the *elements* of `x` (`x` is mutable), but the elements of `x` are themselves immutable static vectors. This is, the line `y = 2*x[i]` is just creating a new static vector, and `x[i] = y` is not actually modifying the values of the positions in memory of the elements of `x[i]` as we would think (or it might be, that is just not your problem), and all that does not involve any access to the slow *heap*  memory of the computer. Thus you can deal with a vector as fast you can deal with a scalar. 
+Since it is the fixed size that allows these optimizations, can't we have *mutable* arrays, with fixed size, that get also stack-allocated? In fact, we can, whenever the compiler can prove that no *heap* memory operation is required in the lifetime of the objects. Indeed, the `StaticArrays` package implements *mutable* arrays with static size, the `MVector`s, which can be used, for example, with this pattern:
+```julia-repl
+julia> function update_positions!(x)
+           for i in eachindex(x)
+               y = MVector(x[i])
+               y[1] = 2.0
+               x[i] = SVector(y)
+           end
+       end;
 
-The possibilities to improve the performance of a numerical code increase. I have been able to write faster codes in Julia than in Fortran now, but that depends on some adaptation with that new way of thinking and with the new possibilities involved.
+julia> x = [ SVector{3,Float64}(1,1,1) for i in 1:3 ]; # positions of 3 particles
 
-Just to add one thing. There is nothing mysterious about StaticArrays.  They are convenient immutable structures, which you could have defined yourself, with the same allocation results:
+julia> update_positions!(x)
 
-```julia
+julia> x
+3-element Vector{SVector{3, Float64}}:
+ [2.0, 1.0, 1.0]
+ [2.0, 1.0, 1.0]
+ [2.0, 1.0, 1.0]
+
+julia> @btime update_positions!($x)
+  2.628 ns (0 allocations: 0 bytes)
+```
+
+What we did there is to copy the static arrays of `x` into a mutable static array of fixed size `y`, in `y = MVector(x[i])`. Next, we mutated the first component of `y`, and finally updated the value of `x[i]` with the mutated array, converted back to a static vector. The compiler could prove, there, that none of the mutable values *escaped* the scope of the function, and could optimize the code such that only *stack* memory used. This resulted in fast and non-allocating code.
+
+The possibilities to improve the performance of a numerical code increase.
+
+## Immutable structs
+
+There is nothing mysterious about StaticArrays. They are just convenient immutable structures, which you could have defined yourself, with the same allocation results:
+
+```julia-repl
 julia> struct P
-         x :: Float64
-         y :: Float64
-         z :: Float64
+           x::Float64
+           y::Float64
+           z::Float64
        end
 
 julia> function update_positions!(x)
-         for i in 1:length(x)
-           y = P( 2*x[i].x, 2*x[i].y, 2*x[i].z )
-           x[i] = y   
-         end
-       end
-update_positions! (generic function with 1 method)
+           for i in eachindex(x)
+               y = P( 2*x[i].x, 2*x[i].y, 2*x[i].z )
+               x[i] = y   
+           end
+       end;
 
 julia> x = [ P(1.0,1.0,1.0) for i in 1:100 ];
 
@@ -213,7 +255,6 @@ julia> update_positions!(x);
 
 julia> @allocated update_positions!(x)
 0
-
 ```
 
 Further information: