Why automatic memory management?
Garbage collection
Three techniques:
Mark and sweep
Stop and copy
Reference counting
Storage management is still a hard problem in modern programming
C and C++ programs have many storage bugs
forgetting to free unused memory
dereferencing a dangling pointer
overwriting parts of a data structure by accident
and so on \(\ldots\) (can be big security problems)
Storage bugs are difficult to find; a bug can lead to a visible effect far away in time and program text from the source
Some storage bugs can be prevented in a strongly typed language, for example, array bounds checking
Can types prevent errors in programs with manual allocation and deallocation of memory?
If you want type safety, then you must use automatic memory management
This is an old problem: studied since the 1950s for Lisp
There are several well-known techniques for performing completely automatic memory management
Until relatively recently (Java), they were unpopular outside the Lisp family of languages
When an object that takes memory space is created, unused space is automatically allocated
After a while there is no more unused space
Some space is occupied by objects that will never be used again (dead objects)
This space can be freed to be reused later
How can we tell whether an object will “never be used again”?
In general, it is impossible (undecidable) to determine
We will have to use a heuristic to find many, but not all, objects that will never be used again
Observation: a program can use only the objects that it can find
Java example:
String s = new String("Hello");
s = new String("Goodbye");
// the original "Hello" string is unreachable
An object \(x\) is reachable if and only if:
A local variable (or register) contains a pointer to \(x\), or
Another reachable object \(y\) contains a pointer to \(x\)
All reachable objects can be found by starting from local variables and following all the pointers (“transitive”)
An unreachable object can never be referred to by the program; these objects are called garbage
Consider the program:
x <- new A;
y <- new B;
x <- y;
if true then x <- new C else x.m() fi;
After x <- y
(assuming y
becomes dead there)
A
is not reachable anymoreB
is reachable (through x
)B
is not garbage and is not collectedB
is never going to be usedAt run-time we have two mappings:
The environment \(E\) maps variable identifiers to locations
The store \(S\) maps locations to values
Proposed garbage collector
for each location l in domain(S)
let can_reach = false
for each (v, l2) in E
if l = l2 then can_reach = true
for each l3 in v // v is X(..., ai = li, ...)
if l = l3 then can_reach = true
if not can_reach then reclaim_location(l)
Could we use the proposed Cool Garbage Collector in real life?
How long would it take?
How much space would it take?
Are we forgetting anything?
In Cool, local variables are easy to find
The stack is more complex
If we know the layout of a stack frame then we can find the pointers (objects) in it
Many things may look legitimate and reachable but will turn out not to be.
How can we figure this out systematically?
Start tracing from local variables and the stack
Note that B
and D
are not reachable from local vars or the stack
Thus we can reuse their storage
let todo = { all roots }
while todo is not empty
pick v in todo
remove v from todo
if mark(v) = 0 then
mark(v) <- 1
let v1, ..., vn be pointers contained in v
add pointers to todo
p <- bottom of the heap
while p < top of the heap
if mark(p) = 1 then
mark(0) <- 0
else
add block p...(p + sizeof(p)-1) to free list
p <- p + sizeof(p)
While conceptually simple, this algorithm has a number of tricky details
The todo list is used as an auxiliary data structure to perform the reachability analysis
Similarly, the free list is stored in the free objects themselves
We still have the issue of how to implement a traversal without using extra space
while scan not equal to alloc
let O be the object at scan pointer
for each pointer p contained in O
find O' that p points to
if O' is without a forwarding pointer
copy O' to new space (update alloc pointer)
set first word of 0' to point to the new copy
change p to point to the new copy of O'
else
set p in O equal to the forwarding pointer
increment scan pointer to the next object
Rather than wait for memory to be exhausted, try to collect an object when there are no more pointers to it
Each assignment operation has to manipulate the reference count
new
returns an object with a reference count of 1
If x
points to an object then let rc(x)
refer to the object’s reference count
x <- y
must be changed:
rc(x)
equals 0 then mark x
as freex <- y
Automatic memory management avoids some serious storage bugs
Garbage collection is going to be around for a while