Pointers in C++ strict mode

Strict mode for C++
Early draft proposal

Pointers in strict mode

Introduction

This is a very early draft of a proposal for a memory-safe "strict mode" for the next major revision of C++. The goal of "strict mode" is to provide "memory safety", preventing code from damaging objects in memory via bad pointers, without imposing excessive overhead, adding garbage collection, or changing the C++ programming model unnecessarily.

Memory safety issues revolve around pointers. Pointers in strict mode come in a number of generic types, representing the various ways in which pointers are commonly used.

Generic types of pointers in strict mode

Syntax for a smart pointer template library is suggested below. This syntax is primarily for discussion; we are not proposing a specific template library, only requirements for safe ones.

Description	Syntax	Assignment semantics	Can be NULL?	Reference count update?	Safe?
*Pointers and references*
Strong reference	strong_ref<T>	Normal	No	Yes	Yes
Strong pointer	strong_ptr<T>	Normal	Yes	Yes	Yes
Weak pointer	weak_ptr<T>	Normal	Yes	Yes (weak count)	Yes
Temporary reference	auto T&	Only from object of larger scope.	No	No	Yes
Temporary pointer	auto T*	Only from object of larger scope.	Yes	No	Yes
*Iterators*
Strong iterator	strong_iterator<T>	Normal	No	Yes	Yes
Temporary iterator	iterator<T>	Only from object of larger scope.	No	No	Yes
*Compatibility with legacy code*
Old-style reference	T&	Normal	Yes	No	No
Old style pointer	T*	Normal	Yes	No	No

The naming and syntax suggested is for discussion, and can probably be improved upon.

Strong Pointers and references

Strong pointers are reference-counted smart pointers, similar to "smart pointers" commonly used with C++. When the last strong pointer to an object goes away, the object's destructor is called and the object is deallocated. The destructor must be called when the last strong pointer goes away, not at some later time. This implies a reference-counted implementation, rather than a garbage collector.

Smart pointers have a good history in C++. The main problems are overhead, exception safety, thread safety, and compatibility with legacy code. Some of those issues are dealt with below.

Consistent with existing C++ semantics, strong pointers can be null; strong references cannot.

new must be encapsulated by the smart pointer library. We use strong_new here.

Weak pointers

Weak pointers can't keep an object around. They're most useful as back pointers in linked structures. Weak pointers are too weak to cause memory leaks through circular linking.

class a {
    strong_ptr<b> ownedobj;
};

class b {
private:
    weak_ptr<a> ownerobj;
public:
    b(strong_ptr<a> owner); // constructor
};

This model of weak pointer semantics follows that of a recent addition to Perl 5. Weak pointers were added to Perl because Perl is prone to circular linking memory leaks. It's quite different from Java's weak pointers, which are tied to a garbage collection/finalizer model.

Weak pointers must be converted to strong pointers before accessing the object pointed to. Implicit conversions may be used, in which case deferencing a weak pointer implies the construction of a strong pointer with the scope of the expression.

With this model, it's still possible to create circular linking memory leaks using strong pointers. But it's not necessary to do so. This is the same compromise Perl 5 now lives with. It is possible to detect potential circular loops by global static analysis, which is a feature we might see in CASE tools. Such a general analysis is beyond the scope of most compilers.

There are no weak references, because references aren't supposed to become null.

Temporary references and pointers

Temporary pointers are our solution to the smart pointer overhead problem. A temporary reference can be created from a strong pointer, but the temporary pointer must have scope that can't outlive the strong pointer from which it was created. The mechanism for this uses the keyword auto.

void fn(auto typname* p)
{
// computationally intensive operations can now
// be performed using p without reference count overhead.
}

Taking a temporary reference, pointer, or iterator from a strong pointer or iterator locks the strong pointer or iterator against changes for the life of the temporary. (More discussion to be provided.)

Iterators

Iterators are one of the most useful objects to enter C++ in recent years. They encapsulate pointer-like semantics in a way that is checkable, and STL implementations have been written that perform such run-time checks. "Strict mode" allows arithmetic on iterators, where it can be checked, but not on pointers, where it can't.

Iterators typically have a short life, so the most common case is the temporary iterator. The new restrictions of explicit "auto" objects apply; the iterator must have scope such that it cannot outlive its collection.

For performance and future optimization, there is a restriction that temporary iterators can only be bound to a single collection during their short life, and must be initialized at their declaration. This eliminates the need to carry a pointer to the collection as part of the iterator, except for systems which do run-time checking without compiler support. Iterator checking can usually be optimized by hoisting the checks to the top of the loop and subsuming them into the loop termination check. For common loop forms, this eliminates all checking overhead. This is the optimization that makes high-performance inner loops with checking possible.

If an iterator with a longer life is needed, strong_iterator is available. Strong iterators update reference counts (on the collection, not its elements) so that the collection can't go away while it has an iterator pointed to it. Strong iterators are intended for the unusual case where an iterator is stored in some object. Strong iterators must contain a pointer to the collection.

When items are inserted into or removed from a collection, some iterators can become invalid, which can potentially create a dangling pointer situation. It's inefficient to have reference counts for every element of a collection, which is the Perl solution to this problem. Two rules handle this case.

Attaching a temporary iterator to a collection locks it against changes. Attempts to add or remove elements then throw exceptions.
Attaching a strong iterator to a collection does not prevent changes. Changes which invalidate the iterator must be detected at run time.

A temporary pointer or temporary reference can be obtained from any iiterator. This is the usual way to reference collection elements, and involves no extra overhead or syntax for temporary iterators. There's some overhead associated with taking a temporary from a strong iterator, because that "locks" the collection against changes. (More discussion to be provided.)

Exceptions for errors

All errors detected at run time by the "strict mode" system throw exceptions. The form of the exception has yet to be defined.

In support of future compiler optimizations, a strict mode exception can be thrown as soon as it becomes inevitable. Thus, if at entry to a loop, it is inevitable that a subscript error will occur during the loop, an exception can be thrown at loop entry. This allows optimizing compilers to hoist subscript checks out of loops. By "inevitable" is meant "inevitable in the absence of other exceptions". Note that code which in normal operation always exits a loop via an exception may encounter a false alarm problem.

Implementation

Later implementations, ones which optimize out redundant checks, can provide significant performance improvements without loss of safety. The goal is to get the checking overhead down to around 10%, or about two months of Moore's Law. Some British work on optimizing Pascal subscript checks in the 1980s (need reference) indicates that this is possible.

Conversion of existing code to strict mode.

Existing code can potentially recompile without change in strict mode, if it's written in a modern STL-oriented style. Realistically, not much code will run without change. But much code may run with minor changes. With the proposed "future built-in syntax", C-style pointers and references become strong pointers and references. Iterators become temporary iterators. STL collections continue to work. This covers the most common cases in modern code. Old-style arrays and pointer arithmetic have to be fixed. The most time-consuming part of conversion will probably be fixing all the old-style array declarations.

Because the default pointer is a strong pointer, conversion may result in a program with excessive reference-count updating. The use of temporary pointers can reduce reference counting overhead substantially. In time, we may see compilers that optimize out unnecessary reference count updates, but that's probably some time off.

Outstanding issues

The following problems with this proposal are known.

Thread-safe reference count updates can be inefficient on some systems.
Iterator locking needs to be fully described.
Placement syntax for new needs to be considered.

July 8, 2001