C++ Creator Bjarne Stroustrup Answers Our Top Five C++ Questions

Mariel Frank and Sonny Li, authors of Codecademy’s Learn C++ course, recently got a chance to interview with Dr. Bjarne Stroustrup, the creator of C++. 

As part of the interview, he answered the highest voted C++ questions on Stack Overflow. While the whole interview is worth a read, Codecademy has generously allowed us to republish the Q&A portion of the interview. 

If you ever wondered if an answer on Stack Overflow was definitive, here’s about the closest you’ll get to certain (though we expect somebody to disagree). 

🥇 Why is processing a sorted array faster than processing an unsorted array?

Why is processing a sorted array faster than processing an unsorted array?
Note: This question is the #1 highest voted question on Stack Overflow of all time. 😱

That sounds like an interview question. Is it true? How would you know? It is a bad idea to answer questions about efficiency without first doing some measurements, so it is important to know how to measure.

So, I tried with a vector of a million integers and got:

Already sorted    32995 milliseconds
Shuffled          125944 milliseconds

Already sorted    18610 milliseconds
Shuffled          133304 milliseconds

Already sorted    17942 milliseconds
Shuffled          107858 milliseconds

I ran that a few times to be sure. Yes, the phenomenon is real. My key code was:

void run(vector<int>& v, const string& label)
{
    auto t0 = system_clock::now();
    sort(v.begin(), v.end());
    auto t1 = system_clock::now();
    cout << label 
         << duration_cast<microseconds>(t1 — t0).count() 
         << " milliseconds\n";
}

void tst()
{
    vector<int> v(1'000'000);
    iota(v.begin(), v.end(), 0);
    run(v, "already sorted ");
    std::shuffle(v.begin(), v.end(), std::mt19937{ std::random_device{}() });
    run(v, "shuffled    ");
}

At least the phenomenon is real with this compiler, standard library, and optimizer settings. Different implementations can and do give different answers. In fact, someone did do a more systematic study (a quick web search will find it) and most implementations show that effect.

One reason is branch prediction: the key operation in the sort algorithm is “if(v[i] < pivot]) …” or equivalent. For a sorted sequence that test is always true whereas, for a random sequence, the branch chosen varies randomly.

Another reason is that when the vector is already sorted, we never need to move elements to their correct position. The effect of these little details is the factor of five or six that we saw.

Quicksort (and sorting in general) is a complex study that has attracted some of the greatest minds of computer science. A good sort function is a result of both choosing a good algorithm and paying attention to hardware performance in its implementation.

If you want to write efficient code, you need to know a bit about machine architecture.

——————————————————————————————————————-
Pardon the interruption. Just here to say, the Stack Overflow Podcast is back! Come check us out and hear an interview with our new CEO.

🥈 What is the --> operator in C++?

What-is-the------operator-in-C---

That’s an old trick question. There is no --> operator in C++. Consider:

if (p-->m == 0) f(p);

It certainly looks as if there is a —-> operator and by suitable declaring p and m, you can even get that to compile and run:

int p = 2;
int m = 0;
if (p-->m == 0) f(p);

That means: see if p-- is greater than m (it is), and then compare the result (true) to 0. Well true != 0, so the result is false and f() is not called. In other words:

if ((p--) > m == 0) f(p);

Please don’t waste too much time on such questions. They have been popular for befuddling novices since before C++ was invented.

🥉 The Definitive C++ Book Guide and List

The-Definitive-C---Book-Guide-and-List

Unfortunately, there is no definite C++ book list. There can’t be one. Not everyone needs the same information, not everyone has the same background, and C++ best practices are evolving.

I did a search of the web and found a bewildering set of suggestions. Many were seriously outdated and some were bad from the start. A novice looking for a good book without guidance will be very confused!

You do need a book because the techniques that make C++ effective are not easily picked up from a few blogs on specific topics—and of course blogs also suffer from mistakes, being dated, and poor explanations. Often, they also focus on advanced new stuff and ignore the essential fundamentals.

I recommend my Programming: Principles and Practice Using C++ (2nd Edition) for people just beginning to learn to program, and A Tour of C++ (2nd Edition) for people who are already programmers and need to know about modern C++. People with a strong mathematical background can start with Peter Gottschling‘s Discovering Modern C++: An Intensive Course for Scientists, Engineers, and Programmers.

Once you start using C++ for real, you need a set of guidelines to distinguish what can be done and what is good practice. For that, I recommend the C++ Core Guidelines on GitHub.

For good brief explanations of individual language features and standard-library functions, I recommend www.cppreference.com.

#4. What are the differences between a pointer variable and a reference variable in C++?

What-are-the-differences-between-a-pointer-variable-and-a-reference-variable-in-C---
To learn more about references and pointers, check out Learn C++. 😊

Both are represented in memory as a machine address. The difference is in their use.

To initialize a pointer, you give it the address of an object:

int x = 7;
int* p1 = &x;
int* p2 = new int{9};

To read and write through a pointer, we use the dereference operator (*):

*p1 = 9;       // write through p1
int y = *p2;   // read through p2

When we assign one pointer to another, they will both point to the same object:

p1 = p2;       // now p1 p2 both point to the int with the value 9
*p2 = 99;      // write 99 through p2
int z = *p1;   // reading through p1, z becomes 99 (not 9)

Note that a pointer can point to different objects during its lifetime. That’s a major difference from references. A reference is bound to an object when it is created and cannot be made to refer to another.

For references, dereferencing is implicit. You initialize a reference with an object and the reference takes its address.

int x = 7;
int& r1 = x;
int& r2 = *new int{9};

Operator new returns a pointer, so I had to dereference it before assigning using it to initialize the reference.

To read and write through a reference, we just use the reference’s name (no explicit dereferencing):

r1 = 9;        // write through r1
int y = r2;    // read through r2

When we assign one reference to another, the value referred to will be copied:

r1 = r2;       // now p1 and p2 both have the value 9
r1 = 99;       // write 99 through r1
int z = r2;    // reading through r2, z becomes 9 (not 99)

Both references and pointers are frequently used as function arguments:

void f(int* p)

{
    if (p == nullptr) return;
    // ...
}

void g(int& r)
{
    // ...
}

int x = 7;
f(&x);
g(x);

A pointer can be the nullptr, so we have to consider whether it points to anything. A reference can be assumed to refer to something.

#5. How do I iterate over the words of a string?

How-do-I-iterate-over-the-words-of-a-string-

Use a stringstream, but how do you define “a word”? Consider “Mary had a little lamb.” Is the last word “lamb” or “lamb.”? If there is no punctuation, it is easy:

vector<string> split(const string& s)
{
    stringstream ss(s);
    vector<string> words;
    for (string w; ss>>w; ) words.push_back(w);
    return words;
}

auto words = split("here is a simple example");   // five words
for (auto& w : words) cout << w << '\n';

or just:

for (auto& w : split("here is a simple example")) cout << w << '\n';

By default, the >> operator skips whitespace. If we want arbitrary sets of delimiters, things get a bit more messy:

template<typename Delim>
string get_word(istream& ss, Delim d)
{
    string word;
    for (char ch; ss.get(ch); )    // skip delimiters
        if (!d(ch)) {
            word.push_back(ch);
            break;
        }
    for (char ch; ss.get(ch); )    // collect word
        if (!d(ch))
            word.push_back(ch);
        else
            break;
    return word;
}

The d is an operation telling whether a character is a delimiter and I return "" (the empty string) to indicate that there wasn’t a word to return.

vector<string> split(const string& s, const string& delim)
{
    stringstream ss(s);
    auto del = [&](char ch) { for (auto x : delim) if (x == ch) return true; return false; };

    vector<string> words;
    for (string w; (w = get_word(ss, del))!= ""; ) words.push_back(w);
    return words;
}

auto words = split("Now! Here is something different; or is it? ", "!.,;? ");
for (auto& w : words) cout << w << '\n';

If you have the C++20 Range library, you don’t have to write something like this yourself but can use a split_view.


Bjarne Stroustrup is a Technical Fellow and Managing Director at Morgan Stanley in New York City and a Visiting Professor at Columbia University. He’s also the creator of C++.

For more information on C++20: https://isocpp.org.



Done reading this awesome post? We have something fun for ya. The Stack Overflow podcast is back! Come check it out or listen below.


Author

Sonny Li
Senior Curriculum Developer, Codecademy

Related Articles

Comments

  1. Benjamin Gruenbaum says:

    Thanks for this blog post, good reading and appreciated. Just mentioning in the C++ book list that Bjarne’s own book (The C++ Programming Language) is excellent and that it captures its philosophy very well so definitely also recommended.

  2. C++ is a powerful language. Mostly because if it’s lower level access.[see copycat-coders-create-vulnerable-apps](https://meta.stackexchange.com/questions/334811/stack-overflow-made-the-bbc-news-copycat-coders-create-vulnerable-apps)

  3. I find it amusing that in the first question, Bjarne mentioned “with this compiler, standard library, and optimizer settings.” But not “and hardware”.

    There are microcontrollers that have no branch-mispredict penalty, just a fixed cost for taken vs. not-taken. But even in-order high-performance cores like you’d find in an ARM tablet / netbook (or anything you’d use to interactively compiler/run on) do branch prediction for the front-end. And all modern x86 CPUs are out-of-order and heavily dependent on their sophisticated branch-prediction. But taking a black-box approach to the effect like he’s aiming to, there’s no reason to assume that hardware is irrelevant.

    It’s a very common mistake that people forget that low-level performance effects can depend on the CPU microarchitecture, not just software. Include your CPU type (e.g. Skylake) or model number in your SO questions, people.

    Also, the test code seems to be timing *sorting* a sorted vs. random array. Of course that’s slower, it’s doing more work with most sort algos. The SO question is doing “processing” that has the same amount of total work, just grouping all the if-body-runs cases together or not.

    I guess Bjarne just looked at the question title, which is basically fine for a novelty interview. But I had been hoping from the title of this post that he had actually posted proper SO answers on SO.

    His answers to the other questions were better, though. Thanks for doing this interview.

  4. Jon Matthiews says:

    Can someone link the study referred to by: “In fact, someone did do a more systematic study (a quick web search will find it)”?

    I thought that kind of thing was against the answer rules 😉

  5. Damn html tags (how come there’s no markdown in these comments?) – I meant to write:

    Is it a typo, than one does `duration_castduration_cast<microseconds>`, and then writes `” milliseconds\n”;`?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.