Memcpy Vs Assignment Performance Cycle

Summary:

While they won't produce incorrect results when used appropriately, the mem* family of functions is obsolete. In particular, std::copy, std::move, std::fill, and std::equal[1] provide type-safe, more general, and equally efficient interfaces to perform the same tasks as std::memcpy, std::memmove, std::memset, std::memcmp, and their wide variants.

This article explores why the standard algorithms are superior and covers one pseudo-exception to the rule.

Background:What are these functions?

For those who don't know, the mem* family of functions are basically a way to take advantage of special hardware instructions to quickly copy, set, or compare a range of values. Here are their signatures:

std::memset is probably the easiest to explain:

The net result of running this code is that all of the elements of the array 'a' are initialized to 0. We have 'blasted the bytes' of everything from the first byte of that array to the last with the value '0'. memset sets each individual byte with our specified value (0).

The motivation? If we don't call memset here, the array 'a' holds garbage values - we're not sure what it contains.

The phrase 'blasting bytes' will show up several times in this article. It refers to using special, blazingly fast, hardware instructions that operate over multiple pieces of data, treating the data as nothing more than raw bytes. This is actually the appeal of the *mem functions: speed.

std::memcpy 'blasts bytes' from one location to another, copying the data. std::memmove does the same thing as memcpy, but can be used when the source and destination are the same array. std::memcmp quickly checks if two arrays contain the same byte content.

Ok - now that we understand what the mem* functions are, let's explore why the standard algorithms deprecate these functions we inherited from C:

Reason #1:The standard algorithms are type-safe

For the remainder of the article, we will restrict our discussion to memset, but our arguments apply to all of the functions.

Memset has an interesting function signature. We pass it a 'void*' as the destination - we can overwrite the bytes to anything. If the values you put in your destination don't fit the domain or don't make sense, oh well; it overwrites them anyway. In this manner, std::memset is as dangerous as a reinterpret_cast.

Meet std::fill and std::fill_n:

We'll talk about std::fill_n rather than std::fill, since it more closely resembles std::memset's signature, but std::fill and std::fill_n do roughly the same thing. We give it an iterator to start at, we tell it how many elements we will visit, and we pass it a value.

Some thoughts:
1) We can still pass in pointers, just like before. A pointer is actually a RandomAccessIterator, the most general kind of iterator.
2) 'count' refers to the number of elements, not the number of bytes. We've raised the level of abstraction! How many ints are in my array 'a'. How many strings are in my vector? The iterator's ++ operation will get us from one to the next safely thanks to the type information.
3) Because we're working with elements, not bytes, we pass in a value of type T. As long as you can assign something of type T to the dereferenced value of the 'first' iterator, this will compile and work.

We get type safety because of the second and third thoughts: We can only access one 'element' of our range at a time, and we can only set the value of it as something of the appropriate type, not arbitrary bytes. We're not reinterpreting the bytes.

Reason #2:The standard algorithms are more general

We can do more with std::fill's function signature.

1) We're not limited to pointers anymore; we can do a std::fill on a linked list and the call site will look exactly the same as a call to do the same thing on a vector or an array. That's awesome! Less mental overhead.

2) We're not limited to setting every byte to the same value anymore; we can set values of a range equal to the value of an entire struct:

Both the string and the double get copied into each of 's elements, safely.

3) We're not limited by what types we can set or copy. If you come from std::memset land, you may now be wondering: "Hey, wait! You can't call std::memset on a std::string!" and you'd be right! You can't. The reason you can't std::memcpy one string into another, is because memcpy simply 'blasts the bytes', with reckless abandon to what they represent. std::string has a pointer to the memory it allocated, if you std::memcpy it, you copy that exact pointer address too. Now you have two strings that both think they own that memory, and so your program will eventually crash.

std::fill is more general; it works safely with any type that supports copying.

Reason #3:The standard algorithms are equally efficient

Here's the real killer - you might be skeptical after reading the last paragraph from reason 2 (because std::fill handles a more general case), but std::fill and std::memset are actually equally efficient.

At compile-time the compiler can determine whether or not a type is what the standard calls 'trivially copyable'. In other words: the compiler can decide if it would be safe to simply 'blast the bytes' to make a copy, or, if like std::string, the class defines a non-trivial copy or move operation and a semantic (rather than bytewise) copy is necessary. There are a few other requirements as well, but there are standard type traits for all of them.

Given the ability to make that distinction at compile-time, the compiler can choose to do one of two things via overloading:
1) If it's safe to 'blast the bytes' - go ahead and do so. The code can invoke std::memset as an implementation detail of std::fill.
2) If it's not safe to 'blast the bytes', the compiler can instead loop over our range and manually invoke the copy constructor.

This means that std::fill is just as fast as std::memset in all the cases that std::memset is good for, but it will also work in cases where std::memset can't.

For the non-believers, I whipped up an implementation of std::copy that is optimized with std::memcpy whenever it is safe to do so:
Check it out, though be warned: it's a tough read.

You might still be wondering whether your standard library implementers actually make this optimization. I'm here to tell you that they, in fact, do. In particular, I learned this by watching a video by Stephan T Lavavej, Microsoft's standard library maintainer. And here's proof that this optimization is shipping to you in g++.

Lastly, a pseudo-exception:

While the standard algorithms are best suited in most cases, there's one situation where the mem* family is your only option, and you'll find it littered throughout C code.

If you've got a struct, say an 'addrinfo' class from the UNIX sockets API, and you need/want to zero it out before you start working with it, it's common practice to just call:

This zeros out whatever happens to be in the struct. You don't need to know how big it is or what's inside, this zeros everything. std::fill is insufficient here, because we would need an element with the right value to copy into this, and we don't have that.

The key idea here: C doesn't have constructors, which are what should really be zeroing these things out, so instead programmers hack around the language and have to remember to use memset so that struct internals aren't filled with garbage. Yay. *crying*

Conclusions:

Leave the mem* family of functions in the past. They're relics. Stick with the standard algorithms. They work in more cases, they're type-safe, and they are just as fast whenever it'd be possible to call the equivalent mem* function.

The only 'reasonable' case to fall back on the old functions is when you're working with nasty code from the past that doesn't present a clean interface.

Footnotes:
[1] std::memcmp is actually most similar to std::lexicographical compare, in that its return value indicates not only if the two input items are equal, but in the alternative, which is lexicographically first. I'm leaving std::equal at the top because it is more likely to see common use.

Posted by markisaa

Problem

There is a structure type defined as below:

If we want to assign type variable to , we usually have below 3 ways:

Consider above ways, most of programmer won’t use way #1, since it’s so stupid ways compare to other twos, only if we are defining an structure assignment function. So, what’s the difference between way #2 and way #3? And what’s the pitfall of the structure assignment once there is array or pointer member existed? Coming sections maybe helpful for your understanding.

The difference between ‘=’ straight assignment and memcpy

The notation is not only more concise, but also shorter and leaves more optimization opportunities to the compiler. The semantic meaning of = is an assignment, while just copies memory. That’s a huge difference in readability as well, although does the same in this case.

Copying by straight assignment is probably best, since it’s shorter, easier to read, and has a higher level of abstraction. Instead of saying (to the human reader of the code) “copy these bits from here to there”, and requiring the reader to think about the size argument to the copy, you’re just doing a straight assignment (“copy this value from here to here”). There can be no hesitation about whether or not the size is correct.

Consider that, above source code also has pitfall about the pointer alias, it will lead dangling pointer problem (It will be introduced below section). If we use straight structure assignment ‘=’ in C++, we can consider to overload the function, that can dissolve the problem, and the structure assignment usage does not need to do any changes, but structure memcpy does not have such opportunity.

The pitfall of structure assignment:

Beware though, that copying structs that contain pointers to heap-allocated memory can be a bit dangerous, since by doing so you’re aliasing the pointer, and typically making it ambiguous who owns the pointer after the copying operation.

If the structures are of compatible types, yes, you can, with something like:

} The only thing you need to be aware of is that this is a shallow copy. In other words, if you have a pointing to a specific string, both structures will point to the same string.

And changing the contents of one of those string fields (the data that the points to, not the itself) will change the other as well. For these situations a “deep copy” is really the only choice, and that needs to go in a function. If you want a easy copy without having to manually do each field but with the added bonus of non-shallow string copies, use strdup:

This will copy the entire contents of the structure, then deep-copy the string, effectively giving a separate string to each structure. And, if your C implementation doesn’t have a strdup (it’s not part of the ISO standard), you have to allocate new memory for pointer member, and copy the data to memory address.

Example of trap:

Below diagram illustrates above source memory layout, if there is a pointer field member, either the straight assignment or , that will be alias of pointer to point same address. For example, and both points to address of . Once one of them free the pointed address, it will cause another pointer as dangling pointer. It’s dangerous!!

Conclusion

  • Recommend use straight assignment ‘=’ instead of memcpy.
  • If structure has pointer or array member, please consider the pointer alias problem, it will lead dangling pointer once incorrect use. Better way is implement structure assignment function in C, and overload the operator= function in C++.

Reference:

Posted by Eric Zhangprogramming

« Retrospective of my 2012Team Gathering Game »

0 thoughts on “Memcpy Vs Assignment Performance Cycle”

    -->

Leave a Comment

Your email address will not be published. Required fields are marked *