This seems to be a common issue, so let’s talk about the different string types
in the Rust programming language. In this post I’m going to explain the
organization of Rust’s string types with
examples, then get into the lesser-used string types—
Path—and how the
Cow container can make working with
Rust strings easier.
The most important thing to understand is that string types in Rust come in
pairs, which I’ll call the “owned” variety and “slice” variety (the term
“variety” is being used here to mean roughly “set of types”, to make clear that
“owned” and “slice” are not actual Rust types, but a notion of organization for
Rust’s string types). The “owned” variety of strings—
ownership over their contents (hence the name!), and can grow or shrink. The
“slice” variety of strings—
Path—are views into some collection of characters. There’s also the
Cow wrapper type, which can make working with the two varieties of
strings easier while retaining good performance characteristics.
“Owned” vs. “Slice” varieties of Strings
str are, by far, the most common string types
in Rust, so I will use them to illustrate the difference between the two
varieties of string types.
String (the “owned” variety of string
type) is a wrapper for a heap-allocated buffer of UTF-8–encoded unicode
str (the “slice” variety of string type) is a buffer of
UTF-8–encoded unicode codepoints that may be on the heap or in the
program memory itself.
When you create a string literal in Rust, it is by default of type
There are two things this could mean, depending on how that buffer was
“Slice” on the Heap
If the reference is taken from a
String, it will be a reference to
the contents of the
String’s internal buffer, which is on the heap.
Handing out these references is a common pattern for effectively using “owned”
variety strings in Rust. For convenience’s sake, all references to “owned”
variety strings coerce to references to their “slice” variety equivalent. That
&str. It is considered good practice to use the latter
type for function parameters taking a reference to a string.
“Slice” in Program Memory
If the buffer is a string literal, with or without an explicit lifetime, then
the buffer will be located in program memory. In fact, all string literals have
'static lifetime by default, meaning they can be safely referenced from
anywhere in the program. Note though that having a buffer with a
lifetime is not the same as declaring a
static variable. A buffer with a
'static lifetime may be safely referenced from anywhere, but must still be
passed to functions explicitly; it is not globally visible. A variable must be
declared with the
static keyword to be globally visible.
There is some nuance here though. When returning a
&str from a function, the
lifetime of the reference must be tied to either a single input lifetime, or
a method taking either
&mut self. In the first case, the lifetime
of the returned reference will be the lifetime of the single input reference.
In the second case, the lifetime will be the lifetime of the reference to
If your function does not fit either of these cases,
rustc will return an
error, as in the following example:
You can avoid the lifetime elision problems entirely by explicitly annotating the function in one of the following ways:
So what do you do if you want to return a string from a function without
giving it a
'static lifetime, or want to return a string whose contents
aren’t known at compile time? You use the “owned” variety of string type, in
Why “Owned” Strings Exist
The “owned” varities of strings are allocated on the heap, and do not suffer the same limitations of the “slice” varities. They may be freely moved around without safety issues, handing out slices of their internal buffer as needed. The “slice” varities do not have the same guarantees, and so can’t be used as freely.
If you want to avoid dealing with ownership and borrowing for strings, you
can just turn every “slice” variety into an “owned” variety via the
to_owned() method (provided by the
which all “owned” varities of string types implement), and only work with the
“owned” varities, but doing so would incur performance penalties when you
unecessarily allocate string buffers on the heap, and would be considered poor
Rust style. Instead, it’s best to develop an understanding of when the two
varities of strings are needed.
Another important thing to note is that because the “owned” varities of strings abstract away the underlying buffer, they can grow or shrink, possibly allocating a new underlying buffer and copying their contents to this new buffer. The “slice” varities of strings cannot be resized, as they may not even be on the heap.
The “slice” variety strings can only be accessed via what’s called a “fat pointer.” This is because slices are “dynamically-sized types,” meaning they do not carry information about their own length. They are simply some collection of contiguous memory. A “fat pointer” to a slice stores both a pointer to the memory in question and the length of the data stored at that memory location. This is all handled automatically by Rust, but it means that the “slice” variety of strings are interacted with via references, rather than being handled directly. For more detail about dynamically-sized types, check out “The Rustonomicon,” which covers them in detail.
The String Type Pairs
The pairs of string types are differentiated from each other by the guarantees provided by the underlying buffer.
strare guaranteed to be valid UTF-8 encoded Unicode strings. If you’re wondering why UTF-8 is the standard encoding for Rust strings, check out the Rust FAQ’s answer to that question.
CStrare guaranteed to be compatible with C strings, and are usually used in FFI code.
OsStrare guaranteed to be platform-native strings (that is, they use the encoding of the current platform) that can be cheaply converted into
strtypes. They are usually used when interacting directly with the operating system.
Pathare wrappers around
OsStrthat provide convenient methods for operating on paths according to the rules of the current system. They are usually used when interacting with system paths.
There are times when working with strings in Rust that you want to cleanly
abstract over the two varities of string types. Maybe you want a container that
may hold either “owned” variety or “slice” variety strings, or you want to use
slices except for cases where an “owned” variety is absolutely necessary. In
these situations, use
Cow (which stands for “Clone on Write”) is a container that can take in
a “slice” variety of string type, and only convert that “slice” variety into an
“owned” variety when absolutely necessary (when something attempts to write to
the slice). This has the advantage of reducing the complexity of reasoning about
ownership, while also helping keep unecessary allocations down. It is not a
panacea, but it can be very helpful sometimes.
That was a relatively quick explanation of string types in Rust. Hopefully that helped to clarify a bit of why the different string types exist, and why they are useful. As a final reference, here is a table of the different types:
|Guarantees||“Slice” variety||“Owned” variety|
- This post has been updated based on corrections from CryZe92 and Ilogiq on the Rust subreddit. Thanks to both of them for the help!
- Replaced the word “sort” with “variety” based on feedback from rkangel on Hacker News.
- Clarified the wording of the unicode explanation, based on feedback from nothrabannosir, also on Hacker News.
- Fixed a typo based on feedback from respeccing on Twitter.
- Reworded the transition into the explanation of “owned” string types, based on feedback from bjzaba from the Rust subreddit.
- Replaced “three” with “two,” fixing an ommission from earlier changes.