Structs and Unions
May 30, 2016C has a variety of built in data
types. But let's pick just two to
mess around with: a long long int and a char.
A long long int (or just long long) is specified to be at least 64 bits
in size. That means that if we call sizeof on it, we should get at least 8
bytes, since 8 * 8 = 64.
#include <stdio.h>
int main() {
long long x;
printf("%lu", sizeof(x));
return 0;
}
Sure enough, this outputs:
8
A char is, by its name, one 'character'... but this is a bit misleading. In
ye olden days, a character was never more than one byte long, since the ascii
standard specified only a very small set of
symbols, and they could all be accomodated in a single byte. Nowadays, a more standard
encoding is Unicode, in which
many thousands of symbols can be represented by strings of bytes of somewhat
arbitrary length. It is no longer accurate, then, to say that a char, in the
sense of "character", is only one byte long, but the naming convention persists
in C, and the char type is always one byte long. It would
be drastically more accurate to call that type a byte, but here we are.
#include <stdio.h>
int main() {
char x;
printf("%lu", sizeof(x));
return 0;
}
yields:
1
Structs
A struct is a way for the programmer to bundle types together in one
structure. Let's say we wanted to represent a point in two dimensional space,
for example. A point is fully articulated when we have both an x value and a y
value. A Point struct might look like this:
struct Point {
int x;
int y;
};
We coud then use struct Point to declare a point variable just like any other type.
struct Point x;
How big is x, now, do you think? a struct Point contains two ints. How big is an int?
#include <stdio.h>
int main() {
int x;
printf("%lu", sizeof(x));
return 0;
}
4
Looks like it is four bytes long. A struct Point, then, should be 8 bytes long, right?
#include <stdio.h>
struct Point {
int x;
int y;
};
int main() {
struct Point x;
printf("%lu", sizeof(x));
return 0;
}
And indeed it is.
8
What about a point in 5 dimensional space? We would need 5 ints to fully
specify such a point, right?
#include <stdio.h>
struct Point {
int x;
int y;
int z;
int a;
int b;
};
int main() {
struct Point x;
printf("%lu", sizeof(x));
return 0;
}
If you're guessing this is 20 bytes long, well pat yourself on the back!
20
Normally, when we declare a var, we can simply assign a value to it like this:
int x;
x = 5;
Or even initialize it at the time of declaration:
int x = 6;
For structs, we can do the same type of thing! Let's look at that 2d point again.
struct Point {
int x;
int y;
};
Those x and y values are referred to as members of that struct. To access
them for reading or writing, we can use dot notation.
struct Point mypoint;
mypoint.x = 2;
mypoint.y = 5;
Now, mypoint is fully initialized, and equal to the point (2, 5) in regular
notation. You could do this in any order.
What about initializing the struct at the time of declaration, in one line? That looks like this, with an inlined static array:
struct Point mypoint = { 2, 5 };
If you simply list the member values in order like that, it will work, but you can also specify which one is which by being explicit:
struct Point mypoint = { .y = 5, .x = 2 };
Let's go back to those two types from the beginning. What if we wanted a
struct that contained one of each?
struct Thingy {
char letter;
long long number;
};
Who knows why we would need that, but whatever. The size of this struct in
memory is going to be equal to the sum of its members' sizes. So a char (1
byte) plus a long long (8 bytes), so 9 bytes.
#include <stdio.h>
struct Thingy {
char letter;
long long number;
};
int main() {
struct Thingy x;
printf("%lu", sizeof(x));
return 0;
}
Outputs:
16
Wait huh? This actually surprised me when I was writing this post! Because a
long long is 8 bytes long, the char member must be offset by the same
distance for optimizations! The 'true size' of the letter member is still
just one byte, but the space the member must take up in memory is now 8 bytes.
I do not fully understand this yet, but it looks like this
document explains the historical
context and reasoning. Interesting! There appears to be some black
magic
that will force the compiler to do that without padding, but in the
absence of a very good reason, this seems unneccesary.
Unions
union types look and behave an awful lot like structs do, syntactically, but
there is a very important difference. Whereas structs are a collection of
members that are assembled together in memory side by side, unions can only ever contain
one of their members at any one time. As such, a union type will be the size of its
largest member, in order to accomodate the biggest thing it will ever need to hold.
union Thingy {
int x;
int y;
};
Thingy looks like it has two ints inside of it. If this were a struct, it
would, and it would need to be as big as two ints to hold both of them! But
since this is a union, it will only ever need to hold one or the other.
int main() {
union Thingy myunion;
printf("%lu", sizeof(myunion));
return 0;
}
Will output: 4! In fact, so will this:
union Thingy {
int x;
int y;
int a;
int b;
int c;
int d;
int e;
int f;
};
int main() {
union Thingy myunion;
printf("%lu", sizeof(myunion));
return 0;
}
You get the idea!
Let's look at a weird thing about unions!
#include <stdio.h>
union Thingy {
int x;
int y;
};
int main() {
union Thingy myunion = { 156 };
printf("%i\n", myunion.x);
printf("%i\n", myunion.y);
return 0;
}
Outputs:
156
156
Because myunion.x and myunion.y are referencing the exact same memory
space. Because they are both ints, this is fine!
How about this one?
union Thingy {
char letter;
long long number;
};
int main() {
union Thingy myunion;
printf("%lu", sizeof(myunion));
return 0;
}
No surprise this time! Because a long long is by far the largest member of
this union, an instance of this type will have a size that is the same as the
long long, which is 8.
8
I will leave you with this bit of weirdness.
#include <stdio.h>
union Thingy {
char letter;
long long number;
};
int main() {
union Thingy myunion;
myunion = (union Thingy){ .letter = '!' };
printf("%c\n", myunion.letter); // !
printf("%lli\n", myunion.number); // 33
myunion = (union Thingy){ .number = 33 };
printf("%c\n", myunion.letter); // !
printf("%lli\n", myunion.number); // 33
return 0;
}
All of these are just different ways to organize and get at the exact same data. In every case above, the same memory is being assigned the same values, but we're accessing it and interpreting it in different ways.