C is a small and relatively simple language when compared to C++, Java and C#.
Still, there’s much more under the C hood than meets the eye!
Writing correct C takes lots of practice and programming experience. But once you master the C language, you open infinite possibilities.
Mastering C is like having computer superpowers. It’s the native language of your operating system, the language your other working language is written in!
Being able to write correct C will save you hardware costs, hosting costs and will yield fast, lean programs that solve any problem efficiently.
In this article I’ve compiled a set of C tips and tricks that may help newbies avoid some common pitfalls and, maybe, impress their friends with some cool C foo. It’s a mix of C gotchas, interview questions, trick questions and general tips I’ve compiled over the years.
I hope you learn something new from this post!
You may have read C code that looks like this?
if (3 == three) {
// do something
}
Newbies are often puzzled by seeing the literal on the left. Why didn’t the coder write three == 3
instead?
First of all, keep in mind the symmetric property holds for equality in the C language.
So 3 == three
is equivalent to three == 3
.
Why, then, do experienced programmers place the literal to the left? Well, it’s an old trick meant to avoid this:
if (three = 3) {
// do something
}
Notice anything wrong? This is a very common error due to distraction or mistyping. Here we have an assignment, not a comparison. Variable three
is being assigned the value 3
which is always true
.
By placing the literal on the left, this code wouldn’t compile:
if (3 = three) {
// do something
}
We get:
error: lvalue required as left operand of assignment
So, to catch this kind of error at compilation, experienced programmers write the literal first.
As you may know, C object
arrays can be addressed by pointers to contiguous regions of memory where the pointer moves higher into the address space a sizeof(object)
number of bytes at a time.
C arrays and pointers are pretty close together (but not exactly the same).
One interesting consequence of the array/pointer relationship is how the square bracket syntax works. The following is how I learned C arrays to pointer conversion back in the day :
char a[4] = {'a','b','c','d'};
if (a[2] == *(a + 2)) {
printf("Matched!\n");
}
Which tells us that the bracket syntax is simply a shortcut for array arithmetic. So we get that …
a[b] == *(a + b)
… holds for any array a
and offset b
.
But does algebraic symmetry hold for pointer arithmetic? Yes it does!
For any array a and offset b, the following holds:
*(a + b) == *(b + a)
.
Then, if a[b] is simply syntax sugar for *(a + b), does inverting the array index and pointer work? Again, it does! Try it and see!
#include <stdio.h>
int main() {
char y[32] = {1,2,3,4,5,6,7,8,9};
for (int i=0;i<sizeof(y);i++) {
if (i[y] == y[i]) {
printf("y[%i] == %i[y]\n", i, i);
}
}
return 0;
}
This yields some interesting possibilities, especially for obfuscated C contests and trick questions. Inverting the array and index isn’t very intuitive but it works. For example:
printf("Program name: %s\n", 2[argv]);
Looks kinda weird, but it’s perfectly valid code which prints the program’s second command line parameter (third item on the command line).
realloc
Memory LeakHere’s a fairly common pattern:
char *buf = calloc(1, 256);
/* Do stuff which requires the array to be resized ... */
buf = realloc(buf, 1024);
The above example looks alright (note that I didn’t check calloc and realloc return values for clarity). But there’s a problem!
What if realloc fails? The realloc man page says:
The malloc() and calloc() functions return a pointer to the allocated memory, which is suitably aligned for any built-in type. On error,
these functions return NULL. NULL may also be returned by a successful call to malloc() with a size of zero, or by a successful call to
calloc() with nmemb or size equal to zero.
So the realloc return value may be NULL, in which case we’ve overwritten the original buf
pointer. That memory is lost forever.
To avoid this, use a temporary pointer, check the return value and then assign it to buf
. Like so:
char *buf = calloc(1, 256);
/* Do stuff which requires the array to be resized ... */
char *buf2 = realloc(buf, 1024);
if (buf2 == NULL) {
// handle allocation error
free(buf); // free the original buffer memory
return; // abort this subroutine
}
// allocation was successful, copy the pointer
buf = buf2;
An extra pointer is really cheap so, in the unlikely case realloc
fails, the above provides a safety net for your buf
pointer. This might save the day in mission critical code, especially when programming memory-limited embedded devices.
As you know, C arrays are simply pointers into memory. A string is actually just a pointer into a buffer ending in the NULL
byte (0x00
). A C array of X structs is simply a chunk of memory of size N * sizeof(struct X)
.
Since this is the case, then it makes sense to start counting from zero. The first object in an array has zero offset from the start of the memory region. For example, in a buffer buf
, to address the first item, you’d do (0 + buf)
. Which is just buf again. That’s why C arrays are zero-based.
But what if you wanted a 1-based array, which starts counting at 1 like hoooomans do? Maybe you’re writing a compiler for R or Lua and would like your arrays to be 1-based?
Turns out there’s a simple trick for this – and I didn’t come up with it. This example is copied verbatim from Numerical Recipes in C, pg. 18
float b[4],*bb;
bb = b - 1;
Since you’ve pointed bb one element behind the start of the b array, element zero in bb is now located at bb[1]
– making it a 1-based array, like in R, Lua and other languages.
To simulate a 1-based array in C, simply point your buffer one element behind the start of the array. Then when you say fetch element 1, it’ll fetch element 0 automagically. Just remember to not 0-address this array or you might get a core dump!
This is a simple commenting trick extracted from Wikibooks. When testing code and commenting out functions you don’t need right now, add an extra /*
before the closing */
.
/*
void EventLoop();
void EventLoop();
/**/
The comment section ignores the /*
– so it doesn’t affect the comment at all. And when you remove the opening /*
the closing comment is automagically matched.
As always, the preprocessor is your friend here – and it probably accommodates better structured debug blocks, allowing you to “uncomment” various sections of code by switching just one flag:
#define DEBUG 0
#if DEBUG
DebugSomething();
#endif
// ... later in code
#if DEBUG
DebugSomethingElse();
#endif
By setting DEBUG
to 0
on a central header, it’ll automatically comment out all those debug sections at once.
C gives you a lot of freedom when manipulating memory. In fact it gives you all the tools to shoot your own foot if you wish to. A C programmer can do a lot, but can’t do everything.
For instance, you cannot modify a string literal. C string literals really are immutable.
Example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(){
char *str = "this is my string here";
printf("my string = '%s'\n", str);
// this generates a core dump
for (int i=0;i<strlen(str);i++) {
*(str + i) = *(str + i) + 1;
}
// this also generates a core dump
str[0] = 'T';
printf("modified string = '%s'\n", str);
return EXIT_SUCCESS;
}
Run the above code and you’ll get a Segmentation fault (core dumped)
error under Linux, but results may vary since this is undefined behavior.
Since it’s undefined behavior, compilers don’t have to offer you a contract restricting what they do in such cases. But usually, compilers will place your string literals in the read-only section of the program executable.
For example, the above program, when compiled to x86_64 assembly using gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0
, yields this preamble:
.file "str.c"
.text
.section .rodata
.LC0:
.string "this is my string here"
.LC1:
.string "my string = '%s'\n"
.LC2:
.string "modified string = '%s'\n"
.text
.globl main
.type main, @function
main:
[...program continues ...]
Tip: To compile C code to assembly use the -S
switch with gcc
. E.g. gcc-S str.c
(Upper case S switch.)
As you can see, the literal strings are stored in a section called rodata
. You may have guessed by its name, this is a read-only memory section.
Dynamically generated strings are mutable, though. If you {m,c}alloc
your memory at runtime, then you can freely modify it.
Beginning C programmers may be surprised to find out that arrays and the array pointer addresses are the same.
Normally, a pointer’s address is different from the pointer itself. This is natural, since the pointer is a variable like any other.
C programmers tend to think of arrays and pointers as the same thing, but there’s a subtle difference. When you take the address of a pointer, you usually get something completely different from what it points to. Not so with arrays!
Here’s an example :
#include <stdio.h>
int main() {
char y[32];
char *yp = y;
if (y == &y) {
printf("Array addresses match.\n");
} else {
printf("Nope. Array address is different.\n");
}
if (yp == &yp) {
printf("Pointer addresses match.\n");
} else {
printf("Nope. Pointer address is different.\n");
}
return 0;
}
What does this program output? You may be surprised. Try the example and see!
This is a classic newbie mistake. In C, signed integers get promoted to unsigned in comparisons. This can yield some unexpected results.
For instance:
#include <stdio.h>
// 2020-09-04
// testing the signed integer to unsigned promotion gotcha
int main () {
int a = -1;
unsigned int b = 1;
if (a < sizeof(int)) {
printf("true\n");
} else {
printf("false\n");
}
if (a < b) {
printf("true\n");
} else {
printf("false\n");
}
return 0;
}
You may be surprised to find out this prints false
in both cases.
This is a case of implicit arithmetic conversion, which is arguably C’s biggest can of worms. Here’s the relevant rule from the C11 standard:
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
So the -1 in the above example is temporarily cast to unsigned value. Since the sign bit is the highest order bit, it turns into a high value unsigned int.
Adding the following two lines to the above program:
unsigned int c = (unsigned int)a;
printf("unsigned value %u\n", c);
Yields this:
unsigned value 4294967295
Which is the large value the signed integer a = -1
is converted to in the unsigned comparisons.
Note that gcc -Wall
will not catch this. You need to use the -Wextra
or -Wsign-compare
flags to get a warning about signed/unsigned comparison.
warning: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Wsign-compare]
Numerical Recipes in C, Second Edition (1992)
Language Gotchas (C section)
SO: What is your favorite C programming trick?