Okay, but what if you're dealing with a rather high-dimensional tensor? In some kinds of coding it can happen, and you usually don't want to sacrifice performance when it does.
You can also do increasingly elaborate pointer arithmetic, but that seems worse, not better to me.
Me: building a fluent interface framework...
I already support a WrapperOf<T, T, T, T>
User: Can I have a WrapperOf<T, T, T, T, T> because I'm doing something weird?
Me: *sigh* god-damnit. You're right but I still hate it.
I've been a four-star programmer a few times. Imagine a blocked symmetric matrix where the rows and columns are indexed by triples (u,v,w). The entries are zero whenever u != u' or v != v', and because of symmetry you only store entries with w <= w'. But the range of v depends on the value of u and the range of w on the value of v. So you do
double ****mat = calloc (UMAX, sizeof(*mat));
for (int u = 0; u < UMAX; ++u) {
mat[u] = calloc (u + 1, sizeof(**mat));
for (int v = 0; v <= u; ++v) {
mat[u][v] = calloc (v + 1, sizeof(***mat));
for (int w = 0; w <= v; ++w) {
mat[u][v][w] = calloc (w + 1, sizeof(****mat));
for (int ww = 0; ww <= w; ++ww)
mat[u][v][w][ww] = some_function (u, v, w, ww);
}
}
}
and weep a little. In reality, this gets a bit optimized by allocating a single chunk of memory and carving that up into the pointer and data arrays, so everything is reasonably close together in memory.
If you do it with fixed-size arrays you can accomplish multi-dimension with just int*. Lots of pointer arithmetic needed though. Probably still faster than n levels of indirection.