Interview question - About arrays

I had an interview for a job, and one question they asked was how to do something in a language I’m comfortable with.

So I told them what language would they like for me to give them a solution, I said I can do it in Java, PHP, C/C++, …, and they said that it’s not a problem, they understand everything, doesn’t mater which language it is written in, if you want to know the difference read bellow in the PHP section.

The job I was applying was for a scripting language Perl (the developers that interviewed me had experience with Perl), so in this case they said I can do it in PHP because its a scripting language too.

They wanted me to remove an item from an array, and to tell them how would I do it in the said language.

The problem is that they don’t understand the representation of array in PHP is completely different than any other language, but we will get into that in a little while.

Before we proceed lets see a little about the array data structure, and the internal workings of the array.

Array Data Structure

Arrays is a basic data structure representing a group of similar elements, accessed by an index, it can be stored effectively and accessed quite fast to the all of the elements.

An array doesn’t have an overhead per element. and any element of the array can be accessed at O(1) time by it’s index.

On the other hand, the array structure is not dynamic. Most programming languages provide a way to allocate arrays with arbitrary size (dynamically allocated arrays), the problem with that is when the space is used up, a new array will have to be generated and the old data copied into it.

The insertion and deletion of an element in an array requires to shift O(n) elements on average.

Static and dynamically allocated arrays

There are two types of arrays, and both of them are implemented differently.

Static array has a constant size and they are defined and present any time during runtime of the application.

But the dynamic array is created during runtime and may be deleted, when it’s not needed any more, and they can be quite big, even bigger than the amount of physical memory, even though they are dynamic they can’t be resized. But you can expand it by:

  • Creating a new array of bigger size
  • Copying data from old one to new one
  • Free the memory of the old one
Dynamic arrays

The problem when working with an array data structure is that its size cannot be changed during program run.

Internal representation

So how can we do this then ?

The application allocates memory and divides it into two parts, one part contains the data and the other one contains the free space.

In the beginning all space is free space. As you add items to the array, it will fill up, and when it has no space it will expand by creating a new larger array and copying the old contents to the new location.

  • storage: dynamically allocated space to store data
  • capacity: size of the storage
  • size: real size of the data

If you are worried about speed don’t be because copying the arrays is done on the hardware level and can be done very efficiently.

Dynamic Array

PHP

First of all, the arrays in PHP are not like traditional arrays, and everyone who comes to PHP or goes from PHP to other languages, are confused because arrays in PHP don’t work as expected, especially for people coming from Javascript, Java, Perl, C/C++, C#.

If you look at the syntax it is similar to other languages, so let’s take a look at a simple implementation of an array in PHP.

<?php
$element = array();
for ($i = 7; $i >= 0; --$i) {
  $element[$i] = $i;
}

print implode(' ', $element);

You would expect the following to be outputted:

0 1 2 3 4 5 6 7

But in reality, it will output

7 6 5 4 3 2 1 0

Why? Because it’s not an array, it’s an ordered map, which is something I pointed out to the interviewers, so I proceeded to explain to them how this works, as I was asked to do so in this language, so I should explain properly how this is handled in the language in question, and was met with skepticism, and doubt.

<?php
$element = array();
for ($i = 7; $i >= 0; --$i) {
  $element[$i] = $i;
}

unset($element[4]);

$element[4] = 4;

print implode(' ', $element);

In this case the result would be:

7 6 5 3 2 1 0 4

So what actually happened, because this is not an array and it’s an ordered map the result is not what you would expect.

If you actually want the keys to be sorted you will need to use ksort function.

The real index ordering is inaccessible to us, if you actually write this in JavaScript, Python, Ruby, Perl, Java, or C# you will get:

0 1 2 3 4 5 6 7

So the reason why is this happening is because this is an ordered map, but lets actually see what goes under the hood in PHP.

Basically everything in PHP is a HASH TABLE, not only the implementation of PHP arrays, but also they are used to store object properties, methods, functions, variables, … (everything)

And because the hash table is so fundamental to PHP, you should know what it actually is, before proceeding with PHP, at least that is what I think.

Any variable in PHP is defined by zval, and it’s value by zval_value, lets see the data structures.

These are taken from the PHP source code.

PHP_5_5/Zend/zend_hash.h#53

typedef struct _hashtable {
	uint nTableSize;
	uint nTableMask;
	uint nNumOfElements;
	ulong nNextFreeElement;
	Bucket *pInternalPointer;	/* Used for element traversal */
	Bucket *pListHead;
	Bucket *pListTail;
	Bucket **arBuckets;
	dtor_func_t pDestructor;
	zend_bool persistent;
	unsigned char nApplyCount;
	zend_bool bApplyProtection;
#if ZEND_DEBUG
	int inconsistent;
#endif
} HashTable;

PHP_5_5/Zend/zend.h#_zval_struct

struct _zval_struct {
    /* Variable information */
    zvalue_value value;     /* value */
    zend_uint refcount__gc;
    zend_uchar type;    /* active type */
    zend_uchar is_ref__gc;
};

PHP_5_5/Zend/zend.h#_zvalue_value

typedef union _zvalue_value {
    long lval;                  /* long value */
    double dval;                /* double value */
    struct {
        char *val;
        int len;
    } str;
    HashTable *ht;              /* hash table value */
    zend_object_value obj;
} zvalue_value;

Just a simple explanation, for those of you that don’t know C, you see the union there, that basically means that this is not actually a structure, but a single type.

In C, variables are just labels for raw memory address.

As you can see in the data structures above, we keep track of what primitive type is the value.

I won’t delve deep into explaining this, it will take quite a while to do so, and maybe I will do so in a future article.

If we actually have this code

<?php
$a = 1;
$b =& $a;
unset($a); 

In this case $a will be unset, but $b will still remain the same with the value 1.

Now let’s see this too

<?php
$a = 1;
$b = &$a;
$a = 2;
echo "$a = $b";

In the previous case when you unset, the value actually remains in $b, but in this case, if we change $a, $b will also change.

Well we strayed quite a bit from arrays, so let’s finish it here for PHP.

Java

In the heap (not on the stack), the array is a block of memory made of all the elements together, each element in the arrays is identified by a zero based index [0,1,2,3,….].

There are ways of putting objects and arrays onto the stack instead, it’s called Escape analysis, you can read some info on this link

The Java language does not offer any way to explicitly allocate an object on the stack, but this fact doesn’t prevent JVMs from still using stack allocation where appropriate. JVMs can use a technique called escape analysis, by which they can tell that certain objects remain confined to a single thread for their entire lifetime, and that lifetime is bounded by the lifetime of a given stack frame. Such objects can be safely allocated on the stack instead of the heap. Even better, for small objects, the JVM can optimize away the allocation entirely and simply hoist the object’s fields into registers.

The length of an array in Java cannot be changed if created with new, let’s take a look at this example.

type arr[]; // declare the array
arr = new type[10]; // will initialize the array to 10 elements, and store the pointer

type can be a primitive or an object.

Lets see the previous example in PHP, with primitives without using Collections, or Lists.

public class Test {

    public static void main(String args[]) {
        
		int[] arr = new int[8];
		
		for(int index = arr.length - 1; index >= 0; index--) {
			arr[index] = index;
		}
		
		for(int index = 0; index < arr.length; index++) {
			System.out.print(arr[index]);
		}    
    }

}

The output will be

0 1 2 3 4 5 6 7

Even though we inserted them in reverse the order is still correct.

If you want to remove an array, you need to follow create a new array and copy it into the old one, while removing the element you don’t want.

You can use System.copyarray to do so

C

Arrays are allocated as a fixed number of contiguous elements, and there is no way to actually remove the memory used by an individual element in the array, but the elements can be shifted to fill in the hole made by the removal of the element.

As you know from above that statical arrays can not be resized, but dynamically allocated ones can be resized with realloc(), and what this will do is move the array to a new memory location, that can accomodate the whole array.

Javascript

In javascript the arrays can be sparse arrays and dense array. Javascript arrays are dynamic. So any implementation has to account for variable length of the array.

The Javascript Array wraps a simple C array allocated with a fixed number of elements. (a flat array) The elements in the C array are resized everytime access to a location beyond the current maximum length is required. Hence insertion could be expensive.

When the runtime detects that the array is Sparse, it is implemented in a similar way to an object. So instead of maintaining a contiguous array, a key/value map is built. In case of an array since the indexes are integers, this map can be efficiently implemented using a balanced Binary Search Tree or a similar structure. The performance of a sparse array is likely similar to the performance of an object.

The arrays are initialized with Flat arrays and are then later replaced with Sparse array map if necessary.

Even with the above general ideas, there is still so much leeway for browsers to tweak their implementations – the heuristic for deciding if the array is sparse/dense, internal data structures used etc.

The key lessons though, from understanding these differences are:

  1. Never use an object, when you need an array
  2. Never initialize arrays backwards (this will force create sparse array which is less efficient than dense arrays)

Lets look at some examples

  • Sparse arrays

When you define the length of the array, you are not actually allocating the entries up front, but you are just allocates a slot for all the entries.

So if we do this

var arr = new Array(10);
arr.forEach(function(v, k) {
 console.log(v, k);
});

The array is actually

[undefined × 10]

The keys are allocated but with no value, and the for cycle will actually not output anything.

  • Dense arrays
var arr = Array.apply(null, Array(10));
arr.forEach(function(v, k) {
 console.log(v, k);
});

The array now is

[undefined, undefined, undefined, undefined, undefined, undefined, undefined, undefined, undefined, undefined]

And as you can see now the array is now defined.

One thing that we can use with this is

Array.apply(null, Array(10)).map(function (x,i) { return i })

And this will create for you an array of items

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Lets see the example from up, and compare it with PHP.

arr = [];
for(i=7; i >= 0; i--) {
   arr[i]=i;
} 
console.log(arr);

And the output will be

0 1 2 3 4 5 6 7

Ending

Well I think it’s enough to describe some of the intricacies of arrays between programing languages, we can probably go into more detail, but it should be sufficient.

Well this turned out to be quite long.

If you made it to the end, leave a comment, or your thoughts on this.