Computer programs at their core perform three primary functions; they accept data from an input source, process and transform the data, and then output data in a format out an output source. In this chapter, we will discuss how a program processes and transforms data within a program.
There will be points in this chapter where we come close to discussing how compilers/interpreters work; we will keep the discussion as high level as possible to avoid that potential rabbit hole. You do not need to know how they work to understand this chapter, and should not be something you required to understand until much later on.
Variables and Data Types
Variables
To process the data that the program receives, you will often wish to store data while processing it. Storing data within the program is done via the use of variables. A variable is the name given to a section of memory that you have assigned a custom name to, using these we can you can read and write to memory location as needed.
Assigning Variables
When you give a value to a variable it is often called assignment/assigning. Variable assignment is achieved using a syntax that looks similar to this:
name = "Jane Doe"
In this example, you have created a variable called name
and given it the value "Jane Doe"
. Anything on the left-hand side of the =
is assigned the value anything on the right-hand side of the =
. We can more precisely define the syntax as <variable_name> = <expression>
.
You may have noticed that this definition says <expression>
rather than <value>
for the values on the right-hand side of =
. Instead of only being able to assigned pre-defined values to variables we can assign an expression that can be evaluated (we will discuss this later in this chapter when we discuss evaluation, but for now, consider evaluation to be the same as 'working out').
In the previous syntax example, the computer will loosely translate the syntax to:
- Create a memory location that can store
"Jane Doe"
- Put the value
"Jane Doe"
in that memory location - Remember that the variable
name
links to this memory location.
Acessing Variables
To be able to make use of the new *variable we have just created we must be able to read from it. When a variable appears in an <expression>
, the computer interprets this as a command to inject the value of the variable at that point in the expression. Let's look at an example:
name = "Jane Doe"
userName = name
If we isolate the last line we can see if follows the previously defined syntax of <variable_name> = <expression>
; combining that with our knowledge that when a variable appears in an expression it replaces it with the value we can determine that the computer takes the following steps:
- Create a memory location that can store
"Jane Doe"
- Put the value
"Jane Doe"
in that memory location - Remember that the variable
name
links to this memory location - Replace
name
on line 2 with the data stored in the location calledname
- Create a memory location that can store the
"Jane Doe"
we have just retrieved fromname
- Put the value
"Jane Doe"
in that memory location - Remember that the variable
userName
links to this new memory location
Combining Assignment and Access
To further illustrate the assignment function evaluating the expression on the right-hand side of =
, we will look at a slightly more advanced example:
firstName = "Jane"
lastName = "Doe"
name = firstName + " " + lastName
In this example, the computer will follow these steps:
- Create a memory location that can store
"Jane"
- Put the value
"Jane"
in that memory location - Remember that the variable
firstName
links to this memory location - Create a memory location that can store
"Doe"
- Put the value
"Doe"
in that memory location - Remember that the variable
lastName
links to this memory location - Replace the variable
firstName
in the expression on line 3 with"Jane"
- Replace the variable
lastName
in the expression on line 3 with"Doe"
- Evaluate the expression on the right-hand side of the
=
on line 3 to combine the strings resulting in"Jane Doe"
- Create a memory location that can store
"Jane Doe"
- Put the value
"Jane Doe"
in that memory location - Remember that the variable
name
links to this memory location
As you can see, this example saves data, accesses it, combines it and then saves the new combined value to a new variable. This process of accessing variables and then creating new ones from the result of an expression is one of the core processes a program performs.
Data types
When we store data in a computer, we are storing a combination of 1s and 0s, which, to many outside observers and our program has very little meaning without additional context. For a program to be able to use the data in a meaningful way, it must first understand what the data represents. The context is often provided through the use of data types.
We use data types to describe what the data represents and more importantly, in what format it is store. Data types come in many different forms, and as you progress, we will start creating custom type definitions, but there are a few primitive types that near all programming languages use:
- Strings: This is any form of text or any size
- Character: Any single character that might appear in a
String
- Integers: Any whole number
- Floating Points: Any number that has a decimal place
- Boolean: Either "True" or "False"
- Arrays: This is used for when you want to store a collection of data; for example, this might be the test results of students in a classroom.
- Objects: These are more complex and are used to combine several other data types into a single variable; for example, you might use this to store all the information about a student in a
Student
type.
This list is not exhaustive, there are many other data types that may be considered primitive types depending on the language you are using, but many of them are extensions of these types offering the ability to store more or less precise data. Many languages, for example, have a primitive type of double
, which is very similar to float in the way it behaves, the primary difference being that is twice the size of a float in memory allowing for nearly double the precision/number of decimal places.
Computers use data types in many different ways, on the most notable uses of data types is during a process called type analysis
Type Analysis
Type analysis takes place whenever we attempt to use any two values together, for example, in an addition operation. Many rules govern what operations can be used by which types and how they will interact. Some programming languages will do the analysis process before running the program, whereas some others will perform it at runtime. This analysis is also performed when you assign a variable and define its type if you attempt to assign a different type to it you will most likely get an error.
Numerics
Let's start with the numerical data types as they seem the easiest. It seems logical that performing any operation on two numerical types should result in a data type that represents the highest precision available. For example, if you have the expression <integer> + <float>
you might expect the result to be a <float>
, this is true of many languages, but for some others, the data type will always be the same type as the first value. So in our example, the result would be an <integer>
. This is one of the many weird complexities that all languages exhibit in some way or another and it is something to be aware of should you not get the result you expect.
Strings
Similarly to numerical types you might expect the operations around string types to be relatively simple. In some ways, they are the most easily defined as the result of any operation with a <string>
will always result in a <string>
. Defining what a <string>
can be used with is more complicated and will depend heavily on what programming language you have elected to use. Some languages will attempt to coerce all non-string types to a <string>
variant, whereas others will throw an error.
An example of coercion would be when you have the following expression:
x = 20 + " Apples"
In this example, the computer might see that you are attempting to combine a <interger>
and a <string>
and attempt to coerce the <integer>
into a <string>
representation of the number. A process similar to this might occur for many different combinations of data types_with strings, and some languages provide the ability to allow this process to occur on non-primitive _types.
Boolens, Arrays, Objects and more complex types
In many programming languages, there are very few to none base operations that can be performed upon these more complex data types as the behaviour could be considered "undefined" and is therefore left to the programmer to implement themselves.
Null
Null values are an interesting result of the way many programming languages are designed. In short, a null value is a variable that contains nothing; you can not perform any operations with it and doing so will often result in errors. The most common way to cause null values is to have a variable that you have not yet initialised, for example:
String name;
We have declared a variable called name
, but we have not assigned a value to it yet; therefore, it has the value of null
. We also can assign a variable the value of null
if we want to, this might be useful in situations where you have tried to compute a value, and the result is undefined but does not throw an error. Be aware that many, including me, who consider this bad practice for the most part and will avoid it. There are also a few languages that do not support null
; these are called null/void safe languages. You can find out more about them here.
Flow Control
Controlling the flow of data through our program is essential if we want to perform any complex logic. All forms of this control are presents as logic statements that are either true or false and based on that result, the program either does or does not do something. Before we can begin discussing various control methods, we must first discuss how a program evaluates a logic statement. I will not discuss how the various control structures can be used with complicated examples as once you understand the format; it will be much more intuitive when we start to use them in later examples.
Evaluation
We have previously touched upon evaluation and how it is used when assigning variables. The evaluation process is the same across all uses of it; it takes a series of variables and operands and uses them to compute a final value for the expression. When performing this evaluation, it will obey all the data type rules and other operational rules (such as the order of operations in mathematical expressions).
When these expressions are evaluated for use in a control statement it must be coerced into a boolean
type. Coercion of boolean
s in programming languages can differ significantly, and if you're not getting the result, you expect then this may be a cause. You are far better off handling the logic to reduce the complex data types to booleans
yourself. For example, some languages consider an empty string
to be False
when coerced to a boolean
, it is far safer though to transform the expression to string.length > 0
.
Equality
It is often required to be able to see if two variables are the same value. To overcome this, most languages have an equality operator of ==
. This can be read as 'is equal to' and results in a boolean
. The behaviour this operator exhibits can be significantly different depending on your language of choice, and it is worth understanding how your language uses this operator.
For example, in JavaScript ==
checks that two values are equal, in the process of doing this though it will coerce types as necessary to compare them. To prevent this behaviour JavaScript provides a ===
operator which acts as a 'type-safe' equality operation.
Another common issue with the equality operator that many people run into is when comparing strings, take the following example:
"Jane Doe" == "Jane Doe"
In JavaScript and many other languages this evaluates to True
, but in Java, this will evalute to False
. To correctly compare strings in Java you need to use the .equals
method on strings
, like this:
"Jane Doe".equals("Jane Doe");
Being aware of issues like this will save you many hours of debugging and frustration. I will point out any potential issues like this in examples we work through as we go along.
Switches
Using the boolean
evaluations we have discussed, we can now combine those with switches to perform conditional logic. Conditional logic in programming appears in three main varieties:
if (booleanExpression) {
// Perform if true
}
if (booleanExpression) {
// Perform if true
} else {
// Perform if false
}
if (booleanExpression) {
// Perform if true
} if else (booleanExpression2) {
// Perform if booleanExpression is false but booleanExpression2 is true
} else {
// Perform if both false
}
All these forms follow a similar premise of if x is true, do y
. Realistically the first one is the only switch you need as all the others are merely there to help you write cleaner code.
if (booleanExpression) {
// Perform if true
} else {
// Perform if false
}
if (booleanExpression) {
// Perform if true
}
if (not booleanExpression) {
// Perform if booleanExpression false
}
Both of these examples perform the same logic, but the first one is much cleaner. Using any combination of these switches will allow you to perform very complex logic operations based on any expression that you can evaluate as a boolean
.
Loops
There are two primary kinds of loops in programming, the while
loop and the for
loop. The while
loop takes a similar form to that of our switches:
while (booleanExpression is true) {
// Perform action
}
As long as the booleanExpression
continues to evaluate to True
anything within the loop will be repeated. This can often lead to infinite loops if care is not taken about how to exit the loop, though this behaviour is useful in some instances, such as running a game loop.
The for loop appears to be much more complex at the surface, but much like the if-else
statement, it is mostly there to improve code cleanliness.
for (i = 0; i < 10; i += 1) {
// Do something 10 times
}
In this example we can see the for
loop makes it much more clear that we are doing something for a set number of times as a pose to looping until told to stop. As I mentioned, this is still a while
loop and can be written as such:
i = 0
while (i < 10) {
i = i + 1
// Do something 10 times
}
You can use either as you see fit, there will be instances where each option makes the most sense for readability, and you will gain an understanding as to which to use when the more you use them.