›INDEX
Last Updated:

History

Code was written by humans as machine level instructions. As programs got more complicated the assembly language was created and it was the job of the assembler to covert assembly language into machine language.

Assemblers eventually were augmented with elaborate "macro expansion" facilities to permit programmers to define parameterized abbreviations for common sequence of instructions.

During this time, programming continued to be a machine-centered enterprise: each different kind of computer had to be programmed in its own assembly language.

As computers evolved, programmers wanted a language that did not require to be rewritten on each machine. In the mid-1950s, FORTRAN was created as first high-level programming language

The late 1960s and early 1970s saw the revolution of "structured-programming" with the go-to statement introduced, which then gave way to while loops, switch-case statements, and similar high level concepts.

In 1980s the nested block structure of languages like Algol, Pascal, and Ada, began to give way to object-oriented programming languages like Smalltalk.

Introduction

Imperative Vs Declarative Programming

Imperative languages describe the steps to solve a problem. Existing hardware is imperative by nature. Therefore it's easier to map imperative languages onto machine code. There tends to be a tension between optimality and readability. It's easier to make subtle mistakes in imperative and hence are more error-prone (opinion). The program achieve the goal by using side-effects.

Declarative languages describe the goal of the problem. Some problems are easier to express declaratively. Due to this nature, programs are often to reason about and make it easier to parallelize. They often include an evaluation engine or virtual machine at runtime. Therefore, they are often performance penalties.

Side Effects

Side effect simple take about the change to a state rather than just processing. Here is an example to illustrate this point:

int GLOBAL_VAR = 10;

void update_variable() {
  GLOBAL_VAR++
}

int updated_variable(int a) {
  return a+1;
}

The first function update_variable updates a global variable and hence changing the state of the program. The second function updated_variable takes in a variable and just returns the next value. The first has side effects and the second does not.

Any change to system state is considered a side effect. Examples of this can include sorting an array in place, update the contents of a variable (a = a + 1), etc.

The Art of Language Design

Functional Languages

Functional languages employ a computational model based on the recursive definition of functions. They take their inspiration from lambda calculus, a formal computational model developed by Alonzo Church.

The fundamental concept of functional languages is that they process data by composing functions. Functions that take in inputs and produce outputs. Such functions DO NOT modify "state". There isn't a concept of state in functional programming languages. That is, given the same parameters, the function always returns the same outputs.

Dataflow Languages

Dataflow languages model computation as the flow of information (tokens) among primitive functional nodes. They provide an inherently parallel model: nodes are triggered by the arrival of input tokens, and can operate concurrently.

Logic/Constraint-based Languages

Logic or constraint-based languages are inspired by predicate logic. The model computation as an attempt to find values that satisfy certain specified relationships. You provide the language information and then you can query the language about the data.

von Neumann Languages

von Neumann languages are structured around computation that work by modification of variables. These languages are based on expressions that have values that influence subsequent computation via the side effect of changing the value of memory.

Object-oriented Languages

Object-oriented languages are closely related to the von Neumann languages, but have more structured and distributed model of both memory and computation. It creates the construct of objects and methods (function) associated with those objects.

Scripting Languages

Scripting languages are focused on glueing together different components in a simple and easy way. Languages like bash, zsh were created to control shells and allow calling multiple programs in a chain to achieve a goal. PHP and JavaScript were created to control web features and make it more dynamic. Python, Ruby, and Perl are more general purpose - with the goal of facilitating rapid prototyping and expressiveness over computational speed.

Compilation and Interpretation

A complier translates the high-level source program into machine level instructions which can directly be run on the computer.

An interpreter executes the high-level source program line by line or region by region in a virtual machine.

In general, interpretation leads to more flexibility and dynamically inserting code and the compiled version has better execution time.

When a program depends on the existence of a library, a linker is used to link the library to the compiled version of a program.

Some compilers generate assembly rather than machine code to facilitate debugging. Assembly code tends to be more human readable than machine code.

Compilers for C include a preprocessor step which removes comments and expands macros. The preprocessor can able be instructed to remove portions of the code itself based on the situation.

A lot of compilers are written in the languages that they compile. The Ada compiler is written in Ada and the C compiler is written in C. This is done using something called bootstrapping. One starts with a simple implementation - often an interpreter - and uses it to build progressively more sophisticated versions.

In some cases, a programming system may intentionally delay computation until the last possible moment. Languages such as Prolog and Lisp that invoke the compiler on the fly to translate newly created source code into machine language.

"In a more general context, suppose we were building one of the first compilers for a new programming language. Assuming we have a C compiler on our target system, we might start by writing, in a simple subset of C, a compiler for an equally simple subset of our new programming language. Once this compiler was working, we could hand-translate the C code into (the subset of) our new language, and then run the new source through the compiler itself. After that, we could repeatedly extend the compiler to accept a larger subset of the new programming language, bootstrap it again, and use the extended language to implement an even larger subset. “Self-hosting” implementations of this sort are actually quite common." (Taken directly from the book, 1.4).

The Java definition defines a machine-independent form known as bytecode. The first Java implementation were based on bytecode interpreters, the JVM (Java Virtual Machine), but modern implementations obtain significantly better performance with a just-in-time (JIT) compilation that translates bytecode into machine code just before execution.

java model for compilation and interpretation

On some early machines (before 1980s), the assembly language instruction set is not actually hardware, but in fact runs on an interpreter. The interpreter is written in low-level instructions called microcode (or firmware), which is stored in read-only memory and executed by hardware.

Overview of The Compilation Process

A typical compiler processes the input in a series of well-defined phases. Each phase discovers information of use to later phases, or transforms the program into a form that is more useful to the subsequent phase.

The first few phases serve to figure out the meaning of the source program. This is sometimes called the front end of the compiler. The last few phases serve to construct an equivalent target program. They are sometimes called back end of the compiler.

An interpreter shares the compiler's front-end structure, but "executed" (interprets) the intermediate form directly.

compiler process chart

interpreter process chart

Lexical And Syntax Analysis

The first phase reads the characters of an input source code and groups them into tokens such as int, main, {, etc. These are the smallest meaningful units of the program. This scanning process is also known as lexical analysis.

The task of the scanner is to reduce the size of the input by converting a sequence of characters to a sequence of tokens. It also typically removes comments and tags tokens with line and column numbers, to make it easier to generate good diagnostics in later phases.

Names, Scopes, And Bindings

Names

A name is a mnemonic character string used to represent something else.

Names act as a form of abstraction for memory addresses. When we code something like value = 10, we actually just stored the value 10 at some memory address.

The complete set of bindings of names in effect at a given point in a program is known as the current referencing environment.

Bindings Time

A binding is mapping from a thing to its name. Binding time is the time at which a binding is created or any implementation decision is made.

  • Language design time: Control-flow constructs, the set of primitive types, the available constructors, etc are chosen when the language is designed.

  • Language implementation time: Most language manuals leave a variety of issues to the discretion of the language implementer. This can be things such as the precision of the fundamental types, the coupling of I/O to the operating system's notion of files, etc.

  • Program writing time: Programmers choose algorithms, data structures, and names.

  • Compile time: Compilers choose the mapping of high-level constructs to machine code, including the layout of statically defined data in memory.

  • Link time: The linker chooses the overall layout of the modules, resolves inter-module references, and finalizes the binding between objects in different modules.

  • Load time: The operating system loads the program into memory, choosing machine addresses for objects. Modern operating systems use virtual addresses chosen at link time, while physical addresses can change at run time.

  • Run time: Covers the entire execution span, including variable bindings, program start-up, module entry, elaboration time, subroutine call time, block entry time, and expression evaluation/statement execution.

Object Lifetimes

The terms static and dynamic are generally used to refer to things bound before run time and at run time, respectively.

Compilers tend to be more efficient at binding variables because they can make earlier decisions. For example, a compiler analyzes the syntax and global variables and arranges them for optimal access. It also can fix variable offsets in the stack for function calls. Whereas the interpreter has to dynamically bind global and local variables, sometimes for each time a function is called.

The time between the creation and the destruction of a name-to-object binding is called the binding's lifetime. The time between creation and destruction of an object is the object's lifetime.

  1. Static objects are given an absolute address that is retained throughout the program's execution.

  2. Stack objects are allocated and deallocated in last-in, first-out order, usually in conjunction with subroutine calls and returns.

  3. Heap objects may be allocated and deallocated at arbitrary times. They require a more general (and expensive) storage management algorithm.

Static Allocation

Global variables, the instructions that constitute a program's machine code, numeric, and string values are examples of statically allocated objects.

Logically speaking, local variables are created when their subroutine is called, and destroyed when it returns. If the subroutine is called repeatedly, each invocation is said to create and destroy a separate instance of each local variable.

However, this is not always the case and it is up to the language implementation to create and destroy these local objects. Before Fortran 90 recursion was not allowed since Fortran could only have one invocation of a function at any given time.

Languages usually require named constants to have a value that can be determined at compile time so that they can be statically allocated. Such constants and literals are called manifest constants or compile-time constants. These constants can always be static allocations, even if they are local to a recursive subroutine.

Languages such as C and Ada have constants that are simply variables that cannot be changed after elaboration (initialization) time. Their values, though unchanging, can sometimes depends on other values that are not known until run time. Such elaboration-time constants, when local to a recursive subroutine, must be allocated on the stack. C# distinguishes between compile-time and elaboration-time constants using the const and readonly keywords, respectively.

Stack-Based Allocation

If a language permits recursive, static allocation of local variables is no longer an option, since the number of instances of a variable that may need to exist at the same time is conceptually unbounded.

Each instance of the subroutine at run time has its own frame on the stack, containing arguments, return values, local variables, temporaries, etc.

While the location of the stack frame cannot be predicted at compile time, the offsets of objects within a frame usually can be statically determined.

Stack frame visualized

Heap-Based Allocation

A heap is a region of storage in which sub-blocks can be allocated and deallocated at arbitrary times. Heaps are required for the dynamically allocated pieced of linked data structures, and for objects such as fully general character strings, lists, and sets, whose size may change as a result of an assignment statement or other update options.

The methods used for heap allocation is a balance between space and time efficiency of the heap. Space concerns are further divided into internal and external fragmentation of the heap. Internal fragmentation occurs when we allocate more spaced than required to maintain a chunk size. External fragmentation occurs when we allocate storage in a way such that the holes in the memory are in such a way that new memory of a particular size cannot be allocated.

For a more detailed explanation of internal and external fragmentation, read notes from CSCI 3120 Operating Systems.

There are multiple mechanisms to manage heap memory, these include:

  • Linked list of free list
  • Dynamic pool lists using buddy system ($2^k$)
  • Dynamic pool list using Fibonacci heaps.

Garbage Collection

Heap allocation is trigged by some specific operation in a program such as creating an object or calling malloc and so on. Deallocation is explicit in some languages such as C, C++, and Rust but they can also be freed automatically when the region is no longer reachable from the program. The latter method uses something called a garbage collector which regularly checks and frees unreachable objects.

The arguments in favor of explicit deallocation include implementation simplicity and execution speed. Even naive implementation of garbage collection add significant complexity to the implementation of the language.

However, manual deallocation errors are among the most common and costly bugs in real-world programs. If an object is deallocated too soon, the program may follow a dangling reference, accessing memory that may be used by another object. If an object is not deallocated at the end of its lifetime, then the program may leak memory, eventually running out of heap space. A garbage collector solves these issues and reduces programming complexity and memory management for programmers.

Rust has a unique solution to this problem where it does not have similar memory deallocation issue even though it has explicit allocation and deallocation. It uses a concept of ownership and borrowing to identify the lifetime of an object.

Go/Golang is a complied language that has a garbage collector that is highly optimized. Here is a comparison between go and rust: source.

Scope

The textual region of the program in which a binding is active is its scope. In most modern languages, the scope of a binding is determined statically at compile time. This is purely set using simple textual rules and does not require any run-time operations. C is statically scope, however, some languages may be dynamically scoped were their bindings depend on the flow of execution at run time.

At any given point in the program's execution, the set of active bindings is called the current referencing environment.

Statically scoped variables are available globally. In C, the static keyword creates a local variable that is allocated globally.

Interesting Fact: In C, the official specification requires that statically declared variables are initialized to 0. This implies static int count would create a variable count with the value 0.

void count_calls() {
    static int count = 0;
    count++;
    printf("count_calls() called %d times.\n", count);
}

Nested Subroutines

The ability to nest subroutines inside each other, introduced in Algol 60, is a feature of many languages such as Ada, ML, Common Lisp, Python, and a few more. Other languages, including C and its descendants, allow classes or other scopes to nest.

In the Algol-family of languages, the Algol-style nesting rules give rise to the closest-nested scope rules where a name introduced in a declaration is only known to the scope in which it is declared, and in each internally nested scope, unless it is hidden by another declaration of the same name in one or more nested scopes.

To find the object corresponding to a given use of a name, we look for a declaration with that name in the current, innermost scope. If there is non, it defines the active binding for the name. Otherwise, we look for a declaration in the immediately surrounding scope. If no declaration is found at any level, then the program is in error.

A name-to-object binding that is hidden by a nested declaration of the same name is said to have a hole in its scope. Some languages provide qualifiers or scope resolution operators to access names that have been rebound.

If a name is in the local scope then we can use a displacement from the base register to find the variable. To find variable in different stack frames we simple store a static link in each frame that points to the "parent" frame.

static chains

To find a variable or parameter declared in $j$ subroutine scopes outward, target code at run time can dereference the static chain $j$ times and apply the appropriate offset.

Declaration Order

Suppose an object $x$ is declared somewhere within block block $B$, can $x$ be used in the section of $B$ before it has been declared? After all, $x$ is defined inside of the scope of block $B$.

Several early languages required that variables be declared before they are used. There are special mechanisms to accommodate recursive types and subroutines, but in general, a forward reference (attempt to use name before declaration) is a static semantic error.

But since a declaration is for the entire block, there are some interesting errors that occur:

{
  int X = 10;
  {
    int M = X;
    int X = 20; // local declaration hides outer X
  }
}

The above code in Pascal would lead to a semantic error since X is accessed before it has been declared for that scope. The X that is used to initialize M is the X within the inner scope.

In Ada, C, C++, or Java, no semantic errors would be reported. The declaration of $M$ would refer to the first (outer) declaration of $X$. The scope of a variable is not whole-block but rather from the point of declaration till the end of the block.

JavaScript uses a concept called hoisting so that variables can be declared in any part of the block are moved to the top of the block. This way, something like this is valid in JavaScript:

x = 5; // Assign 5 to x

elem = document.getElementById("demo"); // Find an element
elem.innerHTML = x;                     // Display x in the element

var x; // Declare x

Declarations And Definitions

Recursive types and subroutines introduce a problem for languages that require names to be declared before they can be used. C and C++ handle the problem by distinguishing between the declaration of an object and its definition.

A declaration introduces a name and indicates its scope, but may omit certain implementation details. A definition describes the object in sufficient detail for the compiler to determine its implementation.

Examples:

struct manager;   // declaration

struct employee {
  struct manager *boss;
  struct employee *next_employee;
};

struct manager {   // definition
  struct employee *first_employee;
};

And for a function definition:

void list_tail(follow_set fs);   // declaration only

void list(follow_set fs) {
  switch (input_token) {
    case id: match(id); list_tail(fs);
    // ...
  }
}

void list_tail(follow_set fs) {  // definition
  switch (input_token) {
    case comma: match(comma); list(fs);
  }
}

The initial declaration of manager only needs to introduce a name since pointers are generally the same size. The compiler can determine the implementation of employee without knowing any details about manager. On the other hand, the initial declaration of list_tail must include the return type and the parameter list. This way, the compiler can verify that the call in list is correct.

Predefined Objects

Declared in an outermost scope that is outside the program's global scope. This can include things like addition, multiplication, etc. This permits the redefinition of such names.

Dynamic Scope

The binding for a given name depends on the order of execution - it's the most recently encountered during execution but not yet destroyed by exiting its scope.

n : integer

procedure first()
    n := 1

procedure second()
    n : integer
    first()

n := 2

if read_integer() > 0
    second()
else
    first()

In the above code snippet the n inside of the first procedure refers to the global n in the else block and refers to the n defined in the second procedure in the if true block. This is because, n was redefined by the second procedure.

Modules

A way to group multiple names together. Breaking down larger projects into modules makes it easier to reason about code.

Modules enable information hiding:

  • Modules provide simple interfaces
  • Reduce risk of naming conflicts
  • Reduce coupling
  • Reduce cognitive load
  • Increase code re-use.

In C, you cannot share static variables among multiple subroutines.

C++ introduces namespaces to help solve similar problems.

#include <time.h>

namespace rand_mod {
  unsigned int seed = time(0);
  const unsigned int a = 48271;
  const unsigned int m = 0x7fffff;

  void set_seed(unsigned int s) {
    seed = s;
  }

  unsigned int rand_int() {
    return seed = (a * seed) % m;
  }
}

Imports and Exports

Modules use imports and exports to expose and include modules features. Exports restrict the names that a module exposes, and possibly the manner in which those names can be used in a client module. Imports identify the names exported by an external module that a client module wishes to include in its scope.

Modules into which names must be explicitly imported are called closed scopes.

Modules which do not require imports are called open scopes.

Object-Oriented Extension

Object-oriented programming is a natural extension of modules, they allow us to encapsulate data and methods. They have the advantage of remembering state on a per-object basis.

Referencing Environments

  • Static scoping: The referencing environment depends on the lexical nesting of the code blocks where names are declared. This can be determined at compile time.

  • Dynamic scoping: The referencing environment depends on the order in which declarations are encountered at run time.

Deep vs Shallow Binding

A deep binding refers to the referencing environment that exists when some function f was passed applies to f. (The moment f was created).

A shallow binding refers to the referencing environment when f is invoked applies to f. (The moment f was called).

Deep binding is implemented by creating a representation of the referencing environment and bundling it with a reference to the subroutine. This bundle is referred to as a closure.

Unlimited Extent

Subroutines in a functional language have first-class status. They can be passed as a parameter, returned from a subroutine, assigned to a variable, etc.

A reference to a subroutine may outlive the scope which the subroutine is declared.

; Function takes a param x and returns a function
; such that f(y) return x+y
(define plus-x
  (lambda (x)
    (lambda (y)
      (+ x y)
    )
  )
)

(let ((f (plus-x 2)))
  (f 3)
) ; return 5

The 2 was on the stack as a parameter and then has been popped off the stack. But f remembers 2 even though x is no longer on the stack. The local variable x has unlimited extent. It may not be garbage collected until no references to the value of plus-x 2 remain.

Object Closures

  • Global (top-level) functions, when passed, use the global referencing environment.

  • Nested functions, on the other hand, require non-trivial referencing environments.

By encapsulating a function in an object, we can include the function's context in that object.

Lambda Expressions

A lambda expression is an anonymous function.

  • Scheme: (lambda (i j ) (if (> i j) i j))
  • C#: (int i, int j) => i > j ? i : j
  • OCaml: fun i j -> if i > j then i else j
  • Ruby: -> (i, j) { i > j ? i : j}

C++ adopted lambda expression in C++ 11. However, lacking a garbage collector, the notion of unlimited extent is unworkable.

Suppose we want a lambda expression to print values less than some variable k. There are two options:

By value:

[=](int e){ if (e < k) cout << e << " "; }

In this case, the lambda expression is evaluated at runtime and k is just stored as a value inside of the lambda expression. Copying a large data structure can be expensive. External updates to those variables will not be reflected in future calls to the closure. But this allows closures to be used even when copied variables leave scope.

Or by reference:

[&](int e){ if (e < k) cout << e << " "; }

This can lead to undefined behaviour if the referenced variable leaves scope. External updates to references variables will be reflected in future calls. References are just pointers and are less expensive to copy.

Control Flow

Constructs

  • Sequencing: instructions should be executed in a specified order.

  • Selection: at run-time, some choices between alternative execution paths need to be made.

  • Iteration: some sequence of instructions needs to be executed repeatedly, either a pre-determined number of times, or until some condition is true or false.

  • Subroutines: Encapsulate a potentially complex sequence of instructions so that it can be treated as a unit.

  • Recursion: An expression that is defined in terms of itself. Usually implemented as one or more self-referential subroutines.

  • Concurrency: Two or more parts of a program are to be executed simultaneously.

  • Exception Handling: Handling unexpected run-time conditions.

  • Non-determinism: Ordering among instructions is deliberately unspecified.

Expressions

Literals, named variables, named constants, operators or functions applied to one or more arguments.

  • A prefix expression: -x, Math.sin(Math.PI), (+ 1 2)

  • A infix expression: a + b

  • A postfix expression: a b + or PI sin

Lisp and Scheme use prefix Cambridge Polish notation exclusively (prefix expressions).

Forth uses postfix notation and special stack handling; Simple expressions are pushed onto the stack and functions and operators pop their arguments off the stack, and push their return values on.

a b c + * -> a * (b + c)

Infix notation leads to potential ambiguity: a + b * c could either be (a + b) * c or a + (b * c). This is solved by operator precedence. Sequence of equal-precedence operators group to the left or the right (associativity).

Value Model

Languages like C employ a value model of variables. Variables are treated as containers for values. Locations appear on the left side of the assignment operator, and values on the right. Thus referred to as l-values and r-values respectively.

Consider: f(a)[2] = 12;

Here the function returns a pointer and we're setting the index 2 to the value 12.

Reference Model

Languages like Clu employ a reference model where variables are not container for values but a named reference to the value.

In languages where values can change in-place, the difference between the value model and the reference model play a strong role.

Example:

class A:
  def __init__(self):
    self.m = 10

a = A()
b = a
a.m = 20
print(b.m)  # 20

Update Operations

Consider the following situation:

A[index_fn(i)] = A[index_fn(i)] + 1;

If index_fn has a side-effect, multiple calls may return multiple values.

Therefore, something like this makes it more clear:

A[index_fn(i)] += 1;

Prefix operations perform the operation first where as postfix first evaluates the variable and then increments it.

int x = 5;
int y, z;

y = x++; // y = 5, x = 6
z = ++x; // z = 7, x = 7

The evaluation order where side effects are presents may be problematic:

fib[++i] = fib[i] + fib[i-1]

Depending on which side is evaluated first is going to change the operation that is performed in the above snippet. Evaluation order can be situation dependent. A compiler may choose evaluation order to take advantage of certain registers or instructions, in an effort to improve performance.

Such statements are classified as undefined behaviour in C. This includes statements where there are multiple increment or decrement operators such as:

x = i++ + i-- + i++

Initialization

Accidental use of an uninitialized variable is a very common programming error.

  • An easy way to prevent such errors is to ensure that every variable receives a value when it is first declared.

  • Some languages initialize variables to zero by default.

  • Others use dynamic checks to establish that variables are assigned values before they are used.

  • Languages like Java and C# define a notion of definite assignment, where the compiler establishes that every path between declaration and use assigns a value to the variable.

Unstructured Programming

Older imperative languages like Fortran and Basic used labels to identify statements. Branching and looping used GOTO label and GOSUB label to direct program flow.

10 X = X + 1
20 PRINT X
30 GOTO 10

This is a holdover from machine and assembly language. goto is still a part of C. It is very easy to accidentally create an infinite loop using goto. Java reserves goto as a keyword, but does not permit its use.

Structured Programming

In structured programming languages like Algol, and virtually every programming language since, goto is replaced by constructs such as if-else and switch-case for selection and for, while, do-while, repeat-until for iteration.

Recursion

Every iterative procedure can be replaced with recursion:

while (cond) {
  // statements
}

// same as

void p() {
  if (cond) {
    // statements
    p()
  }
}

The converse however, is not true. Not every recursive can be made iterative in a "simple" way - without using additional data structures.

The recursive example can be translated by the compiler back into a loop; using a technique called tail-recursion elimination.

Exception

An exception is an unexpected or unusual condition that occurs during execution that cannot be easily handled locally.

Some ways to handle exceptions:

  • Return a special value, eg: -1 for find indexed function.
  • Return an explicit status value to the caller, through an extra argument or a return tuple.
  • Caller passes an error handling closure to the callee.

Example of return special value as error (C):

int find_element(int element, int length, int *array) {
  for (int i=0; i<length; i++) {
    if (array[i] == element) {
      return i;
    }
  }
  return -1
}

int main() {
  int a[] = {1, 2, 3};
  if (find_element(4, 3, a) == -1) {
    // element not found
  }
}

An example of an explicit status return (Go):

func findElement(element int, array []int) (int, error) {
    for i, value := range array {
        if value == element {
            return i, nil
        }
    }
    return -1, errors.New("element not found")
}

func main() {
    a := []int{1, 2, 3}
    if _, err := findElement(4, a); err != nil {
        // element not found
    }
}

Exception Handling

Originally appeared in PL/I where we have the following structure:

ON condition
  statement

The nested statement isn't executed immediately, instead it is remembered and only invoked when the exceptional condition is encountered. Dynamic binding of exceptions to handles is confusing and error prone.

To handle requirement of a particular exception in different ways based on the situation the exception occurs in, we have lexically bound exception handlers:

try {
  if (something_unexpcted) {
    throw my_error("oops!");
  }
} catch (my_error e) {
  // handle exception
}

Here the exception handler is lexically bound to the try block, as opposed to dynamically bound at runtime as in PL/I.

Exceptions have types, and those types may include one or more fields to contain debugging information and messaging. If no matching handler is bound to the current try block, the exception is raised again to the current subroutine's caller. The exception propagates up the call stack until a matching handler is found, or we reach the top of the stack (which normally terminates the program).

To handle exceptions, we may need to unwind any stack frames belonging to subroutines from which the exception escapes. Other resources may also need to be released. Here the finally block is executed regardless of whether the try block completes normally, or terminates prematurely as the result of an exception.

try:
  my_stream = open("foo.txt", "r")
  for line in my_stream:
    # Other code
finally:
  my_stream.close()

Type Systems

  • A denotational view of types is where a type is a set of values, or domain.

  • A structural notion is where a type is either primitive, or composite such as classes, structs, arrays.

  • An abstract notion of type is where a type is an interface consisting of a set of operations with well-defined and mutually consistent semantics.

Structures

A structure or struct is a collection of named fields with types.

struct element_t {
  char symbol[3];
  int atomic_number;
  double melting_point;
  bool is_metal;
}

The memory layout of the struct may be stored in different ways based on the language.

Memory Layout

Aligned

struct element_t {
  char symbol[3];
  int atomic_number;
  double melting_point;
  bool is_metal;
}

The aligned or fixed order way to organize the struct in memory would be to arrange them in the order of declaration while aligning them to a particular memory boundary.

fixed ordered struct

This kind of alignment is done to optimize memory access since most computer access more than one byte in one read operation.

Therefore, in the particular case above, our struct takes up 24 bytes even though our struct only requires 16 bytes (in C on a 64 bit machine).

Here is the code that was used to test this:

#include <stdio.h>
#include <stdbool.h>

struct element_t {
  char symbol[3];
  int atomic_number;
  double melting_point;
  bool is_metal;
};

int main() {
  size_t sum_size = sizeof(char [3]) + sizeof(int) + sizeof(double) + sizeof(bool);
  size_t real_size = sizeof(struct element_t);
  printf("sum of size of element: %ld\n", sum_size);
  printf("size of struct: %ld\n", real_size);
}

This leads to faster memory access and preserves the order in which elements are declared but lead to inefficient use of space. This also guarantees the order of the memory layout so if manual memory manipulations are performed then the order is predictable.

We can manually re-order the elements to minimize the wasted space. C aligns elements in multiples of the largest previous element.

So if we use something like this, where the elements are arranged in increasing order of size used, we can get a more optimized layout:

struct element_t {
  bool is_metal;
  char symbol[3];
  int atomic_number;
  double melting_point;
};

Our struct now only takes up 16 bytes of space and all those 16 bytes are used for the struct.

Packed

In the packed method of organizing memory, we optimize for space while still preserving the order of the struct.

struct element_t {
  char symbol[3];
  int atomic_number;
  double melting_point;
  bool is_metal;
}

The above struct would then be arranged as follows:

packed memory layout for structs

But this causes element access to be much slower since a lot of operations have to be performed to read a single element if it is not aligned right. This still guarantees the memory layout.

Optimized Aligned

The compiler optimizes the struct to minimize space while still maintaining the alignment of the elements. This way, the space and speed are optimized. However, this may cause the struct to be reordered, hence leading to a non-guaranteed memory layout.

struct element_t {
  char symbol[3];
  int atomic_number;
  double melting_point;
  bool is_metal;
}

aligned optimized memory organization for struct

Rust is a language that employs this technique and hence specifies that the memory layout is undefined.

This also leads memory layout may be compiler or architecture dependent.

Arrays

Collections of objects of a single type (typically) occupying a contiguous region of memory.

  • Static Arrays: Type and shape are known at compile time.
  • Heap Arrays: Type and dimensionality may be known at compile time and shape known at run time.

Dead Reckoning

For a one-dimensional array at memory location A whose elements are of type T, we can calculate the address of an element at index n using the following calculation:

A + (n * sizeof(T))

Where A is the base address of the array and sizeof(T) is the array's stride, or the width of each element.

Dope Vectors

A dope vector stores metadata about the memory layout of a data object (like an array). This would include the base address, stride and the sizes of each dimension.

This way the array can then grow if necessary by replacing the metadata in the dope vector.

Scheme

Scheme is a function programming language. Almost everything in scheme is a function. Functional languages are mostly pure, in the sense that they avoid side-effects and updating variables.

Development Environment

We can use DrRacket as an environment to run scheme. We begin files with a #lang directive that identifies the programming language a file is written in. We start the file with #lang scheme to indicate the language scheme.

DrRacket Website

Hello World

Here's how you'd write a hello world program in scheme.

#lang scheme

(display "Hello, World!")

Note that #lang scheme is required if we're using a racket interpreter.

For something more complex, here is a implementation of the Fibonacci sequence:

#lang scheme

(define (fibonacci n)
    (if (<= n 2)
        1
        (+ (fibonacci (- n 1)) (fibonacci (- n 2)))
    )
)

(define (displayFibonacciTill n)
    (if (<= n 0)
        0
        (begin
            (displayFibonacciTill (- n 1))
            (display 
                (string-append 
                (number->string n)
                " "
                (number->string (fibonacci n))
                "\n")
            )
        )
    )
)

(displayFibonacciTill 40)

Language Features

Identifiers

These identifiers include keywords, variable, and symbols. These maybe formed from letters, numbers, and certain special characters:

?!.+-*/<=>:$%^&_~@

Identifiers cannot start with the @ symbol and normally cannot start with any character that can start a number, i.e., a digit, decimal point (.). Exceptions are +, -, and ..., which are valid identifiers.

Valid identifiers include: hi, Hello, n, x, x3, x+2, ?$&*!!!.

Atomic Values

The following are all atomic values:

  • Boolean values such as #t and #f
  • Numbers:
  • Integers: -2, -1, 0, 1, 2
  • Real: 0.5, 1.333, 97.423, pi
  • Rationals: 1/2, 4/3, 9/10, 22/7
  • Complex: 3+4i
  • Strings: "Hello World!"

Cells or Pairs

Cells or pairs are a structured data type consisting of two parts named car, and cdr. For example, '(1 . 2) is a pair literal with car of 1 and cdr of 2.

The leading single quote indicates a pair literal (as opposed to an expression). Cells can be combined to produce lists.

Proper and Improper Lists

Cells can be combined to produce lists. '(a b c) is a pair literal with car a and cdr '(b c).

  • The value of '() is known as a nil or null is the empty list.
  • The value '(a) is a pair literal with car a and cdr '().

A proper list is either the empty list '() or a pair whose cdr is a proper list. Any other pair is an improper list.

  • '(), '(a), and '(a b c) are all proper lists.
  • '(1 . 2) is not since its cdr, 2, is not a proper list.

Note that '(1 . 2) is not the same as '(1 2). The first is an improper list while the second is a proper list.

Non-literal Expressions

All non-literal expressions in Scheme have the same overall form:

(operator a1 a2 ... an)

Where:

  • operator is an operator or function identifier
  • a1 a2 ... an are the operands/arguments to that operator or function
  • Arguments can be literals or non-literal expressions themselves.

The above statement would be the same as the following C-like statement:

operator(a1, a2, ..., an)

Boolean Operators

not, and, and or are the boolean operations. Note that all expressions in scheme that is not explicitly #f is treated as #t. That is, 0 has a "truthy" value of #t, 1 has a value of #t, etc. Only the #f has the value of false.

  • (not a) returns #t if its argument a is #f, it returns #f otherwise.

  • (and a1 a2 ... an) returns an if all its arguments a1 ... an-1 are not false, #f otherwise.

  • (or a1 a2 ... an) returns the first ai that is not false. Or #f iff all ai are false.

Note that both and or or exhibit short-circuit evaluation. Only as many ai will be evaluated as necessary.

Arithmetic Operators

  • (- a) returns the negation of its argument a.
  • (- a1 a2 a3 ... an) returns a1 - a2 - a3 - ... - an.
  • (/ a) returns the reciprocal of its argument a.
  • (/ a1 a2 ... an) returns a1 / a2 / a3 / ... / an.
  • Addition + and multiplication * work the obvious way.

Numeric Comparison Operators

  • (= a1 a2 ... an) returns #t if a1 = a2 = ... = an

    • otherwise #f
    • (= 1 1+0i 3/3) is #t
  • (< a1 a2 ... an) returns #t if a1 < a2 < ... < an

    • otherwise #f
    • (< 1 2 2.5 pi 16/4) is #t
  • (> ...), (>= ...), (<= ...) work similarly.

Basic Numeric Predicates

Predicates are functions that return a boolean value.

  • (number? a) returns #t iff a is a number.
  • (natural? a) returns #t iff a is a natural number.
  • (integer? a) returns #t iff a is an integer.
  • (rational? a) returns #t iff a is a rational number.
  • (real? a) returns #t iff a is a real number.
  • (complex? a) returns #t iff a is a complex number.
  • (exact? a) returns #t iff a is an exact value.
  • (inexact? a) returns #t iff a is an inexact value.

Basic Pair Functions

These are functions that work on pairs or lists.

  • (car a) returns the car of a
  • (cdr a) returns the cdr of a
  • (cons ar dr) returns the pair '(ar . dr)
  • (list a1 a2 .. an) returns the proper list '(a1 a2 .. an)
  • (list? a) returns #t iff a is a proper list
  • (null? a) returns #t iff a is the empty list '()

Conditionals

If Else

(if condition if-true if-false)

This evaluates the given condition and returns if-true if the condition evaluates to anything but #f. It returns if-false if the condition evaluates to #f.

Example:

(display 
    (if (= n 1)
        2
        4
    )
)

This if-statement returns 2 if n is equal to 1 and returns 4 otherwise.

Switch/Cond

(cond (condition_1 return_1) (condition_2 return_2))

This evaluates each condition_i in turn. When a condition_i evaluates to #t the corresponding return_i value is returned. Conditions after the succeeding one are not evaluated. Only the corresponding return value is evaluated. Returns nothing if no condition evaluates to #t. Can use #t as a condition to implement a default.

Example:

(display (cond
    ((= n 1) 2)
    ((= n 2) 4)
    (#t 0)
))

;; Or similarly
(display (cond
    ((= n 1) 2)
    ((= n 2) 4)
    (else 0)
))

The above cond statement returns 1 if n=1 and 4 if n=2. If neither of those evaluates to #t, the last statement is run and the statement returns 0.

Racket returns #<void> if no condition is matched.

Local Binding With Let

(let ((name1 expr1) (name2 expr2) ...) expression)

This evaluates expression with the given names bound to the corresponding expressions. Note that these variables must be independent of each other. expr2 cannot reference name1.

There is also let* which is the same as let except that variables may reference predecessors.

Lambda Expressions

(lambda (name1 name2 .. namen) expression)

Evaluates to an anonymous function which returns the value of expression when argument names name1, name2, ..., namen are bound to corresponding argument values.

Examples:

(lambda (x) (+ x 2))

The above creates an anonymous function with one argument named x which returns the value of x plus two.

Global Binding with Define

(define name value)

This creates a global binding of the given name to the given value.

Here are some examples:

(define e 2.7182818)

; generally define used to create new functions
(define plus-2 (lambda (x) (+ x 2)))

; there is a shorthand for convenience
(define (plus-2 x) (+ x 2))

Unit Testing

We use a library called rackunit: https://docs.racket-lang.org/rackunit/.

(require rackunit)

; Check if expression equals to #t
(check-true expression)

; Check if expression evaluates to #f
(check-false expression)

; Check if expression does not evaluate to #f
(cehck-not-false expression)

; Verifies that (equal? actual expected) evaluates to #t
(check-equal? actual expected)

Recursion

Iteration structures like for or while are problematic in functional languages like Scheme since they involve updating loop variables.

All loops can be replaced with recursion.

void func(n) {
    for (int i = 0; i<n; i++) {
        // do something
    }
}

The above can be converted to something like this:

(define (func n) (
    (if (> n 1)
        (begin 
            ; do something
            (func (- n 1))
        )
    )
))

This is called tail-recursion which is when the function calls itself as the very last thing the function does.

Challenge With Recursion

Here is an example of the factorial function in scheme:

(define (factorial n) (
    (if (< n 2)
        1
        (* n (factorial (- n 1)))
    )
))

This is a fairly typical definition of the factorial function. But it isn't tail-recursion since the multiplication step can only occur after the recursive call. The challenge here is that for even moderate values of $n$, we will overflow the stack (each recursive call is going to create a new stack frame).

Tail-recursion is handle uniquely in that it doesn't create a new stack frame but rather replaces the stack frame with the new stack frame, at least in DrRacket.

Named Let

A named let in Scheme is a let expression that behaves like a local function definition.

The main advantage of a named let is that it creates a function that can be called recursively from inside the expression.

(let name ((var1 value1) (var2 value2) ...) 
    expression
)

When a named let is first encountered, it behaves like a normal let but it also defines the local function (define (name var1 var2 ...) expression) and this function can be called recursively from inside expression.

We used named let to define a local function fact that takes two arguments: m which identifies which factorial we are on, and p its factorial. This way of defining the function allows us to use tail-recursion.

(define (factorial n)
    (let fact ((m 0) (p 1))
        (if (= n m)
            p
            (fact (+ m 1) (* p (+ m 1)))
        )
    )
)

List Patterns

In imperative programming languages we use some common list transformation patterns on lists. There are three common functions that are commonly used.

  • map: The function applies a function to every element of a list.
  • filter: The function builds a sub-list containing elements that satisfy a predicate.
  • reduce: Accumulates the elements in a list into a single result based on an operation.

map and filter are built-in functions in scheme. reduce can be imported from the library srfi/1 using the expression (require srfi/1).

Here is an implementation of the functions with an example of the use-case:

(define (map func lst)
  (if (null? lst)
      '()
      (cons (func (car lst)) (map func (cdr lst)))))

(define (filter pred lst)
  (cond ((null? lst) '())
        ((pred (car lst)) (cons (car lst) (filter pred (cdr lst))))
        (else (filter pred (cdr lst)))))

(define (reduce func init lst)
  (if (null? lst)
      init
      (reduce func (func init (car lst)) (cdr lst))))

(define (even? x)
  (= (remainder x 2) 0))

(define (square x)
  (* x x))

;; Function to sum the squares of all even numbers
(define (square-sum-evens numbers)
  (reduce +
         0
         (map square
              (filter even? numbers))))

;; Example usage
(square-sum-evens '(1 2 3 4 5 6))

Apply

The apply function allows us to apply a function to a list of arguments.

(apply + (range 1 10 1))

Zip

Sometimes we have two or more lists where the $i^{th}$ element of each list corresponds with one another. The zip function accepts $n$ lists of length $l$ and return $l$ lists of length $n$ comprised of $i^{th}$ element of each incoming list.

Like reduce, the zip function is part of the srfi/1 library.

;; Helper function to add elements of a pair
(define (sum-pair pair)
  (apply + pair))

;; Using zip and map to add corresponding elements of two lists
(define (sum-pairs list1 list2)
  (map sum-pair (zip list1 list2)))

;; Example usage
(sum-pairs '(1 2 3) '(4 5 6))  ; Expected output: '(5 7 9)

Examples Of Scheme

This repository contains examples of Scheme and Prolog code examples: GitHub Repository.

The following are some examples from that repository.

Lukas Numbers

#lang scheme


(define (lukas n)
  (cond ((= n 0) 2)
        ((= n 1) 1)
        (#t (let next-iteration ((m 2) (lnum1 (lukas 0)) (lnum2 (lukas 1)))
          (let ((lm (+ lnum1 lnum2)))
            (if (= n m)
              lm
              (next-iteration (+ m 1) lnum2 lm)
            )
          )
        ))
  )
)

(display (lukas 30))

Sequence Generators

#lang scheme

; Calculate the next term in the sequence given the
; current terms and the coefficients.
; next = term[0] * coefficients[0] + term[1] * coefficients[1] ...
; Assumes length of terms and coeffs are the same. 
(define (calculate-next-term terms coeffs)
  (if (equal? terms '())
    0
    (+ 
      (* (car terms) (car coeffs))
      (calculate-next-term (cdr terms) (cdr coeffs))
    )
  )
)

; Get a function to generate terms of a linear recurrence relation
; defined on a set of initial terms and coefficients. 
; Assumptions:
;     - initial-terms and coefficients match in length
;     - the sequence starts from 0. i.e., 
;       the last element of initial-terms is the 0th term.
(define (sequence-generator initial-terms coefficients)
  (lambda (n) 
    (let ((len_terms (length initial-terms)))
      (if (< n len_terms)
        (list-ref initial-terms n)
        (let function ((index len_terms) (terms initial-terms))
          (let ((next-term (calculate-next-term terms coefficients)))
            (if (= n index)
              next-term
              (function (+ index 1) (append (cdr terms) (list next-term)))
            )
          )
        )
      )
    )
  )
)


; A function to get the nth term in the padovan series 
; padovan(0) = 0; padovan(3) = 2; padovan(10) = 12
(define padovan (sequence-generator '(1 1 1) '(1 1 0)))


(require rackunit)

; Fibonacci Sequence Test
(define fibonacci_test (sequence-generator '(0 1) '(1 1)))
(check-equal? (fibonacci_test 0) 0)
(check-equal? (fibonacci_test 1) 1)
(check-equal? (fibonacci_test 5) 5)
(check-equal? (fibonacci_test 10) 55)

; Padovan Sequence Test
(define padovan_test (sequence-generator '(1 1 1) '(1 1 0)))
(check-equal? (padovan_test 0) 1)
(check-equal? (padovan_test 1) 1)
(check-equal? (padovan_test 2) 1)
(check-equal? (padovan_test 3) 2)
(check-equal? (padovan_test 4) 2)
(check-equal? (padovan_test 5) 3)
(check-equal? (padovan_test 6) 4)
(check-equal? (padovan_test 10) 12)

(define pad1000 95947899754891883718198635265406591795729388343961013326205746660648433358757497113284765865744376976832226705511219598880)

(check-equal? (padovan_test 25) 816)
(check-equal? (padovan_test 40) 55405)
(check-equal? (padovan_test 100) 1177482265857)
(check-equal? (padovan_test 1000) pad1000)

; Lucas Numbers Test
(define lucas_test (sequence-generator '(2 1) '(1 1)))
(check-equal? (lucas_test 0) 2)
(check-equal? (lucas_test 1) 1)
(check-equal? (lucas_test 2) 3)
(check-equal? (lucas_test 3) 4)
(check-equal? (lucas_test 4) 7)
(check-equal? (lucas_test 4) 7)
(check-equal? (lucas_test 39) 141422324)

Prolog

Prolog is a logic programming language. Such as language uses facts and rules to determine the results of queries.

Swish Online Interpreter

Atoms

The fundamental pieces of Prolog are atoms. Atoms are mere tokens and represent whatever the programmer intends. They are constructed using:

  • String of letters, digits, underscores starting with a lowecase letter.
  • Strings of special characters. (Some strings of special characters have predefined meanings)
  • Strings of characters enclosed in single quotes.

The atom socrates only has meaning because the programmer has decided that it does. It could refer to the Greek philosopher or a cat.

Compound Terms

A compound term consists of a name (an atom) and one or more arguments (which are arbitrary Prolog objects).

Query Resolution

Query are resolved by Prolog using backtracking. It uses a depth-first search from the top to the bottom to find facts and rules that can be unified with the current goal.

Terms

Prolog treats terms like sq(2, 3) as a tree:

term sq as tree

It even treats expressions as trees 2 * 4 + 7:

expression as tree

The principle functor of a term is the name and arity of the root node in the term's tree representation.

The principle functor of sq(2, 3) is sq/2 and the principle functor of 2 * 4 + 7 is +/2.

Precedence And Associativity

Each operator has a precedence between 1 and 1200 and an associativity:

  • For prefix operators fx or fy.
  • For infix operators xfx, xfy, yfx, or yfy
  • For postfix operators: xf or yf

In all cases f represents the operator x and y follow the following rules:

  • x represents an argument that must have STRICTLY lower precedence.
  • y represents an argument that must have lower or equal precedence.
Precedence Associativity Symbol
1200 xfx :-
1200 fx :- ?
1100 xfy ;
1000 xfy ,
700 xfx list below
500 yfx + -
500 fx + -
400 yfx * / //
200 xfx **
200 xfy ^

List for 700 xfx: = =:= < =< == =\= > >= \= is.

For the expression: 2 * 4 + 7: The atoms 2, 4, and 7 all have precedence 0. The * operator has precedence 400 and the + operator has precedence 500.

operator precedence and associativity tree example

The * cannot be the principle functor since the child node must have lower precedence. This has one exception with regards to parenthesis: ().

2 * (4 + 7): To achieve this interpretation, we would need to wrap part of the expression in parenthesis. This reduces the precedence of the parenthesized expression to zero.

parenthesis tree

Custom Operators

Prolog permits the declaration of custom operators:

?- op(600, xfx, has).

robert has information

The above statement is equivalent to has(robert, information).

custom operator prolog

Enjoy the notes on this website? Consider supporting me in this adventure in you preferred way: Support me.