C Syntax

Preprocessor Directives

Code at the beginning of a file that starts with a #. This is run before compilation.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_COLS 20 /* max # of columns to process */
#define MAX_INPUT 1000

#define defines a constant value, #include includes another file’s declarations.

Main Function

  • Returns an int, takes no arguments (so void)
int main(void){
    /* Some code */
    return 0;
}

Prototypes

  • The first function “declaration”
  • No body
  • Placed at the top to allow the compiler to do proper type checking

Basic Concepts

Environments

Translation Environment

  • Source code get compiled into object code
  • Which then gets linked with other object code by the linker to form a single executable

Execution Environment

  • The code is then excuted by loading it into memory (an os is not neccessary but usually used), setting it up, running it, and terminating it by returning main’s error code.

Lexical Rules

Trigraphs

  • A 3-sequence of a characters that may not be easily typeable

Comments

  • Begin with /* and end with */
  • Can be multiple lines, by not nested
    • If they’re multiple lines, they get everything on those lines

Declaration

  • A declaration causes memory to be reserved for the variable
  • There is no extra instantiation
  • If variables are uninitialized they’ll just have garbage values

Expression and assignment

  • Assignment returns the value the variable was assigned
  • An expression can be a simple statement, its just not useful unless it has a side effect

Side Effect

  • A change in program state

Booleans in C

  • There is no boolean type, 0 is false, non-zero is true

Arithmetic Conversion

  • Both char and short types are promoted to int types when doing math
  • If arithmetic operations are performed with different int types, the smaller type is promoted to be the higher type

Standard Library

stdio

printf

  • printf takes in a character string with 0 or more format specifiers, if any are present, it also takes in that many other values after the string
  • They contain escape sequences, format specifiers, and normal characters
  • Format specifiers include
    • %d for integers
    • %f for floats
    • %c for characters
    • %u for unsigned int decimal
    • %x for unsigned int hex
    • %o for unsigned int octal
  • Padding can be done by adding the number of spaces to pad by before the specifier ("%4d")
    • You can fill with 0s instead by adding a 0 beforehand ("%04d")

scanf

  • scanf takes in a character string with 0 or more format specifiers, if any are present, it also takes in that many other variable pointers after the string
  • It takes in format specifiers and reads each one into its corresponding variable
  • If too few variables are put in, scanf will wait
  • It too many variables are put in, they will remain in the input
  • If the last data in the input has been read, scanf returns a special EOF value
  • Format specifiers include
    • %d for integers
    • %f for floats
    • %c for characters

feof

  • feof(stdin) will be true if the previous try to read didn’t work because there was no more data
  • You must try to read something for feof to work

assert

  • If assert() argument is false, it will kill the program

stddef

Data

Integers

Values

short <= int <= long

TypeSignednessMax StartMin End
intsigned-3276732767
unsigned intunsigned065535
short intunsigned-3276732767
unsigned short intsigned065535
long intsigned-2147836472147483647
unsigned long intunsigned04294967295
charNA0127
signed charsigned-127127
unsigned charunsigned0255

Literals

  • Decimals
  • In octal when starting with a 0
  • In hex when starting with an 0x
  • Characters literals in single quotes

Promotion

Enumerated Type

Values

  • Enums are stored as ints
  • The numbers after the name are optional
enum Jar_Type { CUP=8, PINT=16, QUART=32, HALF_GALLON=64, GALLON=128 };

Floating-Point

  • Have a decimal point 3.141
  • Have an exponent 1E10
  • Or both 6.023e23

Pointers

  • Every memory location has an address
  • Pointer is another name for address
  • Pointer Variable is a variable whose value is an address of a moeory location
  • There are no point constants because we can’t predict where memory addresses will be
  • To declare a pointer, use the indirection operator before the variable
    • The indirection operation gets the value at an address, so therefore the variable stores a pointer to an address with a value of the given type
    • int *a is an in pointer
    • If you declare multiple on the same line, you need a * for each one
    • When initializing it on the same line, the value you give goes to the variable itself, not the pointer

String

  • C has no string type, but there is a string literal
  • A string is a sequence of characters terminated by a NUL byte
    • A sequence of zero characters is valid

Arrays

Declaration

  • Like declaring any variable, but with square brackets after the name with a capacity value: int values[20]
  • Array sizes must be constants at compile time, so literals or symbolic constants

Initialization

  • You can’t assign variables to each other or compare entire arrays
  • You can initialize an array when you declare it by setting it equal to values in curly braces: int values[20] = {1,2,3}
    • This sets the first 3 items to 1,2, and 3 respectively and fills the others with 0
    • If you pass fewer, it fills the rest with 0s, but you need at least 1

Behavior

  • Array parameters don’t take sizes (at least for one-dimensional arrays)
  • Arrays are passed by value but act like they’re passed by reference, because of Pointers
  • Arrays don’t keep track of their length
  • Accessing an invalid index won’t be an error itself but can cause one by accessing or changing memory its not supposed to

Typedef

Scope

Block Scope

File Scope

Shadowing

  • The variable with the narrowest scope will shadow the other(s)

Linkage

  • Determines if multiple instances of the same identifier refer to the same thing or not

No Linkage

  • All occurrences are different

Internal Linkage

  • All occurrences in a given file are the same

External Linkage

  • All occurrences are the same

Changing Linkage

  • extern will change a Block Scope variable from no linkage to external
  • static will change a File Scope variable from external linkage to internal

Storage

  • Determines the lifetime of the memory of the variable while the program is executing
  • It’s still accessible only from within its Scope, but controls when the value’s memory gets destroyed

const

Structures

Unions

**

Operators

Unary

Negation

  • -
  • flips the sign

Increment/Decrement

  • ++ / --
  • Increments/Decrements by 1
  • Before the expression computes before substituing the expression
  • After expression changes the variable after substitution, so the existing value is used.

Size

  • sizeof
  • Gets the size in size_t of a given value

Comma

  • expr1,exp2
  • Evaluates its first operand and then the right one, producing the right’s output as its own

Indirection

  • *
  • Gets the value at the given address

Address

  • &
  • Gets the address a given value is stored at

Logical

  • Both && and || use short circuit evaluation

Bitshifts

Left Shift

  • value << n
  • shifts value’s bits left by n positions
    • Rightmost bits get 0, leftmost bits get discarded

Right Shift

  • value >> n
  • shifts value’s bits right by n positions
    • If unsigned, the leftmost bits get 0
    • If signed, whether the leftmost bits get 0 or 1 is implementation dependent

Bitwise

  • And (&): Ands two bits
  • Or (|): Ors two bits
  • Xor (^): Xors two bits
  • Complement (~): Negates 1 bit

Masking

  • We can construct masks to select only certain bits from multiple of them

rvalues and lvalues

  • rvalues can appear on the right hand side of an assignment
  • lvalues can appear on the left hand side of an assignment

Evaluation

  • Order of evaluation goes first by precedence, then associativity, and finally unspecified as long as &&, ||, (), ?:, and , work

Precedence

  • Order of given operators

Associativity

  • Whether to operate left to right or left to right

Pointers

  • A pointer is a variable that holds the address of something in memory
  • Pointers can be rvalues and lvalues but addresses can only be rvalues
  • Pointer declarations must be initialized
  • Functions can return a pointer but you must ensure the variable will not be destroyed after the function leaves

NULL Pointer

  • Special values that pointer can have that doesn’t point anywhere
  • Defined in stddef

Make

  • make autobuilds programs by using separate compilation to efficiently only build what’s needed using a user defined dependency hierarchy
  • A Makefile has multiple rules
    • A rule consits of a target, a dependency list, and a set of actions
    • The target is the name/identifier of the rule and it can specifically be built with make target
    • The dependency list follows the target and a colon and an update to any of these dependency files will cause the target’s action(s) to run
    • Each action line follows a target line and starts with a tab character. Each action line is run when the target needs to be rebuilt
  • A dependency will cause the target to be rebuilt if its newer than the target
  • An error in a given action line will cause the action lines following it to not run unless the action line starts with a - which will allow actions lines following to continue to run even if the current one fails
  • Comments start with a # and lines can be broken onto following ones with a \
  • Makefiles provide macros that represent a predefined value defined at the start of the makefile and repeatedly used within it
    • They’re defined with name=value and used with $(name)
    • CC is the name for the c compiler, and CFLAGS is the name for options to use with the compiler
      • Use these to allow for ease of addition and reading
      • Also you don’t need CFLAGS when linking
  • Makefiles often have targets that aren’t true files, like all and clean, which are phony targets

Dynamic Memory Allocation

  • Most data in C is stored on the stack, but sometimes we want to store dynamic variables whose size we can’t predict at compile time
  • Dynamic memory allocation allows us to request memory to use while the program is running

malloc

  • malloc allocates space of the given size and returns a pointer to it, without intializing the memory
  • returns NULL if error

free

  • free releases the memory at the pointer
  • the pointer must point to the start of dynamically allocated memory
  • you must free all memory dynamically allocated if you want to avoid leaks
  • free does nothing other than release memory
  • free(NULL) does nothing

calloc

  • calloc allocates spaces to store num things of each size and initializes all the memory to 0
  • returns NULL if an error

realloc

  • reallocates dynamically allocated memory at ptr with new_size
  • Copies space over to the new space if it can’t be done in the same current space

Structures & Pointers

  • Structures can be self-referential by containing a pointer to its structure tag
  • Structures can contain each other by using a partial struct declaration (just struct name), declaring the other struct, and then fully defining the partially defined struct
  • Function pointers dont need to be deferenced to be called, nor do they need to be set to the address of the function name (the function name itself suffices)

Strings

  • Strings are character arrays with a terminating NUL byte (which does not count for its length)
  • strlen(char const *string) returns the length of the string without its terminating NUL byte
    • it returns an unsigned int so a combination of its return values can never be negative
  • Unrestricted functions
    • strcpy(char *dst, char const *src) copies the string from src into dst
    • strcat(char *dst, char const *src) copies the string in src to the end of dst
    • strcmp(char const *s1, char const *s2) compares two strings lexicographically
    • These functions assume a NUL byte exists and will run until it is found, which may not always be the case
    • An equivalent version of each of the functions exists (strncpy, strncat, strncmp) that take an extra length parameter and only go up until there
    • strchr(char const *str, int ch) searches for a character from the left
    • strrchr(char const *str, int char) searches for a character from the right
    • strpbrk(char const *str, char const *group) searches for a group of characters from the left
    • strstr(char const *s1, char const *s2) searches for a substring

Command Line Arguments

  • main(int argc, char *argv[]) is a valid declaration for main
    • argc is the number of arguments when called, including the name
    • argv is that list, with the first being the name, and argv[argc] being NULL
    • It’s also possible for main to take a third parameter char *envp[], which is an array of strings in the form KEYWORD=VALUE for all environment variables

The Preprocessor

  • #define defines a constant and the value to replace it with
  • Double underscored symbols are already predefined
  • #define also defines a macro, which is the same thing but it takes parameters
    • When defining a macro, wrap the result and uses of the parameters in parentheses
  • Macros do complete textual replacement, so statements with side effects may get executed more than once
  • Macro substitution is not done in function calls
  • Macros can invoke each other
  • Conditional compilation
    • #if cond will only include the code if cond is true
      • you need an #endif, and optionally an #elif and a #else if wanted
    • defined(NAME) checks if the following symbol is defined
    • Nested inclusion
      • Give each headerfile its own symbol, which will only include it once no matter who many times #include is done
    • #undef NAME undefines name
    • -D to the compiler will define a symbol

Process Control

  • Kernel: The layer of the OS that is always running and serves as the hardware interface
  • Not all programs have privileges, so if they want to do important things they must ask the OS to do it for them through a system call
  • Process Table: stores info about each process, page table stores info about each process, and file table stores info abotu files
  • Process can be in the states: new, ready, waiting, running, terminated
  • Signals: A message to a process to notify it of en event
    • The kill() system call sends messages, not always SIGKILL
  • Including <err.h> gives you functions that properly print errors
    • err() prints a given error string and the library explanations and quits
    • errx() prints a given error string and quits
  • fork()
    • Creates a new process and returns a pid_t, which is 0 for the child, >0 (child pid) for the parent, and -1 for an error
    • Most things are inherited
  • wait()
    • Reaps the next finishing direct child process from the process table
    • Passing in a pointer to an int will give you a status
    • Using macros like WIFEXITED and WEXITSTATUS
  • exec()
    • Runs the given programs, and never returns, and the new program replaces the process
    • If error, returns -1

:ID: 43947200-ddd3-4794-b144-db11f2fa7dc4

System-Level I/O

  • File Descriptors
    • A process-level object that is associated with a given file
    • STDIN is fd 0, STDOUT is fd 1, and STDERR is fd 2
  • open(const char *filename, int flags, int mode (optional))
    • Opens a file, returning a file descriptor

I/O

Assembly

Registers

  • 4 bytes
  • Names start with a $
  • General purpose registers are $t0 through $t9
  • Using a register in parentheses, like ($t0), means use the address that the register stores
    • Putting a constant before the opening parentheses offsets the address stored by the given amount
  • Register spilling is when we have too many values to keep only in registers

System Calls

  • Asks the OS to do something
  • Done in MIPS by loading a syscall code in $v0, arguments into $a0 through $a3, and then the syscall instruction

Syscall Codes

  • exit
    • code: 10
    • arguments: none
    • return value: none
  • print_int
    • code: 1
    • arguments: $a0 - the integer to print
    • return value: none
  • print_char
    • code: 11
    • arguments: $a0 - address of string to print
    • return value: none
  • read_int
    • code: 5
    • arguments: none
    • return value: $v0 - the integer read
  • read_char
    • code: 12
    • arguments: none
    • return value; $a0 - the character read

Directives

  • They tell the assembler something about how to assemble the program
  • .text says that what comes after goes into the text segment
  • .data says that what comes after goes into the data segment
  • .word says stores what follows on the current line to successive memory locations as ints
  • .asciiz says store the contents of the following double-quoted string as a null terminated string

Instructions

  • A command to run that consists of 3-4 letter code, and then 2-3 registers
    • The first register is always where to put the result
    • The third operand can be a literal

Labels

  • A human-readable alias for a memory address of an instruction/data item

Functions in Assembly

  • Store how to get back, jump there, jump back, reset

Stack Frame

  • A set of data that stores information about a called function
  • $fp points to the first word in the current stack frame
  • $sp points the first free location just past the stack
  • When growing the stack, make sure to leave space for local variables + 8 for $fp and $sp
  • Make sure to reload registers that you need after a function call
  • Prologue

        sub    $sp, $sp, 8
        sw     $ra, 8($sp)
        sw     $fp, 4($sp)
        add    $fp, $sp, 8
        
  • Epilogue

        lw     $ra, 8($sp)
        lw     $fp, 4($sp)
        add    $sp, $sp, 8
        jr     $ra
        
  • Parameters + Return Value + Local Variables

        main:   li      $sp, 0x7ffffffc # init sp
                li      $t0, 1
                sw      $t0, ($sp)      # put the argument on the stack
                sub     $sp, $sp, 4     # grow the stack
                jal     f               # call the function
                add     $sp, $sp, 4     # pop the arg off
                mov     $t1, $v0        # get the return value
                mov     $a0, $t0        # print it
                mov     $v0, 1
                syscall
                mov     $v0, 10         # exit
                syscall
        f:      sub    $sp, $sp, 12     # grow the stack
                sw     $ra, 12($sp)     # store the ra
                sw     $fp, 8($sp)      # store the fp
                add    $fp, $sp, 12     # set the new fp
                lw     $t0, 4($fp)      # load the argument
                add    $t0, $t0, 1
                sw     $t0, 4($sp)      # store it as a local variable
                mov    $v0, $t0         # set the return value
                lw     $ra, 8($sp)      # get the return address
                lw     $fp, 4($sp)      # get the old frame pointer
                add    $sp, $sp, 12     # decrement the stack
                jr     $ra              # go back
        

Concurrency

  • If it can be active at multiple places, it’s concurrent
  • Threads are part of the same program and can share data
  • To use pthreads in C, use -l with gcc

Threads

  • Threads share heap memory, global and static memory, files, and a virtual address space but have their own thread id, runtime stack, and other important registers (stack pointer, etc.)
    • This means they share global variables, static local variables, and dynamically allocated data

int pthread_create(pthread_t *tid, pthread_attr_t *attr, void *(*func)(void *), void *arg)

  • tid is a pointer that will be filled in with an id
  • attr is NULL
  • func is the function to execute
  • arg passes an argument
  • 0 on success, nonzero on erro

pthread_t pthread_self(void)

  • Get’s own thread id

void pthread_exit(void *retval)

  • Terminates the calling thread, which returns the retval

int pthread_cancel(pthread_t tid)

  • Terminates the thread with the given tid
  • Returns 0 on success, nonzero on error

int pthread_join(pthread_t tid, void **retval)

  • Reaps the given thread and frees its memory usage
  • Blocks until termination (if already terminated, instant)
  • Returns 0 on success, nonzero on error
    • If success, *retval gets the return value

Synchronization

  • Force concurrent operations to happen in some relative order
  • Code that modifies a shared variable is a critical section

Atomicity

  • When something starts running by one thread and is not interrupted until it finishes

Semaphore

  • A special integer counter
  • If counter is 0, threads must wait
  • If counter is nonzero, threads process and counter is decremented
  • Once a thread is done, it tells the semaphore, which then increments the counter
  • If a counter becomes nonzero while threads are waiting on it to do so, one of the threads will be picked to run
  • int sem_init(sem_t *sem, 0, unsigned int value)

    • Intializes the semaphore given to value
    • 0 on success, nonzero on error
  • Wait

    • The P operation
    • Blocks if counter is 0
    • Returns if counter is positive
    • Atomic operation
    • int sem_wait(sem_t *sem)
      • 0 on success, nonzero on error
  • Post

    • The V operation
    • Increments the counter
    • Unblocks a random waiting thread if threads are waiting
    • Atomic operation
    • int sem_post(sem_t *sem)
      • 0 on success, nonzero on error

Mutual Exclusion

  • Only allow a certain number of threads to execute a block of code

Condition Synchronization

  • Control the relative order of actions performed by different threads based on some condition
  • sem_wait and sem_post are usually in different threads