# A Free Online Textbook Introducing Computer Architecture Topics

#### **Tia Newhall**

#### Suzanne J. Matthews

#### Kevin C. Webb

Swarthmore College Swarthmore, PA USA



U.S. Military Academy West Point, NY USA



Swarthmore College Swarthmore, PA USA



#### diveintosystems.org

The opinions expressed in this presentation are solely of the authors and do not necessarily reflect those of the U.S. Military Academy, the DoD or the U.S. Army.

## Dive into Systems:

Free, online textbook introducing systems, architecture & parallel computing, available online at diveintosystems.org Anyone with internet access can use our book!

DIVE INTO SYSTEMS A Gentle Introduction to Computer Systems

SUZANNE J. MATTHEWS, TIA NEWHALL, and KEVIN C. WEBB

#### Also new low-cost print version

- Published by No Starch Press
- For readers who want a print version
  Will always also

remain free online!

No Starch Press, September 2022

... Dive Into Systems × + < → C ↔ diveintosystems.org/book/C5-Arch/von.html △ ☆ ○ ★ □ ● mybookmarks **Dive Into Systems** Search book. **Dive Into Systems** Dive Into Systems / 5. What von Neumann Knew: Computer Architecture / 5.2. The von Neumann Architecture Dive Into Systems Contents 5.2. The von Neumann Architecture 0. Introduction 5.2. The von Neumann 1. By the C, the Beautiful C Architecture The von Neumann architecture serves as the foundation for most mod-2. A Deeper Dive Into C 5.2.1. The CPU ern computers. In this section, we briefly characterize the architecture's 5.2.2. The Processing Unit 3. C Debugging Tools major components. 5.2.3. The Control Unit 4. Binary and Data 5.2.4. The Memory Unit Representation The von Neumann architecture (depicted in Figure 1) consists of five 5.2.5. The Input and Output 5. What von Neumann Knew: main components: (I/O) Units Computer Architecture 5.2.6. The von Neumann 5.1. The Origins of Modern 1. The processing unit executes program instructions. Machine in Action: Executing a Program Computing 2. The control unit drives program instruction execution on the pro-5.2. The von Neumann cessing unit. Together, the processing and control units make up Architecture the CPU. 5.3. Logic Gates ▶ 5.4. Circuits The memory unit stores program data and instructions. 5.5. Building a Processor 4. The input unit(s) load program data and instructions on the com-5.6. The Processor's Execution puter and initiate program execution. of Program Instructions 5. The output unit(s) store or receive program results. 5.7. Pipelining Instruction Execution Buses connect the units, and are used by the units to send control and 5.8. Advanced Pipelining data information to one another. A bus is a communication channel that Considerations transfers binary values between communication endpoints (the senders 5.9. Looking Ahead: CPUs and receivers of the values). For example, a data bus that connects the Today memory unit and the CPU could be implemented as 32 parallel wires 5.10. Summary that together transfer a 4-byte value, 1-bit transferred on each wire. 5.11 Exercises Typically, architectures have separate buses for sending data, memory 6. Under the C: Dive into addresses, and control between units. The units use the control bus to Assembly send control signals that request or notify other units of actions, the ad-7. 64-bit x86 Assembly dress bus to send the memory address of a read or write request to the 8. 32-bit x86 Assembly memory unit, and the data bus to transfer data between units. 9. ARMv8 Assembly 10. Key Assembly Takeaways The CPU 11. Storage and the Memory 1. Processing 2. Control 3. Memory Hierarchy Unit Unit 5. Output Unit 4. Input ALU registers PC IR Units Units 12. Code Optimization **Dive Into Systems** 1.0 ~ address bus control hur Show All X Dive Into Systems.html

## Why a free online textbook?

#### Selfish: We couldn't find "right fit" textbook for our courses

Intro to broad range of systems, architecture, parallel topics at the intro sequence level (assume only CS1 background)

#### Altruistic: Create Useful Resource to Share Widely

- Free (cost not a barrier to access)
- Online (easy to access) and update
- Useful resource for lots of different uses
  - "Mix and match" content easily
  - Primary text: intro. systems, computer organization, C programming, ...
  - Supplemental text: Arch, OS, Compilers, P&D, DB, ...



Source: The Economist

## **Content Overview**

#### Three Main Themes:

- 1. How a computer runs a program
- 2. How systems costs affect program performance (Memory Hierarchy, other)
- 3. How to leverage the power of parallel computing

#### Main Architecture Content

- Chapter 5 on Computer Architecture
- Chapter 11 on Memory Hierarchy and Caching
- Binary Representation & Arithmetic
- Some Parallel Architecture Coverage: Chapter 5, 11, 14, 15
- HW-OS interface: TLB, VM, interrupts, user/kernel level

C Intro
 C Depth
 C Debugging
 C Ode Optimization
 Code Optimization
 Operating Systems
 Operating Systems
 Shared Memory Parallel
 Other Parallel

<u>Coming soon</u>: Using Unix Appendix

## Von Neumann Architecture and Computer Architecture History



Fetch-Decode-Execute-StoreResult PC and IR Instruction: opcode & operands

How it executes instructions:

diveintosystems.org

Images: Lerner Books, sciencemuseum.org, IEEE Spectrum, Encylopedia Britannica

## CPU Architecture: How Computer Runs a Program

Build simple CPU from bottom up, 1-bit circuits from logic gates

1. Create truth table for operation

| A | В | A == B |
|---|---|--------|
| 0 | 0 | 1      |
| 0 | 1 | 0      |
| 1 | 0 | 0      |
| 1 | 1 | 1      |

- 2. Expressions for rows w/output 1 using AND, OR, NOT, combine rows with OR: (NOT(A) AND NOT (B)) OR (A AND B)
- 3. Translate expression into sequence of logic gates from inputs to output





## Abstraction and building up complexity

1-bit version of circuits is building block to create multi-bit versions, which in turn can be building blocks for larger units, ...



## Build up large functional units

From simple arithmetic/logic, control, and storage circuits



#### Put it all together:

Clock Driven Execution, IR, PC, Step through 4 Stages of Execution (for instructions with all register operands)





A. Issue read request to memory using the memory address in PC.



B. Store instruction data in IR and increment PC.









## Parallel Architecture

- In more Detail
  - Pipelining
  - Multicore





- Also high-level overview of others (chapt 5, 14, 15)
  - ILP, Superscalar, Vector Processors
  - Hardware Multithreading
  - Accelerators, GPU as example
  - Flynn's Taxonomy
  - Moore's Law, Power Wall
  - Performance metrics



#### diveintosystems.org



## Book Development: History of community help

External Reviewers of Every Chapter from Experts in our Field (mostly faculty):

- Volunteers, multiple for each chapter
- Strengthened content and presentation
- Helped ensure broad applicability of our textbook

2019-20: Early Adopters Program: Beta Version of our textbook (SIGCSE'20)

- Required textbook at 19 different institutions
- Small stipend (\$100) to faculty from SIGCSE Special Projects Grant
- Feedback on its use in different courses
- Helped further refine topic coverage and presentation

People are egar to volunteer for resource filling need, and free online

#### Book Use

We know of ~50 different institutions using it in their courses

19 Early Adopter Institutions (2019-20)

- Most as primary textbook in intro to computer systems or computer organization courses
- Some using in Architecture, OS, Parallel Computing as primary or supplementary textbook

2020 Survey of Early Adopters Types of Courses using *Dive into Systems* as a required text



## Our Current Effort (NSF funded)

- Primary: Adding Interactive Content to Dive into Systems
  - Online format: ideal for adding other types of content to promote student learning
  - Develop interactive exercises for book chapters
  - Also adding videos of worked examples/solutions
- Secondary: Developing Instructor Portal Content
  - exercises, programming/lab assignments, links to example curricula using Dive into Systems, …

## Adding Interactive Exercises

- Seeking Exercise Developers from larger CS community
  - Use the expertise and help from our larger community!
  - Diversity of uses/ideas/school type/participants
  - NSF funding to provide stipends (\$1,000) to some, also volunteers
  - Groups develop interactive exercises for book chapters
- Students at our institutions
  - Develop tools, implement exercises in Runestone
- 4 Year plan for topic groups:
  - Year 1: 2022-23: C programming, Assembly Programming
  - Year 2: 2023-24: Binary, Memory Hierarchy & Caching
  - Year 3: 2024-25: OS, Shared Memory Parallel Computing
  - Year 4: 2025-26: Architecture

All contributors acknowledged for their work!

#### **Our Current Interactive Tool Development**

#### **Tool Demos**

ASM Visualizer: assembly code tracing

"Ask me another" new functionality added to Runestone\*

\*Runestone (by Brad Miller) is the tool we are using as our main tool, and interface to, our interactive exercise

#### **ASM Visualizer**

## 2. trace its execution: next/prev show reg, stack, instr

Welcome! You are using ASMVisualizer in function mode. In this mode you can write multiple functions to be called by our\_main. Please type your assembly code below and click submit.

| 1 .text                                         |
|-------------------------------------------------|
| .globl our_main                                 |
| .type our_main, @function                       |
|                                                 |
| our_main:                                       |
| push %rbp                                       |
| mov %rsp, %rbp                                  |
|                                                 |
| # Add your code for the our_main function here: |
| mov \$10, %rax                                  |
| add \$30, %rax                                  |
|                                                 |
| pop %rbp                                        |
| retq                                            |
| .size our_main,our_main                         |
|                                                 |
|                                                 |
|                                                 |

Submit

1. type in assembly code & submit

| Inst          | truc     | tions           |                                                       |        |                       |              |               |                 |
|---------------|----------|-----------------|-------------------------------------------------------|--------|-----------------------|--------------|---------------|-----------------|
|               | 6        | 0x401117        | push %rbp                                             |        |                       |              |               |                 |
|               | 7        | 0x401118        | mov %rsp, %rbp                                        | Pro    | gram Out              | put          |               |                 |
|               | 8        |                 |                                                       |        | -                     | -            |               |                 |
|               | 9        |                 | # Add your code<br>for the our_main<br>function here: | Sta    | ck Conte              | nt           | Registe       | r Contents      |
| *             | 10       | 0x40111B        | mov \$10, %rax                                        |        | Address               | Value        | Register      | Value           |
| *             | 11       | 0x401122        | add \$30, %rax                                        | RSP    | 0x1FFF000200          | 0x1FFF000210 | RAX           | OxA             |
|               | 12       |                 |                                                       | RBP    |                       |              | RSP           | 0x1FFF000200    |
|               | 13       | 0x401126        | pop %rbp                                              |        | 0x1FFF000208          | 0x401110     | RBP           | 0x1FFF000200    |
|               | 14       | 0x401127        | retq                                                  |        | 0x1FFF000210          | 0x401130     | RFLAG         | 0x44            |
|               | 15       |                 | .size our main.                                       | 🗆 Auto | oscroll to stack poir | nter         | -             |                 |
| Autoscroll    | to curre | ent instructior | 1                                                     |        |                       |              | □ Show more \ | values on click |
| 🔎 line that j |          |                 |                                                       |        |                       |              |               |                 |
| 🗭 next line   | to exec  | ute             |                                                       |        |                       |              |               |                 |
|               |          | Step 4 of       | 6                                                     |        |                       |              |               |                 |
| F             | First    | Prev            | lext Last                                             |        |                       |              |               |                 |

#### diveintosystems.org

| Cache Org  | ganization: 2-Way Set Associative v Address Length: 8 bits v                    |
|------------|---------------------------------------------------------------------------------|
|            | address: 0b10011010                                                             |
|            | tag: 2 index: 4 offset: 2                                                       |
|            |                                                                                 |
| block size | (in bytes) = 4                                                                  |
| number of  | f lines = 32                                                                    |
| number of  | f sets = 16                                                                     |
|            |                                                                                 |
|            |                                                                                 |
| Genera     | te an Address Check me                                                          |
|            | Good job!<br>Good job!                                                          |
|            | Good job!                                                                       |
|            | Articity O Costs Costs (Art archite infe)                                       |
|            | Activity: 2 Cache System (test_caching_info)                                    |
| ſ          |                                                                                 |
|            | Cache Organization: 2-Way Set Associative V Address Length: 8 bits V            |
|            |                                                                                 |
|            | block size: 8 total number of lines: 8                                          |
|            |                                                                                 |
|            | Usage: Select a range of bits, and then click its corresponding button below.   |
|            | address: 0b 1 1 1 1 1 0 0                                                       |
|            | Your current tag bits: 3 Your current index bits: 2 Your current offset bits: 3 |
|            |                                                                                 |
|            | Set to Tag Set to Index Set to Offset Reset selection                           |
|            | Generate an Address Check me                                                    |
|            | Correct. Good job!                                                              |
|            |                                                                                 |

#### "Ask me another" question like the current one

cache organization, size and address bits

Trace through sequence of addresses, answer questions about effects on cache: direct mapped or 2-way set associative

| Cac  | che Table In | fo     |                     |            | Index     | v      | D | Tag      |
|------|--------------|--------|---------------------|------------|-----------|--------|---|----------|
| Dire | ect-Mapped   | 4      |                     |            | 0         | 0      | 0 |          |
| Dire | ect-iviapped |        |                     |            | 1         | 0      | 0 | 111      |
| 8-h  | it Address   |        |                     |            | 2         | 0      | 0 |          |
| 0.0  | it Address   |        |                     |            | 3         | 0      | 0 |          |
| Blo  | ck Size : 4  |        |                     |            | 4         | 0      | 0 | 110      |
|      | 0120.4       |        |                     |            | 5         | 0      | 0 |          |
| Nur  | nber of row  | 15.8   |                     |            | 6         | 1      | 0 | 101      |
|      |              |        |                     |            | 7         | 1      | 0 | 010      |
|      |              | L      |                     |            |           |        |   |          |
| Ref  | Address      | R/W    | Hit?                | Miss?      | Index     | V      | D | Tag      |
| 0    | 10111000     | R      | $\bigcirc$          | ۲          | 6         | 1      | 0 | 101      |
| 1    | 01011100     | R      | 0                   | ۲          | 7         | 1      | 0 | 010      |
| 2    | 10111000     | w      | 0                   | 0          |           |        |   |          |
|      |              |        | Generate and        | other      |           |        |   | Check me |
| C    | orrect. Goo  | d job! |                     |            |           |        |   |          |
|      |              |        |                     |            |           |        |   |          |
|      |              |        | Activity: 4 Cache 1 | able (test | _caching_ | table) |   |          |

## Interested in Participating?

- Join the *Dive into Systems* <u>mailing list</u> (off diveintosystems.org)
- Look for announcements posted to SIGCSE mailing list
- Can Sign-up now: <u>https://forms.gle/sHUnEsjSVWLptrMo8</u>
  - Link also available as a QR code (right).
  - we will send emails with yearly deadlines
- Timeline:
  - Year 2: 2023-24: Binary Representation Memory Hierarchy and Caching
  - Year 3: 2024-25: OS, Shared Memory Parallel Computing
  - Year 4: 2025-26: Architecture



Do you use our book? Please let us know!

# Thank you!

Questions?



Interested in participating in our new effort? https://forms.gle/sHUnEsjSVWLptrMo8

Read our book/mailing list: diveintosystems.org