Use small, cache line aligned data types; such that objects of those types
won’t spread across multiple cache lines or even worse, go outside of cache.
Temporal locality. Make sure hot data are processed by hot code. Examples
include merging loops that touch the same array into one (called Loop Fusion).
Let’s start from the simplest case. I have two machines. How do I make them
communicate with one another? By communicate, I mean, sending a text message,
an email, an image, or a video from one to the other.
Throughout the history of computer networking there has been multiple ways to
achieve this, but in this post I focus on the modern, industry standard way. The
bare minimum I need are: two Ethernet cables, one for each computer, and a
Ethernet switch. The switch needs at least two ports, and an end of the each
Ethernet cable needs to be plugged in to one of these ports.
A text message is represented by the operating system of machine A as a sequence
of ASCII codes, each of which is four bytes. A byte is eight bits, with a value
of either 0 or 1. A bit, after everything, is what the operating system uses to
talk to hardware components, whether it’s saving to memory, disk, or
transmitting over network. An email is a structured text, which fundamentally
are ASCII codes. An image, is a matrix of tuples, each tuple being represented
by three numbers. Numbers, as ASCII letters, are transmitted as bits. A video,
after some even more complex decoding, is essentially a sequence of bits as
well.
To transmit a bit out, the operating system puts the bit to a buffer of the
Network Interface Card (aka NIC, in our case, the Ethernet card on machine A).
The NIC generates a electric vibration, which is carried over by the Ethernet
cable to the port it connects to of the switch. The switch, a simple integrated
circuit board, gets that vibration and relays it to the other port, which
machine B connects to. Through that other Ethernet cable, machine B’s NIC
receives the vibration, and tells the operating system that a bit of a certain
value has been received. Of course this is an overly simplified version of
what’s really happening (for example the NIC ususally sends more than one bit at
a time, rather it sends a packet or a frame), but it is the principle of how
things are transmitted at the physical layer.
More Machines - MAC Addresses
You may have noticed that a switch typically has 5 or more ports. Some
enterprise switches may have 20+ ports. In fact, often times more than two
machines are connected to the same switch. Let’s assume we add machine C to mix.
How would that change the way machine A sends a packet to machine B?
Machine A’s NIC still generates a sequence of vibrations which are carried over
to the switch. When the switch gets the vibrations (the packet) though, how can
it be sure to rely them to the port where machine B is connected and not the
port machine C is connected? If switch was replaced by a Ethernet Hub, the hub
would have sent the packet to ALL connected ports. This is called packet
repeating or broadcast. This can be very inefficient, insecure, and causes
unnecessary congestion in the hub. A switch, on the other hand, has a smart
trick to make sure packets are relayed to and only to the right port. The trick
is called MAC addressing.
Each NIC connected to the same switch has a unique MAC address. It is a 48-bit
number. While in reality MAC addresses can be updated using software by a system
administrator, for the purpose of this post we will assume that NIC manufactures
take the right cautions to make sure each NIC gets a universally unique MAC
address. When a NIC is connected to a port of a switch, the switch learns its
MAC address. When machine A wants to send a packet to machine B, it includes
machine B’s NIC’s MAC address in the packet’s header, as the packet’s
destination. When the switch sees such a packet, it recognises the destination
address, and relays to the port that machine B is connected to. In the same way,
machine A, B and C, can send packets to one another and the switch will be the
reliable mail man that makes sure they are delivered correctly.
Networks, Routers and IP Addresses
The three machines have formed a small network, in the sense that any one in
the network could talk to any other one. If I get two more computers, I can plug
them into the same switch and I will have a network of five hosts. Now, if I
get another machine, or two more machines, how can I add them to the network? I
don’t have a spare port on the switch any more. Apparently I am going to need
another switch, which can connect five more machines. Let’s assume I now have
two fully occupied switches and ten machines like the following, how could
machine A on switch 1 speak to machine AA on switch 2? Since machine AA is not
connected to switch 1, when switch 1 gets the packet from machine A it doesn’t
know where to send it to. The reserve direction is true too.
[Image of two switches, with a question mark between them]
Sounds like we need something in between them that is willing to forward
packets from one end to the other. And such thing exists as routers. A router
is a device that has two or more NICs, and forwards packets from one to another
according to its routing table. Now let’s add a router with two NICs to the
diagram. One of its NIC connects to a spare port of switch 1, the other connects
to switch 2. To switch 1 and 2, the two NICs are no different than the other
NICs which belong to machines.
With this setup, machine A needs to make some change in the way it sends packets
out too. Though it still wants to send a packet to machine AA, it puts the
router’s NIC1’s MAC address in its packet to switch 1. Seeing that, switch 1
relays it to the router’s NIC1. Getting that, the router looks at its routing
table, and forwards the packet out through its NIC2, which is connected to
switch 2. Before forwarding the packet though, the router replaces the
destination MAC address with the MAC address of machine AA so that when switch 2
gets the packet it will relays it to machine AA. This is how machines behind two
switches communicate with each other - through a router!
IP address. There are some important details we left out in the previous
example. For example, how does machine A know to put the router’s MAC address as
destination? When the router gets the packet the there is nothing that says
about machine AA, how does it know it’s targeted for AA? Sounds like we need
some more metadata in the packet. And that is IP address. In fact, IP
addresses (and IP address ranges) are the key used in a routing table.
First all, every machine on a network is configured with its router’s IP
address. Every machine also has its own routing table. After some complex
network probing process that we will leave out deliberately for keeping this post
brief and focused, machine A understands, from its routing table, that to reach
machine AA’s IP, it needs to send the packet to the router. In the packet it
sends out, it will set the destination MAC address as the router’s NIC1’s, but
the destination IP address as machine AA’s. When the router gets the packet,
it recognizes that IP address, looks it up in its routing table, and finds the
MAC address of machine AA. This is possible because the router is also connected
to switch 2. It replaces the destination MAC address with machine AA’s, sends it
off to switch 2, which subsequently relays it to machine AA.
Now we can see how an IP address in addition to a MAC address is necessary in a
network packet. An IP address based communication crosses the boundary of a
switch.
I am trying to build up my Spanish vocabulary by reading books. Every week I
learn about 200 new words. I’ve been using Quizlet in the past to put new words
into study sets which are basically sets of flash cards. With 200 new words
every week it’s almost impossible to type them in by hand. I am a software
engineer, so naturally I seek a programmatic way, to scale up. Here in this
post I write about the solution I have come up with, and as you will see shortly
it is pretty effective.
Children’s books
My Spanish vocabulary is limited to reading children’s books. Every two weeks I
borrow a new book from the library. In my first round I try to go through the
entire book before stopping for new words, while circling them out throughout
the process. At the end, I typically get about 200 new words that I need to
study.
Translate
Now that I have circled out 200 words from the book, I create a Google
Spreadsheet, and start typing each word in the vertical column. Once I am done
typing, I copy the column, and paste it to the column next to it. I select the
entire second column, and click “Add-ons” > “Translate My Sheet” > “Start a new
translation”. In the pop-out window, I select the source and target languages,
and select “Override cells”. Once I click the “Translate” button, a magic
sequence happens and all the words in the second column will be replaced by
their English translations.
Flashcards
So that saved me a lot of time typing and translating each word using Google
Translate or SpanisDict, but having it in a spreadsheet isn’t as handy as in the
flashcards that I like a lot from Quizlet and TinyCards. I read online that
Quizlet does have an API that can be used to create cards programmatically, but
for some reason they’ve paused the API since December 2018. So I needed to
find an alternative.
Anki
A number of search results pointed to this open sourced flash card application
that lets you bulk create flash cards from a CSV file. It has a desktop version
(Mac, Windows, Linux) and a mobile version (iPhone,
Android).
Better yet, you could sync the cards you’ve created on your desktop to your
mobile app - exactly what I wanted, via a free web service called AnkiWeb. You
do have to sign up for an account and acknowledge some privacy agreements, but
if you are only uploading third-grade Spanish words you probably don’t care too
much. It’s clearly stated on the website that it is run by the money the Anki
developers get from the paid iPhone app so they may start charging in the future
for the sync service if that revenue couldn’t sustain the cost of running the
service. I like their honesty.
Anyway, I installed the Mac version on my iMac, downloaded the Google
Spreadsheet I just made as a CSV file (the Google Spreadsheet UI lets you do
that), and “imported” it into the Anki app as a new set. As can be seen from the
following screenshot, it took almost no time to load a 200 word set.
Sync
Now that I’ve created a new deck, how do I get it on my Android app? Well, I
first need to upload the new deck through the Mac app to AnkiWeb.
If I go to the web UI of AnkiWeb now, I can see that the new deck has just been
uploaded. I can browse cards and study right from the web UI if I wanted. But I
could also log in to AnkiWeb on my AnkiDroid app, and in a few seconds I will
see the same deck there too. It’s that easy.
Summary
To summarize, I read a book, circling out the new words while reading it. When I
finish reading, I go to my computer, and type those new words in a Google
Spreadsheet. With the help of the Google Translate “add-on”, I get the
translations of every word in less than a minute. I download the spreadsheet as
a CSV file, import it into the Anki app on my iMac. That will create a new deck
with all the word pairs automatically. After syncing with AnkiWeb, I go to my
AnkiDroid app and do the sync there, I get to see the same deck. Ready to study!
An open source implementation of the legacy bios. Developed for and used by
QEMU. Can not run directly on hardware. But if coreboot is used as the firmware,
then SeaBios can run as a payload of coreboot.
EFI
An upgrade from BIOS. Full name: Extensible Firmware Interface.
UEFI
Unified EFI. A unitified version of EFI. Promoted by Microsoft. Windows
releases after September 2017 has stopped support for “legacy” BIOS, which is
the traditional BIOS before EFI.