wonderfly Blog

Demonstrative Pronouns

Definition

This, that, these, those. For example, “I like this”, “I want that”.

In Spanish, demonstrative pronouns look identical to demonstrative adjectives, except that the pronouns all carry an accent mark. For example, “This car” is Este coche while “I like this (car)” is Me gusta éste.

Another note is that there are three neuter demonstrative pronouns which are kind of the equivalents of the English “it”. Full table:

  Masculine Feminine Neuter
this éste ésta esto
these éstos éstas  
that ése ésa eso
those ésos ésas  
that (far) aquél aquélla aquello
those (far) aquéllos aquéllas  

Note that the neuter ones don’t carry an accent mark.

UTLK - CHAPTER 1 Introduction

Kernel Architecture

Monolithic v.s. microkernels. Like most Unix kernels, Linux is monolithic: single binary in one address space. The major advantage a monolithic kernel has over a microkernel is performance, because there aren’t many inter-process messaging as the latter has. That said, microkernels have a few legitimate pros on its own:

  • Modularized design. Many kernel components are broken down to smaller binaries and inter-process communication requires well designed interfaces
  • Portability. Because the platform dependant portion is stripped down to a minimal (only the main kernel), it’s easier to port a microkernel to a different platform
  • Better memory performance. Unneeded kernel code isn’t loaded into memory, unlike a monolithic one which is always mapped at its entirety.

To achieve some of the above advantages, Linux offers modules:

  • Modules force good modular design. Interface between a module and the main kernel should be well designed.
  • Platform independence.
  • Frugal memory usage. Tiny systems like embedded devices can choose to not load all modules.
  • No performance penalty. Loading and unloading a module is much cheaper than creating and destroying a process like in the microkernel model.

An Overview of the Unix Filesystem

At the heart of the Unix operating system is its file system.

Files and Directories

A file is a sequence of bytes. A file system is a user level abstraction of the data stored on physical devices like a hard disk, because user processes don’t interact with devices directly.

A directory contains information about the files and directories beneath it.

A file can have multiple names, in the form of links. A file is only deleted when its number of links decreases to zero. Links have two major limitations:

  • Links to directories are not allowed. This is to prevent cycles the directory tree.
  • Links can not cross filesystems. There might be multiple of them mounted at the same time, for example from two hard drives, or two partitions on the same drive.

To mitigate those limitations, there exist soft links. They are called so to distinguish from the regular links. Soft links can be created for directories, and can cross file system boundaries. To create a soft link:

ln -s <target> <link>

File Types

  • Regular files, directories, symlinks
  • Block device files, character device files, pipes, sockets

File Descriptor and Inode

Unix distinguishes between the contents of a file, and the metadata of a file. The former is purely a sequence of bytes, while the latter is a data structure that holds information like creation time, owner, access rights, etc. The latter is often called the inode. Each file has its own inode.

Access Rights and File Mode

Owner, group, other, read, write, execute, suid, sgid, sticky. Note: the “sticky” bit on an executable tells the kernel not to release its code block from memory even when the process that executes it has terminated. It has been deprecated.

File-Handling System Calls

Opening a file

  fd = open(path, flag, mode)

The return value, fd, is the File Descriptor, which is an index to an open file object, which contains information about the current read offset (file pointer), pointers to kernel functions that the process can invoke, and etc.

Accessing an opened file

read, lseek.

Closing a file

close.

Renaming and deleting a file

These operations are actually on the directory that the file is in. To delete a file, use unlink(pathname). As can be tell from the function name, this decreases the link of the file by one. It won’t be deleted until the link number drops to zero.

An Overview of Unix Kernels

A quick overview of almost all aspects of a Unix kernel.

The Process/Kernel Model

CPUs have multiple execution states, two of them are “Kernel Mode” and “User Mode”. In Unix, user processes run in “User Mode”, while kernel threads run in “Kernel Mode”. In the majority of time, some user process is running on the CPU while the kernel is largely asleep, except for a few lightweight “kernel threads”. There are a few cases where the main kernel body will be run:

  • When a user process explicitly requests the kernel through a system call.
  • When some exception happens while executing a user process, e.g., page fault.
  • When a hardware device interrupts the CPU through a signal.

Process Implementation

The kernel isn’t a process, but a process manager. The implementation of a process involves the “pause” and “resume” of one. To pause a running process, its states are dumped into a data structure, the process descriptor. To resume, the kernel loads the CPU with those states, thus the process can continue running like it never stopped. The information that is dumped and loaded include:

  • The program counter (PC) and stack pointer (SP) registers
  • The general purpose registers
  • The floating point registers
  • The memory management registers used to keep track of memory accessed by the process

Reentrant Kernels

All Unix kernels are re-entrant. This means a kernel execution path can be suspended, while another takes over the CPU, and then resumed when the latter finishes.

Process Address Space

Every process runs in its private address space. When switched into kernel mode, it runs in its private kernel address space.

There are a couple of exceptions where the same memory region may be shared among processes:

  • A commonly used program’s code section is loaded into memory and shared among all processes that use it. Examples are editors, and shared libraries. When this happens it is usually done automatically by the kernel without the user noticing.
  • Two user processes could explicitly request that a region of memory to be shared. This is enabled for Inter-Process Communication (IPC), and is usually referred to as “shared memory”.

  • User processes could use the mmap() system call to map a file, or a block device (essentially also a file) into its address space. This is usually to facilitate data read or write. The mapped file can be shared with other processes.

Signals and Interprocess Communication

Signals is a way for the kernel to notify user processes of system events: asynchronous events like a terminal interrupt (SIGINT) or synchronous events like failure to access a memory location (SIGSEGV). POSIX defines about 20 signals, two of which are user-definable. User processes can write code to handle the receipt of a signal, either to ignore it, or execute something asynchronously. If none is defined, the kernel has some default actions depending on the type of the signal.

While signals are one-way communication from the kernel to user processes, Unix provides another communication mechanism to allow two user processes to pass information to one another, and that is called Inter-Process Communication or IPC in short. Three main implementations are Shared Memory, Semaphores, and Message Queues. The kernel implements these constructs as IPC resources. Like files, IPC resources are persistent. User processes need to destroy them when unused.

Process Management

Note the distinction between a process and the program it executes. A process can be created with the fork() system call and terminated by _exit() while a program can be loaded into a newly created process by exec().

Zombie processes

When a process is terminated, it is put into the zombie status, until its parent calls the wait4() syscall or something similar, upon which the kernel will release the resources held by the child process.

If the parent process terminates without calling wait4() on its children, the kernel makes the init process parent of those child processes, which periodically calls wait4() on all its children.

Process groups and login sessions

A process group, or a job, is a convenient way to manage multiple processes that represent a single “job” abstraction. For example, $ ls | sort | more.

A login session is a higher level container that has information about all processes, or process groups, started in the same shell login session. Only one group of them can be in the foreground, and multiple can be in the background. The shell internal commands bg and fg can be used to toggle a process group between background and foreground.

Memory Management

Memory management (MM) is the most complex activity in a Unix kernel. A third of this book talks about memory management. The essence of MM is to efficiently share a fixed amount of resource (memory frames) between a large number of concurrent processes.

Virtual memory

The concept of virtual memory makes sharing easier: each process thinks it is given a large, contiguous, private memory chunk to operate on. With the help of a dedicated hardware unit, the Memory Management Unit or MMU, whose sole purpose is to translate virtual memory addresses to physical memory addresses, virtual memory management can be very efficient.

A major problem that virtual memory must solve is fragmentation: though a memory allocation request should only fail when there is not enough free pages left, the kernel is often forced to use contiguous physical memory.

Random access memory usage

Editor’s note: I personally think this section can be better titled as “RAM partition”.

Modern Unix operating systems partition the entire physical memory space into two areas: a few megabytes reserved for the kernel code and static data structures (the kernel image), and the rest that can be used for:

  • To satisfy kernel requests for dynamic memory like buffers, file descriptors, etc.
  • To satisfy user process memory allocation requests, e.g., malloc().
  • Cache for disk contents or other buffered devices.

Kernel Memory Allocator

The Kernel Memory Allocator (KMA) is the subsystem that satisfies memory allocation requests from all parts of the system. A good implementation has to be fast, make efficient use of memory, minimize fragmentation, and work well with other memory management subsystems. There are a few algorithms, and the Linux kernel uses a Slab allocator on top of a buddy system.

Process virtual address space handling

An address space is the set of addresses assigned to a process. When the kernel does this assignment, it assigns a list of address regions such as code region, data region (uninitialized and initialized), shared memory, and the heap. User processes can expand their address space by calling malloc() or brk().

Copy-on-write. As mentioned earlier, a process is created by forking its parent process. To minimize page duplication, the parent’s page frames are assigned to the child process but with read-only permission. When the child first tries to write to one of the pages, it gets duplicated and a new frame is assigned to the child.

Caching

Disks are very slow compared with RAM, and it is not uncommon that a process may read data once read or written by a process that doesn’t exist any more. Caching disk contents in RAM can save a lot of time reading disks.

The sync() syscall exists for explicit flushing of “dirty” pages (cache pages that differ from its on-disk copy) on to disks.

Device Drivers

The last but not least part of a Unix operating system is device drivers. Linux has a good separation between the device driver code and the rest of the kernel code, through a well defined interface. This interface makes it possible for the kernel to access ALL drivers in a uniform way, and allows device manufactures to add new devices without knowing the kernel source code.

MOS - Security

9 Security

9.3.4 Trusted Systems

“Is it possible to build a secure operating system?” “Theoretically yes, but there are reasons why modern OSes aren’t secure”:

  • Users are unwilling to throw away current insecure systems. For example, there is actually a “secure OS” from Microsoft, but they never adversite it and people won’t be happy if told that their Windows systems will be replaced.
  • Features are the enemy of security, yet system designers keep adding features that they believe their users will like. For example, emails used to be ASCII texts which in no way poses a security threat, and one day email developers added rich contents like Word documents that can have viruses written in macro. Another example is web pages, which used to be static HTML files that were secure, until the appearance of dynamic widgets like applet and JavaScript. Since then security problems pop one after another.

9.3.5 Trusted Computing Base

At the heart of a “trusted system” is a minimal Trusted Computing Base (TCB). If the TCB is working to specification, it is believed that the system security cannot be compromised, no matter what else goes wrong.

TCB typically consists of most of the hardware, a portion of the operating system, and most or all user programs that have superuser power (binaries with SETUID bit for instance). MINIX 3 does a great job minimizing the TCB: its kernel is only a few thousand lines of code, as opposed to the Linux kernel which has millions if not tens of millions. Everything else that don’t belong in the TCB is moved to user space, such as printer and video card drivers.

9.3.6 Formal Models of Secure Systems

There are research that attempts to prove that given a set of authorized states of a system, and a list of protection commads (that grant of revoke permissions), whether it is possible to derive a unauthorized state, using formal verification methods.

9.3.7 Multilevel Security

DAC v.s. MAC

Discretionary Access Control (DAC) v.s. Mandatory Access Control (MAC). The former gives owners of resources the right to decide who else can have access to resources they own. The latter, however, says an organizational admin globally mandates who has access to what, thus enforcing a stronger information security. MAC is common in the military, organizations, and hospitals.

Multilevel

The concept “multilevel” refers to the design that there are multiple security levels on a system, and users/processes at each level has restricted access to resource on the system. Two notable models of multilevel security are the Bella-La Padula model and the Biba model.

The Bella-La Padula model was designed for military use, where the core of security is information security, i.e., a general can have access to most information including everything a lieutenant knows, but he cannot tell everything he knows to the lieutenant. When applied to an operating system, this means a process can read files with an equal or lower level, but can write to files with an equal or higher level. This is why the model is often remembered as “read down and write up”.

The Bella-La Padula model makes sure information never flows from a higher level to lower, which is great for the military, but when applied to a corporation, it may break data integrity. For example, it would entitle an engineer to write to the CEO’s company OKRs or the CFO’s financial reports. For corporations, an exact opposite model exists, and that is the Biba model.

9.3.8 Covert Channels

Even when mathematically proven secure, a system could still leak information, through various Covert Channels. A covert channel can often be established by an agreed protocol between the leaker and the receiver, not through the usual IPC or RPC mechanism but something unobvious. For example, they could agree on that the leaker would signal a bit 1 by computing for a certain interval of time, and signal bit 0 much more promptly. This way the receiver could construct a bit stream from the leaker’s response times to its requests. Other signals can be paging rates, status of a lock file, etc. And by multiplying the number of signals, the bandwidth can be significantly increased.

In practice, just finding all the covert channels is challenging, let alone blocking them.

Steganography

Steganography is a slightly different kind of covert channel. A classic example is by tweaking the three lower bits of the RGB bytes of an image file (one per color), and with the help of simple compression, one could embed as much as 700KB of secrete data in a 1024x768 image, without anybody noticing a difference in the resulting image. Many websites use steganography to insert hidden watermarks into their images to prevent from theft and reuse on other web pages.

Example: https://www.cs.vu.nl/~ast/books/mos2/zebras.html

9.4 Authentication

Most authentication methods are based on one or more of the three general principles:

  1. Something the user knows
  2. Something the user has
  3. Something the user is

9.4.1 Authentication Using Passwords

It’s a good practice to always prompt for password even when a user doesn’t exist, and if you have a laptop computer, take the time to set your BIOS password so anybody can’t take it, change your boot sequence, and bypass your system login (and eventually steal your data).

How Crackers Break In

  • Usernames and passwords are easy to guess - 80% of them
  • A classic break-in sequence:
    • Construct a dictionary of common usernames and passwords
    • Choose a domain name, e.g., foobar.edu
    • Run a DNS query to get its network address (upper bits of IP)
    • Run a ping to probe all hosts that are live in that network
    • Run telnet to log in, using the usernames and passwords from the dictionary
  • As you can see, the above is very easy to automate and is what computers are good at
  • Once break in, the cracker typically installs a packet sniffer, that examines incoming and outgoing packets for certain patterns (password sent to a bank’s site for example)

UNIX Password Security

Unix used to have a world-readable file (/etc/passwd) with all the usernames and passwords. This is for sure not secure. An improvement was to store an encrypted version of passwords, but it could be worked around because the encryption algorithm was well known. With the usual dictionary of usernames and passwords, the cracker could, at their leisure, encrypt all the passwords with the same algorithm, and compare them one bye one with the entries in /etc/passwd. Any matching entry will mean a compromised user login.

Morris and Thompson came up with a technique that defeat this attack, or rather made it harder than what it could ever achieve. The idea is to assign a random number to each password, called salt, and encrypt the salt and the password together. Now, imagine in the same situation, the cracker has access to a /etc/passwd, their usual dictionary will have to be updated to take account into the added salt numbers. But because the salt is randomly assigned, for any possible password, they would have to enumerate all possible salt numbers, encrypt their concatenation, and then compare against the entries in /etc/passwd. That just increases the complexity by 2^n, n being the number of bits the salt has, which for UNIX, is 12.

COMPASS 2018 - Communicate, Connect and Career

I just came back from COMPASS 2018, the first career conference for Asians in Silicon valley, with some great take-aways. This was an event organized by Leap.ai and sponsored by big names like Google and DiDi, as well as small startups like Anki. There were a number of high profile silicon valley celebrities including SVPs, VPs and directors from Google and Facebook, in the speaker list as well as various panels. It was such a great opportunity to hear them tell their stories of career development, and I felt stuffed coming back from the eight-hour event.

Biggest takeaways for me were: it is important to understand how your work fits into your company’s roadmap; you need to know how to “manage” your manager; managers look forward to people who can lead (whose words others listen and whom others look up to); and it is possible for somebody who wasn’t born in the states to speak perfectly good English (which was an eye opener for me). Nonetheless, I am going write down my full notes because there was just so much good advice from the experienced.

Edit: Came across this great summary from one of the event organizers: https://medium.com/leap-ai/10-takeaways-on-achieving-career-success-from-senior-leaders-in-silicon-valley-fd6efebaac4c

Sridhar Ramaswamy

Sridhar Ramaswamy,SVP of Ads, Commerce and SPI at Google, gave the opening keynote. He reflected on his own professional experience, and shared how to stand out at work:

  • Passion. It takes 10,000 hours to be an expert in most fields, and that is ten hours a day for almost three years. You need to be passionate about what you do for that long.
  • Breadth. You want to have expertise, but don’t lock yourself up in your small cell. Great product needs expertise in many facets and at least understanding other related fields will help your growth a lot.
  • Growth. To grow, you need to understand the context around you, and make yourself helpful to other aspects of the product.
  • Belief. Managers like people who believe in what they do as a team. But most importantly, you need to be happy about what you do.
  • Manager expectation. Opportunities favor those who show motivation, who drive things, and who show their willingness to learn.
  • Being a manager. If you find yourself the smartest person on your team, you are most certainly not a good manager. Enpower people.
  • Speaking. You may not like it, but to lead you need to communicate and motivate others.
  • People relationship. You need to be a good peer.

Denise Peck

Denise Peck, former VP at Cisco, talked about how to ACE your career as an Asian, by showing examples of her own success story getting into Stanford business school, switching careers from marketing into IT, and so on.

  • Myth: Asians are most likely to be hired, but least likely to be promoted.
  • Leadership mentality ACE: Aspire to more, Commit to growth, and Engage.
  • Imagining yourself in the role you want to be in helps achieving it.
  • Consistency. You may be watched the least you expect.
  • Demonstrate that you can work as a leader.
  • Take tough career risks -> growth, rewards.
  • Engage others, and build influence skills.
  • Soft skills are even more important when you move up.
  • As an immigrant from China, as I am, she speaks surprisingly good English

Yanbing Li

Yanbing Li, SVP and General Manager at VMWare, told her story from surviving a crisis from their parent company (EMC) to leading in their grandfather company (Dell). Like Denise, she also speaks like a native American, even though she didn’t come to the US until after college, as I did. I didn’t take many notes at her talk, but perceived the importance of being a good story teller for one’s career.

Jia Jiang

This was arguably one of the best parts of the event. Jia Jiang, a Chinese boy born and raised in Beijing, came to the states for grad school and worked as an engineer for his first job like many Chinese do, but he was something special. Instead of spending his life climbing ladders in a silicon valley company, he did something very original that gained him popularity on the Internet, and he now makes a living on public speaking and writing, in English, which I don’t see barely any Chinese immigrants do. His talk was about how to overcome the fear for rejections, which is a common pain for many Asians, and also what made him famous. It was a great half an hour of wisdom, with no short of his unique humors.

  • His TED talk on 100 days of rejections: https://www.ted.com/talks/jia_jiang_what_i_learned_from_100_days_of_rejection
  • In the end, rejections are numbers, are opinions. If you keep asking the same thing for enough times, you will get an acception. It’s about how many times you are willing to ask. And for the same request, you may get very different responses from different people. That proves that rejection is not necessarily a problem about you, but of others. It’s an opinion.

Panel: Influencing up and managing up

This was our first panel. It was about how to manage your relationship with your boss, and how to make them believe that you deserve a raise. Panelists were Charles Fan from MemVerge, Chandu Thota from Google, Dawei Feng from Intel, and Eric Rosenblum from Tsingyuan Ventures.

  • Manage you manager. Prepare agenda for 1:1s, understand your manager’s needs
  • Framework to help: Align purpose (on what you will do), Getting access (to information or resource through your manager), and Agreeing on value (potential impact, rewards etc.)
  • Dynamic environment. You need to constantly remind yourself of that reality.
  • Managers’ expectations. First level: you get your assignments done; second: you do things that save your manager some time; and third: you create something that your manager had never thought possible but benefits your team greatly.

Panel: Taking initiatives

  • Understand risks and rewards. Take right, balanced, initiatives.
  • Small things count. Inefficiencies in current systems, processes that help your peers, …
  • Create a concrete career objective. Yes we are all told that interest should be driving everything, but when it isn’t, you need extra motivation.
  • When you are not clear what initiatives to take, you are probably missing the big picture. Find a mentor higher up and they can give you more information.

Panel: AI

Panelists talked about trending technologies in AI, and touched on the threats on people’s jobs from AI.

  • Fact is, there are more job openings that unemployed people. They need skills.
  • Displacement and not replacement. AI is not replacing human beings, but humans need to evolve into jobs that technology can’t do, just like we always have in history. Continuous learning. Great sense of trending techs are key.
  • Creativity is hard to replace.

Panel: Women in tech

This was a very special panel and one of the most profound ones I found. Four great women leaders from Google, Facebook and Pinterest shared their stories and gave advice on many good questions from the audience.

  • Focus on growing skills and not external motivation. For every perf cycle, so to speak, you should set some goals for the skills you want to grow, and not worry about whether you will get promoted at the end of it or not. The former is more sustainable and will help your long term career more.
  • Work life balance. Work with your spouse. They are the most important teammate/partner of your life. Spread the work, support each other, and you both will have opportunity to have a successful career as well as a happy family life.

Panel: Choose the correct opportunity

This was a panel focused to making the important decision of changing a role or a job. Had some great insights.

  • First and foremost, ask yourself, why do you want to move on? Answer that before you start looking.
  • Where should I go for my next job? A good candidate is a company that’s going to IPO soon or has just went IPO. Two reasons: they are new but sort of established, your job won’t go away in a few years; a lot early employees will be cashing in which leaves many seats to fill.
  • Coffee budget. Set aside some change, and most importantly, time, to have coffee with others and learn from them.
  • Where you work determines your impact. Environment » individual.
  • Don’t be afraid of chaos. It is a norm and not an exception, which I shockingly learned was a thing for most Chinese.

Interrogative Pronouns

Definition

To initiate a question. English equivalents: “who”, “whom”, “whose”, “what”, and “which”.

Which? ¿Qué? ¿Cuál? ¿Cuáles?

Both Qué and Cuál can be used to mean the English word “which”, and when to use each can be confusing. Here are a couple of rules to help distinguish:

  • Generally speaking, Qué precedes a noun and Cuál precedes a verb.
  • If you want a definition, use Qué.
  • If there are many possible answers, and you want to know which one, use Cuál.

Examples (followed by a noun v.s. a verb/preposition):

¿Qué vestido prefieres? Which dress do you prefer?
¿Cuál prefieres? Which (one) do you prefer?
¿Cuál de los vestidos prefieres? Which of the dressess do you prefer?
¿Cuáles prefieres? Which ones do you prefer?

Examples (definition v.s. one of many possibilities):

¿Qué es tu nombre? What is your name? (definition of name)
Mi nombre es la palabra la gente usa cuando me llama My name is the word people use to call me
¿Cuál es tu nombre? What is your name? (which one of all the names is yours)
Mi nombre es Pedro My name is Pedro