A Walk through beautiful Vienna
District 1: Systems Design (Senacor Office)
- Design Steps:
Senacor Office
- Scope the problem: build what interviewer wants: what, who, where, whenJapanese Embassy
- Make reasonable assumptionsVotivkirche
- Draw major componentsFancy Spar
- Identify key issues (bottlenecks)U5 Site
- Redesign key issuesTown Hall
- Concepts: Horizontal scaling: adding additional resourcesBurgtheater
- Vertical scaling: propping up existing resources, easierParliament Cupola
- Load Balancing: distribute server load so it doesn’t crash by cloningLaw Palace
- Denormalization: adding redundant info to speed up reads (because joints are slow)VOGA Club
- NoSQL: does not support jointsParliament
- Sharding: splitting data across multiple machinesVolksgarten
- vertical: by function, e.g. one table for profiles, one for messagesHofburg
- key-based: some part of data, m mod N (N: number of servers)Kanzleramt
- directory-based: maintain a look-up tableHofburg
- Caching: provide simple key-value pairing (query & result) between application, and data storeHofreiterschule
- Async processing: for slow operations (like top forum posts, loading DNN)Raiba @ Graben
- Network metrics:Full Square
- bandwidth: max amount of data in tReal Square
- latency: time from start to endTime to Raiba
- throughput: actually amount of data in tMeinl
- Map/reduce: map takes data & returns key: value, reduce takes key:value and reduces it e.g., key: sum(values)Brunnen
- Considerations: Failure: plan for test casesGreen Habsburghouse
- Availability: % system is operationalDo & Co
- Reliability: % that systems is operationalStephansdom
- Read-heavy: consider cachingTower queue of Dom
- Write-heavy: consider queuing writesSecurity Peeps at Dom
- Security: plan for test cases
District 2: Object-Oriented Design (Salztor-Bridge)
Salztorbruecke
- Approach: Handle ambiguity: who is going to use system & how, when, what, where, whyRLB
- Define core objects: mostly nounseforce
- Analyze relationships: 1-to-1, 1-to-many, many-to-manyIBM
- Identify actions: mostly verbs of objectUniqa
- Design Patterns: Singleton: ensures that a class has only one instance (e.g. restaurant)Sternwarte
- Factory: offers interface for creating instance of a class with subclass deciding which class to instantiate (e.g. pipeline)
District 2: API Calls (RDG)
coffee
- Get: retrieve resourcemy office
- Post: create new resourceSusanne
- Put: edit / updateDietmar
- Delete: remove resourceHarald & Seb
- Object orientation: Difference of state & functionMarcus Presich
- Everything is an object in Python
District 3: Sorting & Searching Algorithms (Stadtpark)
Enter
- Bubble Sort: swap element “previous” and “current” if current < previous, continueGo to Statue
- Selection Sort: find smallest element, and swap with first elementLeave to U4
- Merge Sort: divide arrays in half, sort halves, merge backGerman Music Hall
- Quick Sort: pick “random” element, divide array so that smaller numbers left “random”, and great numbers rightTrump Hotel
- Radix Sort: for ints, group digits of numbersIce-Skating Rink
- BST: in sorted array, compare x to mid, left if lower, right if higher
District 3: Stacks and Queues (Landstrasse )
S7
- Stack: dishes - : LIFO - pop (get first item), push (add item to top)Billa
- Queue: queue - : FIFO - add (add last item), remove (delete first item)
District 4: LinkedList, Stack and Queue (Belvedere)
Belevedere
- LinkedList: represents sequence of nodes (singly or doubly linked)Russian Monument
- class Node: val, next;French Embassy
- class Node runner technique: Have two runners, one much faster
District 5: Math & Logic (Paul’s home)
Paul's old flat
- Prime numbers: iterate from 2 to sqrt(n)Pilgramgasse
- Probability of A&B: P(A and B) = P(B given A) * P(A) (Buenos Aires Airport)Unemployment Center
- Probability of A or B: P(A) + P(B) - : P (A and B)Naschmarkt Foods
- Independence: one happening tells you nothing about the otherNaschmarkt Restaurants
- Mutual exclusivity: if one happens, the other cannot happen
District 6: Recursion (Chili & Tofu)
Chili & Tofu
- Recursion: solutions are build off smaller solutionsKaro flat up
- Bottom-up: solve problems for simple caseKaro flat down
- Top-down: divide problem for case N into subproblemsGumpendorfer Stop
- Half-half: divide data in half, binary searchAIDS Hilfe
- Dynamic programming: cache repeated calls from recursive algos
District 7: Trees, Heaps and Graphs (Ring)
Ring
- Tree - data structure composed of nodesVolkstheater
- Every tree has a root nodeStiftsgasse
- With zero or more child nodesSiebenstern
- Tree cannot contain cyclesNeubaugasse
- A binary tree’s nodes have each up to two children, ternary trees have threeZieglergase
- A node is a leaf if it has no children-
Kaiserstrasse
- Binary search tree: has all left descendants <= n < all right descendants Felzl
- Balanched vs. unbalanced: red-black vs AVL trees.Billa
- Complete binary tree: every level (except perhaps last) is fully filledDenn's
- Full binary tree: no nodes have only one child (0 or 2)-
Prosi
- Perfect binary tree: full & complete Burggasse
: Traversal: in-order : left > current node > right branch-> Heiligenstadt
pre-order: current node before children-> Schoenbrunn
post-order: root node is last visited
Polizei
: Min-Heap: complete binary tree & each node is smaller than its children -> root is minimum55/3
: Graphs: Collection of nodes with edgesBell
: Directed or undirectedEntrance
: Acyclic graph has no circleDoor (to trashes)
: Adjacency list: every vertex (node) stores its adjacent verticesTrash
: Adjacency matrix: NxN bool matrixSteps
: Graph search: DFS: start at root, and explore each node completely > visit every nodePostal boxes
: BFS: start at root, and explore each neighbor completely > find shortest pathBell 2
: Bidirectional search: using to BFS to find the shortest path
District 8: Testing (Fethi)
Fethi
- Interviewer looks for:Felzl
- Test cases: Normal, extreme, null/illegal, strangeHummel Cafe
- Big picture: Amazon’s priority is not to charge twiceHammerling-Park
- Piece integration: Google sheets integration with mailVolkskundemuseum
- Organization: Break into categories-
Altes AKH
- Practicality: Create a testing plan Tunnel
- Real world object - : Paperclip:- Who will use it? Teacher
- What are the use cases? exams of students
- What are the usage bands? temperature, 30 pages
- Stress / failure condition? 60 + pages
-
How to perform testing? Can’t sit 5 years
Polizei Fuhrmann
- Software - : Parental Control:- Who will use it? Children, parents
- What are the use cases? blocking porn, gambling
- What are the usage bands? Block more
- Stress / failure condition? Child has access
-
How to perform testing? Black or Whitebox, break into components
Theater Josefstadt
- Troubleshooting:- Understand scenario: how long, which version, consistently, error report
- Problem breakdown: break into testable steps
- Create specific tests: think of walking a customer through
District 10: Arrays & Strings (Train Station)
Bahnhof
- Hash tables: key mod 13Erste Bank
- Array list: size is defined upon creationTennis Arsenal
- Stringbuilder: creates resizable array of small strings
District 13: O-Notation (Grig’s flat)
Elevator
- Runtime efforts: logN: binary searchDoor
- NlogNLiving room
- NKitchen
- N^2Toilet
- 2^N, recursion if f(n-1) + f(n-2)Children
- O(branches ^ depth)Hallway
Cost: big O (upper limit)Grigs room
- big Omega (lower limit)Bathroom
- big Theta (bands)Police in 13th
- Rules: Drop constant: O(2N) > O(N)Billa
- Drop non-dominant terms: O(N^2 + N) > O(N^2)Spar
- Multipart algos: Add runtimes: for a in a, for b in b > O(A+B) > O(N)Kebab
- Multiply runtimes: for a (for b in b) in a > O(A * B) > O(N^2)
District 14: Pull Requests (Haralds Home)
Entrance
- Fork projectKitchen
- Clone locallyEating
- Create a branchCouch
- Make changesGrass
- Push back to repoOffice
- Click ‘compare & pull request’
District 19: Threads & Locks
U4 Heiligenstadt
- Threads: in a process share same memory spaceBakery
- Sync: restrict access to shared resourcesBilla
- Locks: sync access to data by associating resources with a lockRBI
- Deadlock: one threat waits for object lock to be released while others do the sameRSG Main
- conditions: mutual exclusion: there is limited access to one resourcePosch New
- hold & wait: processing holding resources can request morePosch Old
- no preemption: one process cannot function without the otherMensa
- circular wait: two ore more processes form a circular chain
Additional Concepts
- Difference between RDD and DataFrame
-
RDD (resilient distributed data set): each record is an independent entity/object row-wise -
Data frame (DF): perform transformations on columns column-wise
-
- Spark:
- Analytics factory: gains capacity by scaling horizontally
- Instructed by a master, executors perform the work
- Cluster manager plans the capacity when receiving the driver programme
- Lazy evaluation: Spark only works when it needs to report results .show()