Database Indexes: What do they do? | Systems Design Interview: 0 to 1 with Google Software Engineer

TLDR;

This video discusses the internals of databases and how data is organised on disk to maximise performance. It covers the importance of data persistence on hard drives, the impact of data proximity on read speeds, and the trade-offs between read and write speeds when organising data. The video introduces the concept of database indexes as a means to improve read speeds at the cost of write speeds, setting the stage for a deeper look into index implementations in the next video.

Data should be stored persistently on hard drives for durability.
Proximity of data on disk affects read speeds; closer data is accessed faster.
Database indexes can improve read speeds but may reduce write speeds.

Introduction [0:00]

The video starts with an apology for technical issues in the previous recording. The speaker transitions into the main topic: the internal workings of a database and how data is organised on disk to optimise performance. The aim is to explore how to arrange data for fast reading and writing while ensuring persistence.

Hard Drive Basics [1:01]

The speaker outlines three key facts about data storage: data persistence, storage on a hard drive, and the hard drive's role in a database system. The physical characteristics of a hard drive are explained, emphasising that the closer data is on the disk, the faster it can be accessed due to the mechanical movement of the read/write arm. The data is stored as an array of bytes, and accessing related data that is physically close together improves speed.

Database Table Example [2:47]

To illustrate database operations, the speaker uses an example of a website storing names and shoe sizes. The video explains that finding rows based on a specific name requires checking each row individually, resulting in O(n) time complexity for both reading and writing. The speaker then introduces an alternative approach where edits are made by adding new rows instead of directly modifying existing ones.

Trade-offs in Database Design [5:13]

The video explores a design modification where, instead of editing a row directly, a new row is added with the updated information. This makes write operations O(1) because new data is always written at the end of the table. However, this increases the number of rows, worsening read times, although reads remain O(n). The speaker argues that for many large websites, optimising read speeds is more critical than optimising write speeds.

The Need for Indexes [7:24]

The speaker highlights that O(n) read times are unacceptable for large databases with millions of rows. The need for a better way to organise data is emphasised, leading to the introduction of database indexes. Indexes should facilitate finding rows with specific key values and performing range queries more efficiently than linear scanning.

Database Indexes Explained [9:31]

The video defines a database index as a mechanism that increases read speeds at the expense of write speeds for a specific key. Indexes are typically created on one field at a time, although multiple indexes can exist for different fields or combinations of fields. The next video will explore a specific index implementation, the hash index, including its benefits and drawbacks.