Database Fundamentals

Essential database concepts, SQL, and data modeling basics

Overview

A database is an organized collection of structured data stored electronically in a computer system. Database management systems (DBMS) provide interfaces for storing, retrieving, and managing data efficiently while ensuring data integrity, security, and concurrent access.

Database Types

Relational Databases (RDBMS)

Relational databases organize data into tables with rows and columns, using structured query language (SQL) for data manipulation.

Characteristics:

Fixed schema with predefined structure
ACID compliance (Atomicity, Consistency, Isolation, Durability)
Strong consistency guarantees
Relationships between tables via foreign keys

Popular Systems:

PostgreSQL
MySQL
Oracle Database
Microsoft SQL Server
SQLite

NoSQL Databases

NoSQL databases provide flexible schemas and are optimized for specific data models and access patterns.

Document Stores:

Store data as documents (usually JSON/BSON)
Examples: MongoDB, CouchDB, Amazon DocumentDB

Key-Value Stores:

Simple key-value pair storage
Examples: Redis, Amazon DynamoDB, etcd

Column-Family Stores:

Store data in column families
Examples: Apache Cassandra, HBase, Amazon Keyspaces

Graph Databases:

Optimize for relationships between data
Examples: Neo4j, Amazon Neptune, ArangoDB

Core Concepts

Tables and Relations

In relational databases, data is organized into tables:

Table: Collection of related data entries
Row (Record): Single data entry
Column (Attribute): Property of data
Primary Key: Unique identifier for each row
Foreign Key: Reference to primary key in another table

ACID Properties

Atomicity: Transactions complete fully or not at all
Consistency: Data remains valid according to defined rules
Isolation: Concurrent transactions don’t interfere
Durability: Committed data persists through system failures

CAP Theorem

Distributed systems can guarantee only two of:

Consistency: All nodes see the same data
Availability: System remains operational
Partition Tolerance: System continues during network failures

SQL Fundamentals

Data Definition Language (DDL)

-- Create table
CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Alter table
ALTER TABLE customers ADD COLUMN phone VARCHAR(20);

-- Drop table
DROP TABLE customers;

Data Manipulation Language (DML)

-- Insert data
INSERT INTO customers (name, email) 
VALUES ('John Doe', 'john@example.com');

-- Select data
SELECT * FROM customers WHERE email LIKE '%@example.com';

-- Update data
UPDATE customers SET phone = '123-456-7890' WHERE id = 1;

-- Delete data
DELETE FROM customers WHERE created_at < '2023-01-01';

Joins

-- Inner join
SELECT orders.id, customers.name, orders.total
FROM orders
INNER JOIN customers ON orders.customer_id = customers.id;

-- Left join
SELECT customers.name, COUNT(orders.id) as order_count
FROM customers
LEFT JOIN orders ON customers.id = orders.customer_id
GROUP BY customers.id;

Indexing

Indexes improve query performance by creating data structures for faster lookups.

Index Types

B-Tree: Default, good for range queries
Hash: Fast equality comparisons
Bitmap: Efficient for low-cardinality columns
Full-text: Text search optimization

-- Create index
CREATE INDEX idx_email ON customers(email);

-- Composite index
CREATE INDEX idx_name_email ON customers(name, email);

-- Unique index
CREATE UNIQUE INDEX idx_unique_email ON customers(email);

Transactions

Transactions group multiple operations into atomic units:

BEGIN TRANSACTION;

UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

-- Commit if successful
COMMIT;

-- Or rollback on error
ROLLBACK;

Normalization

Database normalization reduces redundancy and improves data integrity:

Normal Forms

1NF: Atomic values, no repeating groups
2NF: 1NF + no partial dependencies
3NF: 2NF + no transitive dependencies
BCNF: 3NF + every determinant is a candidate key

Example Normalization

-- Denormalized
CREATE TABLE orders (
    id INTEGER,
    customer_name VARCHAR(100),
    customer_email VARCHAR(255),
    product_name VARCHAR(100),
    product_price DECIMAL(10,2)
);

-- Normalized
CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(255)
);

CREATE TABLE products (
    id INTEGER PRIMARY KEY,
    name VARCHAR(100),
    price DECIMAL(10,2)
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER
);

Query Optimization

Explain Plans

EXPLAIN SELECT * FROM orders WHERE customer_id = 123;

Optimization Techniques

Use appropriate indexes
Avoid SELECT *
Limit result sets
Use prepared statements
Denormalize for read-heavy workloads
Partition large tables

NoSQL Data Modeling

Document Model (MongoDB)

{
    "_id": ObjectId("..."),
    "name": "John Doe",
    "email": "john@example.com",
    "orders": [
        {
            "id": 1,
            "items": ["product1", "product2"],
            "total": 99.99
        }
    ]
}

Key-Value Model (Redis)

SET user:1000:name "John Doe"
SET user:1000:email "john@example.com"
HSET user:1000:settings theme "dark" language "en"