This is a collection of notes covering database fundamentals, specific database systems, and design principles.
Database Fundamentals
Types of Databases
- Relational Databases (RDBMS)
- Tables (relations) with rows (records)
- Structured Query Language (SQL)
- ACID properties
- Examples: PostgreSQL, MySQL, Oracle, SQL Server
- NoSQL Databases
- Document stores (MongoDB, CouchDB)
- Key-value stores (Redis, DynamoDB)
- Column family (Cassandra, HBase)
- Graph databases (Neo4j, Amazon Neptune)
- NewSQL Databases
- Combine SQL with NoSQL scalability
- Examples: Google Spanner, CockroachDB
ACID Properties
Relational databases guarantee ACID properties:
- Atomic: Transactions are all-or-nothing operations
- Consistency: Database remains in a consistent state
- Isolation: Multiple transactions can run concurrently but execute as if run sequentially
- Durability: Committed changes are permanent on disk
Database Design Principles
Normalization
- 1NF: Eliminate repeating groups
- 2NF: Eliminate partial dependencies
- 3NF: Eliminate transitive dependencies
- BCNF: Boyce-Codd Normal Form
Indexing
Indexes are auxiliary data structures that act like a table of contents, ordered for efficient searching and sorting.
Types of Indexes:
- Primary indexes (clustered)
- Secondary indexes (non-clustered)
- Composite indexes
- Partial indexes
- Expression indexes
Trade-offs:
- Faster queries vs. slower writes
- Storage overhead
- Maintenance cost
PostgreSQL
PostgreSQL is an advanced open-source relational database with extensive features.
Core Concepts
- Table: Named collection of rows
- Row: Record with the same set of named columns
- Column: Specific data type with fixed order
- Database: Collection of tables
- Cluster: Collection of databases managed by a single server instance
Data Types
Standard SQL Types
-- Numeric types
int, smallint, bigint
real, double precision
numeric(precision, scale)
-- Character types
char(N)
varchar(N)
text
-- Date/Time types
date
time
timestamp
timestamptz
interval
PostgreSQL-Specific Types
-- Arrays
integer[]
text[]
-- JSON
json
jsonb
-- Network types
inet
cidr
macaddr
-- UUID
uuid
-- Full-text search
tsvector
tsquery
Database Operations
Database Management
# Create database
createdb mydb
# Drop database
dropdb mydb
# Connect to database
psql -U username -d database_name
Table Operations
-- Create table
CREATE TABLE weather (
city varchar(80),
temp_lo int,
temp_hi int,
prcp real,
date date
);
-- Drop table
DROP TABLE weather;
-- Alter table
ALTER TABLE weather ADD COLUMN humidity int;
ALTER TABLE weather DROP COLUMN humidity;
Data Manipulation
-- Insert data
INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
-- Bulk load from file
COPY weather FROM '/path/to/weather.txt';
-- Update data
UPDATE weather
SET temp_hi = 55
WHERE city = 'San Francisco' AND date = '1994-11-27';
-- Delete data
DELETE FROM weather WHERE city = 'San Francisco';
Querying Data
Basic Queries
-- Select all columns
SELECT * FROM weather;
-- Select specific columns
SELECT city, temp_lo, temp_hi, prcp, date FROM weather;
-- Calculated columns
SELECT city, (temp_hi + temp_lo) / 2 AS temp_avg, date FROM weather;
-- Filtering
SELECT * FROM weather
WHERE city = 'San Francisco' AND prcp > 0.0;
-- Sorting
SELECT * FROM weather ORDER BY city;
SELECT * FROM weather ORDER BY city, temp_lo;
-- Distinct values
SELECT DISTINCT city FROM weather ORDER BY city;
Joins
-- Inner join
SELECT *
FROM weather
INNER JOIN cities ON (weather.city = cities.name);
-- Left join
SELECT w.city, w.temp_lo, c.location
FROM weather w
LEFT JOIN cities c ON (w.city = c.name);
-- Self join
SELECT w1.city, w1.date, w2.date
FROM weather w1, weather w2
WHERE w1.temp_lo < w2.temp_lo AND w1.city = w2.city;
Aggregate Functions
-- Basic aggregates
SELECT max(temp_lo) FROM weather;
SELECT min(temp_hi) FROM weather;
SELECT avg(temp_lo) FROM weather;
SELECT count(*) FROM weather;
-- Group by
SELECT city, max(temp_lo)
FROM weather
GROUP BY city;
-- Having clause
SELECT city, max(temp_lo)
FROM weather
GROUP BY city
HAVING max(temp_lo) < 40;
Advanced PostgreSQL Features
Window Functions
SELECT city, temp_lo,
avg(temp_lo) OVER (PARTITION BY city) as city_avg,
row_number() OVER (ORDER BY temp_lo DESC) as rank
FROM weather;
Common Table Expressions (CTEs)
WITH regional_sales AS (
SELECT region, SUM(sales_amount) as total_sales
FROM sales_data
GROUP BY region
)
SELECT region, total_sales
FROM regional_sales
WHERE total_sales > 1000000;
JSON Operations
-- JSON column operations
SELECT data->>'name' as name, data->'age' as age
FROM users
WHERE data->>'city' = 'San Francisco';
-- JSON aggregation
SELECT json_agg(json_build_object('name', name, 'age', age))
FROM users;
PostgreSQL Administration
psql Commands
# Connect to database
psql -U postgres
# List databases
\l
# List roles/users
\du
# Connect to database
\c database_name
# List tables
\dt
# Describe table
\d table_name
# Show table data
\x # Toggle expanded display
Performance Tuning
-- Analyze query performance
EXPLAIN ANALYZE SELECT * FROM large_table WHERE indexed_column = 'value';
-- Create indexes
CREATE INDEX idx_weather_city ON weather(city);
CREATE INDEX idx_weather_date_temp ON weather(date, temp_lo);
-- Vacuum and analyze
VACUUM ANALYZE weather;
MongoDB
MongoDB is a document-oriented NoSQL database that stores data in flexible, JSON-like documents.
Core Concepts
- Document: JSON-like record (BSON format)
- Collection: Group of related documents (like a table)
- Database: Container for collections
- Field: Key-value pair within a document
Connection and Basic Operations
Connection String
mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]
# Examples
mongodb://localhost/people
mongodb://user:pass@cluster.mongodb.net/myapp
Database Operations
// Show current database
db
// Switch database
use mydb
// Show collections
show collections
// Show databases
show dbs
Document Operations (CRUD)
Create (Insert)
// Insert single document
mo = { name: 'mo', age: 30, city: 'Calgary' }
db.people.insert(mo)
// Insert multiple documents
db.people.insertMany([
{ name: 'alice', age: 25, city: 'Vancouver' },
{ name: 'bob', age: 35, city: 'Toronto' }
])
// Insert with specific _id
db.people.insert({ _id: 'user001', name: 'charlie', age: 28 })
Read (Query)
// Find all documents
db.people.find()
// Find with criteria
db.people.find({ name: 'mo' })
db.people.find({ age: { $gt: 25 } })
// Find one document
db.people.findOne({ name: 'alice' })
// Projection (select specific fields)
db.people.find({}, { name: 1, age: 1, _id: 0 })
// Limit results
db.people.find().limit(5)
// Sort results
db.people.find().sort({ age: 1 }) // 1 = ascending, -1 = descending
// Complex query
db.people.find({
age: { $gte: 25, $lt: 40 },
city: { $in: ['Calgary', 'Vancouver'] }
}).sort({ age: -1 })
Update
// Update single document
db.people.updateOne(
{ name: 'mo' },
{ $set: { age: 31, city: 'Edmonton' } }
)
// Update multiple documents
db.people.updateMany(
{ city: 'Calgary' },
{ $inc: { age: 1 } }
)
// Upsert (insert if not exists)
db.people.updateOne(
{ name: 'david' },
{ $set: { age: 29, city: 'Montreal' } },
{ upsert: true }
)
// Replace document
db.people.replaceOne(
{ name: 'mo' },
{ name: 'mo', age: 32, city: 'Calgary', occupation: 'developer' }
)
Delete
// Delete single document
db.people.deleteOne({ name: 'bob' })
// Delete multiple documents
db.people.deleteMany({ age: { $lt: 25 } })
// Delete all documents in collection
db.people.deleteMany({})
Advanced MongoDB Operations
Aggregation Pipeline
// Group and calculate
db.sales.aggregate([
{ $match: { status: 'completed' } },
{ $group: {
_id: '$product',
total: { $sum: '$amount' },
count: { $sum: 1 }
}},
{ $sort: { total: -1 } }
])
// Unwind arrays
db.orders.aggregate([
{ $unwind: '$items' },
{ $group: {
_id: '$items.product',
totalQuantity: { $sum: '$items.quantity' }
}}
])
Indexing
// Create index
db.people.createIndex({ name: 1 })
db.people.createIndex({ age: 1, city: 1 }) // Compound index
// List indexes
db.people.getIndexes()
// Drop index
db.people.dropIndex({ name: 1 })
// Text index for search
db.articles.createIndex({ title: 'text', content: 'text' })
db.articles.find({ $text: { $search: 'mongodb tutorial' } })
MongoDB Configuration
.mongorc.js
Create a .mongorc.js file for custom functions:
// Custom helper functions
function findByAge(minAge, maxAge) {
return db.people.find({
age: { $gte: minAge, $lte: maxAge }
}).pretty()
}
function countByCity() {
return db.people.aggregate([
{ $group: { _id: '$city', count: { $sum: 1 } } },
{ $sort: { count: -1 } }
])
}
Database Design Patterns
Relational Patterns
One-to-Many Relationship
-- Users table
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE,
email VARCHAR(100)
);
-- Posts table
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
title VARCHAR(200),
content TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Many-to-Many Relationship
-- Junction table for users and roles
CREATE TABLE user_roles (
user_id INTEGER REFERENCES users(id),
role_id INTEGER REFERENCES roles(id),
PRIMARY KEY (user_id, role_id)
);
NoSQL Patterns
Embedded Documents
// User with embedded address
{
_id: ObjectId("..."),
name: "John Doe",
email: "john@example.com",
address: {
street: "123 Main St",
city: "Calgary",
province: "AB",
postal_code: "T2P 1A1"
}
}
Referenced Documents
// Separate collections with references
// Users collection
{
_id: ObjectId("user123"),
name: "John Doe",
email: "john@example.com"
}
// Posts collection
{
_id: ObjectId("post456"),
user_id: ObjectId("user123"),
title: "My Blog Post",
content: "...",
created_at: ISODate("2023-01-01")
}
Database Performance Optimization
Query Optimization
PostgreSQL
-- Use EXPLAIN to analyze queries
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.order_date > '2023-01-01';
-- Create appropriate indexes
CREATE INDEX idx_orders_date ON orders(order_date);
CREATE INDEX idx_orders_customer ON orders(customer_id);
MongoDB
// Use explain() for query analysis
db.orders.find({ order_date: { $gt: ISODate("2023-01-01") } })
.explain("executionStats")
// Create supporting indexes
db.orders.createIndex({ order_date: 1 })
db.orders.createIndex({ customer_id: 1, order_date: 1 })
Database Monitoring
Key Metrics
- Query execution time
- Index usage
- Connection pool status
- Disk I/O
- Memory usage
- Lock contention
PostgreSQL Monitoring
-- Active queries
SELECT pid, query, state, query_start
FROM pg_stat_activity
WHERE state = 'active';
-- Index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
-- Table statistics
SELECT schemaname, tablename, n_tup_ins, n_tup_upd, n_tup_del
FROM pg_stat_user_tables;
Best Practices
Security
- Authentication and Authorization
- Use strong passwords
- Implement role-based access control
- Principle of least privilege
- Data Protection
- Encrypt data at rest and in transit
- Regular backups
- Input validation and parameterized queries
- Network Security
- Firewall configuration
- VPN or private networks
- SSL/TLS connections
Performance
- Query Optimization
- Use appropriate indexes
- Avoid N+1 queries
- Optimize JOIN operations
- Use query analysis tools
- Schema Design
- Normalize appropriately
- Consider denormalization for read-heavy workloads
- Use appropriate data types
- Connection Management
- Use connection pooling
- Monitor connection limits
- Implement proper timeout settings
Backup and Recovery
- Backup Strategies
- Regular automated backups
- Test backup restoration
- Multiple backup locations
- Point-in-time recovery capability
- High Availability
- Master-slave replication
- Cluster configuration
- Failover procedures
- Load balancing
This guide provides a comprehensive foundation for working with database systems, covering both relational and NoSQL approaches with practical examples and best practices.