🗄️ Database — Complete Basic Notes¶

Target Audience: Non-technical students who want to understand what databases are, why they exist, and how they work — explained in plain, everyday language.

1. The Three Core Concepts: Data, Database, and DBMS¶

Most beginners confuse these three terms and use them interchangeably. They are related but they are not the same thing. Understanding the difference between them is the very first step.

1.1 Data¶

Data is a collection of raw, unprocessed facts about any object, person, event, or thing.

Data on its own has no meaning. It is just facts.
It may or may not be relevant to you.
Data can be anything: numbers, text, images, dates, symbols.

Examples of raw data:

25, "Alice", "01-01-2000", 98.6, "Rawalpindi", true, 7734

These numbers and words alone tell you nothing useful. They are just raw facts floating in isolation.

Think of data like individual puzzle pieces scattered on a table. Each piece exists, but it does not show you the full picture yet.

1.2 Information¶

Information is data that has been processed, organized, or given context so that it becomes meaningful and relevant.

Information is always relevant to a specific question or purpose.
It is derived from data by adding context.
Not all data becomes information — only the data that is relevant to your query.

Example: Suppose a hospital database has the following raw data:

"Alice", 25, "01-01-2000", "Rawalpindi", 98.6, "Flu", "Dr. Ahmed", "2024-03-15"

Now, if the doctor asks: "What is Alice's age and diagnosis?" - Information (relevant): Alice is 25 years old and was diagnosed with Flu on 2024-03-15. - Data (not relevant to this query): Her city (Rawalpindi) and body temperature (98.6) are still data — they exist but are not part of the answer to this specific question.

Think of information like a completed puzzle section — you have selected and arranged only the relevant pieces to reveal a meaningful picture.

1.3 The Relationship: Data → Information¶

Raw Facts (Data)
       |
       |  processing + context + relevance
       ↓
Meaningful Output (Information)

	Data	Information
Definition	Raw, unprocessed facts	Processed, meaningful, relevant data
Meaning	Has no meaning on its own	Has clear meaning and context
Relevance	May or may not be relevant	Always relevant to a specific purpose
Example	`25`, `"Alice"`, `"Flu"`	"Alice, aged 25, was diagnosed with Flu"
Dependency	Independent	Derived from data

1.4 Database (DB)¶

A database is a systematic, organized collection of related data stored so it can be easily accessed, managed, searched, and updated.

The keyword is systematic — data is not just dumped randomly; it is structured with a purpose.
A database stores data in a way that allows meaningful information to be retrieved from it.

Real-life examples: - Your phone's contact list is a tiny database (name, number, email stored together). - A school's student records system is a database. - Netflix uses a database to store every movie, user account, and watch history. - Amazon uses a database to store every product, price, and order.

Think of a database like a well-organized library — thousands of books (data) are arranged systematically so you can find what you need (information) quickly.

1.5 DBMS — Database Management System¶

A DBMS (Database Management System) is a software application that enables users to access, create, manage, and manipulate a database.

You never interact with the raw data files directly.
The DBMS sits between the user/application and the stored data, acting as an intermediary.
It handles security, queries, updates, backups, and multi-user access.

         You / Application
               |
               |  (sends a request, e.g., "Find all orders from today")
               ↓
   +---------------------------+
   |          DBMS             |  <-- This is the SOFTWARE (e.g., MySQL, PostgreSQL)
   |  (processes the request)  |
   +---------------------------+
               |
               ↓
   Raw Data stored on Disk / Memory
               |
               ↓
   Result returned to You / Application

What a DBMS does: - Allows users to create databases and tables - Allows users to read, insert, update, delete data - Enforces security (who can access what) - Manages concurrent access (multiple users at once) - Handles backups and recovery

Examples of DBMS Software¶

DBMS Software	Type	Used By
MySQL	Relational	WordPress, web apps, startups
PostgreSQL	Relational	Large-scale apps, analytics
Oracle DB	Relational	Banks, large enterprises
Microsoft SQL Server	Relational	Microsoft ecosystem, corporate apps
MongoDB	NoSQL (Document)	Real-time apps, content platforms
Redis	NoSQL (Key-Value)	Caching, sessions, leaderboards
Cassandra	NoSQL (Column)	IoT, time-series, social media feeds
Neo4j	NoSQL (Graph)	Social networks, fraud detection

Common Confusion: Many beginners think MySQL or MongoDB is the database itself. It is not. MySQL is the DBMS — the software engine. The actual database is the collection of data it manages. MySQL is the tool; your data is the database.

2. Why Do We Need Databases?¶

Without a database, data would be stored in loose files (like plain text or Excel sheets), which causes problems:

Problem (Without DB)	Solution (With DB)
Hard to find specific data	Fast search and query
Data gets duplicated	Data is stored once, referenced everywhere
No security	Role-based access control
Hard to update	Update in one place, reflects everywhere
No multi-user support	Multiple users can access simultaneously

3. Key Terminology¶

Term	Plain English Meaning	Example
Data	Raw facts and figures	`25`, `"Alice"`, `"alice@email.com"`
Information	Relevant, processed data with meaning	"Alice is 25 years old"
Database (DB)	Systematic collection of related data	A school's entire student record system
DBMS	Software that manages the database	MySQL, PostgreSQL, MongoDB
Table	A grid of rows and columns (like Excel)	A `Students` table
Row / Record / Tuple	One complete entry in a table	One student's full details
Column / Field / Attribute	A single category of data	Student's `Name`, `Age`, `Grade`
Primary Key	A unique ID for every row	Student ID: `S001`
Foreign Key	A reference to another table's primary key	`CourseID` inside a `Students` table
Query	A request/question you ask the database	"Show me all students above 18"
Schema	The blueprint/structure of the database	Defining what columns a table has
Index	A shortcut for faster searching	Like a book index — helps find data quickly
Tuple	Another name for a complete row/record	The entire row: `(S001, Alice, 20, C01)`
Attribute	Another name for a column/field	`Age`, `Name`, `ClassID`

4. Relational vs. Non-Relational Databases (In Depth)¶

This is one of the most important distinctions in database technology. Let's go deep.

4.1 Relational Databases (SQL Databases)¶

A relational database stores data in structured tables made of rows and columns. The tables are related to each other through keys.

Key Characteristics¶

Feature	Detail
Structure	Data stored in tables (rows and columns)
Row	Called a Record or Tuple — one complete entry
Column	Called a Field or Attribute — one category of data
Schema	Pre-defined and fixed — you must define the structure before inserting data
All fields must be filled	Every row must provide a value for every column (or explicitly mark it NULL)
Primary Key	Every table must have a Primary Key — a unique identifier for each row
Relationships	Tables are related to each other using Foreign Keys
Query Language	SQL (Structured Query Language) — used across ALL relational databases
Scaling	Primarily Vertical Scaling (scale up: get a bigger, more powerful server)
Horizontal Scaling	Possible but very difficult and complex to implement
Use Case	OLTP — Online Transaction Processing (banking, orders, bookings)

What is OLTP?¶

OLTP (Online Transaction Processing) means the database handles a very large number of short, real-time transactions — like: - A customer placing an order on Amazon - A student registering for a course - A bank processing a money transfer

Relational databases are perfectly built for this because every transaction must be accurate, complete, and reliable.

Example — A School Relational Database¶

Students Table: | StudentID (PK) | Name | Age | ClassID (FK) | |----------------|-------|-----|--------------| | S001 | Alice | 20 | C01 | | S002 | Bob | 22 | C02 |

Classes Table: | ClassID (PK) | ClassName | |--------------|-------------| | C01 | Mathematics | | C02 | Physics |

Each row is a Tuple (e.g., the entire row S001, Alice, 20, C01 is one tuple)
Each column is an Attribute (e.g., Name, Age)
StudentID is the Primary Key of the Students table
ClassID in the Students table is a Foreign Key referencing the Classes table
The two tables are related — that's why it's called a relational database

Relational Database Examples & AWS Services¶

The following are the major relational database engines. Amazon offers most of them under a single managed cloud service:

Database Engine	Notes
MySQL	Most popular open-source relational DB
PostgreSQL	Advanced open-source, highly reliable
MariaDB	Fork of MySQL, open-source
Oracle DB	Enterprise-grade, used in banking
IBM DB2	Enterprise use, large corporations
Microsoft SQL Server	Microsoft ecosystem, corporate apps

AWS Cloud Services for Relational Databases:

AWS Service	What It Is
Amazon RDS	Managed service that runs MySQL, PostgreSQL, MariaDB, Oracle, IBM DB2, and Microsoft SQL Server — all six engines under one umbrella
Amazon Aurora	AWS's own cloud-native relational DB engine, compatible with MySQL and PostgreSQL, but faster and more scalable

RDS = Relational Database Service. It does not create its own engine — it manages the existing six engines for you in the cloud so you don't have to set up servers manually.

4.2 Non-Relational Databases (NoSQL Databases)¶

NoSQL stands for "Not Only SQL". These databases do not use fixed tables. They store data in flexible, varied formats suited for modern, high-speed applications.

Key Characteristics¶

Feature	Detail
Structure	No fixed, pre-defined structure (no rigid schema)
Schema	Schema-less — you can add new fields anytime without altering the whole database
Flexibility	Each record/document can have different fields from others
Tables	Tables/collections are generally independent of each other (no joins)
Scaling	Designed for Horizontal Scaling (add more servers instead of a bigger server)
Speed	Faster reads/writes because of independence and simpler structure
Hardware Cost	Lower — because you add many cheap servers instead of one expensive server
Query Language	No universal language — each NoSQL database has its own query method

Why No Universal Query Language?¶

In relational databases, SQL works for all of them because all of them share the same table structure. In NoSQL, each type stores data differently (documents, graphs, key-value pairs, columns) — so each one has its own way of querying data:

NoSQL Database	Its Own Query Method
MongoDB	MongoDB Query Language (MQL) using JSON-like syntax
Redis	Redis commands (`GET`, `SET`, `HGET`, etc.)
Cassandra	CQL — Cassandra Query Language (SQL-like but not SQL)
Neo4j	Cypher Query Language
DynamoDB	AWS SDK / PartiQL

4.3 Relational vs. Non-Relational — Side-by-Side¶

Feature	Relational (SQL)	Non-Relational (NoSQL)
Data Format	Tables (rows & columns)	Documents, key-value, graphs, columns
Schema	Fixed, pre-defined (rigid)	Flexible, schema-less
Row name	Tuple	Document / Record / Node (varies)
Column name	Attribute / Field	Field / Key (varies)
Relationships	Tables are related via Foreign Keys	Collections/tables are independent
All fields required	Yes (or explicitly NULL)	No — each entry can have different fields
Primary Key	Required in every table	Optional (varies by DB)
Query Language	SQL (same for all)	Each DB has its own language
Scaling	Vertical (harder to scale out)	Horizontal (easy to scale out)
Best for	OLTP, financial systems, ERP	Real-time apps, big data, social media
Examples	MySQL, PostgreSQL, Oracle	MongoDB, Redis, Cassandra, Neo4j

5. NoSQL Database Types — Deep Dive¶

NoSQL is not one single type of database. It is a family of four major types, each designed for a different kind of data.

5.1 Columnar Database (Column-Family Database)¶

How it works: Instead of storing data row by row (like relational databases), columnar databases store data column by column. This means when you search or analyze data, you only read the columns you need — not entire rows. This makes it dramatically faster for analytical queries.

Structure:

Row Key  |  Name    |  Age  |  City
---------|----------|-------|----------
R001     |  Alice   |  20   |  Karachi
R002     |  Bob     |  22   |  Lahore

In a columnar DB, the Age column is stored together physically on disk: [20, 22, ...]. So if you want the average age of all users, it reads just that column — not every row.

Why it is fast: - You read only the columns relevant to your query - Excellent compression (similar values stored together) - Designed to handle millions to billions of rows

Real-world use cases: - IoT sensor data (millions of readings per second) - Time-series data (stock prices over time) - Large-scale analytics (clickstream data, user behavior tracking) - Social media feeds with massive write throughput

Examples: Apache Cassandra, Apache HBase

AWS Equivalent: Amazon Keyspaces (Managed Cassandra-compatible service)

5.2 Document Database¶

How it works: Data is stored as documents — typically in JSON or BSON format. Each document is a self-contained unit that can store rich, nested, complex data including arrays and sub-objects. Unlike relational tables, documents in the same collection can have completely different fields.

Structure (JSON document):

{
  "userID": "U001",
  "name": "Alice",
  "email": "alice@example.com",
  "address": {
    "city": "Rawalpindi",
    "country": "Pakistan"
  },
  "orders": ["ORD001", "ORD002", "ORD005"]
}

Another document in the same collection can have completely different fields:

{
  "userID": "U002",
  "name": "Bob",
  "phone": "0300-1234567"
}

Why it is useful: - Stores long, complex, nested data naturally - No need to define a fixed schema upfront - Adding a new field to one document does not break others - Great for content that evolves over time

Real-world use cases: - User profiles on social media (each user may have different fields) - Product catalogs (a phone has different attributes than a shirt) - Blog posts, articles, CMS systems - E-commerce catalogs with varying product types

Examples: MongoDB, CouchDB, Firebase Firestore

AWS Equivalent: Amazon DocumentDB (MongoDB-compatible managed service)

5.3 Key-Value Database¶

How it works: This is the simplest type of NoSQL database. Data is stored as pairs of key → value, just like a dictionary or a locker system. You give a key, you get the value. That's it.

Structure:

Key                  | Value
---------------------|----------------------------------
"user:101"           | "Alice"
"session:abc123"     | "{logged_in: true, role: admin}"
"cart:U001"          | "[item1, item2, item3]"
"page:homepage"      | "<html>...cached HTML...</html>"

Why it is fast: - Lookup by key is O(1) — almost instant, regardless of data size - No complex queries, no joins, no schema overhead - Often stored entirely in memory (RAM) for maximum speed

Real-world use cases: - Session management (store logged-in user's session data) - Caching (store frequently accessed pages/results so the app doesn't recompute them) - Leaderboards (real-time game scores) - Shopping cart (temporary cart data before checkout) - Feature flags (on/off switches for app features)

Examples: Redis, Memcached

AWS Equivalents: - Amazon ElastiCache (managed Redis / Memcached) - Amazon DynamoDB (also a key-value store, but more powerful — supports document model too)

5.4 Graph Database¶

How it works: A graph database stores data as a network of nodes and edges: - Node = an entity (a person, a city, a product, a webpage) - Edge = a relationship between two nodes (FRIENDS_WITH, LOCATED_IN, BOUGHT, LINKS_TO) - Properties = attributes attached to nodes or edges

Structure:

(Alice) --[FRIENDS_WITH]--> (Bob)
(Bob)   --[LIVES_IN]------> (Lahore)
(Alice) --[BOUGHT]---------> (iPhone 15)
(BBC)   --[REPORTED_ON]----> (Climate Change)
(Climate Change) --[AFFECTS]--> (Pakistan)

Why it is powerful: - Relationships between data are first-class citizens — stored explicitly, not computed via joins - Traversing relationships ("who are Alice's friends' friends?") is extremely fast - Naturally models complex, interconnected real-world data

Real-world use cases: - Social networks (Facebook's friend recommendations, LinkedIn's "People You May Know") - Fraud detection (finding suspicious patterns of transactions between accounts) - Knowledge graphs (BBC uses a graph database to link news stories, topics, people, and places — e.g., a weather report linking regions, events, and impacts) - Recommendation engines ("Users who bought X also bought Y") - Network and IT infrastructure mapping (how servers connect to each other) - Drug discovery (relationships between molecules, diseases, and proteins)

Examples: Neo4j, Amazon Neptune, ArangoDB

AWS Equivalent: Amazon Neptune (fully managed graph database supporting both Property Graph and RDF)

5.5 NoSQL Types Summary¶

Type	How Data is Stored	Query Style	Real-World Example	AWS Service
Columnar	By column (not row)	CQL / column-scan	IoT sensor data, analytics	Amazon Keyspaces
Document	JSON/BSON documents	MQL (MongoDB-style)	User profiles, product catalogs	Amazon DocumentDB
Key-Value	Key → Value pairs	GET/SET commands	Sessions, caching, carts	Amazon ElastiCache / DynamoDB
Graph	Nodes + Edges (relationships)	Cypher / Gremlin	Social networks, fraud detection	Amazon Neptune

6. What is SQL?¶

SQL (Structured Query Language) is the standard language used to communicate with all relational databases. It is not a programming language — it is a query language designed specifically for data.

Common SQL Commands (CRUD)¶

Command	What it does	Example
`SELECT`	Read/fetch data	`SELECT * FROM Students;`
`INSERT`	Add new data	`INSERT INTO Students VALUES (...)`
`UPDATE`	Modify existing data	`UPDATE Students SET Age=21 WHERE ID='S001'`
`DELETE`	Remove data	`DELETE FROM Students WHERE ID='S001'`
`CREATE TABLE`	Create a new table	`CREATE TABLE Students (...)`
`DROP TABLE`	Delete a table permanently	`DROP TABLE Students;`

These four core operations — Create, Read, Update, Delete — are called CRUD and are fundamental to every database application.

7. ACID Properties (Data Reliability)¶

For a database to be reliable and trustworthy (especially for banking, healthcare), it must follow ACID rules:

Property	Meaning	Simple Example
Atomicity	All steps of a transaction succeed, or none do	Bank transfer: debit AND credit must both happen
Consistency	Data always moves from one valid state to another	Balance can't go negative (if rule says so)
Isolation	Two transactions don't interfere with each other	Two people booking the last seat simultaneously
Durability	Once saved, data stays saved even after crashes	After a power cut, your transfer is still recorded

8. Database vs. Spreadsheet¶

	Spreadsheet (Excel)	Database (MySQL, etc.)
Best for	Small, personal data	Large, multi-user data
Multi-user	Limited	Yes, thousands simultaneously
Relationships	Manual (copy-paste)	Built-in (foreign keys)
Speed	Slow with large data	Fast even with millions of rows
Security	File-level only	Row/column level permissions
Automation	Macros (limited)	Full programmatic access

9. Real-World Database Examples¶

Application	Database Used For
Facebook / Instagram	Storing user profiles, posts, likes, comments
YouTube	Video metadata, comments, watch history
Uber	Drivers, riders, trips, real-time locations
WhatsApp	Messages, contacts, media metadata
Online Banking	Accounts, transactions, balances
Hospital System	Patient records, prescriptions, appointments
E-commerce (Amazon)	Products, inventory, orders, customers
BBC News	Knowledge graph linking stories, people, places, topics

10. Summary of Key Concepts¶

Data = raw, unprocessed facts about anything; may or may not be relevant.
Information = data that has been given context and is relevant to a specific purpose.
Database = a systematic, organized collection of related data.
DBMS = the software (e.g., MySQL, MongoDB) that lets users access and manipulate the database.
Relational databases use fixed-schema tables, SQL, tuples (rows), attributes (columns), primary keys, and foreign keys; best for OLTP and structured data.
Non-relational (NoSQL) databases are schema-less, horizontally scalable, and each has its own query language.
The four NoSQL types are: Columnar, Document, Key-Value, and Graph — each solves a different problem.
AWS provides managed cloud services for every database type: RDS/Aurora (relational), Keyspaces (columnar), DocumentDB (document), ElastiCache/DynamoDB (key-value), Neptune (graph).

💡 Pro Tip for Beginners: Start by learning MySQL or PostgreSQL (relational) as they teach the core concepts clearly. Once comfortable, explore MongoDB (NoSQL document) to understand the difference. Then look at Redis to appreciate how simple and fast key-value stores are.