🗄️ Database — Complete Basic Notes

Target Audience: Non-technical students who want to understand what databases are, why they exist, and how they work — explained in plain, everyday language.


1. The Three Core Concepts: Data, Database, and DBMS

Most beginners confuse these three terms and use them interchangeably. They are related but they are not the same thing. Understanding the difference between them is the very first step.


1.1 Data

Data is a collection of raw, unprocessed facts about any object, person, event, or thing.

  • Data on its own has no meaning. It is just facts.
  • It may or may not be relevant to you.
  • Data can be anything: numbers, text, images, dates, symbols.

Examples of raw data:

25, "Alice", "01-01-2000", 98.6, "Rawalpindi", true, 7734
These numbers and words alone tell you nothing useful. They are just raw facts floating in isolation.

Think of data like individual puzzle pieces scattered on a table. Each piece exists, but it does not show you the full picture yet.


1.2 Information

Information is data that has been processed, organized, or given context so that it becomes meaningful and relevant.

  • Information is always relevant to a specific question or purpose.
  • It is derived from data by adding context.
  • Not all data becomes information — only the data that is relevant to your query.

Example: Suppose a hospital database has the following raw data:

"Alice", 25, "01-01-2000", "Rawalpindi", 98.6, "Flu", "Dr. Ahmed", "2024-03-15"

Now, if the doctor asks: "What is Alice's age and diagnosis?" - Information (relevant): Alice is 25 years old and was diagnosed with Flu on 2024-03-15. - Data (not relevant to this query): Her city (Rawalpindi) and body temperature (98.6) are still data — they exist but are not part of the answer to this specific question.

Think of information like a completed puzzle section — you have selected and arranged only the relevant pieces to reveal a meaningful picture.


1.3 The Relationship: Data → Information

Raw Facts (Data)
       |
       |  processing + context + relevance
Meaningful Output (Information)
Data Information
Definition Raw, unprocessed facts Processed, meaningful, relevant data
Meaning Has no meaning on its own Has clear meaning and context
Relevance May or may not be relevant Always relevant to a specific purpose
Example 25, "Alice", "Flu" "Alice, aged 25, was diagnosed with Flu"
Dependency Independent Derived from data

1.4 Database (DB)

A database is a systematic, organized collection of related data stored so it can be easily accessed, managed, searched, and updated.

  • The keyword is systematic — data is not just dumped randomly; it is structured with a purpose.
  • A database stores data in a way that allows meaningful information to be retrieved from it.

Real-life examples: - Your phone's contact list is a tiny database (name, number, email stored together). - A school's student records system is a database. - Netflix uses a database to store every movie, user account, and watch history. - Amazon uses a database to store every product, price, and order.

Think of a database like a well-organized library — thousands of books (data) are arranged systematically so you can find what you need (information) quickly.


1.5 DBMS — Database Management System

A DBMS (Database Management System) is a software application that enables users to access, create, manage, and manipulate a database.

  • You never interact with the raw data files directly.
  • The DBMS sits between the user/application and the stored data, acting as an intermediary.
  • It handles security, queries, updates, backups, and multi-user access.
         You / Application
               |
               |  (sends a request, e.g., "Find all orders from today")
   +---------------------------+
   |          DBMS             |  <-- This is the SOFTWARE (e.g., MySQL, PostgreSQL)
   |  (processes the request)  |
   +---------------------------+
               |
   Raw Data stored on Disk / Memory
               |
   Result returned to You / Application

What a DBMS does: - Allows users to create databases and tables - Allows users to read, insert, update, delete data - Enforces security (who can access what) - Manages concurrent access (multiple users at once) - Handles backups and recovery

Examples of DBMS Software

DBMS Software Type Used By
MySQL Relational WordPress, web apps, startups
PostgreSQL Relational Large-scale apps, analytics
Oracle DB Relational Banks, large enterprises
Microsoft SQL Server Relational Microsoft ecosystem, corporate apps
MongoDB NoSQL (Document) Real-time apps, content platforms
Redis NoSQL (Key-Value) Caching, sessions, leaderboards
Cassandra NoSQL (Column) IoT, time-series, social media feeds
Neo4j NoSQL (Graph) Social networks, fraud detection

Common Confusion: Many beginners think MySQL or MongoDB is the database itself. It is not. MySQL is the DBMS — the software engine. The actual database is the collection of data it manages. MySQL is the tool; your data is the database.


2. Why Do We Need Databases?

Without a database, data would be stored in loose files (like plain text or Excel sheets), which causes problems:

Problem (Without DB) Solution (With DB)
Hard to find specific data Fast search and query
Data gets duplicated Data is stored once, referenced everywhere
No security Role-based access control
Hard to update Update in one place, reflects everywhere
No multi-user support Multiple users can access simultaneously

3. Key Terminology

Term Plain English Meaning Example
Data Raw facts and figures 25, "Alice", "alice@email.com"
Information Relevant, processed data with meaning "Alice is 25 years old"
Database (DB) Systematic collection of related data A school's entire student record system
DBMS Software that manages the database MySQL, PostgreSQL, MongoDB
Table A grid of rows and columns (like Excel) A Students table
Row / Record / Tuple One complete entry in a table One student's full details
Column / Field / Attribute A single category of data Student's Name, Age, Grade
Primary Key A unique ID for every row Student ID: S001
Foreign Key A reference to another table's primary key CourseID inside a Students table
Query A request/question you ask the database "Show me all students above 18"
Schema The blueprint/structure of the database Defining what columns a table has
Index A shortcut for faster searching Like a book index — helps find data quickly
Tuple Another name for a complete row/record The entire row: (S001, Alice, 20, C01)
Attribute Another name for a column/field Age, Name, ClassID

4. Relational vs. Non-Relational Databases (In Depth)

This is one of the most important distinctions in database technology. Let's go deep.


4.1 Relational Databases (SQL Databases)

A relational database stores data in structured tables made of rows and columns. The tables are related to each other through keys.

Key Characteristics

Feature Detail
Structure Data stored in tables (rows and columns)
Row Called a Record or Tuple — one complete entry
Column Called a Field or Attribute — one category of data
Schema Pre-defined and fixed — you must define the structure before inserting data
All fields must be filled Every row must provide a value for every column (or explicitly mark it NULL)
Primary Key Every table must have a Primary Key — a unique identifier for each row
Relationships Tables are related to each other using Foreign Keys
Query Language SQL (Structured Query Language) — used across ALL relational databases
Scaling Primarily Vertical Scaling (scale up: get a bigger, more powerful server)
Horizontal Scaling Possible but very difficult and complex to implement
Use Case OLTP — Online Transaction Processing (banking, orders, bookings)

What is OLTP?

OLTP (Online Transaction Processing) means the database handles a very large number of short, real-time transactions — like: - A customer placing an order on Amazon - A student registering for a course - A bank processing a money transfer

Relational databases are perfectly built for this because every transaction must be accurate, complete, and reliable.

Example — A School Relational Database

Students Table: | StudentID (PK) | Name | Age | ClassID (FK) | |----------------|-------|-----|--------------| | S001 | Alice | 20 | C01 | | S002 | Bob | 22 | C02 |

Classes Table: | ClassID (PK) | ClassName | |--------------|-------------| | C01 | Mathematics | | C02 | Physics |

  • Each row is a Tuple (e.g., the entire row S001, Alice, 20, C01 is one tuple)
  • Each column is an Attribute (e.g., Name, Age)
  • StudentID is the Primary Key of the Students table
  • ClassID in the Students table is a Foreign Key referencing the Classes table
  • The two tables are related — that's why it's called a relational database

Relational Database Examples & AWS Services

The following are the major relational database engines. Amazon offers most of them under a single managed cloud service:

Database Engine Notes
MySQL Most popular open-source relational DB
PostgreSQL Advanced open-source, highly reliable
MariaDB Fork of MySQL, open-source
Oracle DB Enterprise-grade, used in banking
IBM DB2 Enterprise use, large corporations
Microsoft SQL Server Microsoft ecosystem, corporate apps

AWS Cloud Services for Relational Databases:

AWS Service What It Is
Amazon RDS Managed service that runs MySQL, PostgreSQL, MariaDB, Oracle, IBM DB2, and Microsoft SQL Server — all six engines under one umbrella
Amazon Aurora AWS's own cloud-native relational DB engine, compatible with MySQL and PostgreSQL, but faster and more scalable

RDS = Relational Database Service. It does not create its own engine — it manages the existing six engines for you in the cloud so you don't have to set up servers manually.


4.2 Non-Relational Databases (NoSQL Databases)

NoSQL stands for "Not Only SQL". These databases do not use fixed tables. They store data in flexible, varied formats suited for modern, high-speed applications.

Key Characteristics

Feature Detail
Structure No fixed, pre-defined structure (no rigid schema)
Schema Schema-less — you can add new fields anytime without altering the whole database
Flexibility Each record/document can have different fields from others
Tables Tables/collections are generally independent of each other (no joins)
Scaling Designed for Horizontal Scaling (add more servers instead of a bigger server)
Speed Faster reads/writes because of independence and simpler structure
Hardware Cost Lower — because you add many cheap servers instead of one expensive server
Query Language No universal language — each NoSQL database has its own query method

Why No Universal Query Language?

In relational databases, SQL works for all of them because all of them share the same table structure. In NoSQL, each type stores data differently (documents, graphs, key-value pairs, columns) — so each one has its own way of querying data:

NoSQL Database Its Own Query Method
MongoDB MongoDB Query Language (MQL) using JSON-like syntax
Redis Redis commands (GET, SET, HGET, etc.)
Cassandra CQL — Cassandra Query Language (SQL-like but not SQL)
Neo4j Cypher Query Language
DynamoDB AWS SDK / PartiQL

4.3 Relational vs. Non-Relational — Side-by-Side

Feature Relational (SQL) Non-Relational (NoSQL)
Data Format Tables (rows & columns) Documents, key-value, graphs, columns
Schema Fixed, pre-defined (rigid) Flexible, schema-less
Row name Tuple Document / Record / Node (varies)
Column name Attribute / Field Field / Key (varies)
Relationships Tables are related via Foreign Keys Collections/tables are independent
All fields required Yes (or explicitly NULL) No — each entry can have different fields
Primary Key Required in every table Optional (varies by DB)
Query Language SQL (same for all) Each DB has its own language
Scaling Vertical (harder to scale out) Horizontal (easy to scale out)
Best for OLTP, financial systems, ERP Real-time apps, big data, social media
Examples MySQL, PostgreSQL, Oracle MongoDB, Redis, Cassandra, Neo4j

5. NoSQL Database Types — Deep Dive

NoSQL is not one single type of database. It is a family of four major types, each designed for a different kind of data.


5.1 Columnar Database (Column-Family Database)

How it works: Instead of storing data row by row (like relational databases), columnar databases store data column by column. This means when you search or analyze data, you only read the columns you need — not entire rows. This makes it dramatically faster for analytical queries.

Structure:

Row Key  |  Name    |  Age  |  City
---------|----------|-------|----------
R001     |  Alice   |  20   |  Karachi
R002     |  Bob     |  22   |  Lahore
In a columnar DB, the Age column is stored together physically on disk: [20, 22, ...]. So if you want the average age of all users, it reads just that column — not every row.

Why it is fast: - You read only the columns relevant to your query - Excellent compression (similar values stored together) - Designed to handle millions to billions of rows

Real-world use cases: - IoT sensor data (millions of readings per second) - Time-series data (stock prices over time) - Large-scale analytics (clickstream data, user behavior tracking) - Social media feeds with massive write throughput

Examples: Apache Cassandra, Apache HBase

AWS Equivalent: Amazon Keyspaces (Managed Cassandra-compatible service)


5.2 Document Database

How it works: Data is stored as documents — typically in JSON or BSON format. Each document is a self-contained unit that can store rich, nested, complex data including arrays and sub-objects. Unlike relational tables, documents in the same collection can have completely different fields.

Structure (JSON document):

{
  "userID": "U001",
  "name": "Alice",
  "email": "alice@example.com",
  "address": {
    "city": "Rawalpindi",
    "country": "Pakistan"
  },
  "orders": ["ORD001", "ORD002", "ORD005"]
}

Another document in the same collection can have completely different fields:

{
  "userID": "U002",
  "name": "Bob",
  "phone": "0300-1234567"
}

Why it is useful: - Stores long, complex, nested data naturally - No need to define a fixed schema upfront - Adding a new field to one document does not break others - Great for content that evolves over time

Real-world use cases: - User profiles on social media (each user may have different fields) - Product catalogs (a phone has different attributes than a shirt) - Blog posts, articles, CMS systems - E-commerce catalogs with varying product types

Examples: MongoDB, CouchDB, Firebase Firestore

AWS Equivalent: Amazon DocumentDB (MongoDB-compatible managed service)


5.3 Key-Value Database

How it works: This is the simplest type of NoSQL database. Data is stored as pairs of key → value, just like a dictionary or a locker system. You give a key, you get the value. That's it.

Structure:

Key                  | Value
---------------------|----------------------------------
"user:101"           | "Alice"
"session:abc123"     | "{logged_in: true, role: admin}"
"cart:U001"          | "[item1, item2, item3]"
"page:homepage"      | "<html>...cached HTML...</html>"

Why it is fast: - Lookup by key is O(1) — almost instant, regardless of data size - No complex queries, no joins, no schema overhead - Often stored entirely in memory (RAM) for maximum speed

Real-world use cases: - Session management (store logged-in user's session data) - Caching (store frequently accessed pages/results so the app doesn't recompute them) - Leaderboards (real-time game scores) - Shopping cart (temporary cart data before checkout) - Feature flags (on/off switches for app features)

Examples: Redis, Memcached

AWS Equivalents: - Amazon ElastiCache (managed Redis / Memcached) - Amazon DynamoDB (also a key-value store, but more powerful — supports document model too)


5.4 Graph Database

How it works: A graph database stores data as a network of nodes and edges: - Node = an entity (a person, a city, a product, a webpage) - Edge = a relationship between two nodes (FRIENDS_WITH, LOCATED_IN, BOUGHT, LINKS_TO) - Properties = attributes attached to nodes or edges

Structure:

(Alice) --[FRIENDS_WITH]--> (Bob)
(Bob)   --[LIVES_IN]------> (Lahore)
(Alice) --[BOUGHT]---------> (iPhone 15)
(BBC)   --[REPORTED_ON]----> (Climate Change)
(Climate Change) --[AFFECTS]--> (Pakistan)

Why it is powerful: - Relationships between data are first-class citizens — stored explicitly, not computed via joins - Traversing relationships ("who are Alice's friends' friends?") is extremely fast - Naturally models complex, interconnected real-world data

Real-world use cases: - Social networks (Facebook's friend recommendations, LinkedIn's "People You May Know") - Fraud detection (finding suspicious patterns of transactions between accounts) - Knowledge graphs (BBC uses a graph database to link news stories, topics, people, and places — e.g., a weather report linking regions, events, and impacts) - Recommendation engines ("Users who bought X also bought Y") - Network and IT infrastructure mapping (how servers connect to each other) - Drug discovery (relationships between molecules, diseases, and proteins)

Examples: Neo4j, Amazon Neptune, ArangoDB

AWS Equivalent: Amazon Neptune (fully managed graph database supporting both Property Graph and RDF)


5.5 NoSQL Types Summary

Type How Data is Stored Query Style Real-World Example AWS Service
Columnar By column (not row) CQL / column-scan IoT sensor data, analytics Amazon Keyspaces
Document JSON/BSON documents MQL (MongoDB-style) User profiles, product catalogs Amazon DocumentDB
Key-Value Key → Value pairs GET/SET commands Sessions, caching, carts Amazon ElastiCache / DynamoDB
Graph Nodes + Edges (relationships) Cypher / Gremlin Social networks, fraud detection Amazon Neptune

6. What is SQL?

SQL (Structured Query Language) is the standard language used to communicate with all relational databases. It is not a programming language — it is a query language designed specifically for data.

Common SQL Commands (CRUD)

Command What it does Example
SELECT Read/fetch data SELECT * FROM Students;
INSERT Add new data INSERT INTO Students VALUES (...)
UPDATE Modify existing data UPDATE Students SET Age=21 WHERE ID='S001'
DELETE Remove data DELETE FROM Students WHERE ID='S001'
CREATE TABLE Create a new table CREATE TABLE Students (...)
DROP TABLE Delete a table permanently DROP TABLE Students;

These four core operations — Create, Read, Update, Delete — are called CRUD and are fundamental to every database application.


7. ACID Properties (Data Reliability)

For a database to be reliable and trustworthy (especially for banking, healthcare), it must follow ACID rules:

Property Meaning Simple Example
**A**tomicity All steps of a transaction succeed, or none do Bank transfer: debit AND credit must both happen
**C**onsistency Data always moves from one valid state to another Balance can't go negative (if rule says so)
**I**solation Two transactions don't interfere with each other Two people booking the last seat simultaneously
**D**urability Once saved, data stays saved even after crashes After a power cut, your transfer is still recorded

8. Database vs. Spreadsheet

Spreadsheet (Excel) Database (MySQL, etc.)
Best for Small, personal data Large, multi-user data
Multi-user Limited Yes, thousands simultaneously
Relationships Manual (copy-paste) Built-in (foreign keys)
Speed Slow with large data Fast even with millions of rows
Security File-level only Row/column level permissions
Automation Macros (limited) Full programmatic access

9. Real-World Database Examples

Application Database Used For
Facebook / Instagram Storing user profiles, posts, likes, comments
YouTube Video metadata, comments, watch history
Uber Drivers, riders, trips, real-time locations
WhatsApp Messages, contacts, media metadata
Online Banking Accounts, transactions, balances
Hospital System Patient records, prescriptions, appointments
E-commerce (Amazon) Products, inventory, orders, customers
BBC News Knowledge graph linking stories, people, places, topics

10. Summary of Key Concepts

  • Data = raw, unprocessed facts about anything; may or may not be relevant.
  • Information = data that has been given context and is relevant to a specific purpose.
  • Database = a systematic, organized collection of related data.
  • DBMS = the software (e.g., MySQL, MongoDB) that lets users access and manipulate the database.
  • Relational databases use fixed-schema tables, SQL, tuples (rows), attributes (columns), primary keys, and foreign keys; best for OLTP and structured data.
  • Non-relational (NoSQL) databases are schema-less, horizontally scalable, and each has its own query language.
  • The four NoSQL types are: Columnar, Document, Key-Value, and Graph — each solves a different problem.
  • AWS provides managed cloud services for every database type: RDS/Aurora (relational), Keyspaces (columnar), DocumentDB (document), ElastiCache/DynamoDB (key-value), Neptune (graph).

💡 Pro Tip for Beginners: Start by learning MySQL or PostgreSQL (relational) as they teach the core concepts clearly. Once comfortable, explore MongoDB (NoSQL document) to understand the difference. Then look at Redis to appreciate how simple and fast key-value stores are.