🗄️ Database — Complete Basic Notes¶
Target Audience: Non-technical students who want to understand what databases are, why they exist, and how they work — explained in plain, everyday language.
1. The Three Core Concepts: Data, Database, and DBMS¶
Most beginners confuse these three terms and use them interchangeably. They are related but they are not the same thing. Understanding the difference between them is the very first step.
1.1 Data¶
Data is a collection of raw, unprocessed facts about any object, person, event, or thing.
- Data on its own has no meaning. It is just facts.
- It may or may not be relevant to you.
- Data can be anything: numbers, text, images, dates, symbols.
Examples of raw data:
25, "Alice", "01-01-2000", 98.6, "Rawalpindi", true, 7734
Think of data like individual puzzle pieces scattered on a table. Each piece exists, but it does not show you the full picture yet.
1.2 Information¶
Information is data that has been processed, organized, or given context so that it becomes meaningful and relevant.
- Information is always relevant to a specific question or purpose.
- It is derived from data by adding context.
- Not all data becomes information — only the data that is relevant to your query.
Example: Suppose a hospital database has the following raw data:
"Alice", 25, "01-01-2000", "Rawalpindi", 98.6, "Flu", "Dr. Ahmed", "2024-03-15"
Now, if the doctor asks: "What is Alice's age and diagnosis?" - Information (relevant): Alice is 25 years old and was diagnosed with Flu on 2024-03-15. - Data (not relevant to this query): Her city (Rawalpindi) and body temperature (98.6) are still data — they exist but are not part of the answer to this specific question.
Think of information like a completed puzzle section — you have selected and arranged only the relevant pieces to reveal a meaningful picture.
1.3 The Relationship: Data → Information¶
Raw Facts (Data)
|
| processing + context + relevance
↓
Meaningful Output (Information)
| Data | Information | |
|---|---|---|
| Definition | Raw, unprocessed facts | Processed, meaningful, relevant data |
| Meaning | Has no meaning on its own | Has clear meaning and context |
| Relevance | May or may not be relevant | Always relevant to a specific purpose |
| Example | 25, "Alice", "Flu" | "Alice, aged 25, was diagnosed with Flu" |
| Dependency | Independent | Derived from data |
1.4 Database (DB)¶
A database is a systematic, organized collection of related data stored so it can be easily accessed, managed, searched, and updated.
- The keyword is systematic — data is not just dumped randomly; it is structured with a purpose.
- A database stores data in a way that allows meaningful information to be retrieved from it.
Real-life examples: - Your phone's contact list is a tiny database (name, number, email stored together). - A school's student records system is a database. - Netflix uses a database to store every movie, user account, and watch history. - Amazon uses a database to store every product, price, and order.
Think of a database like a well-organized library — thousands of books (data) are arranged systematically so you can find what you need (information) quickly.
1.5 DBMS — Database Management System¶
A DBMS (Database Management System) is a software application that enables users to access, create, manage, and manipulate a database.
- You never interact with the raw data files directly.
- The DBMS sits between the user/application and the stored data, acting as an intermediary.
- It handles security, queries, updates, backups, and multi-user access.
You / Application
|
| (sends a request, e.g., "Find all orders from today")
↓
+---------------------------+
| DBMS | <-- This is the SOFTWARE (e.g., MySQL, PostgreSQL)
| (processes the request) |
+---------------------------+
|
↓
Raw Data stored on Disk / Memory
|
↓
Result returned to You / Application
What a DBMS does: - Allows users to create databases and tables - Allows users to read, insert, update, delete data - Enforces security (who can access what) - Manages concurrent access (multiple users at once) - Handles backups and recovery
Examples of DBMS Software¶
| DBMS Software | Type | Used By |
|---|---|---|
| MySQL | Relational | WordPress, web apps, startups |
| PostgreSQL | Relational | Large-scale apps, analytics |
| Oracle DB | Relational | Banks, large enterprises |
| Microsoft SQL Server | Relational | Microsoft ecosystem, corporate apps |
| MongoDB | NoSQL (Document) | Real-time apps, content platforms |
| Redis | NoSQL (Key-Value) | Caching, sessions, leaderboards |
| Cassandra | NoSQL (Column) | IoT, time-series, social media feeds |
| Neo4j | NoSQL (Graph) | Social networks, fraud detection |
Common Confusion: Many beginners think MySQL or MongoDB is the database itself. It is not. MySQL is the DBMS — the software engine. The actual database is the collection of data it manages. MySQL is the tool; your data is the database.
2. Why Do We Need Databases?¶
Without a database, data would be stored in loose files (like plain text or Excel sheets), which causes problems:
| Problem (Without DB) | Solution (With DB) |
|---|---|
| Hard to find specific data | Fast search and query |
| Data gets duplicated | Data is stored once, referenced everywhere |
| No security | Role-based access control |
| Hard to update | Update in one place, reflects everywhere |
| No multi-user support | Multiple users can access simultaneously |
3. Key Terminology¶
| Term | Plain English Meaning | Example |
|---|---|---|
| Data | Raw facts and figures | 25, "Alice", "alice@email.com" |
| Information | Relevant, processed data with meaning | "Alice is 25 years old" |
| Database (DB) | Systematic collection of related data | A school's entire student record system |
| DBMS | Software that manages the database | MySQL, PostgreSQL, MongoDB |
| Table | A grid of rows and columns (like Excel) | A Students table |
| Row / Record / Tuple | One complete entry in a table | One student's full details |
| Column / Field / Attribute | A single category of data | Student's Name, Age, Grade |
| Primary Key | A unique ID for every row | Student ID: S001 |
| Foreign Key | A reference to another table's primary key | CourseID inside a Students table |
| Query | A request/question you ask the database | "Show me all students above 18" |
| Schema | The blueprint/structure of the database | Defining what columns a table has |
| Index | A shortcut for faster searching | Like a book index — helps find data quickly |
| Tuple | Another name for a complete row/record | The entire row: (S001, Alice, 20, C01) |
| Attribute | Another name for a column/field | Age, Name, ClassID |
4. Relational vs. Non-Relational Databases (In Depth)¶
This is one of the most important distinctions in database technology. Let's go deep.
4.1 Relational Databases (SQL Databases)¶
A relational database stores data in structured tables made of rows and columns. The tables are related to each other through keys.
Key Characteristics¶
| Feature | Detail |
|---|---|
| Structure | Data stored in tables (rows and columns) |
| Row | Called a Record or Tuple — one complete entry |
| Column | Called a Field or Attribute — one category of data |
| Schema | Pre-defined and fixed — you must define the structure before inserting data |
| All fields must be filled | Every row must provide a value for every column (or explicitly mark it NULL) |
| Primary Key | Every table must have a Primary Key — a unique identifier for each row |
| Relationships | Tables are related to each other using Foreign Keys |
| Query Language | SQL (Structured Query Language) — used across ALL relational databases |
| Scaling | Primarily Vertical Scaling (scale up: get a bigger, more powerful server) |
| Horizontal Scaling | Possible but very difficult and complex to implement |
| Use Case | OLTP — Online Transaction Processing (banking, orders, bookings) |
What is OLTP?¶
OLTP (Online Transaction Processing) means the database handles a very large number of short, real-time transactions — like: - A customer placing an order on Amazon - A student registering for a course - A bank processing a money transfer
Relational databases are perfectly built for this because every transaction must be accurate, complete, and reliable.
Example — A School Relational Database¶
Students Table: | StudentID (PK) | Name | Age | ClassID (FK) | |----------------|-------|-----|--------------| | S001 | Alice | 20 | C01 | | S002 | Bob | 22 | C02 |
Classes Table: | ClassID (PK) | ClassName | |--------------|-------------| | C01 | Mathematics | | C02 | Physics |
- Each row is a Tuple (e.g., the entire row
S001, Alice, 20, C01is one tuple) - Each column is an Attribute (e.g.,
Name,Age) StudentIDis the Primary Key of the Students tableClassIDin the Students table is a Foreign Key referencing the Classes table- The two tables are related — that's why it's called a relational database
Relational Database Examples & AWS Services¶
The following are the major relational database engines. Amazon offers most of them under a single managed cloud service:
| Database Engine | Notes |
|---|---|
| MySQL | Most popular open-source relational DB |
| PostgreSQL | Advanced open-source, highly reliable |
| MariaDB | Fork of MySQL, open-source |
| Oracle DB | Enterprise-grade, used in banking |
| IBM DB2 | Enterprise use, large corporations |
| Microsoft SQL Server | Microsoft ecosystem, corporate apps |
AWS Cloud Services for Relational Databases:
| AWS Service | What It Is |
|---|---|
| Amazon RDS | Managed service that runs MySQL, PostgreSQL, MariaDB, Oracle, IBM DB2, and Microsoft SQL Server — all six engines under one umbrella |
| Amazon Aurora | AWS's own cloud-native relational DB engine, compatible with MySQL and PostgreSQL, but faster and more scalable |
RDS = Relational Database Service. It does not create its own engine — it manages the existing six engines for you in the cloud so you don't have to set up servers manually.
4.2 Non-Relational Databases (NoSQL Databases)¶
NoSQL stands for "Not Only SQL". These databases do not use fixed tables. They store data in flexible, varied formats suited for modern, high-speed applications.
Key Characteristics¶
| Feature | Detail |
|---|---|
| Structure | No fixed, pre-defined structure (no rigid schema) |
| Schema | Schema-less — you can add new fields anytime without altering the whole database |
| Flexibility | Each record/document can have different fields from others |
| Tables | Tables/collections are generally independent of each other (no joins) |
| Scaling | Designed for Horizontal Scaling (add more servers instead of a bigger server) |
| Speed | Faster reads/writes because of independence and simpler structure |
| Hardware Cost | Lower — because you add many cheap servers instead of one expensive server |
| Query Language | No universal language — each NoSQL database has its own query method |
Why No Universal Query Language?¶
In relational databases, SQL works for all of them because all of them share the same table structure. In NoSQL, each type stores data differently (documents, graphs, key-value pairs, columns) — so each one has its own way of querying data:
| NoSQL Database | Its Own Query Method |
|---|---|
| MongoDB | MongoDB Query Language (MQL) using JSON-like syntax |
| Redis | Redis commands (GET, SET, HGET, etc.) |
| Cassandra | CQL — Cassandra Query Language (SQL-like but not SQL) |
| Neo4j | Cypher Query Language |
| DynamoDB | AWS SDK / PartiQL |
4.3 Relational vs. Non-Relational — Side-by-Side¶
| Feature | Relational (SQL) | Non-Relational (NoSQL) |
|---|---|---|
| Data Format | Tables (rows & columns) | Documents, key-value, graphs, columns |
| Schema | Fixed, pre-defined (rigid) | Flexible, schema-less |
| Row name | Tuple | Document / Record / Node (varies) |
| Column name | Attribute / Field | Field / Key (varies) |
| Relationships | Tables are related via Foreign Keys | Collections/tables are independent |
| All fields required | Yes (or explicitly NULL) | No — each entry can have different fields |
| Primary Key | Required in every table | Optional (varies by DB) |
| Query Language | SQL (same for all) | Each DB has its own language |
| Scaling | Vertical (harder to scale out) | Horizontal (easy to scale out) |
| Best for | OLTP, financial systems, ERP | Real-time apps, big data, social media |
| Examples | MySQL, PostgreSQL, Oracle | MongoDB, Redis, Cassandra, Neo4j |
5. NoSQL Database Types — Deep Dive¶
NoSQL is not one single type of database. It is a family of four major types, each designed for a different kind of data.
5.1 Columnar Database (Column-Family Database)¶
How it works: Instead of storing data row by row (like relational databases), columnar databases store data column by column. This means when you search or analyze data, you only read the columns you need — not entire rows. This makes it dramatically faster for analytical queries.
Structure:
Row Key | Name | Age | City
---------|----------|-------|----------
R001 | Alice | 20 | Karachi
R002 | Bob | 22 | Lahore
Age column is stored together physically on disk: [20, 22, ...]. So if you want the average age of all users, it reads just that column — not every row. Why it is fast: - You read only the columns relevant to your query - Excellent compression (similar values stored together) - Designed to handle millions to billions of rows
Real-world use cases: - IoT sensor data (millions of readings per second) - Time-series data (stock prices over time) - Large-scale analytics (clickstream data, user behavior tracking) - Social media feeds with massive write throughput
Examples: Apache Cassandra, Apache HBase
AWS Equivalent: Amazon Keyspaces (Managed Cassandra-compatible service)
5.2 Document Database¶
How it works: Data is stored as documents — typically in JSON or BSON format. Each document is a self-contained unit that can store rich, nested, complex data including arrays and sub-objects. Unlike relational tables, documents in the same collection can have completely different fields.
Structure (JSON document):
{
"userID": "U001",
"name": "Alice",
"email": "alice@example.com",
"address": {
"city": "Rawalpindi",
"country": "Pakistan"
},
"orders": ["ORD001", "ORD002", "ORD005"]
}
Another document in the same collection can have completely different fields:
{
"userID": "U002",
"name": "Bob",
"phone": "0300-1234567"
}
Why it is useful: - Stores long, complex, nested data naturally - No need to define a fixed schema upfront - Adding a new field to one document does not break others - Great for content that evolves over time
Real-world use cases: - User profiles on social media (each user may have different fields) - Product catalogs (a phone has different attributes than a shirt) - Blog posts, articles, CMS systems - E-commerce catalogs with varying product types
Examples: MongoDB, CouchDB, Firebase Firestore
AWS Equivalent: Amazon DocumentDB (MongoDB-compatible managed service)
5.3 Key-Value Database¶
How it works: This is the simplest type of NoSQL database. Data is stored as pairs of key → value, just like a dictionary or a locker system. You give a key, you get the value. That's it.
Structure:
Key | Value
---------------------|----------------------------------
"user:101" | "Alice"
"session:abc123" | "{logged_in: true, role: admin}"
"cart:U001" | "[item1, item2, item3]"
"page:homepage" | "<html>...cached HTML...</html>"
Why it is fast: - Lookup by key is O(1) — almost instant, regardless of data size - No complex queries, no joins, no schema overhead - Often stored entirely in memory (RAM) for maximum speed
Real-world use cases: - Session management (store logged-in user's session data) - Caching (store frequently accessed pages/results so the app doesn't recompute them) - Leaderboards (real-time game scores) - Shopping cart (temporary cart data before checkout) - Feature flags (on/off switches for app features)
Examples: Redis, Memcached
AWS Equivalents: - Amazon ElastiCache (managed Redis / Memcached) - Amazon DynamoDB (also a key-value store, but more powerful — supports document model too)
5.4 Graph Database¶
How it works: A graph database stores data as a network of nodes and edges: - Node = an entity (a person, a city, a product, a webpage) - Edge = a relationship between two nodes (FRIENDS_WITH, LOCATED_IN, BOUGHT, LINKS_TO) - Properties = attributes attached to nodes or edges
Structure:
(Alice) --[FRIENDS_WITH]--> (Bob)
(Bob) --[LIVES_IN]------> (Lahore)
(Alice) --[BOUGHT]---------> (iPhone 15)
(BBC) --[REPORTED_ON]----> (Climate Change)
(Climate Change) --[AFFECTS]--> (Pakistan)
Why it is powerful: - Relationships between data are first-class citizens — stored explicitly, not computed via joins - Traversing relationships ("who are Alice's friends' friends?") is extremely fast - Naturally models complex, interconnected real-world data
Real-world use cases: - Social networks (Facebook's friend recommendations, LinkedIn's "People You May Know") - Fraud detection (finding suspicious patterns of transactions between accounts) - Knowledge graphs (BBC uses a graph database to link news stories, topics, people, and places — e.g., a weather report linking regions, events, and impacts) - Recommendation engines ("Users who bought X also bought Y") - Network and IT infrastructure mapping (how servers connect to each other) - Drug discovery (relationships between molecules, diseases, and proteins)
Examples: Neo4j, Amazon Neptune, ArangoDB
AWS Equivalent: Amazon Neptune (fully managed graph database supporting both Property Graph and RDF)
5.5 NoSQL Types Summary¶
| Type | How Data is Stored | Query Style | Real-World Example | AWS Service |
|---|---|---|---|---|
| Columnar | By column (not row) | CQL / column-scan | IoT sensor data, analytics | Amazon Keyspaces |
| Document | JSON/BSON documents | MQL (MongoDB-style) | User profiles, product catalogs | Amazon DocumentDB |
| Key-Value | Key → Value pairs | GET/SET commands | Sessions, caching, carts | Amazon ElastiCache / DynamoDB |
| Graph | Nodes + Edges (relationships) | Cypher / Gremlin | Social networks, fraud detection | Amazon Neptune |
6. What is SQL?¶
SQL (Structured Query Language) is the standard language used to communicate with all relational databases. It is not a programming language — it is a query language designed specifically for data.
Common SQL Commands (CRUD)¶
| Command | What it does | Example |
|---|---|---|
SELECT | Read/fetch data | SELECT * FROM Students; |
INSERT | Add new data | INSERT INTO Students VALUES (...) |
UPDATE | Modify existing data | UPDATE Students SET Age=21 WHERE ID='S001' |
DELETE | Remove data | DELETE FROM Students WHERE ID='S001' |
CREATE TABLE | Create a new table | CREATE TABLE Students (...) |
DROP TABLE | Delete a table permanently | DROP TABLE Students; |
These four core operations — Create, Read, Update, Delete — are called CRUD and are fundamental to every database application.
7. ACID Properties (Data Reliability)¶
For a database to be reliable and trustworthy (especially for banking, healthcare), it must follow ACID rules:
| Property | Meaning | Simple Example |
|---|---|---|
| **A**tomicity | All steps of a transaction succeed, or none do | Bank transfer: debit AND credit must both happen |
| **C**onsistency | Data always moves from one valid state to another | Balance can't go negative (if rule says so) |
| **I**solation | Two transactions don't interfere with each other | Two people booking the last seat simultaneously |
| **D**urability | Once saved, data stays saved even after crashes | After a power cut, your transfer is still recorded |
8. Database vs. Spreadsheet¶
| Spreadsheet (Excel) | Database (MySQL, etc.) | |
|---|---|---|
| Best for | Small, personal data | Large, multi-user data |
| Multi-user | Limited | Yes, thousands simultaneously |
| Relationships | Manual (copy-paste) | Built-in (foreign keys) |
| Speed | Slow with large data | Fast even with millions of rows |
| Security | File-level only | Row/column level permissions |
| Automation | Macros (limited) | Full programmatic access |
9. Real-World Database Examples¶
| Application | Database Used For |
|---|---|
| Facebook / Instagram | Storing user profiles, posts, likes, comments |
| YouTube | Video metadata, comments, watch history |
| Uber | Drivers, riders, trips, real-time locations |
| Messages, contacts, media metadata | |
| Online Banking | Accounts, transactions, balances |
| Hospital System | Patient records, prescriptions, appointments |
| E-commerce (Amazon) | Products, inventory, orders, customers |
| BBC News | Knowledge graph linking stories, people, places, topics |
10. Summary of Key Concepts¶
- Data = raw, unprocessed facts about anything; may or may not be relevant.
- Information = data that has been given context and is relevant to a specific purpose.
- Database = a systematic, organized collection of related data.
- DBMS = the software (e.g., MySQL, MongoDB) that lets users access and manipulate the database.
- Relational databases use fixed-schema tables, SQL, tuples (rows), attributes (columns), primary keys, and foreign keys; best for OLTP and structured data.
- Non-relational (NoSQL) databases are schema-less, horizontally scalable, and each has its own query language.
- The four NoSQL types are: Columnar, Document, Key-Value, and Graph — each solves a different problem.
- AWS provides managed cloud services for every database type: RDS/Aurora (relational), Keyspaces (columnar), DocumentDB (document), ElastiCache/DynamoDB (key-value), Neptune (graph).
💡 Pro Tip for Beginners: Start by learning MySQL or PostgreSQL (relational) as they teach the core concepts clearly. Once comfortable, explore MongoDB (NoSQL document) to understand the difference. Then look at Redis to appreciate how simple and fast key-value stores are.