AWS Systems Manager (SSM)
1. What is AWS Systems Manager?
AWS Systems Manager is an operations management service — a unified interface for viewing operational data, automating tasks, managing configuration, and maintaining compliance across your EC2 fleet and on-premises infrastructure — without needing SSH or RDP access to individual instances.
Without SSM:
SSH into each server to run patches → manual, slow, audit-less
Push config to 500 servers → script it yourself → error-prone
"Run this command on all prod servers" → SSH loop → risky + no central log
Secrets in .env files on servers → insecure
With SSM:
Patch Manager: define patch baseline → SSM patches all instances on schedule
Run Command: run shell script on 500 instances in parallel → full output log
Session Manager: browser-based terminal → no SSH, no bastion, no open ports
Parameter Store: centralized, encrypted config + secrets → app pulls at runtime
Automation: define multi-step runbook → execute on schedule or on event trigger
Inventory: see all software, services, network config across entire fleet
2. SSM Agent ⭐
Prerequisite: SSM Agent must be installed and running on managed nodes
Pre-installed on: Amazon Linux 2, Amazon Linux 2023, Amazon Linux (2017.09+),
Windows Server 2008+, Ubuntu 16.04+, RHEL 7+, SUSE 12+
Install manually (older/other OS):
curl https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm -o ssm-agent.rpm
sudo yum install -y ssm-agent.rpm
sudo systemctl enable amazon-ssm-agent && sudo systemctl start amazon-ssm-agent
EC2 IAM requirements:
Attach managed policy: AmazonSSMManagedInstanceCore
→ Allows: SSM API, S3 (for commands/logs), CloudWatch Logs
On-premises servers:
Install SSM Agent → register with SSM as "hybrid" managed node
→ Same SSM capabilities as EC2 instances (patching, commands, session manager)
→ Appears in SSM Fleet Manager alongside EC2 instances
Network requirements:
EC2 → needs access to SSM endpoints (public or VPC endpoint):
ssm.region.amazonaws.com
ec2messages.region.amazonaws.com
ssmmessages.region.amazonaws.com
For private subnets: create 3 VPC Interface Endpoints
→ No internet required → fully private fleet management
3. Session Manager ⭐
Browser or CLI-based terminal access to instances WITHOUT:
❌ SSH (port 22 closed — never needs to be open)
❌ Bastion hosts
❌ Key pairs (no PEM files)
❌ VPN
❌ Direct network access
How it works:
SSM Agent on instance maintains outbound connection to SSM service
You: AWS console → Systems Manager → Session Manager → Start Session
→ Browser terminal opens → full shell on instance
Or: aws ssm start-session --target i-0123456789abcdef0
Logging: [docs.aws.amazon](https://docs.aws.amazon.com/systems-manager/latest/userguide/patch-manager-aws-runpatchbaseline.html)
All session activity (keystrokes + output) can be logged to:
S3 (full session output)
CloudWatch Logs (searchable)
Audit: every session: who, which instance, when, what commands → CloudTrail
IAM control:
iam:StartSession → controls who can start sessions
Condition: ssm:resourceTag/Environment = prod
→ Only ops team can start sessions on prod instances
Port forwarding via Session Manager:
aws ssm start-session \
--target i-0123456789abcdef0 \
--document-name AWS-StartPortForwardingSession \
--parameters '{"portNumber":["3306"],"localPortNumber":["13306"]}'
→ Local port 13306 → tunnels to instance's port 3306 (MySQL)
→ No SSH needed, no SG change, fully audited
Best practice: [docs.aws.amazon](https://docs.aws.amazon.com/systems-manager/latest/userguide/patch-manager-aws-runpatchbaseline.html)
Remove all inbound SG rules for SSH (port 22) and RDP (3389)
Use Session Manager exclusively → zero attack surface
Enable CloudTrail + S3 session logging for full audit trail
4. Run Command ⭐
Execute commands on one or many managed instances simultaneously,
without opening any ports or using SSH:
Core concepts:
Command Document: pre-built or custom script/command template
AWS-RunShellScript: run bash/shell commands on Linux
AWS-RunPowerShellScript: run PowerShell on Windows
AWS-RunPatchBaseline: run patching (used by Patch Manager) [docs.aws.amazon](https://docs.aws.amazon.com/systems-manager/latest/userguide/patch-manager-aws-runpatchbaseline.html)
AWS-ConfigureAWSPackage: install/uninstall AWS packages
AWS-UpdateSSMAgent: update SSM Agent on instances
Target selection: [docs.aws.amazon](https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-param-runcommand.html)
By instance ID(s): --instance-ids i-111, i-222
By tag(s): {"Key":"Environment","Values":["production"]}
By resource group: all instances in an AWS Resource Group
→ Works on EC2 + on-premises managed nodes
Concurrency controls: [docs.aws.amazon](https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-param-runcommand.html)
Max concurrency: run on 50 instances at a time (or 20%)
Error threshold: stop if 5 failures occur
→ Roll out command safely to large fleets
Example: apply hotfix to all prod servers:
aws ssm send-command \
--document-name "AWS-RunShellScript" \
--parameters '{"commands":["yum update -y openssl"]}' \
--targets '[{"Key":"tag:Environment","Values":["production"]}]' \
--max-concurrency "20%" \
--max-errors "5"
Output:
Stored in S3 or CloudWatch Logs
View per-instance: success/failure + stdout/stderr
CloudTrail: records who ran what command on which instances
5. Patch Manager ⭐
Automate OS + application patching across your fleet:
Key concepts:
Patch Baseline:
Defines WHICH patches to install (by severity, classification, product)
AWS-provided defaults: AmazonLinux2023DefaultPatchBaseline, etc.
Custom baselines: define your own auto-approval rules
Example baseline rule:
Auto-approve: Critical severity, after 0-day delay
Auto-approve: Important severity, after 7-day delay
Reject: specific CVEs you've tested and excluded
Patch Group:
Tag instances with "Patch Group" key = "value"
Associate a patch baseline to a patch group
Example:
Tag: Patch Group = ProdServers → use strict baseline (only critical + 7-day delay)
Tag: Patch Group = DevServers → use lenient baseline (all severity, 0-day delay)
Maintenance Window:
Schedule when patching runs: "every Sunday 2 AM–4 AM"
Tasks registered: Run Command with AWS-RunPatchBaseline document [docs.aws.amazon](https://docs.aws.amazon.com/systems-manager/latest/userguide/patch-manager-aws-runpatchbaseline.html)
Rate control: 20% of fleet at a time → rolling patch deployment
Patch compliance reporting:
After patching: SSM reports each instance as "Compliant" or "Non-Compliant"
View in: SSM Compliance dashboard OR AWS Config OR Security Hub
Scan mode vs Install mode: [docs.aws.amazon](https://docs.aws.amazon.com/systems-manager/latest/userguide/patch-manager-aws-runpatchbaseline.html)
Scan: check what patches are missing → report only (no install)
Install: apply patches per baseline → reboot if required
AWS-RunPatchBaseline document supports both via "Operation" parameter
6. Parameter Store ⭐
Centralized storage for configuration data and secrets:
Parameter types:
String: plain text value
StringList: comma-separated list of values
SecureString: encrypted with KMS (symmetric CMK or aws/ssm)
Tiers:
Standard:
Free (up to 10,000 parameters per account/region)
Max value size: 4 KB
No parameter policies (TTL/expiry)
Advanced:
$0.05/parameter/month (above 10,000 standard are free)
Max value size: 8 KB
Parameter policies: TTL (auto-expire old secrets), notifications
Hierarchy:
/myapp/prod/database/password
/myapp/prod/database/username
/myapp/prod/api/stripe_key
/myapp/dev/database/password
GetParametersByPath: /myapp/prod/ → returns all parameters under prod
→ Application gets all config in ONE API call
Integration with Run Command: [docs.aws.amazon](https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-param-runcommand.html)
Parameter value referenced in command: {{ssm:/myapp/prod/db-password}}
→ SSM resolves parameter → injects into command → never visible in logs
IAM control:
ssm:GetParameter by path or name
ssm:PutParameter for specific paths
KMS: kms:Decrypt required for SecureString parameters
Parameter Store vs Secrets Manager:
Parameter Store: free for standard, config + secrets, no built-in rotation
Secrets Manager: $0.40/secret/month, built-in rotation, cross-account easier
Rule: static config → Parameter Store; DB passwords needing rotation → Secrets Manager
7. SSM Automation ⭐
Execute multi-step operational runbooks:
Pre-built: AWS-StartEC2Instance, AWS-StopEC2Instance, AWS-CreateImage,
AWS-PatchInstanceWithRollback, AWS-DisablePublicAccessForSecurityGroup
Custom: write your own YAML runbook with sequential or parallel steps
Automation document steps:
aws:executeAwsApi → call any AWS API
aws:runCommand → run Run Command on instances
aws:invokeLambdaFunction → trigger Lambda
aws:changeInstanceState → start/stop/terminate instances
aws:createSnapshot → create EBS snapshot
aws:waitForAwsResourceProperty → wait until condition met (e.g., instance running)
aws:branch → conditional branching
Example: pre-patch AMI creation runbook:
Step 1: aws:createImage → snapshot current instance
Step 2: aws:runCommand → run AWS-RunPatchBaseline
Step 3: aws:waitForAwsResourceProperty → wait for patches complete
Step 4: aws:createImage → create patched AMI
Step 5: aws:changeInstanceState → start instance
Execution targets:
Single resource: automate one EC2 instance
Rate-controlled: apply to fleet (10% at a time)
Triggered by: AWS Config remediation, EventBridge rule, Lambda, manual
Integrations:
AWS Config → non-compliant resource → trigger SSM Automation [aws.amazon](https://aws.amazon.com/blogs/mt/manage-custom-aws-config-rules-with-remediation-using-conformance-packs/)
EventBridge → scheduled trigger → Automation runs weekly
CloudWatch Alarm → threshold exceeded → Automation scales resources
8. Inventory
Collect and view metadata from managed instances:
Data collected automatically:
Installed applications (name, version, publisher)
Running services
Network configuration (IPs, MAC, DNS)
Windows updates
AWS components (SSM Agent version, etc.)
Custom inventory (you define JSON schema + data)
Files and registries (specify paths to track)
Query with: [docs.aws.amazon](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_introduction.html)
SSM Inventory dashboard → filter by any attribute
AWS Config Advanced Query → SQL across entire fleet
Example: "Show all instances running Apache version < 2.4.50"
Aggregation with Resource Data Sync:
All inventory data → single S3 bucket → query with Athena
→ Build custom dashboards, compliance reports, asset management
9. Maintenance Windows
Define when SSM tasks run (patching, backups, scripts):
Schedule: cron or rate expression
cron(0 2 ? * SUN *) → every Sunday at 2 AM
rate(7 days)
Duration: max hours window can run
Stop initiating: stop starting new tasks N hours before window closes
Allow unregistered targets: run on instances not matching current targets
Register tasks:
Run Command task: run AWS-RunPatchBaseline on patch group
Automation task: run custom runbook
Lambda task: invoke Lambda function
Step Functions: start state machine execution
Rate control: [docs.aws.amazon](https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-param-runcommand.html)
10 instances at a time → rolling deployment
Error threshold: stop if 3 failures → prevent cascading issues
10. SSM Features Summary
| Capability | Purpose | Key Use Case |
| Session Manager | Browser terminal, no SSH | Secure access, no bastion hosts |
| Run Command | Run scripts fleet-wide | Hotfixes, config changes, ad-hoc tasks |
| Patch Manager | OS patch automation | Weekly patching with compliance reporting |
| Parameter Store | Config + secrets storage | App config, SecureString for secrets |
| Automation | Multi-step runbooks | Pre-approved operational procedures |
| Inventory | Fleet metadata collection | Asset management, compliance queries |
| Maintenance Windows | Scheduled task execution | Controlled patching windows |
| Compliance | Patch + config compliance | Dashboard of compliant vs non-compliant nodes |
| Fleet Manager | Single-pane EC2 + on-prem view | View all managed nodes in one console |
| Distributor | Package deployment | Deploy/update software packages at scale |
11. Common Mistakes
| ❌ Wrong | ✅ Correct |
| Session Manager requires port 22 open | Session Manager requires zero open inbound ports — uses outbound HTTPS only |
| SSM works without IAM role on EC2 | EC2 needs AmazonSSMManagedInstanceCore policy attached to instance role |
| Patch Manager patches immediately on baseline change | Patches apply only during Maintenance Windows or explicit Run Command |
| Parameter Store standard tier has TTL/expiry | TTL/expiry parameter policies only available in Advanced tier |
| SSM Automation requires Lambda for all custom steps | Automation documents can call AWS APIs directly (aws:executeAwsApi) without Lambda |
| Run Command has no rollback mechanism | SSM Automation supports rollback steps — Run Command alone does not |
| Session logs are automatic | Session logging to S3/CloudWatch must be explicitly configured in Session Manager preferences |
| Parameter Store SecureString uses default encryption | SecureString uses aws/ssm KMS key by default — specify your CMK for extra control |
12. Interview Questions Checklist