AWS Certified Solutions Architect — Associate (SAA-C03)
A documentation-first study guide. AWS writes the exam from its own documentation, so reading the docs is the highest-leverage thing you can do. This guide is a curated index into the canonical references, FAQs, and a selection of whitepapers — organised around the four exam domains, not around services.
Maps to the published AWS Certified Solutions Architect — Associate (SAA-C03) exam guide. Domain weights and task statements are quoted from that PDF.
About the exam
Current exam code: SAA-C03 (released August 2022). No C04 announcement as of April 2026.
Format: 65 questions (50 scored + 15 unscored) · 130 minutes · $150 USD · scaled score 100–1000, pass at 720.
The four domains:
- Domain 1 — Design Secure Architectures — 30%
- Domain 2 — Design Resilient Architectures — 26%
- Domain 3 — Design High-Performing Architectures — 24%
- Domain 4 — Design Cost-Optimized Architectures — 20%
Primary official sources (bookmark these):
- Official SAA-C03 certification page
- SAA-C03 Exam Guide (PDF)
- Official sample questions (PDF)
- Exam Readiness: Solutions Architect – Associate (free on Skill Builder)
Whitepapers worth reviewing:
These can be pretty long (particularly the Well-Architected Framework) so don’t let yourself go too far down the rabbit hole with them if you want to make quick progress with exam study.
- AWS Well-Architected Framework — the lens through which every SAA question is framed. The five (now six) pillars map almost one-to-one onto the four exam domains.
- Overview of Amazon Web Services — a structured tour of the service catalogue.
- AWS Fault Isolation Boundaries — the cleanest articulation of how AZs, regions, and partitions actually behave under failure.
- Disaster Recovery of Workloads on AWS — the source for the four DR pattern names the exam uses.
- AWS Security Incident Response Guide — for Domain 1.
Priority tiers: The published domain weights (30/26/24/20) tell you how the exam is balanced across the four domains, but they don’t tell you that within each domain a handful of services account for most of the questions. Every section in this guide carries a tier badge based on triangulating the AWS exam guide, the experience reports of recent test takers, and the patterns that appear in the practice-exam community:
- ★★★ Core Heavily tested. Multiple questions will lean on this. Spend hours, not minutes — if you don’t know it well, you fail.
- ★★ Important Reliably tested, usually one or two questions. Read every linked page in the section, do the FAQ, understand the comparison points. A few hours per topic.
- ★ Light Known to appear, but typically as one distinguishing question or as wrong-answer distractors. Skim the docs, learn the one-line distinction, move on. Twenty minutes to an hour.
For an 8–12 week prep cycle the rough split that the data supports is about 60% of your time on Core topics, 30% on Important, and 10% on Light. The biggest single concentration of questions across the whole exam is the cluster around VPC + Security Groups + S3 + EC2 + IAM + ELB + RDS + DynamoDB + Lambda + CloudFront — know those ten cold and you have the foundation of a pass.
How to use this guide:
- Each section opens with a one-paragraph summary explaining what to focus on, then has up to three link sections: Core docs (user/developer guides — the canonical reference), FAQ (exam writers love edge cases from FAQs — do not skip), and Deeper reading (whitepapers, blog posts, re:Post articles).
- If a link 404s, AWS has reorganised the docs. Search the page title to find the new location — the content almost always still exists.
- Read every FAQ for every Important and Core service. They are short, dense, and disproportionately tested.
- The What’s New feed is worth a weekly scan in the last month before your exam, but remember: the SAA-C03 exam lags new launches by ~12 months. Don’t memorise yesterday’s announcement.
Part I — Domain 1: Design Secure Architectures (30%)
The largest domain by weight. The exam treats security as a design constraint that shapes every other choice — least privilege, encryption everywhere, defence in depth, and audit trails are the recurring themes.
Chapter 1 — Identity and access management
Maps to Task Statement 1.1 — Design secure access to AWS resources
Knowledge of:
- Access controls and management across multiple accounts
- AWS federated access and identity services (for example, AWS IAM Identity Center, AWS IAM)
- AWS global infrastructure (for example, Availability Zones, AWS Regions)
- AWS security best practices (for example, the principle of least privilege)
- The AWS shared responsibility model
Skills in:
- Applying AWS security best practices to IAM users and root users
- Designing a flexible authorization model that includes IAM users, groups, roles, and policies
- Designing a role-based access control strategy
- Designing a security strategy for multiple AWS accounts
- Determining the appropriate use of resource policies for AWS services
- Determining when to federate a directory service with IAM roles
1.1 IAM core ★★★ Core
The bedrock of every security question. Know identity-based vs resource-based policies, the explicit deny → explicit allow → implicit deny evaluation order, users vs groups vs roles, and MFA basics. Expect 3–5 questions to lean on this.
Core docs
- What is IAM?
- IAM identities — users, groups, and roles
- Policies and permissions
- Identity-based vs resource-based policies
- Policy evaluation logic — explicit deny, explicit allow, implicit deny order
- IAM security best practices
- Multi-factor authentication
FAQ
1.2 IAM Identity Center and federation ★★ Important
Tested as the distinction between workforce identity from an external IdP → AWS (Identity Center / SAML / OIDC) versus application end-user identity (Cognito user pools and identity pools). The exam likes the boundary; deep configuration is rare.
Core docs
- What is IAM Identity Center? (formerly AWS SSO)
- Getting started with Identity Center
- Identity providers and federation
- SAML 2.0 federation
- OIDC federation
- Amazon Cognito user pools and identity pools — for application end-user identity, not workforce
FAQ
1.3 STS and role assumption ★★★ Core
Cross-account access patterns are an exam favourite. Know AssumeRole, the trust-policy + permissions-policy split, and especially the External ID confused-deputy mitigation — it appears regularly.
Core docs
- Temporary security credentials
- AWS STS API reference —
AssumeRole,AssumeRoleWithWebIdentity,GetSessionToken - Roles terms and concepts
- External ID for cross-account access — confused-deputy mitigation; tested often
- Switching to an IAM role (CLI)
1.4 Service Control Policies (SCPs) and Organizations ★★ Important
Know what an SCP can and can’t do (it caps permissions, doesn’t grant them; doesn’t apply to the management account). OUs and inheritance show up in scenarios about restricting actions across many accounts.
Core docs
- What is AWS Organizations?
- Service control policies (SCPs)
- SCP evaluation — what an SCP can and cannot do
- Organizational units (OUs)
- AWS Control Tower — landing-zone automation atop Organizations
FAQ
1.5 Permissions boundaries and access analysis ★ Light
Rare at SAA level. Know that permissions boundaries cap the maximum permissions an IAM entity can be given (used to delegate IAM creation safely), and that Access Analyzer can generate least-privilege policies from CloudTrail. 20 minutes is enough.
Core docs
- Permissions boundaries for IAM entities
- IAM Access Analyzer — identifies external access and unused permissions
- Access Analyzer policy generation — produces least-privilege policies from CloudTrail data
- Troubleshooting “Access Denied” — useful mental model for the policy stack
Deeper reading
- Organizing Your AWS Environment Using Multiple Accounts — the canonical multi-account whitepaper
- IAM policy types — how and when to use them
Chapter 2 — Securing data at rest and in transit
Maps to Task Statement 1.2 and 1.3 — Design secure workloads and applications; Determine appropriate data security controls
Knowledge of:
- Application configuration and credentials security
- AWS service endpoints
- Control ports, protocols, and network traffic on AWS
- Secure application access
- Security services with appropriate use cases
- Threat vectors external to AWS
Skills in:
- Designing VPC architectures with security components
- Determining network segmentation strategies
- Integrating AWS services to secure applications
- Securing external network connections to and from the AWS Cloud
- Designing encryption strategies for data at rest and data in transit
- Determining the appropriate use of data access controls
2.1 AWS KMS ★★★ Core
Encryption is a recurring theme across the exam. Know symmetric vs asymmetric, key policies (the only resource policy you can’t bypass with an identity policy alone), grants, multi-region keys, and the services that integrate. CloudHSM is the answer when “FIPS 140-2 Level 3” or “single-tenant” appears.
Core docs
- What is AWS KMS?
- KMS concepts — keys, aliases, grants, key policies
- Symmetric vs asymmetric KMS keys
- Key policies — the only resource policy you can’t replace with an identity policy alone
- Grants — short-lived programmatic permission to use a key
- Multi-Region keys
- Importing key material (BYOK)
- AWS services that integrate with KMS
- AWS CloudHSM — when FIPS 140-2 Level 3 or single-tenant HSM is required
FAQ
2.2 Secrets Manager and Parameter Store ★★ Important
The choice between Secrets Manager and Parameter Store is recurring. Secrets Manager rotates automatically (built-in for the RDS family, Lambda for everything else); Parameter Store is cheaper, doesn’t rotate, and stores config plus references to secrets. Larger than 4 KB → advanced parameters.
Core docs
- What is Secrets Manager?
- Rotating secrets — built-in for RDS, Aurora, Redshift, DocumentDB; Lambda for everything else
- Systems Manager Parameter Store
- Advanced parameters (8 KB, parameter policies) vs standard parameters
- Secrets Manager vs Parameter Store — when to use which
FAQ
2.3 ACM and TLS ★★ Important
The exam loves the rule that CloudFront distributions need certs in us-east-1 — memorise it. Also know DNS validation for automation and AWS Private CA for internal certificates.
Core docs
- AWS Certificate Manager overview
- Services that integrate with ACM — ALB, NLB, CloudFront, API Gateway, App Runner
- DNS validation (vs email validation; use DNS for automation)
- AWS Private CA — for internal certificates
- Region requirement: CloudFront certs must be in us-east-1 — exam favourite
2.4 Encryption across services ★★★ Core
S3 encryption types (SSE-S3, SSE-KMS, SSE-C), default bucket encryption (now on automatically for all buckets), Bucket Keys (cuts KMS request cost), and that EBS / EFS / RDS / DynamoDB all encrypt at rest with KMS. Pattern: “highly sensitive data” usually wants SSE-KMS with a customer-managed key.
Core docs
- S3 encryption — SSE-S3, SSE-KMS, DSSE-KMS, SSE-C
- S3 default bucket encryption (now on by default for all buckets)
- S3 Bucket Keys — reduces KMS request cost dramatically for SSE-KMS
- EBS encryption
- EFS encryption at rest and in transit
- RDS encryption at rest
- DynamoDB encryption at rest
- Redshift encryption
Deeper reading
2.5 Macie ★ Light
Almost always tested as a single “which service detects PII or credit-card numbers in S3?” question — the answer is Macie. Read the one-paragraph “What is Macie” page and the FAQ, learn the Macie / Inspector / GuardDuty distinction, and move on. ~30 minutes.
Core docs
- What is Amazon Macie?
- Sensitive data discovery jobs
- Managed data identifiers (PII, PHI, financial)
FAQ
Chapter 3 — Network security
Maps to Task Statement 1.2 — Design secure workloads and applications
Knowledge of:
- Application configuration and credentials security
- AWS service endpoints
- Control ports, protocols, and network traffic on AWS
- Secure application access
- Security services with appropriate use cases
- Threat vectors external to AWS
Skills in:
- Designing VPC architectures with security components (for example, security groups, route tables, network ACLs, NAT gateways)
- Determining network segmentation strategies (for example, using public subnets and private subnets)
- Integrating AWS services to secure applications (for example, AWS Shield, AWS WAF, IAM Identity Center, AWS Secrets Manager)
- Securing external network connections to and from the AWS Cloud (for example, VPN, AWS Direct Connect)
3.1 Security Groups and NACLs ★★★ Core
Stateful (SGs) vs stateless (NACLs) is fundamental. Know that SGs reference other SGs as a source (the canonical multi-tier pattern), and the NACL ephemeral-port trap that the exam loves to test.
Core docs
- Security groups — stateful, default-deny inbound, default-allow outbound
- Network ACLs — stateless, evaluated by rule number, applied at subnet level
- Security group rules — referencing other SGs as a source/destination is the core pattern
- NACL recommended rules — the ephemeral-port trap
3.2 VPC endpoints and endpoint policies ★★★ Core
Gateway endpoints (S3 and DynamoDB) are free and route-table based; interface endpoints (PrivateLink) cost per hour + per GB. “Without traversing the internet / NAT Gateway” phrasing always points at endpoints. Endpoint policies further restrict what’s reachable through them.
Core docs
- VPC endpoints overview
- Gateway endpoints (S3, DynamoDB) — free, route-table based
- Interface endpoints (PrivateLink) — ENI-based, hourly + per-GB cost
- VPC endpoint policies
- S3 bucket policies that require a specific VPC endpoint
3.3 AWS WAF ★★ Important
Know the resources WAF can attach to: CloudFront, ALB, API Gateway, AppSync, Cognito, App Runner, Verified Access — not NLB. Rate-based rules for DDoS-style abuse; Managed Rules for OWASP Top 10.
Core docs
- AWS WAF developer guide
- Rule statements — IP, geo, regex, size, SQLi, XSS, rate-based
- AWS Managed Rules — Core rule set, known bad inputs, IP reputation
- Supported resources — CloudFront, ALB, API Gateway, AppSync, Cognito, App Runner, Verified Access
FAQ
3.4 AWS Shield ★ Light
Standard is free and on by default; Advanced is paid and adds DDoS Response Team access, cost protection, and tighter integration with WAF. Usually one question max — recognise the distinction and move on.
Core docs
FAQ
3.5 Network Firewall and Firewall Manager ★ Light
More SCS / ANS territory than SAA. At SAA level know that Network Firewall exists for stateful packet inspection in VPCs and that Firewall Manager centralises WAF / Shield / SG / Network-Firewall policies across an Org. Skim and move on.
Core docs
- What is AWS Network Firewall?
- Deployment architectures — distributed vs centralised inspection VPC
- AWS Firewall Manager — central WAF / Shield Advanced / Network Firewall / SG policy management across an Org
Chapter 4 — Compute access and threat detection
Maps to Task Statement 1.2 — Design secure workloads and applications
Knowledge of:
- Application configuration and credentials security
- AWS service endpoints
- Control ports, protocols, and network traffic on AWS
- Secure application access
- Security services with appropriate use cases (for example, Amazon Cognito, Amazon GuardDuty, Amazon Macie)
- Threat vectors external to AWS (for example, DDoS, SQL injection)
Skills in:
- Designing VPC architectures with security components
- Determining network segmentation strategies
- Integrating AWS services to secure applications
- Securing external network connections to and from the AWS Cloud
4.1 EC2 access without SSH ★★ Important
Session Manager is the modern answer to “connect to EC2 without inbound 22” — no bastion, no key pairs, full audit trail in CloudTrail. Pattern: “administer EC2 without exposing SSH” → SSM Session Manager.
Core docs
- AWS Systems Manager Session Manager — no inbound SSH, no bastion, fully audited
- EC2 Instance Connect — short-lived SSH keys via IAM
- SSM Fleet Manager
- Default Host Management Configuration — instance profile-free SSM onboarding
4.2 IAM roles for compute ★★★ Core
Critical pattern across the whole exam. EC2 instance profiles, ECS task roles, Lambda execution roles, EKS IRSA / Pod Identity. Never embed credentials in code; always use a role. IMDSv2 protects against SSRF — require it on every instance.
Core docs
- IAM roles for EC2 (instance profile)
- IAM roles for ECS tasks
- IAM roles for service accounts (EKS, IRSA)
- EKS Pod Identity — newer, simpler alternative to IRSA
- Lambda execution roles
- IMDSv2 — protects against SSRF; require it on every instance
4.3 Inspector ★ Light
Tested as one option in “which service finds OS / package / network vulnerabilities on EC2 / ECR images / Lambda?” — the answer is Inspector. Skim the “What is” page and the FAQ.
Core docs
- What is Amazon Inspector?
- Resource scanning — EC2, ECR images, Lambda
- Findings — package and network reachability
FAQ
4.4 GuardDuty ★★ Important
Threat detection from VPC Flow Logs, DNS logs, CloudTrail, EKS audit logs, S3, RDS login events. Pattern: “detect compromised EC2 / unusual API calls / port-scan activity” → GuardDuty. Worth knowing the data sources it consumes.
Core docs
- What is GuardDuty?
- Data sources — VPC Flow Logs, DNS logs, CloudTrail, S3, EKS audit, RDS login events, Lambda, EBS malware scanning
- Finding types
- GuardDuty in an Organization
FAQ
Chapter 5 — Auditing, compliance, and visibility
Maps to Task Statement 1.1 and 1.3 — Design secure access; Determine appropriate data security controls
Knowledge of:
- Data access and governance
- Data recovery and data backups
- Data retention and data archiving
- Data encryption options (for example, client-side, server-side)
Skills in:
- Aligning AWS technologies to meet compliance requirements
- Designing encryption strategies for data at rest and data in transit
- Designing secure access to AWS resources using encryption keys
- Designing the implementation of data-retention policies
- Determining the appropriate use of data access controls
5.1 CloudTrail ★★ Important
Management vs data vs Insights events is the canonical exam distinction. Organization trails span every account; log-file integrity validation matters for compliance scenarios. Data events (S3 object-level, Lambda invocations) are off by default and billable.
Core docs
- What is CloudTrail?
- Management events vs data events vs Insights events
- Organization trails — single trail spanning every account
- Log file integrity validation
- CloudTrail Lake — managed event store with SQL query
5.2 AWS Config ★★ Important
“Continuously evaluates whether resources match a desired configuration.” The answer for “audit configuration drift”, “detect non-compliant resources”, or “auto-remediate via SSM Automation”. Conformance packs map to compliance frameworks (PCI, HIPAA, NIST).
Core docs
- What is AWS Config?
- AWS Config managed rules
- Conformance packs — prepackaged compliance frameworks (PCI, HIPAA, NIST)
- Auto-remediation via SSM Automation
- Aggregators — multi-account, multi-region rollup
5.3 Security Hub ★ Light
Aggregates findings from GuardDuty, Inspector, Macie, and partner tools into a single dashboard against standards (CIS, PCI, NIST). Tested as “which service gives a single pane of glass for security findings?” Skim the docs.
Core docs
- What is Security Hub?
- Security standards — AWS Foundational Security Best Practices, CIS, PCI DSS, NIST
- AWS Security Finding Format (ASFF) — common schema for findings from GuardDuty, Inspector, Macie, partners
FAQ
5.4 Audit Manager and Detective ★ Light
Audit Manager: collects evidence and maps it to controls for audit reports. Detective: graph-based investigation across CloudTrail, VPC Flow Logs, and GuardDuty findings. Both rare — recognise the one-line description and move on.
Core docs
- AWS Audit Manager — collects evidence, maps to controls, produces audit reports
- Amazon Detective — graph-based investigation across CloudTrail, VPC Flow Logs, GuardDuty findings
Deeper reading
Part II — Domain 2: Design Resilient Architectures (26%)
Resilience on AWS is the discipline of designing for the failure modes the platform actually has — single instances die, AZs partition, regions occasionally have bad days, and dependencies always fail. The exam tests whether you can pick the right pattern for a stated RTO/RPO and budget.
Chapter 6 — Availability Zone, Region, and DR foundations
Maps to Task Statement 2.1 and 2.2 — Design scalable and loosely coupled architectures; Design highly available and/or fault-tolerant architectures
Knowledge of:
- API creation and management (for example, Amazon API Gateway, REST APIs)
- AWS managed services with appropriate use cases (for example, AWS Transfer Family, Amazon SQS, Secrets Manager)
- Caching strategies
- Design principles for microservices
- Event-driven architectures
- Horizontal and vertical scaling
- How to appropriately use edge accelerators (for example, CDN)
- How to migrate applications into containers
- Load balancing concepts (for example, Application Load Balancer)
- Multi-tier architectures
- Queuing and messaging concepts (for example, publish/subscribe)
- Serverless technologies and patterns (for example, AWS Fargate, Lambda)
- Storage types with associated characteristics (for example, object, file, block)
- Container orchestration (for example, Amazon ECS, Amazon EKS)
- When to use read replicas
- Workflow orchestration (for example, AWS Step Functions)
- AWS global infrastructure (for example, Availability Zones, Regions, Amazon Route 53)
- Disaster recovery strategies
- High-availability design patterns
Skills in:
- Designing event-driven, microservices, and/or multi-tier architectures based on requirements
- Determining scaling strategies for components used in an architecture design
- Determining the AWS services required to achieve loose coupling based on requirements
- Determining when to use containers
- Determining when to use serverless technologies and patterns
- Recommending appropriate compute, storage, networking, and database technologies based on requirements
- Using purpose-built AWS services for workloads
- Determining automation strategies to ensure infrastructure integrity
- Determining the AWS services required to achieve high availability based on business requirements
- Determining the data retention strategy to ensure backups are present when needed
- Designing a disaster recovery strategy based on business requirements
- Designing a multi-Region architecture based on business requirements
- Designing a scalable and highly available architecture based on business requirements
6.1 AZs, Regions, and edge locations ★★ Important
Foundational. AZs are the primary fault-isolation boundary; two AZs is the minimum for any “highly available” answer. Outposts / Local Zones / Wavelength appear as “low latency to on-prem / metro / 5G” scenarios.
Core docs
- Regions, Availability Zones, and Local Zones
- Global infrastructure overview
- AWS Wavelength — 5G edge
- AWS Outposts — AWS hardware on-prem
- Local Zones — sub-region edge presence in major metros
Deeper reading
- AWS Fault Isolation Boundaries — why AZs are the primary unit of fault isolation, and what regions and partitions add on top
- Builder’s Library — Static stability using Availability Zones
6.2 Disaster recovery strategies ★★★ Core
The four-pattern taxonomy — backup & restore, pilot light, warm standby, multi-site active — is required vocabulary. Multiple questions will use these names directly. Know the RTO / RPO trade-off and the cost ordering.
Core docs
- DR options in the cloud — the canonical four-pattern taxonomy:
- Backup & restore — high RTO/RPO (hours), lowest cost.
- Pilot light — minimal infra warm in DR region, scaled out on failover.
- Warm standby — scaled-down full stack in DR region, scaled up on failover.
- Multi-site active/active — full capacity in both regions, near-zero RTO/RPO.
- Disaster Recovery of Workloads on AWS (whitepaper)
- Well-Architected Reliability Pillar
- Route 53 health checks and DNS failover — the usual failover trigger
- Route 53 Application Recovery Controller — readiness checks and routing controls
Chapter 7 — Compute resilience
Maps to Task Statement 2.2 — Design highly available and/or fault-tolerant architectures
Knowledge of:
- AWS global infrastructure (for example, Availability Zones, Regions, Amazon Route 53)
- AWS managed services with appropriate use cases (for example, Amazon Comprehend, Amazon Polly)
- Basic networking concepts (for example, route tables)
- Disaster recovery strategies (for example, backup and restore, pilot light, warm standby, multi-site active-active)
- Distributed design patterns
- Failover strategies
- Immutable infrastructure
- Load balancing concepts (for example, Application Load Balancer)
- Proxy concepts (for example, Amazon RDS Proxy)
- Service quotas and throttling (for example, how to configure the service quotas for a workload in a standby environment)
- Storage options and characteristics (for example, durability, replication)
- Workload visibility (for example, AWS X-Ray)
Skills in:
- Determining automation strategies to ensure infrastructure integrity
- Determining the AWS services required to achieve high availability based on business requirements
- Determining the data retention strategy to ensure backups are present when needed
- Designing a disaster recovery strategy based on business requirements
- Designing a multi-Region architecture based on business requirements
- Designing a scalable and highly available architecture based on business requirements
7.1 Auto Scaling ★★★ Core
Cornerstone of the resilience domain. Know launch templates (not legacy launch configurations), target-tracking scaling policies, mixed instance groups (On-Demand + Spot), and lifecycle hooks. Predictive scaling appears for known traffic patterns.
Core docs
- EC2 Auto Scaling
- Scaling policies — target tracking, step, simple, scheduled, predictive
- Launch templates (use these — launch configurations are deprecated)
- Mixed instance groups — combine On-Demand and Spot
- Lifecycle hooks
- Warm pools — pre-initialised instances for rapid scale-out
- Application Auto Scaling — for ECS, DynamoDB, Aurora, etc.
FAQ
7.2 Elastic Load Balancing ★★★ Core
ALB vs NLB is one of the most-tested decisions on the exam. ALB for HTTP(S) and host/path routing; NLB for TCP/UDP, static IPs, and source-IP preservation; GLB for security-appliance insertion. Cross-zone behaviour differs by default (on for ALB, off for NLB).
Core docs
- Elastic Load Balancing — what is it?
- Load balancer comparison — ALB vs NLB vs GLB vs CLB
- Application Load Balancer — Layer 7, host/path-based routing, native HTTPS termination
- Network Load Balancer — Layer 4, static IPs, ultra-low latency, source IP preservation
- Gateway Load Balancer — security-appliance insertion via GENEVE
- Target groups, health checks, deregistration delay
- Sticky sessions
- Cross-zone load balancing — on by default for ALB, off by default for NLB
FAQ
7.3 Container resilience — ECS and EKS ★★ Important
Fargate vs EC2 launch type is the recurring decision: Fargate when “no servers to manage”, EC2 when cost control or GPU is required. Service auto-scaling, deployment circuit breakers, and capacity providers appear in scenario questions.
Core docs
- What is Amazon ECS?
- EC2 launch type vs Fargate
- ECS service definition — desired count, deployment config, placement strategies
- ECS service auto-scaling
- What is Amazon EKS?
- EKS managed node groups
- EKS on Fargate
- Amazon ECR
FAQ
7.4 Lambda concurrency and resilience ★★ Important
Reserved concurrency caps a function (and isolates it); provisioned concurrency eliminates cold starts; both are paid in different ways. Async invocation auto-retries twice and supports DLQs. SnapStart for Java / Python / .NET cold-start mitigation.
Core docs
- What is AWS Lambda?
- Concurrency, reserved concurrency, provisioned concurrency
- Lambda in a VPC — Hyperplane ENIs, no more cold-start tax
- Asynchronous invocation, retries, DLQs
- Lambda SnapStart — sub-second cold starts for Java/Python/.NET
Chapter 8 — Storage resilience
Maps to Task Statement 2.2 — Design highly available and/or fault-tolerant architectures
Knowledge of:
- AWS global infrastructure (for example, Availability Zones, Regions, Amazon Route 53)
- Disaster recovery strategies (for example, backup and restore, pilot light, warm standby, multi-site active-active)
- Distributed design patterns
- Failover strategies
- Immutable infrastructure
- Storage options and characteristics (for example, durability, replication)
Skills in:
- Determining automation strategies to ensure infrastructure integrity
- Determining the AWS services required to achieve high availability based on business requirements
- Determining the data retention strategy to ensure backups are present when needed
- Designing a disaster recovery strategy based on business requirements
8.1 S3 — versioning, replication, Object Lock ★★★ Core
Heavily tested. Versioning + MFA Delete; SRR (same region) for compliance copies; CRR (cross-region) for DR; Replication Time Control for the 15-minute SLA; Object Lock (Governance vs Compliance) for WORM, ransomware, and regulatory scenarios.
Core docs
- What is Amazon S3?
- S3 versioning
- S3 replication — SRR (same-region) and CRR (cross-region)
- Replication Time Control (RTC) — 15-minute SLA, billable
- S3 Object Lock — WORM, retention modes (Governance vs Compliance)
- Multi-Region Access Points — global endpoint with failover
- Lifecycle rules
8.2 EBS snapshots and recovery ★★ Important
Incremental, S3-backed, region-scoped (use Copy for cross-region). Fast Snapshot Restore eliminates lazy-loading; Snapshot Archive cuts long-term cost ~75%. Multi-Attach is io1/io2 single-AZ only and needs a cluster filesystem.
Core docs
- Amazon EBS snapshots — incremental, S3-backed
- Fast Snapshot Restore (FSR)
- Multi-volume crash-consistent snapshots
- EBS Multi-Attach — io1/io2 only, single-AZ, requires cluster-aware filesystem
- Data Lifecycle Manager — automated snapshot/AMI lifecycle
8.3 EFS resilience ★★ Important
Regional (multi-AZ) vs One Zone storage classes; lifecycle (Standard → IA → Archive) for cost. Tested as “shared POSIX filesystem across many EC2 instances” with a perf-mode or throughput-mode twist.
Core docs
- What is Amazon EFS?
- Regional vs One Zone storage classes
- EFS replication
- EFS lifecycle management — Standard ↔ IA ↔ Archive
FAQ
8.4 FSx ★★ Important
Four flavours, each with a clear “when”. Windows: AD-joined SMB. Lustre: HPC and ML training. ONTAP: multiprotocol SMB + NFS + iSCSI with SnapMirror. OpenZFS: high-perf NFS with snapshots and clones. Recognise the keyword that points at each.
Core docs
- FSx for Windows File Server — SMB, AD-joined, multi-AZ
- FSx for Lustre — HPC, scratch and persistent
- FSx for NetApp ONTAP — multiprotocol (NFS, SMB, iSCSI), SnapMirror
- FSx for OpenZFS — high-perf NFS, snapshots, clones
FAQ
8.5 Storage Gateway ★★ Important
Hybrid scenario answer for “extend on-prem to S3”. File Gateway (NFS/SMB to S3), Volume Gateway (iSCSI; cached or stored), Tape Gateway (VTL backed by S3
- Glacier). DataSync is the answer for one-time bulk transfers.
Core docs
- What is Storage Gateway?
- S3 File Gateway — NFS/SMB to S3
- FSx File Gateway
- Volume Gateway — iSCSI cached/stored volumes backed by S3 EBS snapshots
- Tape Gateway — virtual tape library backed by S3 + Glacier
Chapter 9 — Database resilience
Maps to Task Statement 2.2 — Design highly available and/or fault-tolerant architectures
Knowledge of:
- AWS global infrastructure (for example, Availability Zones, Regions)
- Disaster recovery strategies (for example, backup and restore, pilot light, warm standby, multi-site active-active)
- Distributed design patterns
- Failover strategies
- Proxy concepts (for example, Amazon RDS Proxy)
- Storage options and characteristics (for example, durability, replication)
Skills in:
- Determining automation strategies to ensure infrastructure integrity
- Determining the AWS services required to achieve high availability based on business requirements
- Determining the data retention strategy to ensure backups are present when needed
- Designing a disaster recovery strategy based on business requirements
- Designing a multi-Region architecture based on business requirements
9.1 RDS Multi-AZ and read replicas ★★★ Core
Multi-AZ is HA (synchronous standby, automatic failover, no read traffic on the standby). Read replicas are async, scale reads, can cross regions, can be promoted. Multi-AZ DB clusters add two readable standbys. RDS Proxy is the answer for “serverless connection pooling”.
Core docs
- What is Amazon RDS?
- Multi-AZ deployments — synchronous standby, automatic failover, no read traffic on standby
- Multi-AZ DB clusters — semi-sync, two readable standbys
- Read replicas — async, can be promoted, can cross regions
- Automated backups and point-in-time recovery
- RDS Proxy — connection pooling, faster failover
FAQ
9.2 Aurora resilience and Global Database ★★★ Core
Aurora’s storage is six-way replicated across three AZs out of the box. Global Database does sub-second cross-region replication with RTO under one minute — the answer for “global app with low-RTO DR”. Up to 15 replicas share storage with the writer.
Core docs
- Aurora overview
- Aurora high availability — 6 copies across 3 AZs, self-healing storage
- Aurora Global Database — sub-second cross-region replication, RTO < 1 min
- Aurora Serverless v2 — autoscales by ACU, no idle pause
- Aurora Replicas — up to 15, share storage with primary
9.3 DynamoDB Global Tables and PITR ★★★ Core
Global Tables are multi-region, multi-active, eventually consistent — the canonical answer for “multi-region active-active key-value store”. PITR restores to any second in the last 35 days. Streams + Lambda for change-data-capture.
Core docs
- What is DynamoDB?
- Global Tables — multi-region, multi-active, eventually consistent
- Point-in-time recovery (PITR) — restore to any second in last 35 days
- On-demand backup and restore
- DynamoDB Streams
FAQ
9.4 ElastiCache resilience ★★ Important
Redis with cluster mode + Multi-AZ adds replication and automatic failover; Memcached has neither (sharded but no replication or persistence). MemoryDB is the answer when you need durable Redis as a primary store, not just a cache.
Core docs
- ElastiCache for Redis — clustering, replication, Multi-AZ with auto failover
- ElastiCache for Memcached — sharded, no replication, no persistence
- MemoryDB for Redis — durable Redis with multi-AZ transactional log
Chapter 10 — Decoupling and event-driven design
Maps to Task Statement 2.1 — Design scalable and loosely coupled architectures
Knowledge of:
- API creation and management (for example, Amazon API Gateway, REST APIs)
- AWS managed services with appropriate use cases
- Caching strategies
- Design principles for microservices
- Event-driven architectures
- Horizontal and vertical scaling
- Queuing and messaging concepts (for example, publish/subscribe)
- Serverless technologies and patterns (for example, AWS Fargate, Lambda)
- Workflow orchestration (for example, AWS Step Functions)
Skills in:
- Designing event-driven, microservices, and/or multi-tier architectures based on requirements
- Determining scaling strategies for components used in an architecture design
- Determining the AWS services required to achieve loose coupling based on requirements
- Determining when to use serverless technologies and patterns
10.1 Amazon SQS ★★★ Core
The default decoupling answer. Standard (best-effort ordering, at-least-once) vs FIFO (strict ordering + exactly-once within a message group). Visibility timeout, DLQs, long polling, and scaling by queue depth (SNS-driven ASG with SQS as buffer) appear repeatedly.
Core docs
- What is Amazon SQS?
- Standard vs FIFO queues
- Visibility timeout
- Dead-letter queues
- Long polling vs short polling
- FIFO deduplication and message groups
10.2 Amazon SNS ★★ Important
Pub/sub. Fan-out (SNS → multiple SQS) is the canonical pattern. Filter policies cut consumer-side filtering. FIFO topics pair with FIFO queues for end-to-end ordering.
Core docs
- What is Amazon SNS?
- Fan-out pattern — SNS → multiple SQS subscribers
- Message filtering policies
- FIFO topics — pair with SQS FIFO queues for end-to-end ordering
10.3 Amazon EventBridge ★★ Important
Event-driven glue. Default bus for AWS service events, custom buses for app events, partner buses for SaaS. Rules + targets; archive / replay for audit and reprocessing. Scheduler replaces CloudWatch cron at scale.
Core docs
- What is EventBridge?
- Default, custom, and partner event buses
- Rules and event patterns
- Archive and replay
- EventBridge Scheduler — replaces CloudWatch cron rules at scale
- EventBridge Pipes — point-to-point integrations with optional filter/transform/enrich
10.4 AWS Step Functions ★ Light
Lighter at SAA than you might expect. Know Standard vs Express (exactly-once long-running vs at-least-once high-volume short workflows) and the basic service-integration patterns. Skim the FAQ.
Core docs
- What is Step Functions?
- Standard vs Express workflows
- Error handling, retries, and catch
- Service integration patterns — request/response, run-job (.sync), wait-for-callback (.waitForTaskToken)
10.5 Kinesis Data Streams (resilience aspects) ★★ Important
Resilience aspects: durable, ordered, replayable up to 365 days. Enhanced fan-out for low-latency consumers. Tested as “replay required” (KDS) vs “just deliver to S3” (Firehose) vs “message bus” (SQS / SNS) decision questions.
Core docs
- Kinesis Data Streams — shards, retention up to 365 days, replay
- Enhanced fan-out
- On-demand vs provisioned capacity
Chapter 11 — Backup and disaster recovery
Maps to Task Statement 2.2 — Design highly available and/or fault-tolerant architectures
Knowledge of:
- Disaster recovery strategies (for example, backup and restore, pilot light, warm standby, multi-site active-active)
- Distributed design patterns
- Failover strategies
- Immutable infrastructure
- Storage options and characteristics (for example, durability, replication)
Skills in:
- Determining automation strategies to ensure infrastructure integrity
- Determining the data retention strategy to ensure backups are present when needed
- Designing a disaster recovery strategy based on business requirements
- Designing a multi-Region architecture based on business requirements
11.1 AWS Backup ★★ Important
Centralised backup across many services with a single policy. Cross-account / cross-region copies for DR. Vault Lock provides WORM for backups (ransomware mitigation). Pattern: “compliance-driven backup across the org” → AWS Backup.
Core docs
- What is AWS Backup?
- Supported resources — EBS, EFS, RDS, Aurora, DynamoDB, FSx, Storage Gateway, S3, Neptune, DocumentDB
- Cross-account and cross-region backup copies
- Backup Vault Lock — WORM for backups, ransomware mitigation
- AWS Backup for DR (audit-ready, cross-region, cross-account)
11.2 Elastic Disaster Recovery ★ Light
Block-level continuous replication for sub-minute RPO; the answer for “lift-and-shift DR for VMs / on-prem servers”. Light at SAA level — recognise the one-liner and move on.
Core docs
- What is AWS Elastic Disaster Recovery (AWS DRS)?
- Recovery workflow — block-level continuous replication, sub-minute RPO
FAQ
Deeper reading
Part III — Domain 3: Design High-Performing Architectures (24%)
The performance domain rewards knowing the available options and their trade-offs at each layer of the stack — compute family selection, the right storage tier, the right database engine, and the right network primitive. The exam likes “you have requirement X under constraint Y, which combination of services?” — the trick is reading both X and Y carefully.
Chapter 12 — Choosing compute
Maps to Task Statement 3.2 — Design high-performing and elastic compute solutions
Knowledge of:
- AWS compute services with appropriate use cases (for example, AWS Batch, Amazon EMR, Fargate)
- Distributed computing concepts supported by AWS global infrastructure and edge services
- Queuing and messaging concepts (for example, publish/subscribe)
- Scalability capabilities with appropriate use cases (for example, Amazon EC2 Auto Scaling, AWS Auto Scaling)
- Serverless technologies with appropriate use cases (for example, Lambda, Fargate)
- The orchestration of containers (for example, Amazon ECS, Amazon EKS)
Skills in:
- Decoupling workloads so that components can scale independently
- Identifying metrics and conditions to perform scaling actions
- Selecting the appropriate compute options and features (for example, EC2 instance types) to meet business requirements
- Selecting the appropriate resource type and size (for example, the amount of Lambda memory) to meet business requirements
12.1 EC2 instance families and Graviton ★★★ Core
Know the family letters cold: M (general), C (compute), R (memory), X / u (high-mem), I / D / H (storage), P / G / Inf / Trn (accelerated), T (burstable). Graviton (ARM) gives ~20% cost saving on broad workloads. T-class CPU credits and unlimited mode appear.
Core docs
- EC2 instance types overview
- Instance type comparison page — keep the families straight: M (general), C (compute), R (memory), X/u (high memory), I/D/H (storage), P/G/Inf/Trn (accelerated), T (burstable)
- AWS Graviton — ARM-based, ~20% cheaper, ~40% better price-performance for many workloads
- Burstable (T-class) instances — CPU credits and unlimited mode
- Instance purchasing options — covered in detail in Domain 4
12.2 Containers — ECS Fargate vs EC2 ★★ Important
Recurring decision: Fargate when “no servers to manage / minimum operational overhead”, EC2 when “cost control / GPU / custom AMI / Spot diversity”. Capacity providers (Fargate, Fargate Spot, ASG) and task placement strategies show up in cost questions.
Core docs
- ECS launch types — Fargate for serverless containers, EC2 for cost control / GPU / custom AMIs
- ECS task definitions
- Task placement strategies — binpack, spread, random
- Capacity providers — Fargate, Fargate Spot, ASG-backed
12.3 Lambda performance ★★ Important
Memory tunes both speed and cost (CPU is allocated proportionally; tune up until total cost stops dropping). Provisioned concurrency = no cold starts but always-on cost. SnapStart for Java / Python / .NET. Graviton runtimes are cheaper.
Core docs
- Memory and CPU — CPU is allocated proportionally to memory; tuning memory tunes both speed and cost
- Provisioned concurrency — eliminates cold starts at the cost of always-on capacity
- Lambda SnapStart — sub-second cold starts for Java, Python, .NET
- Runtime selection — Graviton for ~20% cost savings
- Lambda + RDS Proxy — connection pooling for burst-y serverless
12.4 Batch and HPC ★ Light
Recognise that Batch is the managed answer for “queued jobs across Spot and On-Demand”, ParallelCluster for HPC, EFA for OS-bypass MPI. Rare on SAA — skim and move on.
Core docs
- AWS Batch — managed job scheduling, Fargate or EC2 (incl. Spot)
- AWS ParallelCluster — open-source cluster orchestrator for HPC
- HPC on AWS — EFA, cluster placement groups, FSx for Lustre, Spot
Chapter 13 — High-performing storage
Maps to Task Statement 3.1 — Determine high-performing and/or scalable storage solutions
Knowledge of:
- Hybrid storage solutions to meet business requirements
- Storage services with appropriate use cases (for example, Amazon S3, Amazon EFS, Amazon EBS)
- Storage types with associated characteristics (for example, object, file, block)
Skills in:
- Determining storage services and configurations that meet performance demands
- Determining storage services that can scale to accommodate future needs
13.1 S3 storage classes and Transfer Acceleration ★★★ Core
The most-tested storage decision on the exam. Standard / Intelligent-Tiering / Standard-IA / One Zone-IA / Glacier (Instant / Flexible / Deep Archive) / Express One Zone — know the access-pattern + retrieval-cost trade-off cold. Transfer Acceleration uses CloudFront edges for fast uploads from far away.
Core docs
- S3 storage classes — Standard, Intelligent-Tiering, Standard-IA, One Zone-IA, Glacier Instant Retrieval, Glacier Flexible Retrieval, Glacier Deep Archive, Express One Zone
- Performance design patterns — parallelism, request rates, multipart upload
- S3 Transfer Acceleration — uploads via CloudFront edge locations
- Multipart upload
- S3 Access Points
FAQ
13.2 EBS volume types ★★★ Core
gp3 is the new default; gp2 is legacy. io2 Block Express for SAN-class IOPS. st1 for streaming and big-data; sc1 for cold. Multi-Attach is io1/io2 only. Elastic Volumes for online type / size / IOPS changes without downtime.
Core docs
- EBS volume types — gp3 (default general-purpose), gp2, io2 Block Express (SAN-class IOPS), io1, st1 (throughput HDD), sc1 (cold HDD)
- I/O characteristics and monitoring
- Elastic Volumes — change type, size, IOPS without downtime
| Volume type | Use case | Max IOPS | Max throughput |
|---|---|---|---|
| gp3 (SSD) | General-purpose default | 16,000 | 1,000 MiB/s |
| io2 Block Express (SSD) | I/O-intensive databases | 256,000 | 4,000 MiB/s |
| st1 (HDD) | Streaming, big data, log processing | 500 | 500 MiB/s |
| sc1 (HDD) | Cold, infrequently accessed | 250 | 250 MiB/s |
13.3 Instance Store ★★ Important
Ephemeral local NVMe — data is lost on stop or terminate. The answer for “cache / scratch / replicated DB shard” where you accept the data-loss model. Cheaper than EBS for the IOPS and throughput you get.
Core docs
- Instance Store volumes — ephemeral local NVMe; data lost on stop/terminate
- When to choose Instance Store — caches, scratch space, replicated databases
13.4 FSx for Lustre and HPC patterns ★ Light
Niche. Sub-ms latency, hundreds of GB/s throughput; scratch vs persistent file systems; can lazy-load from S3 and write back. Mostly an HPC topic; rare on SAA.
Core docs
- FSx for Lustre — sub-ms latency, hundreds of GB/s throughput
- Scratch vs persistent file systems
- Linking Lustre to S3 — lazy-load datasets, write results back
Chapter 14 — High-performing databases
Maps to Task Statement 3.3 — Determine high-performing database solutions
Knowledge of:
- AWS global infrastructure (for example, Availability Zones, Regions)
- Caching strategies and services (for example, Amazon ElastiCache)
- Data access patterns (for example, read-intensive compared with write-intensive)
- Database capacity planning (for example, capacity units, instance types, Provisioned IOPS)
- Database connections and proxies
- Database engines with appropriate use cases (for example, heterogeneous migrations, homogeneous migrations)
- Database replication (for example, read replicas)
- Database types and services (for example, serverless, relational compared with non-relational, in-memory)
Skills in:
- Configuring read replicas to meet business requirements
- Designing database architectures
- Determining an appropriate database engine (for example, MySQL compared with PostgreSQL)
- Determining an appropriate database type (for example, Amazon Aurora, Amazon DynamoDB)
- Integrating caching to meet business requirements
14.1 Choosing the right database ★★★ Core
The exam’s single most-tested database task. Relational (RDS / Aurora) for SQL with joins; key-value (DynamoDB) for ms latency at any scale; document (DocumentDB); in-memory (ElastiCache, MemoryDB); graph (Neptune); time-series (Timestream); ledger (QLDB). Read the qualifier — “flexible schema”, “sub-millisecond”, “graph traversal” — and pick.
Core docs
- AWS database services overview — relational, key-value, document, in-memory, graph, time-series, ledger
- Choosing a database (whitepaper section)
- SQL → NoSQL decision framework
14.2 RDS and Aurora performance ★★ Important
Performance Insights, RDS Proxy for connection pooling, Aurora replicas (up to 15) for read scale. Aurora Serverless v2 autoscales at 0.5 ACU granularity. gp3 storage by default.
Core docs
- RDS storage types — gp3, io1/io2, magnetic
- RDS Performance Insights
- RDS Proxy — managed connection pooling
- Aurora Replicas — up to 15, < 100 ms lag
- Aurora Serverless v2 — fine-grained autoscaling
14.3 DynamoDB capacity, partitioning, DAX ★★★ Core
On-demand vs provisioned (with auto-scaling) is a recurring decision. Hot-partition design is heavily tested — choose a high-cardinality partition key. DAX is the in-memory cache for microsecond reads. GSI vs LSI distinction matters.
Core docs
- On-demand vs provisioned capacity
- Partition key design and hot partitions
- Local and global secondary indexes
- DynamoDB Accelerator (DAX) — write-through cache, microsecond reads
- DynamoDB Streams + Lambda triggers
- Core concepts — RCU/WCU sizing
14.4 ElastiCache for caching workloads ★★ Important
Lazy loading vs write-through; TTL strategy; Redis (replication, persistence, cluster mode) vs Memcached (sharded, simple, no persistence). Pattern: “reduce read load on RDS / DynamoDB” → ElastiCache (or DAX for DDB).
Core docs
- Caching strategies — lazy loading, write-through, TTL
- Cluster mode disabled vs enabled (Redis)
- MemoryDB — when you need durability and Redis API in one product
Chapter 15 — High-performing networking
Maps to Task Statement 3.4 — Determine high-performing and/or scalable network architectures
Knowledge of:
- Edge networking services with appropriate use cases (for example, Amazon CloudFront, AWS Global Accelerator)
- How to design network architecture (for example, subnet tiers, routing, IP addressing)
- Load balancing concepts (for example, Application Load Balancer)
- Network connection options (for example, AWS VPN, Direct Connect, AWS PrivateLink)
Skills in:
- Creating a network topology for various architectures (for example, global, hybrid, multi-tier)
- Determining network configurations that can scale to accommodate future needs
- Determining the appropriate placement of resources to meet business requirements
- Selecting the appropriate load balancing strategy
15.1 Placement groups, ENA, EFA ★★ Important
Cluster (low-latency same-rack), partition (large distributed systems with rack-aware fault isolation), spread (fewer instances, max isolation). EFA is for HPC / MPI (OS-bypass) — rare. ENA enhanced networking is on by default on current-gen instances.
Core docs
- Placement groups — cluster (low latency), partition (large distributed systems), spread (small fault-tolerant clusters)
- Enhanced networking — ENA, up to 100 Gbps
- Elastic Fabric Adapter (EFA) — OS-bypass, MPI, HPC and ML training
15.2 EBS-optimised instances ★ Light
Default-on for all current-gen instances; mostly background knowledge. Won’t be a primary topic — recognise the term and move on.
Core docs
- EBS-optimised instances — dedicated EBS bandwidth; default on for current-gen instances
15.3 CloudFront and Global Accelerator ★★★ Core
One of the most-tested decision pairs. CloudFront for HTTP(S) caching at the edge; Global Accelerator for static anycast IPs in front of TCP / UDP (and fast cross-region failover for non-HTTP traffic). They compose.
Core docs
- What is CloudFront?
- How CloudFront delivers content
- Origin groups and failover
- CloudFront Functions vs Lambda@Edge
- What is Global Accelerator?
- CloudFront vs Global Accelerator — when to use which
15.4 Direct Connect for performance ★★ Important
Hybrid backbone. Dedicated vs hosted; private / public / transit VIFs. MACsec on dedicated lines for L2 encryption; IPsec VPN over a public VIF for L3. HA designs use multiple connections in multiple locations.
Core docs
- What is Direct Connect?
- Dedicated vs hosted connections
- Virtual interfaces — public, private, transit
- Encrypting Direct Connect — MACsec on dedicated, IPsec VPN over public VIF
- High-availability designs — multiple connections, multiple locations
15.5 PrivateLink and VPC endpoints ★★★ Core
Overlaps with section 3.2 — the exam asks the same question from both security and performance angles. “Expose a service privately to other VPCs / accounts” → PrivateLink. “Reach S3 / DynamoDB privately” → gateway endpoints (free).
Core docs
- What is AWS PrivateLink?
- Sharing your service via PrivateLink
- AWS services that support interface endpoints
Chapter 16 — Data, analytics, and streaming
Maps to Task Statement 3.1 and 3.2 — Determine high-performing storage solutions; Design high-performing compute solutions
Knowledge of:
- Hybrid storage solutions to meet business requirements
- Storage services with appropriate use cases (for example, Amazon S3, Amazon EFS, Amazon EBS)
- Storage types with associated characteristics (for example, object, file, block)
- AWS compute services with appropriate use cases (for example, AWS Batch, Amazon EMR, Fargate)
- Distributed computing concepts supported by AWS global infrastructure and edge services
- Queuing and messaging concepts (for example, publish/subscribe)
Skills in:
- Determining storage services and configurations that meet performance demands
- Determining storage services that can scale to accommodate future needs
- Decoupling workloads so that components can scale independently
- Selecting the appropriate compute options to meet business requirements
16.1 Kinesis Data Streams and Firehose ★★ Important
Streams: durable, replayable, custom consumers, sharded. Firehose: managed delivery to S3 / Redshift / OpenSearch / Splunk / HTTP, no replay, near-real-time. Pattern: “I just want this in S3” → Firehose. “I need to replay” → Streams.
Core docs
- Kinesis Data Streams — durable, ordered, replayable shard-based stream
- Amazon Data Firehose (formerly Kinesis Firehose) — managed delivery to S3, Redshift, OpenSearch, Splunk, HTTP endpoints
- Streams vs Firehose — Streams for replayable, custom-consumer pipelines; Firehose for “I just want this in S3 / Redshift / OpenSearch”
16.2 Amazon MSK (Managed Streaming for Kafka) ★ Light
Managed Kafka. Rare at SAA; tested as “I have an existing Kafka workload, what’s the AWS-native answer?” MSK Serverless for autoscaling. Skim and move on.
Core docs
- What is Amazon MSK?
- MSK Serverless — pay per throughput, autoscaling
16.3 Athena and Glue ★★ Important
Athena for serverless SQL on S3 — the answer when “no infra” and “occasional query” coincide. Partitioning + Parquet/ORC dramatically lowers cost. Glue is the managed ETL + Data Catalog. Lake Formation adds fine-grained access control.
Core docs
- What is Athena? — serverless SQL on S3
- Performance — partition, compress, columnar (Parquet/ORC)
- What is AWS Glue? — ETL, Data Catalog, crawlers
- AWS Lake Formation — fine-grained access control over the data lake
16.4 EMR ★ Light
Managed Hadoop / Spark / Presto / HBase / Flink. Rare. Recognise it for “large-scale distributed processing on a managed cluster” and move on. Spot use is common.
Core docs
- What is Amazon EMR? — Hadoop, Spark, Presto, HBase, Flink
- Instance purchasing options for EMR — heavy Spot use is common
- EMR Serverless
16.5 Redshift ★★ Important
Columnar MPP data warehouse. Tested mostly as “is this the right tool?” rather than configuration depth. Spectrum queries S3 directly without loading. Serverless variant exists.
Core docs
- What is Amazon Redshift? — columnar MPP data warehouse
- Redshift Serverless
- Redshift Spectrum — query S3 directly, no load
- Workload management
16.6 OpenSearch Service ★ Light
Search and log analytics. Recognise it for “log search and observability” or “full-text search”. Multi-AZ with standby for HA. Rare beyond the basic “which service” question.
Core docs
- What is Amazon OpenSearch Service? — search, log analytics, observability
- OpenSearch Serverless
- Multi-AZ with standby
Deeper reading
- Well-Architected Performance Efficiency Pillar
- Storage services overview (whitepaper section)
Part IV — Domain 4: Design Cost-Optimized Architectures (20%)
The cost domain is mostly about knowing the levers each service exposes, the pricing units that drive bills (compute hours, GB-month, GB-out, request count), and the few high-leverage practices — Savings Plans, S3 Intelligent-Tiering, lifecycle policies, VPC endpoints — that consistently move the needle.
Chapter 17 — Cost-optimised compute
Maps to Task Statement 4.2 — Design cost-optimized compute solutions
Knowledge of:
- AWS cost management service features (for example, cost allocation tags, multi-account billing)
- AWS cost management tools with appropriate use cases (for example, Cost Explorer, AWS Budgets, AWS Cost and Usage Report)
- AWS global infrastructure (for example, Availability Zones, Regions)
- AWS purchasing options (for example, Spot Instances, Reserved Instances, Savings Plans)
- Distributed compute strategies (for example, edge processing)
- Hybrid compute options (for example, AWS Outposts, AWS Snowball Edge)
- Instance types, families, and sizes (for example, memory optimized, compute optimized, virtualization)
- Optimization of compute utilization (for example, containers, serverless computing, microservices)
- Scaling strategies (for example, auto scaling, hibernation)
Skills in:
- Determining an appropriate load balancing strategy (for example, Application Load Balancer [Layer 7] compared with Network Load Balancer [Layer 4] compared with Gateway Load Balancer)
- Determining appropriate scaling methods and strategies for elastic workloads
- Determining cost-effective AWS compute services with appropriate use cases
- Determining the required availability for different classes of workloads
- Selecting the appropriate instance family for a workload
- Selecting the appropriate instance size for a workload
17.1 Pricing models — On-Demand, Reserved, Savings Plans, Spot ★★★ Core
Cornerstone of the cost domain. Know the discount tiers, commitment terms, and which model fits which workload pattern. Compute Savings Plans cover EC2 / Fargate / Lambda; EC2 Instance SP is deeper but family-locked. Spot for fault-tolerant workloads.
Core docs
- EC2 instance purchasing options — overview of all five
- Savings Plans — Compute (most flexible), EC2 Instance (deepest discount), SageMaker
- Reserved Instances — Standard vs Convertible, Regional vs Zonal scope
- Spot Instances — up to 90% off On-Demand, two-minute interruption notice
- Dedicated Hosts — for BYOL licences (Windows, SQL Server, Oracle)
| Model | Discount | Commit | Best for |
|---|---|---|---|
| On-Demand | 0% | None | Spiky / unknown workloads |
| Compute Savings Plans | ~66% | 1 or 3 years, $/hr | Steady compute across EC2 / Fargate / Lambda |
| EC2 Instance Savings Plans | ~72% | 1 or 3 years, $/hr in family + region | Steady EC2 in a known family |
| Reserved Instances | ~72% | 1 or 3 years, instance attributes | Legacy commitments; new workloads should use Savings Plans |
| Spot | up to 90% | None | Fault-tolerant, flexible workloads — Batch, EMR, ASG, Fargate Spot |
17.2 Right-sizing and Compute Optimizer ★★ Important
Compute Optimizer’s ML recommendations cover EC2, ASGs, EBS, Lambda, ECS Fargate. Tested as “most cost-effective without sacrificing performance” — usually points at Compute Optimizer or Cost Explorer right-sizing.
Core docs
- AWS Compute Optimizer — ML-based right-sizing for EC2, ASGs, EBS, Lambda, ECS Fargate
- Cost Explorer right-sizing recommendations
- Trusted Advisor — cost optimization checks
17.3 Auto Scaling for cost ★★ Important
Mixed instance policies blend On-Demand and Spot inside one ASG; capacity-optimized allocation maximises Spot survival. Pattern: “most cost-effective batch / web tier” often wants Spot via ASG mixed instances.
Core docs
- Mixed instance policies — combine On-Demand + Spot inside one ASG
- Spot in Auto Scaling — capacity-optimized allocation strategy
- EC2 Fleet and Spot Fleet
17.4 Lambda economics ★ Light
Pricing is request count + GB-second of execution. Memory tuning often reduces total cost (faster execution outpaces per-ms cost growth). Light topic — one or two questions max.
Core docs
- Lambda pricing — request count + GB-second of execution
- Memory tuning — allocating more memory often reduces total cost because duration drops faster than per-ms cost rises
- Lambda Power Tuning (operator guide)
Chapter 18 — Cost-optimised storage
Maps to Task Statement 4.1 — Design cost-optimized storage solutions
Knowledge of:
- Access options (for example, an S3 bucket with Requester Pays object storage)
- AWS cost management service features (for example, cost allocation tags, multi-account billing)
- AWS cost management tools with appropriate use cases (for example, Cost Explorer, AWS Budgets, AWS Cost and Usage Report)
- AWS storage services with appropriate use cases (for example, Amazon FSx, Amazon EFS, Amazon S3, Amazon EBS)
- Backup strategies
- Block storage options (for example, hard disk drive [HDD] volume types, solid state drive [SSD] volume types)
- Data lifecycles
- Hybrid storage options (for example, DataSync, Transfer Family, Storage Gateway)
- Storage access patterns
- Storage tiering (for example, cold tiering for object storage)
- Storage types with associated characteristics (for example, object, file, block)
Skills in:
- Designing appropriate storage strategies (for example, batch uploads to Amazon S3 compared with individual uploads)
- Determining the correct storage size for a workload
- Determining the lowest cost method of transferring data for a workload to AWS storage
- Determining when storage auto scaling is required
- Managing S3 object lifecycles
- Selecting the appropriate backup and/or archival solution
- Selecting the appropriate service for data migration to storage services
- Selecting the appropriate storage tier
- Selecting the correct data lifecycle for storage
- Selecting the most cost-effective storage service for a workload
18.1 S3 storage classes and Intelligent-Tiering ★★★ Core
Re-tested from the cost angle: Intelligent-Tiering is the default “unknown access pattern” answer because it has no retrieval fees. One Zone-IA cuts cost ~20% but loses an AZ of durability. Glacier tiers — Instant / Flexible / Deep Archive — by retrieval-time tolerance.
Core docs
- S3 storage classes overview
- S3 Intelligent-Tiering — automatic movement across Frequent / Infrequent / Archive Instant / Archive / Deep Archive Access tiers, no retrieval fees
- S3 pricing page — for the pricing-unit details (storage, requests, transfer, retrieval)
18.2 S3 lifecycle policies ★★★ Core
Heavily tested. Transition rules (minimum object size, minimum days in source class) and expiration rules; interactions with versioning (current vs non-current versions) and Object Lock matter. Pattern: “after 30 / 90 / 365 days move to …” → lifecycle.
Core docs
- Lifecycle management
- Transition constraints — minimum object size, minimum days in source class
- Lifecycle interactions with versioning, replication, Object Lock
18.3 EBS cost levers ★★ Important
gp3 vs gp2 — ~20% cheaper at equal performance. Snapshot Archive cuts long-term cost ~75%. DLM automates snapshot deletion.
Core docs
- gp3 vs gp2 — gp3 decouples IOPS/throughput from capacity, ~20% cheaper at equal performance
- EBS Snapshot Archive — 75% cheaper for long-term snapshot retention
- Data Lifecycle Manager — automated snapshot deletion to control cost
- Recycle Bin for snapshots and AMIs
18.4 Storage Gateway and on-prem caching ★★ Important
DataSync for recurring or one-time transfers under ~100 TB; Snow Family for bulk offline (Snowball Edge for petabyte-scale). Storage Gateway for ongoing on-prem ↔ S3 caching.
Core docs
- Storage Gateway types — File, Volume (cached or stored), Tape
- AWS DataSync — bulk one-time and recurring transfers; pricing per GB transferred
- AWS Snow Family — Snowcone, Snowball Edge, Snowmobile (deprecated) for petabyte-scale offline transfer
Chapter 19 — Cost-optimised databases
Maps to Task Statement 4.3 — Design cost-optimized database solutions
Knowledge of:
- AWS cost management service features (for example, cost allocation tags, multi-account billing)
- AWS cost management tools with appropriate use cases (for example, Cost Explorer, AWS Budgets, AWS Cost and Usage Report)
- Caching strategies
- Data retention policies
- Database capacity planning (for example, capacity units)
- Database connections and proxies
- Database engines with appropriate use cases (for example, heterogeneous migrations, homogeneous migrations)
- Database replication (for example, read replicas)
- Database types and services (for example, relational compared with non-relational, Aurora, DynamoDB)
Skills in:
- Designing appropriate backup and retention policies (for example, snapshot frequency)
- Determining an appropriate database engine (for example, MySQL compared with PostgreSQL)
- Determining cost-effective AWS database services with appropriate use cases (for example, DynamoDB compared with Amazon RDS, serverless)
- Determining cost-effective AWS database types (for example, time series format, columnar format)
- Migrating database schemas and data to different locations and/or different database engines
19.1 Aurora Serverless v2 ★★ Important
Autoscales by ACU; the answer for “unpredictable / spiky relational workload, minimum operational overhead”. v1 scales to zero, v2 doesn’t (but starts up much faster and supports newer engine versions).
Core docs
- Aurora Serverless v2 — autoscales in 0.5 ACU increments, no scale-to-zero (use v1 if scale-to-zero matters; v1 only supports older engine versions)
- Aurora Serverless v1 — scales to zero, slower scaling response
19.2 DynamoDB on-demand vs provisioned ★★ Important
On-demand for unpredictable workloads (no capacity planning, higher unit cost); provisioned + auto-scaling for predictable. Reserved capacity discounts provisioned. Standard-IA storage class is ~60% cheaper storage with higher request prices.
Core docs
- Capacity modes — on-demand for unpredictable, provisioned (with auto-scaling) for predictable
- Reserved capacity — discount on provisioned RCU/WCU
- TTL — free deletion of expired items
- Standard vs Standard-IA storage class — IA is ~60% cheaper storage with higher request prices
19.3 RDS RIs and read replica patterns ★★ Important
RIs scope by engine + instance class + region. Stopping a non-Multi-AZ instance pauses compute charges (auto-restarts after 7 days). Read replicas can be cheaper than scaling primary up.
Core docs
- RDS Reserved Instances — engine + instance class + region scope
- Stopping a DB instance temporarily — pay only for storage; auto-restart after 7 days
- Read replicas to offload primary — cheaper than scaling primary up
Chapter 20 — Cost-optimised networking
Maps to Task Statement 4.4 — Design cost-optimized network architectures
Knowledge of:
- AWS cost management service features (for example, cost allocation tags, multi-account billing)
- AWS cost management tools with appropriate use cases (for example, Cost Explorer, AWS Budgets, AWS Cost and Usage Report)
- Load balancing concepts (for example, Application Load Balancer)
- NAT gateways (for example, NAT gateway costs compared with NAT instance costs)
- Network connectivity (for example, private lines, dedicated lines, VPNs)
- Network routing, topology, and peering (for example, AWS Transit Gateway, VPC peering)
- Network services with appropriate use cases (for example, DNS)
Skills in:
- Configuring appropriate NAT gateway types for a network (for example, a single shared NAT gateway compared with NAT gateways for each Availability Zone)
- Configuring appropriate network connections (for example, Direct Connect compared with VPN compared with internet)
- Configuring appropriate network routes to minimize network transfer costs (for example, Region to Region, Availability Zone to Availability Zone, private to public, Global Accelerator, VPC endpoints)
- Determining strategic needs for content delivery networks (CDNs) and edge caching
- Reviewing existing workloads for network optimizations
- Selecting an appropriate throttling strategy
20.1 Data transfer pricing — the bills people don’t see coming ★★★ Core
The bills people don’t see coming. Know the rules of thumb cold: same-AZ private = free; cross-AZ = $0.01/GB each way; egress to internet ~$0.09/GB; through NAT GW adds $0.045/GB processing on top of egress; cross-region = $0.02–0.09/GB. CloudFront → AWS origin is free.
Core docs
- Overview of data transfer costs for common architectures — required reading; the canonical map of when bytes cost
- EC2 data transfer pricing
- VPC peering — same-region peering has no per-GB charge for traffic within an AZ; cross-AZ traffic is charged
Headline rules of thumb:
- Within an AZ, between resources using private IPs: free.
- Between AZs in the same region: $0.01/GB each direction.
- To the internet from EC2: $0.09/GB for the first 10 TB/month (region dependent), with a free tier per account.
- Through a NAT Gateway: $0.045/GB processing fee on top of normal egress.
- Between regions: $0.02–$0.09/GB depending on region pair.
- Out via CloudFront: a separate, generally lower, regional rate.
20.2 VPC endpoints to avoid NAT and IGW data charges ★★★ Core
Single biggest “cost-optimised network” pattern on the exam. Gateway endpoints (S3, DynamoDB) are free and bypass NAT entirely. Pattern: “cut NAT Gateway data-processing fees” → gateway endpoints.
Core docs
- Gateway endpoints (S3 and DynamoDB) — free; route-table based; biggest cost win for any VPC that hits S3 a lot through a NAT Gateway
- Interface endpoints (PrivateLink) — hourly charge per ENI per AZ + per-GB; usually worth it for high-volume API traffic that would otherwise traverse a NAT Gateway
20.3 CloudFront and edge caching for cost ★★ Important
Egress from CloudFront is generally cheaper than direct from origin, and CloudFront → AWS origin is free. The Security Savings Bundle gives a discount in exchange for a monthly commitment.
Core docs
- CloudFront pricing — data transfer out from CloudFront is generally cheaper than direct from origin, and CloudFront → AWS origin is free
- CloudFront Security Savings Bundle — discount in exchange for a monthly commitment
- CloudFront in front of S3 — eliminates direct-to-S3 GET costs and adds a security layer
20.4 Direct Connect vs VPN economics ★★ Important
Core docs
- Direct Connect pricing — port-hour + data-out per GB (cheaper than internet egress at scale)
- Site-to-Site VPN pricing — per-tunnel-hour + standard data-out
- When VPN is enough — small steady traffic, occasional high traffic, no SLA needs
Chapter 21 — Cost management and governance
Maps to Task Statement 4.1–4.4 — Cross-cutting cost optimization
Knowledge of:
- AWS cost management service features (for example, cost allocation tags, multi-account billing)
- AWS cost management tools with appropriate use cases (for example, Cost Explorer, AWS Budgets, AWS Cost and Usage Report)
- AWS Organizations consolidated billing and cost allocation
Skills in:
- Designing and implementing cost allocation tags
- Reviewing existing workloads for cost optimizations
- Using AWS cost management tools to identify cost savings opportunities
- Determining the appropriate AWS purchasing options (for example, Savings Plans, Reserved Instances)
21.1 Cost Explorer and Cost & Usage Reports ★★ Important
Cost Explorer for visualisation, filtering, forecasting. CUR for hourly line-level data into S3, queryable from Athena or QuickSight. Cost Anomaly Detection for ML-based alerts on unusual spend.
Core docs
- AWS Cost Explorer — visualisation, filtering, forecasting
- Cost and Usage Reports (CUR) — hourly line-level data delivered to S3, queryable via Athena/QuickSight
- Cost Anomaly Detection — ML-based alerts on unusual spend
21.2 AWS Budgets ★★ Important
Cost / usage / RI-SP coverage and utilisation budgets, with SNS alerts. Budget Actions can auto-stop EC2 or apply restrictive IAM / SCP when a threshold is crossed — appears in “enforce a hard cap” questions.
Core docs
- AWS Budgets — cost, usage, RI/Savings Plans coverage and utilisation, with SNS or chatbot alerts
- Budget Actions — auto-apply IAM/SCP, stop/terminate EC2 or RDS when threshold crossed
21.3 Tagging strategy ★ Light
Cost-allocation tags must be activated in the billing console before they appear in CUR. Tag policies in Organizations enforce a tagging schema. Tested rarely — a single “how do I split costs by team?” question.
Core docs
- Tagging best practices (whitepaper)
- Cost allocation tags — must be activated in the billing console before they appear in CUR
- Tag policies — enforce a tagging schema across an Organization
21.4 Trusted Advisor ★★ Important
Five categories of checks: cost, performance, security, fault tolerance, service limits. Tested as “which service surfaces idle resources / underutilised RIs / unused EIPs?” → Trusted Advisor cost checks.
Core docs
- AWS Trusted Advisor — five pillars of checks (cost, performance, security, fault tolerance, service limits)
- Trusted Advisor check reference
21.5 AWS Organizations consolidated billing ★ Light
Volume-tier discounts and RI / Savings Plans sharing across accounts in an Org. Recognise the one-line description and move on — depth here is exam noise.
Core docs
- Consolidated billing — volume-tier and RI/Savings Plans sharing across accounts in an Organization
- How Savings Plans apply across accounts
Deeper reading
- Well-Architected Cost Optimization Pillar
- Laying the Foundation — Setting Up Your Environment for Cost Optimization (whitepaper)
- AWS Cloud Financial Management blog
Study tips
Schedule the exam before you feel ready. The deadline produces the focus. Two weeks out, sit a full-length practice exam under timed conditions; the gap between your practice score and the pass mark tells you where to spend the remaining time.
SAA-C03 questions are scenario-based and verbose. Read the question once for context, then read the answers, then re-read the question with the answers in mind. Half the time the wrong answers are eliminated by a single qualifier — “highly available”, “least operational overhead”, “most cost-effective”, “minimal application changes” — that’s easy to miss on the first pass.
If two answers are technically correct, the right one is the one that aligns with the qualifier. “Highly available” rules out single-AZ; “least operational overhead” favours managed services over self-managed; “most cost-effective” favours Spot, Savings Plans, S3 Intelligent-Tiering, and serverless over provisioned capacity.
Read the FAQ for every service in the exam guide. They are short, dense, and disproportionately tested. The Well-Architected Framework is the design lens behind every question — when in doubt, pick the answer that is the closest to a Well-Architected best practice.
Good luck.