AWS Certified Solutions Architect — Associate (SAA-C03)

A documentation-first study guide. AWS writes the exam from its own documentation, so reading the docs is the highest-leverage thing you can do. This guide is a curated index into the canonical references, FAQs, and a selection of whitepapers — organised around the four exam domains, not around services.

Maps to the published AWS Certified Solutions Architect — Associate (SAA-C03) exam guide. Domain weights and task statements are quoted from that PDF.

About the exam

Current exam code: SAA-C03 (released August 2022). No C04 announcement as of July 2026.

Format: 65 questions (50 scored + 15 unscored) · 130 minutes · $150 USD · scaled score 100–1000, pass at 720.

The four domains:

Domain 1 — Design Secure Architectures — 30%
Domain 2 — Design Resilient Architectures — 26%
Domain 3 — Design High-Performing Architectures — 24%
Domain 4 — Design Cost-Optimized Architectures — 20%

Primary official sources (bookmark these):

Whitepapers worth reviewing:

These can be pretty long (particularly the Well-Architected Framework) so don't let yourself go too far down the rabbit hole with them if you want to make quick progress with exam study.

AWS Well-Architected Framework — the lens through which every SAA question is framed. The five (now six) pillars map almost one-to-one onto the four exam domains.
Overview of Amazon Web Services — a structured tour of the service catalogue.
AWS Fault Isolation Boundaries — the cleanest articulation of how AZs, regions, and partitions actually behave under failure.
Disaster Recovery of Workloads on AWS — the source for the four DR pattern names the exam uses.
Well-Architected Security Pillar — Incident response — for Domain 1.

Priority tiers: The published domain weights (30/26/24/20) tell you how the exam is balanced across the four domains, but they don't tell you that within each domain a handful of services account for most of the questions. Every section in this guide carries a tier badge based on triangulating the AWS exam guide, the experience reports of recent test takers, and the patterns that appear in the practice-exam community:

★★★ Core Heavily tested. Multiple questions will lean on this. Spend hours, not minutes — if you don't know it well, you fail.
★★ Important Reliably tested, usually one or two questions. Read every linked page in the section, do the FAQ, understand the comparison points. A few hours per topic.
★ Light Known to appear, but typically as one distinguishing question or as wrong-answer distractors. Skim the docs, learn the one-line distinction, move on. Twenty minutes to an hour.

For an 8–12 week prep cycle the rough split that the data supports is about 60% of your time on Core topics, 30% on Important, and 10% on Light. The biggest single concentration of questions across the whole exam is the cluster around VPC + Security Groups + S3 + EC2 + IAM + ELB + RDS + DynamoDB + Lambda + CloudFront — know those ten cold and you have the foundation of a pass.

How to use this guide:

Each section opens with a one-paragraph summary explaining what to focus on, then has up to three link sections: Core docs (user/developer guides — the canonical reference), FAQ (exam writers love edge cases from FAQs — do not skip), and Deeper reading (whitepapers, blog posts, re:Post articles).
If a link 404s, AWS has reorganised the docs. Search the page title to find the new location — the content almost always still exists.
Read every FAQ for every Important and Core service. They are short, dense, and disproportionately tested.
The What's New feed is worth a weekly scan in the last month before your exam, but remember: the SAA-C03 exam lags new launches by ~12 months. Don't memorise yesterday's announcement.

Part I — Domain 1: Design Secure Architectures (30%)

The largest domain by weight. The exam treats security as a design constraint that shapes every other choice — least privilege, encryption everywhere, defence in depth, and audit trails are the recurring themes.

Chapter 1 — Identity and access management

Maps to Task Statement 1.1 — Design secure access to AWS resources

Knowledge of:

Access controls and management across multiple accounts
AWS federated access and identity services (for example, AWS IAM Identity Center, AWS IAM)
AWS global infrastructure (for example, Availability Zones, AWS Regions)
AWS security best practices (for example, the principle of least privilege)
The AWS shared responsibility model

Skills in:

Applying AWS security best practices to IAM users and root users
Designing a flexible authorization model that includes IAM users, groups, roles, and policies
Designing a role-based access control strategy
Designing a security strategy for multiple AWS accounts
Determining the appropriate use of resource policies for AWS services
Determining when to federate a directory service with IAM roles

1.1 IAM core ★★★ Core

The bedrock of every security question. Know identity-based vs resource-based policies, the explicit deny → explicit allow → implicit deny evaluation order, users vs groups vs roles, and MFA basics. Expect 3–5 questions to lean on this.

Core docs

What is IAM?
IAM identities — users, groups, and roles
Policies and permissions
Identity-based vs resource-based policies
Policy evaluation logic — explicit deny, explicit allow, implicit deny order
IAM security best practices
Multi-factor authentication

FAQ

IAM FAQs

1.2 IAM Identity Center and federation ★★ Important

Tested as the distinction between workforce identity from an external IdP → AWS (Identity Center / SAML / OIDC) versus application end-user identity (Cognito user pools and identity pools). The exam likes the boundary; deep configuration is rare.

Core docs

What is IAM Identity Center? (formerly AWS SSO)
Getting started with Identity Center
Identity providers and federation
SAML 2.0 federation
OIDC federation
Amazon Cognito user pools and identity pools — for application end-user identity, not workforce

FAQ

1.3 STS and role assumption ★★★ Core

Cross-account access patterns are an exam favourite. Know AssumeRole, the trust-policy + permissions-policy split, and especially the External ID confused-deputy mitigation — it appears regularly.

Core docs

Temporary security credentials
AWS STS API reference — AssumeRole, AssumeRoleWithWebIdentity, GetSessionToken
Roles terms and concepts
External ID for cross-account access — confused-deputy mitigation; tested often
Switching to an IAM role (CLI)

1.4 Service Control Policies (SCPs) and Organizations ★★ Important

Know what an SCP can and can't do (it caps permissions, doesn't grant them; doesn't apply to the management account). OUs and inheritance show up in scenarios about restricting actions across many accounts.

Core docs

What is AWS Organizations?
Service control policies (SCPs)
SCP evaluation — what an SCP can and cannot do
Organizational units (OUs)
AWS Control Tower — landing-zone automation atop Organizations

FAQ

AWS Organizations FAQs

1.5 Permissions boundaries and access analysis ★ Light

Rare at SAA level. Know that permissions boundaries cap the maximum permissions an IAM entity can be given (used to delegate IAM creation safely), and that Access Analyzer can generate least-privilege policies from CloudTrail. 20 minutes is enough.

Core docs

Permissions boundaries for IAM entities
IAM Access Analyzer — identifies external access and unused permissions
Access Analyzer policy generation — produces least-privilege policies from CloudTrail data
Troubleshooting "Access Denied" — useful mental model for the policy stack

Deeper reading

Organizing Your AWS Environment Using Multiple Accounts — the canonical multi-account whitepaper
IAM policy types — how and when to use them

Chapter 2 — Securing data at rest and in transit

Maps to Task Statement 1.2 and 1.3 — Design secure workloads and applications; Determine appropriate data security controls

Knowledge of:

Application configuration and credentials security
AWS service endpoints
Control ports, protocols, and network traffic on AWS
Secure application access
Security services with appropriate use cases (for example, Amazon Cognito, Amazon GuardDuty, Amazon Macie)
Threat vectors external to AWS (for example, DDoS, SQL injection)
Data access and governance
Data recovery
Data retention and classification
Encryption and appropriate key management

Skills in:

Designing VPC architectures with security components (for example, security groups, route tables, network ACLs, NAT gateways)
Determining network segmentation strategies (for example, using public subnets and private subnets)
Integrating AWS services to secure applications (for example, AWS Shield, AWS WAF, IAM Identity Center, AWS Secrets Manager)
Securing external network connections to and from the AWS Cloud (for example, VPN, AWS Direct Connect)
Aligning AWS technologies to meet compliance requirements
Encrypting data at rest (for example, AWS KMS)
Encrypting data in transit (for example, ACM using TLS)
Implementing access policies for encryption keys
Implementing data backups and replications
Implementing policies for data access, lifecycle, and protection
Rotating encryption keys and renewing certificates

2.1 AWS KMS ★★★ Core

Encryption is a recurring theme across the exam. Know symmetric vs asymmetric, key policies (the only resource policy you can't bypass with an identity policy alone), grants, multi-region keys, and the services that integrate. CloudHSM is the answer when "FIPS 140-3 Level 3" or "single-tenant" appears.

Core docs

What is AWS KMS?
KMS concepts — keys, aliases, grants, key policies
Symmetric vs asymmetric KMS keys
Key policies — the only resource policy you can't replace with an identity policy alone
Grants — short-lived programmatic permission to use a key
Multi-Region keys
Importing key material (BYOK)
AWS services that integrate with KMS
AWS CloudHSM — when FIPS 140-3 Level 3 or single-tenant HSM is required

FAQ

2.2 Secrets Manager and Parameter Store ★★ Important

The choice between Secrets Manager and Parameter Store is recurring. Secrets Manager rotates automatically (built-in for the RDS family, Lambda for everything else); Parameter Store is cheaper, doesn't rotate, and stores config plus references to secrets. Larger than 4 KB → advanced parameters.

Core docs

What is Secrets Manager?
Rotating secrets — built-in for RDS, Aurora, Redshift, DocumentDB; Lambda for everything else
Systems Manager Parameter Store
Advanced parameters (8 KB, parameter policies) vs standard parameters
Secrets Manager vs Parameter Store — when to use which

FAQ

Secrets Manager FAQs

2.3 ACM and TLS ★★ Important

The exam loves the rule that CloudFront distributions need certs in us-east-1 — memorise it. Also know DNS validation for automation and AWS Private CA for internal certificates.

Core docs

AWS Certificate Manager overview
Services that integrate with ACM — ALB, NLB, CloudFront, API Gateway, App Runner
DNS validation (vs email validation; use DNS for automation)
AWS Private CA — for internal certificates
Region requirement: CloudFront certs must be in us-east-1 — exam favourite

2.4 Encryption across services ★★★ Core

S3 encryption types (SSE-S3, SSE-KMS, SSE-C), default bucket encryption (now on automatically for all buckets), Bucket Keys (cuts KMS request cost), and that EBS / EFS / RDS / DynamoDB all encrypt at rest with KMS. Pattern: "highly sensitive data" usually wants SSE-KMS with a customer-managed key.

Core docs

S3 encryption — SSE-S3, SSE-KMS, DSSE-KMS, SSE-C
S3 default bucket encryption (now on by default for all buckets)
S3 Bucket Keys — reduces KMS request cost dramatically for SSE-KMS
EBS encryption
EFS encryption at rest and in transit
RDS encryption at rest
DynamoDB encryption at rest
Redshift encryption

Deeper reading

Well-Architected Security Pillar — Data protection

2.5 Macie ★ Light

Almost always tested as a single "which service detects PII or credit-card numbers in S3?" question — the answer is Macie. Read the one-paragraph "What is Macie" page and the FAQ, learn the Macie / Inspector / GuardDuty distinction, and move on. ~30 minutes.

Core docs

What is Amazon Macie?
Sensitive data discovery jobs
Managed data identifiers (PII, PHI, financial)

FAQ

Macie FAQs

Chapter 3 — Network security

Maps to Task Statement 1.2 — Design secure workloads and applications

Knowledge of:

Application configuration and credentials security
AWS service endpoints
Control ports, protocols, and network traffic on AWS
Secure application access
Security services with appropriate use cases
Threat vectors external to AWS

Skills in:

Designing VPC architectures with security components (for example, security groups, route tables, network ACLs, NAT gateways)
Determining network segmentation strategies (for example, using public subnets and private subnets)
Integrating AWS services to secure applications (for example, AWS Shield, AWS WAF, IAM Identity Center, AWS Secrets Manager)
Securing external network connections to and from the AWS Cloud (for example, VPN, AWS Direct Connect)

3.1 Security Groups and NACLs ★★★ Core

Stateful (SGs) vs stateless (NACLs) is fundamental. Know that SGs reference other SGs as a source (the canonical multi-tier pattern), and the NACL ephemeral-port trap that the exam loves to test.

Core docs

Security groups — stateful, default-deny inbound, default-allow outbound
Network ACLs — stateless, evaluated by rule number, applied at subnet level
Security group rules — referencing other SGs as a source/destination is the core pattern
NACL recommended rules — the ephemeral-port trap

3.2 VPC endpoints and endpoint policies ★★★ Core

Gateway endpoints (S3 and DynamoDB) are free and route-table based; interface endpoints (PrivateLink) cost per hour + per GB. "Without traversing the internet / NAT Gateway" phrasing always points at endpoints. Endpoint policies further restrict what's reachable through them.

Core docs

VPC endpoints overview
Gateway endpoints (S3, DynamoDB) — free, route-table based
Interface endpoints (PrivateLink) — ENI-based, hourly + per-GB cost
VPC endpoint policies
S3 bucket policies that require a specific VPC endpoint

3.3 AWS WAF ★★ Important

Know the resources WAF can attach to: CloudFront, ALB, API Gateway, AppSync, Cognito, App Runner, Verified Access — not NLB. Rate-based rules for DDoS-style abuse; Managed Rules for OWASP Top 10.

Core docs

AWS WAF developer guide
Rule statements — IP, geo, regex, size, SQLi, XSS, rate-based
AWS Managed Rules — Core rule set, known bad inputs, IP reputation
Supported resources — CloudFront, ALB, API Gateway, AppSync, Cognito, App Runner, Verified Access

FAQ

AWS WAF FAQs

3.4 AWS Shield ★ Light

Standard is free and on by default; Advanced is paid and adds DDoS Response Team access, cost protection, and tighter integration with WAF. Usually one question max — recognise the distinction and move on.

Core docs

FAQ

AWS Shield FAQs

3.5 Network Firewall and Firewall Manager ★ Light

More SCS / ANS territory than SAA. At SAA level know that Network Firewall exists for stateful packet inspection in VPCs and that Firewall Manager centralises WAF / Shield / SG / Network-Firewall policies across an Org. Skim and move on.

Core docs

What is AWS Network Firewall?
Deployment architectures — distributed vs centralised inspection VPC
AWS Firewall Manager — central WAF / Shield Advanced / Network Firewall / SG policy management across an Org

Chapter 4 — Compute access and threat detection

Maps to Task Statement 1.2 — Design secure workloads and applications

Knowledge of:

Application configuration and credentials security
AWS service endpoints
Control ports, protocols, and network traffic on AWS
Secure application access
Security services with appropriate use cases (for example, Amazon Cognito, Amazon GuardDuty, Amazon Macie)
Threat vectors external to AWS (for example, DDoS, SQL injection)

Skills in:

Designing VPC architectures with security components
Determining network segmentation strategies
Integrating AWS services to secure applications
Securing external network connections to and from the AWS Cloud

4.1 EC2 access without SSH ★★ Important

Session Manager is the modern answer to "connect to EC2 without inbound 22" — no bastion, no key pairs, full audit trail in CloudTrail. Pattern: "administer EC2 without exposing SSH" → SSM Session Manager.

Core docs

AWS Systems Manager Session Manager — no inbound SSH, no bastion, fully audited
EC2 Instance Connect — short-lived SSH keys via IAM
SSM Fleet Manager
Default Host Management Configuration — instance profile-free SSM onboarding

4.2 IAM roles for compute ★★★ Core

Critical pattern across the whole exam. EC2 instance profiles, ECS task roles, Lambda execution roles, EKS IRSA / Pod Identity. Never embed credentials in code; always use a role. IMDSv2 protects against SSRF — require it on every instance.

Core docs

IAM roles for EC2 (instance profile)
IAM roles for ECS tasks
IAM roles for service accounts (EKS, IRSA)
EKS Pod Identity — newer, simpler alternative to IRSA
Lambda execution roles
IMDSv2 — protects against SSRF; require it on every instance

4.3 Inspector ★ Light

Tested as one option in "which service finds OS / package / network vulnerabilities on EC2 / ECR images / Lambda?" — the answer is Inspector. Skim the "What is" page and the FAQ.

Core docs

What is Amazon Inspector?
Resource scanning — EC2, ECR images, Lambda
Findings — package and network reachability

FAQ

Inspector FAQs

4.4 GuardDuty ★★ Important

Threat detection from VPC Flow Logs, DNS logs, CloudTrail, EKS audit logs, S3, RDS login events. Pattern: "detect compromised EC2 / unusual API calls / port-scan activity" → GuardDuty. Worth knowing the data sources it consumes.

Core docs

What is GuardDuty?
Data sources — VPC Flow Logs, DNS logs, CloudTrail, S3, EKS audit, RDS login events, Lambda, EBS malware scanning
Finding types
GuardDuty in an Organization

FAQ

GuardDuty FAQs

Chapter 5 — Auditing, compliance, and visibility

Maps to Task Statement 1.3 — Determine appropriate data security controls

Knowledge of:

Data access and governance
Data recovery
Data retention and classification
Encryption and appropriate key management

Skills in:

Aligning AWS technologies to meet compliance requirements
Encrypting data at rest (for example, AWS KMS)
Encrypting data in transit (for example, ACM using TLS)
Implementing access policies for encryption keys
Implementing data backups and replications
Implementing policies for data access, lifecycle, and protection
Rotating encryption keys and renewing certificates

5.1 CloudTrail ★★ Important

Management vs data vs Insights events is the canonical exam distinction. Organization trails span every account; log-file integrity validation matters for compliance scenarios. Data events (S3 object-level, Lambda invocations) are off by default and billable.

Core docs

What is CloudTrail?
Management events vs data events vs Insights events
Organization trails — single trail spanning every account
Log file integrity validation
CloudTrail Lake — managed event store with SQL query

5.2 AWS Config ★★ Important

"Continuously evaluates whether resources match a desired configuration." The answer for "audit configuration drift", "detect non-compliant resources", or "auto-remediate via SSM Automation". Conformance packs map to compliance frameworks (PCI, HIPAA, NIST).

Core docs

What is AWS Config?
AWS Config managed rules
Conformance packs — prepackaged compliance frameworks (PCI, HIPAA, NIST)
Auto-remediation via SSM Automation
Aggregators — multi-account, multi-region rollup

5.3 Security Hub ★ Light

Aggregates findings from GuardDuty, Inspector, Macie, and partner tools into a single dashboard against standards (CIS, PCI, NIST). Tested as "which service gives a single pane of glass for security findings?" Skim the docs.

Core docs

What is Security Hub?
Security standards — AWS Foundational Security Best Practices, CIS, PCI DSS, NIST
AWS Security Finding Format (ASFF) — common schema for findings from GuardDuty, Inspector, Macie, partners

FAQ

Security Hub FAQs

5.4 Audit Manager and Detective ★ Light

Audit Manager: collects evidence and maps it to controls for audit reports. Detective: graph-based investigation across CloudTrail, VPC Flow Logs, and GuardDuty findings. Both rare — recognise the one-line description and move on.

Core docs

AWS Audit Manager — collects evidence, maps to controls, produces audit reports
Amazon Detective — graph-based investigation across CloudTrail, VPC Flow Logs, GuardDuty findings

Deeper reading

Part II — Domain 2: Design Resilient Architectures (26%)

Resilience on AWS is the discipline of designing for the failure modes the platform actually has — single instances die, AZs partition, regions occasionally have bad days, and dependencies always fail. The exam tests whether you can pick the right pattern for a stated RTO/RPO and budget.

Chapter 6 — Availability Zone, Region, and DR foundations

Maps to Task Statement 2.1 and 2.2 — Design scalable and loosely coupled architectures; Design highly available and/or fault-tolerant architectures

Knowledge of:

API creation and management (for example, Amazon API Gateway, REST APIs)
AWS managed services with appropriate use cases (for example, AWS Transfer Family, Amazon SQS, Secrets Manager)
Caching strategies
Design principles for microservices
Event-driven architectures
Horizontal and vertical scaling
How to appropriately use edge accelerators (for example, CDN)
How to migrate applications into containers
Load balancing concepts (for example, Application Load Balancer)
Multi-tier architectures
Queuing and messaging concepts (for example, publish/subscribe)
Serverless technologies and patterns (for example, AWS Fargate, Lambda)
Storage types with associated characteristics (for example, object, file, block)
Container orchestration (for example, Amazon ECS, Amazon EKS)
When to use read replicas
Workflow orchestration (for example, AWS Step Functions)
AWS global infrastructure (for example, Availability Zones, Regions, Amazon Route 53)
Disaster recovery strategies

Skills in:

Designing event-driven, microservices, and/or multi-tier architectures based on requirements
Determining scaling strategies for components used in an architecture design
Determining the AWS services required to achieve loose coupling based on requirements
Determining when to use containers
Determining when to use serverless technologies and patterns
Recommending appropriate compute, storage, networking, and database technologies based on requirements
Using purpose-built AWS services for workloads
Determining automation strategies to ensure infrastructure integrity
Determining the AWS services required to provide a highly available and/or fault-tolerant architecture across AWS Regions or Availability Zones
Identifying metrics based on business requirements to deliver a highly available solution
Implementing designs to mitigate single points of failure
Implementing strategies to ensure the durability and availability of data (for example, backups)
Selecting an appropriate DR strategy to meet business requirements
Using AWS services that improve the reliability of legacy applications and applications not built for the cloud (for example, when application changes are not possible)
Using purpose-built AWS services for workloads

6.1 AZs, Regions, and edge locations ★★ Important

Foundational. AZs are the primary fault-isolation boundary; two AZs is the minimum for any "highly available" answer. Outposts / Local Zones / Wavelength appear as "low latency to on-prem / metro / 5G" scenarios.

Core docs

Regions, Availability Zones, and Local Zones
Global infrastructure overview
AWS Wavelength — 5G edge
AWS Outposts — AWS hardware on-prem
Local Zones — sub-region edge presence in major metros

Deeper reading

AWS Fault Isolation Boundaries — why AZs are the primary unit of fault isolation, and what regions and partitions add on top
Builder's Library — Static stability using Availability Zones

6.2 Disaster recovery strategies ★★★ Core

The four-pattern taxonomy — backup & restore, pilot light, warm standby, multi-site active — is required vocabulary. Multiple questions will use these names directly. Know the RTO / RPO trade-off and the cost ordering.

Core docs

DR options in the cloud — the canonical four-pattern taxonomy:
- Backup & restore — high RTO/RPO (hours), lowest cost.
- Pilot light — minimal infra warm in DR region, scaled out on failover.
- Warm standby — scaled-down full stack in DR region, scaled up on failover.
- Multi-site active/active — full capacity in both regions, near-zero RTO/RPO.
Disaster Recovery of Workloads on AWS (whitepaper)
Well-Architected Reliability Pillar
Route 53 health checks and DNS failover — the usual failover trigger
Route 53 Application Recovery Controller — readiness checks and routing controls

Chapter 7 — Compute resilience

Maps to Task Statement 2.2 — Design highly available and/or fault-tolerant architectures

Knowledge of:

AWS global infrastructure (for example, Availability Zones, Regions, Amazon Route 53)
AWS managed services with appropriate use cases (for example, Amazon Comprehend, Amazon Polly)
Basic networking concepts (for example, route tables)
Disaster recovery strategies (for example, backup and restore, pilot light, warm standby, multi-site active-active)
Distributed design patterns
Failover strategies
Immutable infrastructure
Load balancing concepts (for example, Application Load Balancer)
Proxy concepts (for example, Amazon RDS Proxy)
Service quotas and throttling (for example, how to configure the service quotas for a workload in a standby environment)
Storage options and characteristics (for example, durability, replication)
Workload visibility (for example, AWS X-Ray)

Skills in:

Determining automation strategies to ensure infrastructure integrity
Determining the AWS services required to provide a highly available and/or fault-tolerant architecture across AWS Regions or Availability Zones
Identifying metrics based on business requirements to deliver a highly available solution
Implementing designs to mitigate single points of failure
Implementing strategies to ensure the durability and availability of data (for example, backups)
Selecting an appropriate DR strategy to meet business requirements
Using AWS services that improve the reliability of legacy applications and applications not built for the cloud (for example, when application changes are not possible)
Using purpose-built AWS services for workloads

7.1 Auto Scaling ★★★ Core

Cornerstone of the resilience domain. Know launch templates (not legacy launch configurations), target-tracking scaling policies, mixed instance groups (On-Demand + Spot), and lifecycle hooks. Predictive scaling appears for known traffic patterns.

Core docs

EC2 Auto Scaling
Scaling policies — target tracking, step, simple, scheduled, predictive
Launch templates (use these — launch configurations are deprecated)
Mixed instance groups — combine On-Demand and Spot
Lifecycle hooks
Warm pools — pre-initialised instances for rapid scale-out
Application Auto Scaling — for ECS, DynamoDB, Aurora, etc.

FAQ

EC2 Auto Scaling FAQs

7.2 Elastic Load Balancing ★★★ Core

ALB vs NLB is one of the most-tested decisions on the exam. ALB for HTTP(S) and host/path routing; NLB for TCP/UDP, static IPs, and source-IP preservation; GLB for security-appliance insertion. Cross-zone behaviour differs by default (on for ALB, off for NLB).

Core docs

Elastic Load Balancing — what is it?
Load balancer comparison — ALB vs NLB vs GLB vs CLB
Application Load Balancer — Layer 7, host/path-based routing, native HTTPS termination
Network Load Balancer — Layer 4, static IPs, ultra-low latency, source IP preservation
Gateway Load Balancer — security-appliance insertion via GENEVE
Target groups, health checks, deregistration delay
Sticky sessions
Cross-zone load balancing — on by default for ALB, off by default for NLB

FAQ

ELB FAQs

7.3 Container resilience — ECS and EKS ★★ Important

Fargate vs EC2 launch type is the recurring decision: Fargate when "no servers to manage", EC2 when cost control or GPU is required. Service auto-scaling, deployment circuit breakers, and capacity providers appear in scenario questions.

Core docs

What is Amazon ECS?
EC2 launch type vs Fargate
ECS service definition — desired count, deployment config, placement strategies
ECS service auto-scaling
What is Amazon EKS?
EKS managed node groups
EKS on Fargate
Amazon ECR

FAQ

7.4 Lambda concurrency and resilience ★★ Important

Reserved concurrency caps a function (and isolates it); provisioned concurrency eliminates cold starts; both are paid in different ways. Async invocation auto-retries twice and supports DLQs. SnapStart for Java / Python / .NET cold-start mitigation.

Core docs

What is AWS Lambda?
Concurrency, reserved concurrency, provisioned concurrency
Lambda in a VPC — Hyperplane ENIs, no more cold-start tax
Asynchronous invocation, retries, DLQs
Lambda SnapStart — sub-second cold starts for Java/Python/.NET

Chapter 8 — Storage resilience

Maps to Task Statement 2.2 — Design highly available and/or fault-tolerant architectures

Knowledge of:

AWS global infrastructure (for example, Availability Zones, Regions, Amazon Route 53)
Disaster recovery strategies (for example, backup and restore, pilot light, warm standby, multi-site active-active)
Distributed design patterns
Failover strategies
Immutable infrastructure
Storage options and characteristics (for example, durability, replication)

Skills in:

Determining automation strategies to ensure infrastructure integrity
Determining the AWS services required to provide a highly available and/or fault-tolerant architecture across AWS Regions or Availability Zones
Identifying metrics based on business requirements to deliver a highly available solution
Implementing designs to mitigate single points of failure
Implementing strategies to ensure the durability and availability of data (for example, backups)
Selecting an appropriate DR strategy to meet business requirements
Using AWS services that improve the reliability of legacy applications and applications not built for the cloud (for example, when application changes are not possible)
Using purpose-built AWS services for workloads

8.1 S3 — versioning, replication, Object Lock ★★★ Core

Heavily tested. Versioning + MFA Delete; SRR (same region) for compliance copies; CRR (cross-region) for DR; Replication Time Control for the 15-minute SLA; Object Lock (Governance vs Compliance) for WORM, ransomware, and regulatory scenarios.

Core docs

What is Amazon S3?
S3 versioning
S3 replication — SRR (same-region) and CRR (cross-region)
Replication Time Control (RTC) — 15-minute SLA, billable
S3 Object Lock — WORM, retention modes (Governance vs Compliance)
Multi-Region Access Points — global endpoint with failover
Lifecycle rules

8.2 EBS snapshots and recovery ★★ Important

Incremental, S3-backed, region-scoped (use Copy for cross-region). Fast Snapshot Restore eliminates lazy-loading; Snapshot Archive cuts long-term cost ~75%. Multi-Attach is io1/io2 single-AZ only and needs a cluster filesystem.

Core docs

Amazon EBS snapshots — incremental, S3-backed
Fast Snapshot Restore (FSR)
Multi-volume crash-consistent snapshots
EBS Multi-Attach — io1/io2 only, single-AZ, requires cluster-aware filesystem
Data Lifecycle Manager — automated snapshot/AMI lifecycle

8.3 EFS resilience ★★ Important

Regional (multi-AZ) vs One Zone storage classes; lifecycle (Standard → IA → Archive) for cost. Tested as "shared POSIX filesystem across many EC2 instances" with a perf-mode or throughput-mode twist.

Core docs

What is Amazon EFS?
Regional vs One Zone storage classes
EFS replication
EFS lifecycle management — Standard ↔ IA ↔ Archive

FAQ

EFS FAQs

8.4 FSx ★★ Important

Four flavours, each with a clear "when". Windows: AD-joined SMB. Lustre: HPC and ML training. ONTAP: multiprotocol SMB + NFS + iSCSI with SnapMirror. OpenZFS: high-perf NFS with snapshots and clones. Recognise the keyword that points at each.

Core docs

FSx for Windows File Server — SMB, AD-joined, multi-AZ
FSx for Lustre — HPC, scratch and persistent
FSx for NetApp ONTAP — multiprotocol (NFS, SMB, iSCSI), SnapMirror
FSx for OpenZFS — high-perf NFS, snapshots, clones

FAQ

FSx for Windows File Server FAQs

8.5 Storage Gateway ★★ Important

Hybrid scenario answer for "extend on-prem to S3". File Gateway (NFS/SMB to S3), Volume Gateway (iSCSI; cached or stored), Tape Gateway (VTL backed by S3

Glacier). DataSync is the answer for one-time bulk transfers.

Core docs

S3 File Gateway — NFS/SMB to S3
FSx File Gateway
Volume Gateway — iSCSI cached/stored volumes backed by S3 EBS snapshots
Tape Gateway — virtual tape library backed by S3 + Glacier

Chapter 9 — Database resilience

Maps to Task Statement 2.2 — Design highly available and/or fault-tolerant architectures

Knowledge of:

AWS global infrastructure (for example, Availability Zones, Regions)
Disaster recovery strategies (for example, backup and restore, pilot light, warm standby, multi-site active-active)
Distributed design patterns
Failover strategies
Proxy concepts (for example, Amazon RDS Proxy)
Storage options and characteristics (for example, durability, replication)

Skills in:

Determining automation strategies to ensure infrastructure integrity
Determining the AWS services required to provide a highly available and/or fault-tolerant architecture across AWS Regions or Availability Zones
Identifying metrics based on business requirements to deliver a highly available solution
Implementing designs to mitigate single points of failure
Implementing strategies to ensure the durability and availability of data (for example, backups)
Selecting an appropriate DR strategy to meet business requirements
Using AWS services that improve the reliability of legacy applications and applications not built for the cloud (for example, when application changes are not possible)
Using purpose-built AWS services for workloads

9.1 RDS Multi-AZ and read replicas ★★★ Core

Multi-AZ is HA (synchronous standby, automatic failover, no read traffic on the standby). Read replicas are async, scale reads, can cross regions, can be promoted. Multi-AZ DB clusters add two readable standbys. RDS Proxy is the answer for "serverless connection pooling".

Core docs

What is Amazon RDS?
Multi-AZ deployments — synchronous standby, automatic failover, no read traffic on standby
Multi-AZ DB clusters — semi-sync, two readable standbys
Read replicas — async, can be promoted, can cross regions
Automated backups and point-in-time recovery
RDS Proxy — connection pooling, faster failover

FAQ

RDS FAQs

9.2 Aurora resilience and Global Database ★★★ Core

Aurora's storage is six-way replicated across three AZs out of the box. Global Database does sub-second cross-region replication with RTO under one minute — the answer for "global app with low-RTO DR". Up to 15 replicas share storage with the writer.

Core docs

Aurora overview
Aurora high availability — 6 copies across 3 AZs, self-healing storage
Aurora Global Database — sub-second cross-region replication, RTO < 1 min
Aurora Serverless v2 — autoscales by ACU, no idle pause
Aurora Replicas — up to 15, share storage with primary

9.3 DynamoDB Global Tables and PITR ★★★ Core

Global Tables are multi-region, multi-active, eventually consistent — the canonical answer for "multi-region active-active key-value store". PITR restores to any second in the last 35 days. Streams + Lambda for change-data-capture.

Core docs

What is DynamoDB?
Global Tables — multi-region, multi-active, eventually consistent
Point-in-time recovery (PITR) — restore to any second in last 35 days
On-demand backup and restore
DynamoDB Streams

FAQ

DynamoDB FAQs

9.4 ElastiCache resilience ★★ Important

ElastiCache (Redis OSS / Valkey) with cluster mode + Multi-AZ adds replication and automatic failover; Memcached has neither (sharded but no replication or persistence). MemoryDB is the answer when you need a durable in-memory database as a primary store, not just a cache.

Core docs

Amazon ElastiCache (Redis OSS / Valkey) — clustering, replication, Multi-AZ with automatic failover; Memcached is sharded with no replication or persistence
Amazon MemoryDB — durable in-memory database with a multi-AZ transactional log

Chapter 10 — Decoupling and event-driven design

Maps to Task Statement 2.1 — Design scalable and loosely coupled architectures

Knowledge of:

API creation and management (for example, Amazon API Gateway, REST APIs)
AWS managed services with appropriate use cases
Caching strategies
Design principles for microservices
Event-driven architectures
Horizontal and vertical scaling
Queuing and messaging concepts (for example, publish/subscribe)
Serverless technologies and patterns (for example, AWS Fargate, Lambda)
Workflow orchestration (for example, AWS Step Functions)

Skills in:

Designing event-driven, microservices, and/or multi-tier architectures based on requirements
Determining scaling strategies for components used in an architecture design
Determining the AWS services required to achieve loose coupling based on requirements
Determining when to use serverless technologies and patterns

10.1 Amazon SQS ★★★ Core

The default decoupling answer. Standard (best-effort ordering, at-least-once) vs FIFO (strict ordering + exactly-once within a message group). Visibility timeout, DLQs, long polling, and scaling by queue depth (SNS-driven ASG with SQS as buffer) appear repeatedly.

Core docs

10.2 Amazon SNS ★★ Important

Pub/sub. Fan-out (SNS → multiple SQS) is the canonical pattern. Filter policies cut consumer-side filtering. FIFO topics pair with FIFO queues for end-to-end ordering.

Core docs

What is Amazon SNS?
Fan-out pattern — SNS → multiple SQS subscribers
Message filtering policies
FIFO topics — pair with SQS FIFO queues for end-to-end ordering

10.3 Amazon EventBridge ★★ Important

Event-driven glue. Default bus for AWS service events, custom buses for app events, partner buses for SaaS. Rules + targets; archive / replay for audit and reprocessing. Scheduler replaces CloudWatch cron at scale.

Core docs

What is EventBridge?
Default, custom, and partner event buses
Rules and event patterns
Archive and replay
EventBridge Scheduler — replaces CloudWatch cron rules at scale
EventBridge Pipes — point-to-point integrations with optional filter/transform/enrich

10.4 AWS Step Functions ★ Light

Lighter at SAA than you might expect. Know Standard vs Express (exactly-once long-running vs at-least-once high-volume short workflows) and the basic service-integration patterns. Skim the FAQ.

Core docs

What is Step Functions?
Standard vs Express workflows
Error handling, retries, and catch
Service integration patterns — request/response, run-job (.sync), wait-for-callback (.waitForTaskToken)

10.5 Kinesis Data Streams (resilience aspects) ★★ Important

Resilience aspects: durable, ordered, replayable up to 365 days. Enhanced fan-out for low-latency consumers. Tested as "replay required" (KDS) vs "just deliver to S3" (Firehose) vs "message bus" (SQS / SNS) decision questions.

Core docs

Kinesis Data Streams — shards, retention up to 365 days, replay
Enhanced fan-out
On-demand vs provisioned capacity

Chapter 11 — Backup and disaster recovery

Maps to Task Statement 2.2 — Design highly available and/or fault-tolerant architectures

Knowledge of:

Disaster recovery strategies (for example, backup and restore, pilot light, warm standby, multi-site active-active)
Distributed design patterns
Failover strategies
Immutable infrastructure
Storage options and characteristics (for example, durability, replication)

Skills in:

Determining automation strategies to ensure infrastructure integrity
Determining the AWS services required to provide a highly available and/or fault-tolerant architecture across AWS Regions or Availability Zones
Identifying metrics based on business requirements to deliver a highly available solution
Implementing designs to mitigate single points of failure
Implementing strategies to ensure the durability and availability of data (for example, backups)
Selecting an appropriate DR strategy to meet business requirements
Using AWS services that improve the reliability of legacy applications and applications not built for the cloud (for example, when application changes are not possible)
Using purpose-built AWS services for workloads

11.1 AWS Backup ★★ Important

Centralised backup across many services with a single policy. Cross-account / cross-region copies for DR. Vault Lock provides WORM for backups (ransomware mitigation). Pattern: "compliance-driven backup across the org" → AWS Backup.

Core docs

What is AWS Backup?
Supported resources — EBS, EFS, RDS, Aurora, DynamoDB, FSx, Storage Gateway, S3, Neptune, DocumentDB
Cross-account and cross-region backup copies
Backup Vault Lock — WORM for backups, ransomware mitigation
AWS Backup for DR (audit-ready, cross-region, cross-account)

11.2 Elastic Disaster Recovery ★ Light

Block-level continuous replication for sub-minute RPO; the answer for "lift-and-shift DR for VMs / on-prem servers". Light at SAA level — recognise the one-liner and move on.

Core docs

What is AWS Elastic Disaster Recovery (AWS DRS)? — block-level continuous replication, sub-minute RPO

FAQ

AWS Elastic Disaster Recovery FAQs

Deeper reading

Disaster Recovery of Workloads on AWS

Part III — Domain 3: Design High-Performing Architectures (24%)

The performance domain rewards knowing the available options and their trade-offs at each layer of the stack — compute family selection, the right storage tier, the right database engine, and the right network primitive. The exam likes "you have requirement X under constraint Y, which combination of services?" — the trick is reading both X and Y carefully.

Chapter 12 — Choosing compute

Maps to Task Statement 3.2 — Design high-performing and elastic compute solutions

Knowledge of:

AWS compute services with appropriate use cases (for example, AWS Batch, Amazon EMR, Fargate)
Distributed computing concepts supported by AWS global infrastructure and edge services
Queuing and messaging concepts (for example, publish/subscribe)
Scalability capabilities with appropriate use cases (for example, Amazon EC2 Auto Scaling, AWS Auto Scaling)
Serverless technologies and patterns (for example, Lambda, Fargate)
The orchestration of containers (for example, Amazon ECS, Amazon EKS)

Skills in:

Decoupling workloads so that components can scale independently
Identifying metrics and conditions to perform scaling actions
Selecting the appropriate compute options and features (for example, EC2 instance types) to meet business requirements
Selecting the appropriate resource type and size (for example, the amount of Lambda memory) to meet business requirements

12.1 EC2 instance families and Graviton ★★★ Core

Know the family letters cold: M (general), C (compute), R (memory), X / u (high-mem), I / D / H (storage), P / G / Inf / Trn (accelerated), T (burstable). Graviton (ARM) gives ~20% cost saving on broad workloads. T-class CPU credits and unlimited mode appear.

Core docs

EC2 instance types overview
Instance type comparison page — keep the families straight: M (general), C (compute), R (memory), X/u (high memory), I/D/H (storage), P/G/Inf/Trn (accelerated), T (burstable)
AWS Graviton — ARM-based, ~20% cheaper, ~40% better price-performance for many workloads
Burstable (T-class) instances — CPU credits and unlimited mode
Instance purchasing options — covered in detail in Domain 4

12.2 Containers — ECS Fargate vs EC2 ★★ Important

Recurring decision: Fargate when "no servers to manage / minimum operational overhead", EC2 when "cost control / GPU / custom AMI / Spot diversity". Capacity providers (Fargate, Fargate Spot, ASG) and task placement strategies show up in cost questions.

Core docs

ECS launch types — Fargate for serverless containers, EC2 for cost control / GPU / custom AMIs
ECS task definitions
Task placement strategies — binpack, spread, random
Capacity providers — Fargate, Fargate Spot, ASG-backed

12.3 Lambda performance ★★ Important

Memory tunes both speed and cost (CPU is allocated proportionally; tune up until total cost stops dropping). Provisioned concurrency = no cold starts but always-on cost. SnapStart for Java / Python / .NET. Graviton runtimes are cheaper.

Core docs

Memory and CPU — CPU is allocated proportionally to memory; tuning memory tunes both speed and cost
Provisioned concurrency — eliminates cold starts at the cost of always-on capacity
Lambda SnapStart — sub-second cold starts for Java, Python, .NET
Runtime selection — Graviton for ~20% cost savings
Lambda + RDS Proxy — connection pooling for burst-y serverless

12.4 Batch and HPC ★ Light

Recognise that Batch is the managed answer for "queued jobs across Spot and On-Demand", ParallelCluster for HPC, EFA for OS-bypass MPI. Rare on SAA — skim and move on.

Core docs

AWS Batch — managed job scheduling, Fargate or EC2 (incl. Spot)
AWS ParallelCluster — open-source cluster orchestrator for HPC
HPC on AWS — EFA, cluster placement groups, FSx for Lustre, Spot

Chapter 13 — High-performing storage

Maps to Task Statement 3.1 — Determine high-performing and/or scalable storage solutions

Knowledge of:

Hybrid storage solutions to meet business requirements
Storage services with appropriate use cases (for example, Amazon S3, Amazon EFS, Amazon EBS)
Storage types with associated characteristics (for example, object, file, block)

Skills in:

Determining storage services and configurations that meet performance demands
Determining storage services that can scale to accommodate future needs

13.1 S3 storage classes and Transfer Acceleration ★★★ Core

The most-tested storage decision on the exam. Standard / Intelligent-Tiering / Standard-IA / One Zone-IA / Glacier (Instant / Flexible / Deep Archive) / Express One Zone — know the access-pattern + retrieval-cost trade-off cold. Transfer Acceleration uses CloudFront edges for fast uploads from far away.

Core docs

S3 storage classes — Standard, Intelligent-Tiering, Standard-IA, One Zone-IA, Glacier Instant Retrieval, Glacier Flexible Retrieval, Glacier Deep Archive, Express One Zone
Performance design patterns — parallelism, request rates, multipart upload
S3 Transfer Acceleration — uploads via CloudFront edge locations
Multipart upload
S3 Access Points

FAQ

S3 FAQs

13.2 EBS volume types ★★★ Core

gp3 is the new default; gp2 is legacy. io2 Block Express for SAN-class IOPS. st1 for streaming and big-data; sc1 for cold. Multi-Attach is io1/io2 only. Elastic Volumes for online type / size / IOPS changes without downtime.

Core docs

EBS volume types — gp3 (default general-purpose), gp2, io2 Block Express (SAN-class IOPS), io1, st1 (throughput HDD), sc1 (cold HDD)
I/O characteristics and monitoring
Elastic Volumes — change type, size, IOPS without downtime

| Volume type | Use case | Max IOPS | Max throughput | | ----------------------- | ----------------------------------- | -------- | -------------- | | gp3 (SSD) | General-purpose default | 16,000 | 1,000 MiB/s | | io2 Block Express (SSD) | I/O-intensive databases | 256,000 | 4,000 MiB/s | | st1 (HDD) | Streaming, big data, log processing | 500 | 500 MiB/s | | sc1 (HDD) | Cold, infrequently accessed | 250 | 250 MiB/s |

13.3 Instance Store ★★ Important

Ephemeral local NVMe — data is lost on stop or terminate. The answer for "cache / scratch / replicated DB shard" where you accept the data-loss model. Cheaper than EBS for the IOPS and throughput you get.

Core docs

Instance Store volumes — ephemeral local NVMe; data lost on stop/terminate
When to choose Instance Store — caches, scratch space, replicated databases

13.4 FSx for Lustre and HPC patterns ★ Light

Niche. Sub-ms latency, hundreds of GB/s throughput; scratch vs persistent file systems; can lazy-load from S3 and write back. Mostly an HPC topic; rare on SAA.

Core docs

FSx for Lustre — sub-ms latency, hundreds of GB/s throughput
Scratch vs persistent file systems
Linking Lustre to S3 — lazy-load datasets, write results back

Chapter 14 — High-performing databases

Maps to Task Statement 3.3 — Determine high-performing database solutions

Knowledge of:

AWS global infrastructure (for example, Availability Zones, Regions)
Caching strategies and services (for example, Amazon ElastiCache)
Data access patterns (for example, read-intensive compared with write-intensive)
Database capacity planning (for example, capacity units, instance types, Provisioned IOPS)
Database connections and proxies
Database engines with appropriate use cases (for example, heterogeneous migrations, homogeneous migrations)
Database replication (for example, read replicas)
Database types and services (for example, serverless, relational compared with non-relational, in-memory)

Skills in:

Configuring read replicas to meet business requirements
Designing database architectures
Determining an appropriate database engine (for example, MySQL compared with PostgreSQL)
Determining an appropriate database type (for example, Amazon Aurora, Amazon DynamoDB)
Integrating caching to meet business requirements

14.1 Choosing the right database ★★★ Core

The exam's single most-tested database task. Relational (RDS / Aurora) for SQL with joins; key-value (DynamoDB) for ms latency at any scale; document (DocumentDB); in-memory (ElastiCache, MemoryDB); graph (Neptune); time-series (Timestream); ledger (QLDB — note: AWS ended Amazon QLDB support on 31 July 2025, though it remains in the SAA-C03 in-scope services list). Read the qualifier — "flexible schema", "sub-millisecond", "graph traversal" — and pick.

Core docs

AWS database services overview — relational, key-value, document, in-memory, graph, time-series, ledger
Choosing a database (whitepaper section)
SQL → NoSQL decision framework

14.2 RDS and Aurora performance ★★ Important

Performance Insights, RDS Proxy for connection pooling, Aurora replicas (up to 15) for read scale. Aurora Serverless v2 autoscales at 0.5 ACU granularity. gp3 storage by default.

Core docs

RDS storage types — gp3, io1/io2, magnetic
RDS Performance Insights
RDS Proxy — managed connection pooling
Aurora Replicas — up to 15, < 100 ms lag
Aurora Serverless v2 — fine-grained autoscaling

14.3 DynamoDB capacity, partitioning, DAX ★★★ Core

On-demand vs provisioned (with auto-scaling) is a recurring decision. Hot-partition design is heavily tested — choose a high-cardinality partition key. DAX is the in-memory cache for microsecond reads. GSI vs LSI distinction matters.

Core docs

On-demand vs provisioned capacity
Partition key design and hot partitions
Local and global secondary indexes
DynamoDB Accelerator (DAX) — write-through cache, microsecond reads
DynamoDB Streams + Lambda triggers
Core concepts — RCU/WCU sizing

14.4 ElastiCache for caching workloads ★★ Important

Lazy loading vs write-through; TTL strategy; Redis (replication, persistence, cluster mode) vs Memcached (sharded, simple, no persistence). Pattern: "reduce read load on RDS / DynamoDB" → ElastiCache (or DAX for DDB).

Core docs

Caching strategies — lazy loading, write-through, TTL
Cluster mode disabled vs enabled
Amazon MemoryDB — when you need durability and the Redis OSS / Valkey API in one product

Chapter 15 — High-performing networking

Maps to Task Statement 3.4 — Determine high-performing and/or scalable network architectures

Knowledge of:

Edge networking services with appropriate use cases (for example, Amazon CloudFront, AWS Global Accelerator)
How to design network architecture (for example, subnet tiers, routing, IP addressing)
Load balancing concepts (for example, Application Load Balancer)
Network connection options (for example, AWS VPN, Direct Connect, AWS PrivateLink)

Skills in:

Creating a network topology for various architectures (for example, global, hybrid, multi-tier)
Determining network configurations that can scale to accommodate future needs
Determining the appropriate placement of resources to meet business requirements
Selecting the appropriate load balancing strategy

15.1 Placement groups, ENA, EFA ★★ Important

Cluster (low-latency same-rack), partition (large distributed systems with rack-aware fault isolation), spread (fewer instances, max isolation). EFA is for HPC / MPI (OS-bypass) — rare. ENA enhanced networking is on by default on current-gen instances.

Core docs

Placement groups — cluster (low latency), partition (large distributed systems), spread (small fault-tolerant clusters)
Enhanced networking — ENA, up to 100 Gbps
Elastic Fabric Adapter (EFA) — OS-bypass, MPI, HPC and ML training

15.2 EBS-optimised instances ★ Light

Default-on for all current-gen instances; mostly background knowledge. Won't be a primary topic — recognise the term and move on.

Core docs

EBS-optimised instances — dedicated EBS bandwidth; default on for current-gen instances

15.3 CloudFront and Global Accelerator ★★★ Core

One of the most-tested decision pairs. CloudFront for HTTP(S) caching at the edge; Global Accelerator for static anycast IPs in front of TCP / UDP (and fast cross-region failover for non-HTTP traffic). They compose.

Core docs

15.4 Direct Connect for performance ★★ Important

Hybrid backbone. Dedicated vs hosted; private / public / transit VIFs. MACsec on dedicated lines for L2 encryption; IPsec VPN over a public VIF for L3. HA designs use multiple connections in multiple locations.

Core docs

What is Direct Connect?
Dedicated vs hosted connections
Virtual interfaces — public, private, transit
Encrypting Direct Connect — MACsec on dedicated, IPsec VPN over public VIF
High-availability designs — multiple connections, multiple locations

15.5 PrivateLink and VPC endpoints ★★★ Core

Overlaps with section 3.2 — the exam asks the same question from both security and performance angles. "Expose a service privately to other VPCs / accounts" → PrivateLink. "Reach S3 / DynamoDB privately" → gateway endpoints (free).

Core docs

Chapter 16 — Data, analytics, and streaming

Maps to Task Statement 3.5 — Determine high-performing data ingestion and transformation solutions

Knowledge of:

Data analytics and visualization services with appropriate use cases (for example, Amazon Athena, AWS Lake Formation, Amazon QuickSight)
Data ingestion patterns (for example, frequency)
Data transfer services with appropriate use cases (for example, AWS DataSync, AWS Storage Gateway)
Data transformation services with appropriate use cases (for example, AWS Glue)
Secure access to ingestion access points
Sizes and speeds needed to meet business requirements
Streaming data services with appropriate use cases (for example, Amazon Kinesis)

Skills in:

Building and securing data lakes
Designing data streaming architectures
Designing data transfer solutions
Implementing visualization strategies
Selecting appropriate compute options for data processing (for example, Amazon EMR)
Selecting appropriate configurations for ingestion
Transforming data between formats (for example, .csv to .parquet)

16.1 Kinesis Data Streams and Firehose ★★ Important

Streams: durable, replayable, custom consumers, sharded. Firehose: managed delivery to S3 / Redshift / OpenSearch / Splunk / HTTP, no replay, near-real-time. Pattern: "I just want this in S3" → Firehose. "I need to replay" → Streams.

Core docs

Kinesis Data Streams — durable, ordered, replayable shard-based stream
Amazon Data Firehose (formerly Kinesis Firehose) — managed delivery to S3, Redshift, OpenSearch, Splunk, HTTP endpoints
Streams vs Firehose — Streams for replayable, custom-consumer pipelines; Firehose for "I just want this in S3 / Redshift / OpenSearch"

16.2 Amazon MSK (Managed Streaming for Kafka) ★ Light

Managed Kafka. Rare at SAA; tested as "I have an existing Kafka workload, what's the AWS-native answer?" MSK Serverless for autoscaling. Skim and move on.

Core docs

What is Amazon MSK?
MSK Serverless — pay per throughput, autoscaling

16.3 Athena and Glue ★★ Important

Athena for serverless SQL on S3 — the answer when "no infra" and "occasional query" coincide. Partitioning + Parquet/ORC dramatically lowers cost. Glue is the managed ETL + Data Catalog. Lake Formation adds fine-grained access control.

Core docs

What is Athena? — serverless SQL on S3
Performance — partition, compress, columnar (Parquet/ORC)
What is AWS Glue? — ETL, Data Catalog, crawlers
AWS Lake Formation — fine-grained access control over the data lake

16.4 EMR ★ Light

Managed Hadoop / Spark / Presto / HBase / Flink. Rare. Recognise it for "large-scale distributed processing on a managed cluster" and move on. Spot use is common.

Core docs

What is Amazon EMR? — Hadoop, Spark, Presto, HBase, Flink
Instance purchasing options for EMR — heavy Spot use is common
EMR Serverless

16.5 Redshift ★★ Important

Columnar MPP data warehouse. Tested mostly as "is this the right tool?" rather than configuration depth. Spectrum queries S3 directly without loading. Serverless variant exists.

Core docs

What is Amazon Redshift? — columnar MPP data warehouse
Redshift Serverless
Redshift Spectrum — query S3 directly, no load
Workload management

16.6 OpenSearch Service ★ Light

Search and log analytics. Recognise it for "log search and observability" or "full-text search". Multi-AZ with standby for HA. Rare beyond the basic "which service" question.

Core docs

What is Amazon OpenSearch Service? — search, log analytics, observability
OpenSearch Serverless
Multi-AZ with standby

Deeper reading

Well-Architected Performance Efficiency Pillar
Storage services overview (whitepaper section)

Part IV — Domain 4: Design Cost-Optimized Architectures (20%)

The cost domain is mostly about knowing the levers each service exposes, the pricing units that drive bills (compute hours, GB-month, GB-out, request count), and the few high-leverage practices — Savings Plans, S3 Intelligent-Tiering, lifecycle policies, VPC endpoints — that consistently move the needle.

Chapter 17 — Cost-optimised compute

Maps to Task Statement 4.2 — Design cost-optimized compute solutions

Knowledge of:

AWS cost management service features (for example, cost allocation tags, multi-account billing)
AWS cost management tools with appropriate use cases (for example, Cost Explorer, AWS Budgets, AWS Cost and Usage Report)
AWS global infrastructure (for example, Availability Zones, Regions)
AWS purchasing options (for example, Spot Instances, Reserved Instances, Savings Plans)
Distributed compute strategies (for example, edge processing)
Hybrid compute options (for example, AWS Outposts, AWS Snowball Edge)
Instance types, families, and sizes (for example, memory optimized, compute optimized, virtualization)
Optimization of compute utilization (for example, containers, serverless computing, microservices)
Scaling strategies (for example, auto scaling, hibernation)

Skills in:

Determining an appropriate load balancing strategy (for example, Application Load Balancer [Layer 7] compared with Network Load Balancer [Layer 4] compared with Gateway Load Balancer)
Determining appropriate scaling methods and strategies for elastic workloads
Determining cost-effective AWS compute services with appropriate use cases
Determining the required availability for different classes of workloads
Selecting the appropriate instance family for a workload
Selecting the appropriate instance size for a workload

17.1 Pricing models — On-Demand, Reserved, Savings Plans, Spot ★★★ Core

Cornerstone of the cost domain. Know the discount tiers, commitment terms, and which model fits which workload pattern. Compute Savings Plans cover EC2 / Fargate / Lambda; EC2 Instance SP is deeper but family-locked. Spot for fault-tolerant workloads.

Core docs

EC2 instance purchasing options — overview of all five
Savings Plans — Compute (most flexible), EC2 Instance (deepest discount), SageMaker
Reserved Instances — Standard vs Convertible, Regional vs Zonal scope
Spot Instances — up to 90% off On-Demand, two-minute interruption notice
Dedicated Hosts — for BYOL licences (Windows, SQL Server, Oracle)

| Model | Discount | Commit | Best for | | -------------------------- | --------- | ------------------------------------- | ------------------------------------------------------------------ | | On-Demand | 0% | None | Spiky / unknown workloads | | Compute Savings Plans | ~66% | 1 or 3 years, $/hr | Steady compute across EC2 / Fargate / Lambda | | EC2 Instance Savings Plans | ~72% | 1 or 3 years, $/hr in family + region | Steady EC2 in a known family | | Reserved Instances | ~72% | 1 or 3 years, instance attributes | Legacy commitments; new workloads should use Savings Plans | | Spot | up to 90% | None | Fault-tolerant, flexible workloads — Batch, EMR, ASG, Fargate Spot |

17.2 Right-sizing and Compute Optimizer ★★ Important

Compute Optimizer's ML recommendations cover EC2, ASGs, EBS, Lambda, ECS Fargate. Tested as "most cost-effective without sacrificing performance" — usually points at Compute Optimizer or Cost Explorer right-sizing.

Core docs

AWS Compute Optimizer — ML-based right-sizing for EC2, ASGs, EBS, Lambda, ECS Fargate
Cost Explorer right-sizing recommendations
Trusted Advisor — cost optimization checks

17.3 Auto Scaling for cost ★★ Important

Mixed instance policies blend On-Demand and Spot inside one ASG; capacity-optimized allocation maximises Spot survival. Pattern: "most cost-effective batch / web tier" often wants Spot via ASG mixed instances.

Core docs

Mixed instance policies — combine On-Demand + Spot inside one ASG
Spot in Auto Scaling — capacity-optimized allocation strategy
EC2 Fleet and Spot Fleet

17.4 Lambda economics ★ Light

Pricing is request count + GB-second of execution. Memory tuning often reduces total cost (faster execution outpaces per-ms cost growth). Light topic — one or two questions max.

Core docs

Lambda pricing — request count + GB-second of execution
Memory tuning — allocating more memory often reduces total cost because duration drops faster than per-ms cost rises
Lambda Power Tuning (operator guide)

Chapter 18 — Cost-optimised storage

Maps to Task Statement 4.1 — Design cost-optimized storage solutions

Knowledge of:

Access options (for example, an S3 bucket with Requester Pays object storage)
AWS cost management service features (for example, cost allocation tags, multi-account billing)
AWS cost management tools with appropriate use cases (for example, Cost Explorer, AWS Budgets, AWS Cost and Usage Report)
AWS storage services with appropriate use cases (for example, Amazon FSx, Amazon EFS, Amazon S3, Amazon EBS)
Backup strategies
Block storage options (for example, hard disk drive [HDD] volume types, solid state drive [SSD] volume types)
Data lifecycles
Hybrid storage options (for example, DataSync, Transfer Family, Storage Gateway)
Storage access patterns
Storage tiering (for example, cold tiering for object storage)
Storage types with associated characteristics (for example, object, file, block)

Skills in:

Designing appropriate storage strategies (for example, batch uploads to Amazon S3 compared with individual uploads)
Determining the correct storage size for a workload
Determining the lowest cost method of transferring data for a workload to AWS storage
Determining when storage auto scaling is required
Managing S3 object lifecycles
Selecting the appropriate backup and/or archival solution
Selecting the appropriate service for data migration to storage services
Selecting the appropriate storage tier
Selecting the correct data lifecycle for storage
Selecting the most cost-effective storage service for a workload

18.1 S3 storage classes and Intelligent-Tiering ★★★ Core

Re-tested from the cost angle: Intelligent-Tiering is the default "unknown access pattern" answer because it has no retrieval fees. One Zone-IA cuts cost ~20% but loses an AZ of durability. Glacier tiers — Instant / Flexible / Deep Archive — by retrieval-time tolerance.

Core docs

S3 storage classes overview
S3 Intelligent-Tiering — automatic movement across Frequent / Infrequent / Archive Instant / Archive / Deep Archive Access tiers, no retrieval fees
S3 pricing page — for the pricing-unit details (storage, requests, transfer, retrieval)

18.2 S3 lifecycle policies ★★★ Core

Heavily tested. Transition rules (minimum object size, minimum days in source class) and expiration rules; interactions with versioning (current vs non-current versions) and Object Lock matter. Pattern: "after 30 / 90 / 365 days move to ..." → lifecycle.

Core docs

Lifecycle management
Transition constraints — minimum object size, minimum days in source class
Lifecycle interactions with versioning, replication, Object Lock

18.3 EBS cost levers ★★ Important

gp3 vs gp2 — ~20% cheaper at equal performance. Snapshot Archive cuts long-term cost ~75%. DLM automates snapshot deletion.

Core docs

gp3 vs gp2 — gp3 decouples IOPS/throughput from capacity, ~20% cheaper at equal performance
EBS Snapshot Archive — 75% cheaper for long-term snapshot retention
Data Lifecycle Manager — automated snapshot deletion to control cost
Recycle Bin for snapshots and AMIs

18.4 Storage Gateway and on-prem caching ★★ Important

DataSync for recurring or one-time transfers under ~100 TB; Snow Family for bulk offline (Snowball Edge for petabyte-scale). Storage Gateway for ongoing on-prem ↔ S3 caching.

Core docs

Storage Gateway types — File, Volume (cached or stored), Tape
AWS DataSync — bulk one-time and recurring transfers; pricing per GB transferred
AWS Snowball Edge — the remaining orderable Snow device for petabyte-scale offline transfer (Snowcone and Snowmobile have been discontinued)

Chapter 19 — Cost-optimised databases

Maps to Task Statement 4.3 — Design cost-optimized database solutions

Knowledge of:

AWS cost management service features (for example, cost allocation tags, multi-account billing)
AWS cost management tools with appropriate use cases (for example, Cost Explorer, AWS Budgets, AWS Cost and Usage Report)
Caching strategies
Data retention policies
Database capacity planning (for example, capacity units)
Database connections and proxies
Database engines with appropriate use cases (for example, heterogeneous migrations, homogeneous migrations)
Database replication (for example, read replicas)
Database types and services (for example, relational compared with non-relational, Aurora, DynamoDB)

Skills in:

Designing appropriate backup and retention policies (for example, snapshot frequency)
Determining an appropriate database engine (for example, MySQL compared with PostgreSQL)
Determining cost-effective AWS database services with appropriate use cases (for example, DynamoDB compared with Amazon RDS, serverless)
Determining cost-effective AWS database types (for example, time series format, columnar format)
Migrating database schemas and data to different locations and/or different database engines

19.1 Aurora Serverless v2 ★★ Important

Autoscales by ACU; the answer for "unpredictable / spiky relational workload, minimum operational overhead". v2 now supports scaling to zero (automatic pause), fine-grained 0.5-ACU steps, and current engine versions. Aurora Serverless v1 reached end of life on 31 December 2024 — design new workloads on v2.

Core docs

Aurora Serverless v2 — autoscales in 0.5 ACU increments and can scale to zero (automatic pause); supports current Aurora engine versions

19.2 DynamoDB on-demand vs provisioned ★★ Important

On-demand for unpredictable workloads (no capacity planning, higher unit cost); provisioned + auto-scaling for predictable. Reserved capacity discounts provisioned. Standard-IA storage class is ~60% cheaper storage with higher request prices.

Core docs

Capacity modes — on-demand for unpredictable, provisioned (with auto-scaling) for predictable
Reserved capacity — discount on provisioned RCU/WCU
TTL — free deletion of expired items
Standard vs Standard-IA storage class — IA is ~60% cheaper storage with higher request prices

19.3 RDS RIs and read replica patterns ★★ Important

RIs scope by engine + instance class + region. Stopping a non-Multi-AZ instance pauses compute charges (auto-restarts after 7 days). Read replicas can be cheaper than scaling primary up.

Core docs

RDS Reserved Instances — engine + instance class + region scope
Stopping a DB instance temporarily — pay only for storage; auto-restart after 7 days
Read replicas to offload primary — cheaper than scaling primary up

Chapter 20 — Cost-optimised networking

Maps to Task Statement 4.4 — Design cost-optimized network architectures

Knowledge of:

AWS cost management service features (for example, cost allocation tags, multi-account billing)
AWS cost management tools with appropriate use cases (for example, Cost Explorer, AWS Budgets, AWS Cost and Usage Report)
Load balancing concepts (for example, Application Load Balancer)
NAT gateways (for example, NAT gateway costs compared with NAT instance costs)
Network connectivity (for example, private lines, dedicated lines, VPNs)
Network routing, topology, and peering (for example, AWS Transit Gateway, VPC peering)
Network services with appropriate use cases (for example, DNS)

Skills in:

Configuring appropriate NAT gateway types for a network (for example, a single shared NAT gateway compared with NAT gateways for each Availability Zone)
Configuring appropriate network connections (for example, Direct Connect compared with VPN compared with internet)
Configuring appropriate network routes to minimize network transfer costs (for example, Region to Region, Availability Zone to Availability Zone, private to public, Global Accelerator, VPC endpoints)
Determining strategic needs for content delivery networks (CDNs) and edge caching
Reviewing existing workloads for network optimizations
Selecting an appropriate throttling strategy

20.1 Data transfer pricing — the bills people don't see coming ★★★ Core

The bills people don't see coming. Know the rules of thumb cold: same-AZ private = free; cross-AZ = $0.01/GB each way; egress to internet ~$0.09/GB; through NAT GW adds $0.045/GB processing on top of egress; cross-region = $0.02–0.09/GB. CloudFront → AWS origin is free.

Core docs

Overview of data transfer costs for common architectures — required reading; the canonical map of when bytes cost
EC2 data transfer pricing
VPC peering — same-region peering has no per-GB charge for traffic within an AZ; cross-AZ traffic is charged

Headline rules of thumb:

Within an AZ, between resources using private IPs: free.
Between AZs in the same region: $0.01/GB each direction.
To the internet from EC2: $0.09/GB for the first 10 TB/month (region dependent), with a free tier per account.
Through a NAT Gateway: $0.045/GB processing fee on top of normal egress.
Between regions: $0.02–$0.09/GB depending on region pair.
Out via CloudFront: a separate, generally lower, regional rate.

20.2 VPC endpoints to avoid NAT and IGW data charges ★★★ Core

Single biggest "cost-optimised network" pattern on the exam. Gateway endpoints (S3, DynamoDB) are free and bypass NAT entirely. Pattern: "cut NAT Gateway data-processing fees" → gateway endpoints.

Core docs

Gateway endpoints (S3 and DynamoDB) — free; route-table based; biggest cost win for any VPC that hits S3 a lot through a NAT Gateway
Interface endpoints (PrivateLink) — hourly charge per ENI per AZ + per-GB; usually worth it for high-volume API traffic that would otherwise traverse a NAT Gateway

20.3 CloudFront and edge caching for cost ★★ Important

Egress from CloudFront is generally cheaper than direct from origin, and CloudFront → AWS origin is free. The Security Savings Bundle gives a discount in exchange for a monthly commitment.

Core docs

CloudFront pricing — data transfer out from CloudFront is generally cheaper than direct from origin, and CloudFront → AWS origin is free
CloudFront Security Savings Bundle — discount in exchange for a monthly commitment
CloudFront in front of S3 — eliminates direct-to-S3 GET costs and adds a security layer

20.4 Direct Connect vs VPN economics ★★ Important

DX has a port-hour cost + cheaper data egress (worth it at scale); VPN is per-tunnel-hour with standard egress (right for low or spiky traffic). Pattern: "sustained > 1 Gbps to on-prem" → DX; "occasional admin traffic" → VPN.

Core docs

Direct Connect pricing — port-hour + data-out per GB (cheaper than internet egress at scale)
Site-to-Site VPN pricing — per-tunnel-hour + standard data-out
When VPN is enough — small steady traffic, occasional high traffic, no SLA needs

Chapter 21 — Cost management and governance

Cross-cutting cost management (editorial, not a separate exam task statement). This chapter draws on the cost-management knowledge and skills that recur across all four Domain 4 task statements: the cost-management service features (cost allocation tags, multi-account billing) and tools (Cost Explorer, AWS Budgets, AWS Cost and Usage Report) that every cost question assumes, plus Organizations consolidated billing, purchasing-option selection (Savings Plans, Reserved Instances), and reviewing existing workloads for savings opportunities.

21.1 Cost Explorer and Cost & Usage Reports ★★ Important

Cost Explorer for visualisation, filtering, forecasting. CUR for hourly line-level data into S3, queryable from Athena or QuickSight. Cost Anomaly Detection for ML-based alerts on unusual spend.

Core docs

AWS Cost Explorer — visualisation, filtering, forecasting
Cost and Usage Reports (CUR) — hourly line-level data delivered to S3, queryable via Athena/QuickSight
Cost Anomaly Detection — ML-based alerts on unusual spend

21.2 AWS Budgets ★★ Important

Cost / usage / RI-SP coverage and utilisation budgets, with SNS alerts. Budget Actions can auto-stop EC2 or apply restrictive IAM / SCP when a threshold is crossed — appears in "enforce a hard cap" questions.

Core docs

AWS Budgets — cost, usage, RI/Savings Plans coverage and utilisation, with SNS or chatbot alerts
Budget Actions — auto-apply IAM/SCP, stop/terminate EC2 or RDS when threshold crossed

21.3 Tagging strategy ★ Light

Cost-allocation tags must be activated in the billing console before they appear in CUR. Tag policies in Organizations enforce a tagging schema. Tested rarely — a single "how do I split costs by team?" question.

Core docs

Tagging best practices (whitepaper)
Cost allocation tags — must be activated in the billing console before they appear in CUR
Tag policies — enforce a tagging schema across an Organization

21.4 Trusted Advisor ★★ Important

Five categories of checks: cost, performance, security, fault tolerance, service limits. Tested as "which service surfaces idle resources / underutilised RIs / unused EIPs?" → Trusted Advisor cost checks.

Core docs

AWS Trusted Advisor — five pillars of checks (cost, performance, security, fault tolerance, service limits)
Trusted Advisor check reference

21.5 AWS Organizations consolidated billing ★ Light

Volume-tier discounts and RI / Savings Plans sharing across accounts in an Org. Recognise the one-line description and move on — depth here is exam noise.

Core docs

Consolidated billing — volume-tier and RI/Savings Plans sharing across accounts in an Organization
How Savings Plans apply across accounts

Deeper reading

Study tips

Schedule the exam before you feel ready. The deadline produces the focus. Two weeks out, sit a full-length practice exam under timed conditions; the gap between your practice score and the pass mark tells you where to spend the remaining time.

SAA-C03 questions are scenario-based and verbose. Read the question once for context, then read the answers, then re-read the question with the answers in mind. Half the time the wrong answers are eliminated by a single qualifier — "highly available", "least operational overhead", "most cost-effective", "minimal application changes" — that's easy to miss on the first pass.

If two answers are technically correct, the right one is the one that aligns with the qualifier. "Highly available" rules out single-AZ; "least operational overhead" favours managed services over self-managed; "most cost-effective" favours Spot, Savings Plans, S3 Intelligent-Tiering, and serverless over provisioned capacity.

Read the FAQ for every service in the exam guide. They are short, dense, and disproportionately tested. The Well-Architected Framework is the design lens behind every question — when in doubt, pick the answer that is the closest to a Well-Architected best practice.

Good luck.