Notes from video course focused on the Certified Solutions Architect Associate exam.
NEXT: 287 (Section 25)
To see which services are available in a given region, visit: https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/
The user you use to sign into AWS initially is the root user, so should only use that moving forward when absolutely necessary.
Groups can only contain users (not other groups).
Best practice is that all users are members of at least one group, thought this is not strictly necessary.
Create an admin user, add to a new group "admin", and give that group the "AdministratorAccess" policy.
The download CSV available after creating includes the credentials created.
Can create an alias for account, which then comes with a custom logic for that account.
Account Settings This is where you set up security policies, like requiring MFA and password restrictions.
Setting it up per user is from the user drop-down for your account.
CLI keys are under IAM, then select your user. Don't ever create CLI credentials using the root account.
CloudShell Alternative to CLI which works from the web console.
Roles are permissions for AWS services to run on your behalf.
IAM Credentials Report (account-level) Show things like when password was last change, MFA enabled, etc. for all users in the account.
IAM Access Report (user-level) Under the IAM Users after selecting an individual user. Allows you to see when permissions were last used (good audit for what to remove).
IAM Conditions: restrict policies based upon conditions, like deny IP ranges.
- can also use for something like restricting storage to only EU regions
- can configure it based upon tags
- can force MFA for certain actions
IAM Roles vs Resource Based Policy
- when assuming a role, you give up old permissions and take on new ones
- with policy, you don't give up your role permissions
IAM Permission Boundaries
- allow you to set maximum permissions an IAM entity can get
- supported for users and roles, not groups
- Similar to organization SCP, prevents giving access in identity-based perms
- use cases:
- delegate responsibilities to non-administrators, like creating users
- allow developers to self-assign policies without allowing privilege escalation
- useful for restricting one specific user
EC2 includes EBS, ELB, and ASG as well.
EC2 User Data - automated tasks that run once on first start-up. Runs a root
user.
(t2.micro is part of free tier up to 750 hours/month) [BTW, 31 * 24 = 744]
AMI (Amazon Machine Image)
When you launch an image from the console, it will indicate which images are "free tier" available. The simple linux one is.
When you create an instance, you may need to create a key pair. Use RSA. It's what you'll use to connect to the instance from SSH.
If you tag with "Name", that will be used in the listing in the console.
Public IPv4 will change each time you stop and start an instance.
Instance Types: https://aws.amazon.com/ec2/instance-types/ m5.2xlarge -> m: instance class; 5: generation (improves over time); 2xlarge: size within instance class
- General Purpose ("t" or "m"): good for web servers and code repositories.
- Compute Optimized ("c"): for high performance computing (batch loads, media transcoding, machine learning, gaming server, etc.) [minecraft?]
- Memory Optimized ("r" or "x"): high performance in memory, like DBs and caches. ["r" for RAM]
- Storage Optimized ("i" or "d"): high sequential read|write to local storage. OLTP, DBs, cache, data warehousing, file systems.
https://instances.vantage.sh/ for comparing instance costs and specs. (ec2instances.info)
Security Groups: the firewall for your instance, e.g., only open port 22 to my home IP address to prevent others from accessing.
- limited to a region or VPC combination.
- lives outside instance, so it doesn't even see traffic that's not allowed.
- if you timeout talking to instance, it's probably a Security Group issue. Connection refused, on the other hand, means your application is getting the request and not responding as desired.
- by default, all inbound blocked, all outbound open.
Best Practice: Maintain a separate security group just for SSH access.
Important ports to remember: 22 SSH/SFTP, 21 FTP, 80 HTTP, 443 HTTPS, 3389 RDP (MSWin remote access)
To SSH, grab public IPv4 address from console. Grab pem
file as well.
chmod 0400 <path-to.pem>
(otherwise will get security warning)
ssh -i <path-to.pem> ec2-user@<ip-addr>
EC2 Instance Connect allows you to get a shell using the web console. Still requires port 22 to be open.
Don't ever add your AWS credentials to an instance, e.g., if you wanted to SSH
in and run aws-cli
commands. Should be using IAM Roles instead.
EC2 purchase options:
- On-demand. Billed per second. Highest cost, but no up-front payment. For short-term and un-interrupted workloads where you cannot predict app behavior.
- Reserved, minimum 1 year. Includes "reserved instance" and "convertible reserved instances", the latter allows for flexible instances. Up to 75% cheaper than on-demand, bigger discount for 3-year commitment. Reserves specific instance type. Best for steady-state (like DB). Convertible allows you to change instance types within committed time-frame.
- Spot Instances: like on-demand, but less reliable. Cheapest, but price changes over time, and you can lose it if cost goes over your threshold. Only use for resilient workloads, like batch jobs, distributed workloads, like batch jobs, distributed workloads, jobs with flexible start/end times. 2-minute grace period. You have to cancel spot requests before terminating instances.
- Dedicated Host: reserves entire physical server. This is for compliance requirements and server-bound license restrictions. More expensive and require 3-year commitment.
- Dedicated Instances: reserved, but could still be sharing hardware.
Spot Fleet Can include spot and on-demand, configured to minimize cost to address compute needs.
Elastic IP Is a set IP address that you can use, limited to 5 per account. Since, otherwise, IP changes with each instance. Have to pay for this.
- Best practice, avoid using this. Better to register DNS and use the random IP AWS provides. Or load balancer.
Placement Group gives control over EC2 instance strategy. This is a configuration concept and does not cost in and of itself.
- Cluster: low-latency group in single AZ. 10 Gbps bandwidth between instances.
- Spread: spread instances across underlying hardware, max 7 per AZ. Use for critical applications. Reduced risk of simultaneous hardware failure.
- Partition: spread across partitions (racks) within AZ, can be 100s per group. (Hadoop, Kafka, Cassandra, etc.) Instances can get access to partition meta-data.
Elastic Network Interface [ENI] Logical component in VPC that represent a virtual network card. Bound to specific AZ. Gives control over private IPv4s.
- Can be created and added to instances on the fly.
- Can be used for a quick network failover.
EC2 Hibernate EBS is kept intact in next start. RAM is preserved (up to 150GB), so instance boot is much faster.
- Only supported by certain types ("c", "m", and "r")
- Root volume must be encrypted EBS.
- Only for on-demand and reserved instances.
- Can only hibernate up to 60 days.
- in example, shows that
uptime
is relative to initial time, not restart.
EC2 Nitro New underlying platform for EC2.
- Better performance (max 64k IOPS, vs 32k on non-Nitro)
- Includes most newer instance types (5s or up in many cases) [IOPS I/O operations per second]
vCPU: represents the number of threads * number of cores allocated in a VM.
- Can customize if you need more RAM but not the default number of CPUs (to help maintain costs)
- Or, if you want just 1 CPU per core for high-performance computing (HPC) workloads.
Capacity Reservations: Have a manual or planned end date, but not tied into 1-3 year commitment. Allow for faster instances, but cheaper than reserved types.
Like a virtual UBS stick. Can only be mounted to one instance in an AZ at a time, but can attach multiple EBSs to a single instance.
- Free tier includes 30GB of storage per month (SSD or magnetic).
- Uses network to communicate, so will be some latency versus straight to disk.
- Have to use snapshots to move across AZs.
- Have to provision GB and IOPS in advance.
Delete on Termination. Enabled by default on root, disabled for others, but can change.
Snapshots. Not necessary to detach before creating, but good idea.
Types:
- "gp": General purpose SSD
- "io": highest-performance SSD
- "st": low-cost HDD for frequent access, throughput-intensive workflows
- "sc": lowest-cost HDD for less frequent access. (only "gp" and "io" can be used as boot volumes)
Can adjust IOPS and storage capacity independently with these types.
Use Provisioned IOPS if you need more that 16,000 IOPS. io2 can get up to 64k IOPS.
io1/io2 support EBS Multi-Attach, all with read/write access. Good for high availability (like Teradata). Applications have to manage concurrent writes. Have to use cluster-aware file system.
Encryption: leverages KMS (AES-256)
- Copying an unencrypted volume does allow for encryption.
EC2 Instance Store high-performance disk attached to instance.
- Better I/O performance, but still ephemeral.
Works across AZs. Scalable, but about 3x cost of "gp". Pay per use.
- Uses NFSv4.1 protocol for mounting. Only works with linux systems.
- Have to use Security Groups to access, using an "NFS" rule.
Performance mode (set at creation time)
- General purpose: latency-sensitive, like web server or CMS
- Max I/O: higher latency, but highly parallel (big data, media processing) Throughput mode: bursting or provisioned modes Storage tiers (life-cycle, move after n days)
- standard: for frequently accessed files
- EFS-IA (infrequent access): lower price to store, but cost to retrieve
Have to install amazon-efs-utils
to mount on an instance.
Build for a specific region, though can be copied. Give faster boot times, since most necessary dependencies can be pre-packaged.
There are vendors that sell AMIs too.
Managed load balancer, so AWS takes care of upgrades, maintenance, etc. Can be public or private.
Instance has to respond to port and route for health checks, responding with 200 if OK.
Types
- CLB: Classic Load Balancer, deprecated. (doesn't support IPv6)
- ALB: Application Load Balancer, supports HTTP, HTTPS, WebSocket.
- NLB: Network Load Balancer, supports TCP, TLS, UDP.
- GWLB: Gateway Load Balancer, layer 3 (network layer), IP Protocol.
Security: ELB will be open to 80/443, but instance will only be open to ELB.
ALB. Layer 7 load balancer, for HTTP traffic. Can route to HTTPS from HTTP at load balancer.
- Can route based upon URL path, URL hostname (x.company.com vs. b.company.com), and even by query parameters.
- Greater for micro services and container-based (Docker, ECS)
- Target Groups:
- EC2 instances, including Auto Scaling Groups
- ECS tasks
- Lambda functions, translated to JSON events
- private IP addresses (e.g., on-prem).
- Get a fixed DNS name like XXX.region.elb.amazonaws.com
- Servers don't see IP of client directly, it's in
X-Forwarded-For
header.
NLB. Layer 4, TCP/UDP traffic. Better latency than ALB (~100ms versus ~400ms)
- Has one static IP per AZ, and supports assigning Elastic IP.
- Not included in free tier
- Target Groups:
- EC2 instances
- private IP addresses
- An ALB (like if you want the fixed IP feature)
- Security group for instances have to accept traffic from anywhere, because this balancer in just forwarding on the request with no header modifications.
Gateway Load Balancer: to manage 3rd party appliances via AWS.
- Works at level of IP packets. GENEVE protocol on port 6081.
Sticky Sessions / Session Affinity: when client is redirected back to same instance.
- uses cookies to track, application-based or duration-based.
- Cannot use "AWSALB~" in cookie name
- works with ALB (and Classic)
- Edit the attributes on the Target Group.
Cross-Zone Load Balancing, will send evenly across instances, even if the number of instances is not even between zones.
- Without this, the balance would be even between zones, but then would be imbalanced after that if instances spread that way.
- Always on for ALB, and you don't pay for inter AZ data
- With NLB, have to pay to enable.
SSL: Secure Sockets Layer TLS: Transport Layer Security Although most are using TLS now, colloquially still referred to as "SSL". ACM: AWS Certificate Manager (where the X.509 certs are managed)
Typical that SSL used between user and load balancer, then HTTP over private VPC to the individual instances.
SNI: Server Name Indication, clients can specify the hostname they reach
- Addresses how to load multiple certs on a single server.
- Only works with ALB and NLB (and CloudFront)
Deregistration Delay (Connection Draining) - allowing time for in-flight request to complete before deregistering an instance.
- Can scale based on CloudWatch alarms
- Can even push metric into CloudWatch using PutMetric API
- "Launch Template" is the newer approach to configuring.
- ASGs are free, just paying for instances.
Dynamic Scaling Policies
- Target Tracking, easiest to set up, like targeting 40% CPU on all instances
- Simple / Step Scaling, like a CPU threshold add or remove instances.
- Scheduled Actions, because you know when load spikes.
- Predictive Scaling, so AWS uses historical load to guess when to scale. Driven my machine learning.
Good metrics:
- CPU averages across instances.
- Request counts per target
- Average network in / out (when network bound)
Scaling Cooldown, default 300 seconds, so not autoscaling will happen during this time to allow for things to settle.
- use a ready-to-use AMI to reduce the cooldown time needed
Default termination policy
- first, select AZ with most instances
- then, delete one with oldest launch config
Lifecycle Hooks, can set additional measures of when an instance is ready or terminating.
Includes PostgreSQL, MySQL, MariaDB, Oracle, Microsoft SQL Server, and Aurora
RDS is a managed service, so automated provisioning and OS patching. But, cannot SSH into them. Storage backed by EBS. 20GB minimum.
Daily full backups, and transaction logs backed up every 5 minutes. 7-day retention default, but up to 35 days.
DB snapshots are manually triggered by user and can be kept as long as you want.
Scales automatically when running out of space. Have to set a "Maximum Storage Threshold" to prevent this from getting out of hand.
Up to 5 read replicas, in same AZ, cross AZs, or cross regions. Async, eventually consistent replication. Can be promoted to main if needed to stand as independent DB.
- If replica is in the same region, you don't pay network fee for data transfer, since it's a managed service. (Across regions costs $$$)
Multi AZ (disaster recovery). Synchronous replication across zones or region, with single DNS that automatically fails over to stand-by in case of failure of the master. Not useful for scaling.
- No manual intervention required for this to work.
Changing from single to multi-AZ is a 0-downtime operation, using a snapshot.
Can use a t2.micro with gp2 SSD to stay in free tier.
The estimated monthly cost will be listed when creating.
Security - Encryption
- At rest encryption: Can set at creation time, but if not set for master cannot be applied to read replicas.
- In-flight requires a certificate to encrypt data during communication.
To enforce:
- Postgres ->
rds.force_ssl=1
in AWS RDS Console Parameter Groups - MySQL ->
GRANT USAGE ON *.* TO 'mysqluser'@'% REQUIRE SSL;
- Postgres ->
- Can copy an unencrypted snapshot into an encrypted one. Restoring this would enable encryption if not set initially.
Should only expose DB within private subnet, using security groups.
Can use IAM-based auth to login for PostgreSQL and MySQL. Uses token with a 15-minute lifespan.
- Allows you to use IAM instead of DB users for security.
Compatible with PostgreSQL and MySQL. Optimized, so 5x performance over MySQL on RDS and 3x performance over PostgreSQL RDS.
- Automatically grows, starting at 10GB and up to 128TB.
- Up to 15 read replicas (vs. 5 RDS), with faster replication and failover.
- 20% more expensive
Stores 6 copies of your data across 3 AZs. Needs 4 of 6 to write, 3 of 6 to read. Self-healing peer-to-peer healing.
Only one instance ("master") takes writes, with <30s failover.
Can set up auto-scaling for read replicas. Load balance is at connection level (not statement)
Backtrack: restores data at any point in time without using backups.
Security is same as RDS
Can defined a custom endpoint which accesses just some of the read replicas, for example if some of them are beefier machines. General, default read endpoint is not used together with custom, but instead each use-case uses a custom endpoint.
Serverless: when you have infrequent, intermittent, or unpredictable workloads. No capacity planning needed. You pay per second (and can be cheaper as a result).
Multi-Master: when you want HA (high availability) with immediate failover. Every node does R/W.
Global: multi-region (up to 5 secondary, read-only). Up to 16 replicas per region.
- RTO (Recover Time Objective) of <1min to fail over to another region.
Machine Learning: Aurora has integration with SageMaker (any ML model) and Comprehend (for sentiment analysis)
Managed Redis or Memcached. Helps reduce load off DBs with heavy reads.
- Using ElastiCache does involve heavy code changes, since you are responsible for managing hitting the cache before hitting the DB and cache invalidation.
- Useful for storing user sessions in stateless apps.
Redis: multi-AZ with auto-failover, read replicas to scale reads. Has data durability (with AOF persistence). Backup and restore features.
Memcached: multi-node for partitioning data (sharding). No replication, persistence, backup, or restore.
Security: Does not support IAM, so policies are only used for AWS API-level security.
- Redis AUTH available as extra level of security.
- Memcached supports SASL
Patterns:
- Lazy Loading: read data is caches and can become stale
- Write Through: adds or updates when written to DB
- Session Store: temporary, TTL features
Use cases
- Redis: classic is gaming leaderboard, using sorted sets to guarantee uniqueness and ordering
RDS Databases ports:
- PostgreSQL: 5432
- MySQL/MariaDB: 3306
- Oracle RDS: 1521
- MSSQL Server: 1433
DNS:
- Domain Registrar: Amazon Route 53, GoDaddy, etc.
- DNS Records: A, AAAA, CNAME
- Zone File: contains DNS records
- Name Server: resolves DNS queries (authoritative or non-authoritative)
- Top Level Domain (TLD): .com, .gov, .org, ...
- Second Level Domain (SLD): amazon.com, google.com, ...
- Root DNS Server, managed by ICANN (routes to
.com
) - TLD DNS Server, managed by IANA (branch of ICANN) (i.e., where in
.com
is the address?) - SLD DNS Server, managed by domain registrar (e.g., Amazon Registrar)
Route 53 allows you to update the DNS records. "Authoritative DNS"
- Also a Domain Registrar
- Can check the health of your resources
- only AWS service with 100% SLA
- "53" because that's the traditional DNS port
Record Types:
- A: maps to IPv4
- AAAA: maps to IPv6
- CNAME: hostname to other hostname, target must be A/AAAA.
- Cannot create to top node (Zone Apex). e.g., can't do "example.com" only "www.example.com"
- NS: Name Servers for the Hosted Zone, controls how traffic routed to domain.
Hosted Zone: container for records defining subdomain routes.
- Public Hosted Zones, open internet access to subdomain.
- Private Hosted Zones, route limited to VPCs ("private.example.internal")
- $0.50/month per hosted zone
TTL: on DNS record. Longer has risk of going stale, but shorter is $$$
- works for every record except Alias
- If you are planning a change, can set a low TTL for a day or so, let clients adjust, then set back to a high TTL once change is complete.
CNAME vs Alias
- CNAME doesn't work for root domain, Alias can
- Aliases are free of charge and have native health check
- Aliases always A/AAAA, and cannot set TTL
- Alias automatically changes resource IP address based upon AWS allocations
Alias:
- include ELB, CloudFront Distributions, API Gateway, Elastic Beanstalk,
S3 Websites, VPC Interface Endpoints, Global Acelerator, Route 53 same zone
- Cannot use Alias for an EC2 DNS name
Routing Policies:
- Support for Simple, Weighted, Failover, Latency based, Geolocation, Multi-Value Answer, Geoproximity (using Route 53 Traffic Flow)
- Simple: single resource. If multiple values in record, random is chosen
(client gets the list and chooses one)
- Cannot be associated to a Health Check
- Weighted: control % of requests going to each resource, allows Health Checks.
- Set weight to 0 to pull instance out of balancer
- Latency: should end up being the closest instance, allows Health Checks.
- You have to indicated the region, AWS doesn't currently inspect this
- Geolocation: based upon user location.
- Should have default in case there's no match.
- Can be used for localization too
- Geoproximity: based upon users and resources, based upon 'bias' (-99 to 99)
- Can specify latitude and longitude for non-AWS resources
- Multi-Value: up to 8 health records per query, up to client to select one.
- Not a substitute for ELB
Health Checks: mainly for public resources
- Types:
- monitor a single endpoint
- 15 global checkers, default threshold of 3 OK is healthy (>18%)
- Can set interval, 30 sec default, setting lower costs more money
- 2xx and 3xx responses are healthy
- Can parse text in first 5120 bytes of response to set pass/fail
- have to open to Route 53 servers (documented online)
- monitor other health checks (Calculated Health Checks)
- combine checks with OR, AND, or NOT
- monitor up to 256 child health checks
- monitor CloudWatch Alarms (e.g., throttles in DynamoDB, alarms on RDS)
- Can be used to monitor private (behind VPC) resources with alarms
- monitor a single endpoint
Failover: Active-Passive
- have to use a Health Check to failover from primary to secondary
Traffic Flow: Visual editor for configuration, especially helpful for approaches like Geoproximity. Includes a map to show how it will affect flow.
- $50/month just to create.
Domain registrars: Will need to update Nameservers with the registrar to point at the Hosted Zone details from AWS.
Well Architected Application, 5 pillars: costs, performance, reliability, security, operational excellence.
Stateless Web App: WhatIsTheTime.com
- Don't need a DB, can scale vertically and horizontally
Stateful Web App: MyClothes.com
- Need to maintain a shopping cart. Options:
- Stickiness in ELB?
- Stateful cookies (containing shopping cart)? (heavier payloads up to 4KB, security risks)
- session_id and store cart in ElastiCache (or DynamoDB)
- Storing user info in a DB (RDS)
Shared Storage: MyWordPress.com
- Using EFS to share common storage between redundant instances (vs. EBS)
- Could use Aurora for Multi-AZ
Instantiating Quickly
- EC2 Instances:
- Golden AMI: all OS dependencies installed ahead of time for faster startup
- Bootstrap using User Data (for dynamic config)
- Hybrid of these two is Elastic Beanstalk
- RDS and EBS: use snapshots to speed up
Developer-centric view of AWS deployment. Bundles configuration of multiple services, so you pay for those services when they launch.
Supports a lot of language environments, as well as docker and custom.
Components:
- Application: collection of Beanstalk components (environments, version, config, ...)
- Application Version: an iteration of your app
- Environment: Collection of AWS resources. Can has multiple (dev, test prod)
- Tiers: Web Server Environment Tier (ELB-centered) & Worker Environment Tier (SQS-centered)
Allows storage of objects (files) into "buckets" (directories). "key" is full path to file in bucket, made up of "prefix" and "object name"
Bucket names must be globally unique. Defined at region level. But S3 console is global.
Naming conventions: No uppercase, no underscore, 3-63 characters long, not an IP, start with lowercase letter or number
Max size: 5TB. Can only upload 5GB at time, so have to use multi-part upload to hit TB.
Versioning: overwrites existing files, but keeps history.
- Version ID will be null if uploaded when versioning not enabled.
- When you delete, it only adds a marker, and you can see older versions when
"List versions" enabled.
- if you delete from the versioned view, it will be permanent
Encryption:
- SSE-S3: encrypts S3 objects using keys handled and managed by AWS (AES-256)
- must set header "x-amz-server-side-encryption":"AES256"
- SSE-KMS: leverage AWS Key Management Service
- advantage of user control over keys and audit trail
- must set header "x-amz-server-side-encryption":"aws:kms"
- "CMK": Customer Master Key
- SSE-C: manage your own keys
- Amazon doesn't store keys, so have to use HTTPS to transfer, including data key in header.
- Client Side Encryption
- There is "S3 Encryption Client" from AWS that can be used.
Security:
- User based (IAM policies), in JSON
- can block public access through these settings
- Resource based: bucket policies, Object Access Control List (ACL) or bucket ACL (less common)
- access is whether IAM OR resource permissions allow.
- Can log and audit and store in another S3 bucket or access via CloudTrail
- Can enable MFA for deletion.
- Can have pre-signed URLs to resources with a limited time of access
IAM: Difference between bucket and bucket/*
- list actions are on bucket level, like
arn:aws:s3:::blah
- get/put/delete are on objects level, like
arn:aws:s3:::blah/*
S3 Websites: host static sites from .s3-website..amazonaws.com
- will get 403s if policy doesn't expose to public
CORS [Cross-Origin Resource Sharing]
- origin is a scheme (protocol), host (domain), and port (e.g., https://www.example.com)
- browser-based security to prevent access across schemes
- CORS Headers: "Access-Control-Allow-Origin" and "Access-Control-Allow-Methods"
- These are defined on the secondary resource, allowing the primary resource to access.
Consistency Model
- Strong consistency as of December 2020
- After a successful PUT or DELETE, subsequent read or list request reflects changes. "read after write consistency" and "list consistency"
Can create in-line policies attached to a role, but not recommended.
Can use UI to create policies, and there is an AWS Policy Generator tool as well.
AWS Policy Simulator: Tool online for testing your policies
EC2 Instance Metadata: Allows EC2 instances to learn about themselves without using an IAM role!
- http://169.254.169.254/latest/meta-data/ (internal to AWS from instance, trailing "/" important) [also /user-data/ and /dynamic/ at same base]
- You can get IAM Role from meta data, but not the IAM Policy
- "Metadata": info about the EC2 instance, vs.
- "Userdata": launch script of the EC2 instance
- Don't need an IAM role to access this information.
SDK: The CLI uses the Python SDK (boto3).
- If you don't select a region, SDK defaults to "us-east-1"
- Some things have to be done through the SDK, no available to console
MFA-Delete: can only be enabled via CLI and by root account
- need to create an API token to do this, BUT not recommended to use those
creds for anything else.
- Best to deactivate and delete after use.
aws configure --profile root-access-<you-decide>
aws s3api put-bucket-versioning --bucket <bucket-name> \
--versioning-configuration Status=Enabled,MFADelete=Enabled \
--mfa "<arn-of-mfa-device> <mfa-code>" \
--profile root-access-<you-decide>
^^ That "mfa-code" is the actual 6-digit code right now.
- To disable, same command but "MFADelete=Disabled"
Encryption: few ways including two different policies (DenyIncorrectEncryptionHeader and DenyUnEncryptedObjectUploads)
- Can set default encryption in S3
- Bucket Policies are evaluated before "default encryption"
Access Logs: allows logging of all interaction with S3.
- WARN: Never set logging bucket to be one you are monitoring.
S3 Replication (CRR & SRR)
- have to have versioning enabled to use
- Cross Region Replication (CRR) use cases: compliance, lower latency access, replication across accounts
- Same Region Replication (SRR) use cases: log aggregation, replication between prod and test accounts.
- Buckets can be in different accounts
- copying is asynchronous
- Must have proper IAM permissions to S3
- only NEW objects are replicated, not retroactive
- optional how to handle deletions
- version ID is the same between replications
- No chaining of replication
Pre-signed URLs
- for downloads can use SDK or CLI, but uploads must use SDK
- default expiration is 1 hour, change with --expires-in [seconds]
- URL will inherit the permissions of the role used to create URL, so can include GET / PUT
aws s3 presign s3://<bucket>/<key> --region <region> --expires-in 300
If there are issues, may have to do
aws configure set default.s3.signature_version s3v4
Storage Classes:
- Standard: High durability (11 9s)
- Standard-Infrequent Access (IA): for anything less frequently accessed, but
requires rapid access when needed. Lower cost vs Standard
- Good for disaster recovery and backups
- minimum storage is 30 days
- minimum size (charge) is for 128KB
- One Zone-Infrequent Access: about 20% cheaper than IA, but 99.5% availability
- Good for storing secondary backups or data that can be recreated (like thumbnails).
- Intelligent Tiering: small fee to have AWS move data between Standard and IA based upon access patterns
- Glacier: Ideal for data needing long-term retention (think 10 years),
alternative to on-prem magnetic tape
- $0.004/GB + retrieval cost
- called "archive" not objects, and up to 40TB each, and "vaults" not buckets
- Retrieval options: Expedited (1–5 min), Standard (3–5 hr), Bulk (5–12 hr)
- minimum storage duration is 90 days
- minimum size (charge) is for 40KB
- Glacier Deep Archive:
- Retrieval options: Standard (12 hr), Bulk (48 hr)
- minimum storage duration is 180 days
- (There's also a Reduced Redundancy Storage which is deprecated)
Lifecycle Rules: Allows transition and expiration actions
- e.g., move to IA after 60 days, Glacier after 6 months, delete after x days
- Can use to remove older "versions"
Storage Class Analytics: Daily report, takes 24–48 hrs to start, for transitioning between Standard and IA (but not One-Zone or Glacier)
- Good first step for improving Lifecycle Rules
Baseline performance:
- 100-200ms latency
- at least 3500 PUT/COPY/POST/DELETE per second (per prefix)
- at last 5500 GET/HEAD per second (per prefix)
- "prefix" -> anything in the key up to the filename
KML Limitation: Using SSE-KMS can hit up against KMS limits
- Uses GenerateDataKey and Decrypt APIs when accessing, so counts against
quota (can be 5500, 10000, or 30000 req/s depending upon region)
- Have to use Service Quotas Console to go over those limits
Multi-Part upload, parallelizes upload to speed upload
- recommended for > 100MB
- required for > 5GB S3 Transfer Acceleration: moves files to edge to speed up transfers
- e.g., in U.S. you upload to U.S. server, and transfer to an Australian server is does inside AWS for you. Byte-Range Fetches: for speeding up GETs (and better resilience)
- pulls byte chunks of source file
- can also be used to only retrieve header
S3 Select & Glacier Select: use SQL queries for server side filtering to retrieve less data.
- Can only filter by rows and columns, but pushes filtering server-side for speed up
Event Notifications:
- options: SNS topic (to email, SMS, or HTTP endpoints), SQS queue, Lambda function
- typically delivered within seconds, but count take longer than 1 min
- Need versioning enabled to ensure notifications on all writes
Requester Pays: Owner pays for storage, but requester pays for networking costs
- Requester must be logged into AWS.
- Good for sharing cost of large datasets
Glacier Vault Lock: to adopt a WORM (Write Once Read Many) model
- A file can only be written once, helpful for compliance and data retention. S3 Object Lock: WORM
- have to designate time to block deletion
- Retention: 1) Retention Period (fixed period), 2) Legal Hold (no expiry date)
- Modes: 1) Governance (block overwrite/deletion except special permisions) 2) Compliance (even root cannot overwrite/delete!)
Serverless query service to perform analytics against S3 objects
- Uses standard SQL
- Supports CSV, JSON, ORC, Avro, and Parquet (on Presto)
~
$5/TB data scanned, so compressed and columnar data saves $ - Can connect to QuickSight for reporting
- Use cases: BI, analytics, analyzing VPC Flow Logs, ELB logs, CloudTrail
- search online for "Athena Query Access Logs" for SQL examples to set up Athena
Content Delivery Network (CDN), to improve read performance cached at the edge. At least 216 points of presence globally (edge locations).
- includes DDoS protection, integration with Shield, AWS Web Application Firewall
Origins:
- S3 bucket, includes enhances security with Origin Access Identity (OAI)
- Can also be used as an ingress for uploading S3 files
- Custom Origin (HTTP), like ALB, instance, S3 as website, any HTTP service.
Acts as a cache for requests. Passes along entire request, including headers and query strings.
With ALB or EC2 instances, the origin must be publicly available for CloudFront to access them. There's a list of the CloudFront IPs that you can search for to add to security groups.
Geo Restriction: can use whitelist or blacklist, to prevent banned countries from accessing (like for copyright laws, etc.)
CloudFront vs S3 Cross Region Replication
- CloudFront: files are caches for a TTL, great for static content that must be available everywhere.
- S3 CRR: must be set up in each region you want it, changes are near real-time, read-only, great for dynamic content that needs to be available at low cost in a few regions.
Signed URL / Signed Cookies
- to restrict access to content, such as premium or paid access.
- Needs a policy with:
- URL expiration, can be minutes or years
- IP ranges to accept from
- Trusted signers (which AWS accounts can create the signed URLs)
- Signed URL: one per file
- Signed Cookies: allow access to multiple files.
- Your application has to use the SDK to generate the Signed URL, then return to the client to use for access.
CloudFront Signed URL vs S3 Pre-Signed URL
- CloudFront allows access to path no matter the origin; account wide key-pair; can filter by IP, path, date, expiration; and leverages caching
- S3 uses the creator's IAM access for the URL, and has a limited lifetime
Price Classes:
- All: best performance
- 200: most regions, but excludes most expensive regions (North America, Europe, Asia, and Africa, NOT South America or Australia)
- 100: only the least expensive regions (North America & Europe)
Multiple Origin: can set different behaviors based upon path (e.g., /images/* vs /api/*). Origin Groups: To increase high availability, with primary and failover origins.
Field Level Encryption: additional layer along with HTTPS
- Information encrypted at the edge close to the user, using asymmetric encryption
- Usage: designate set of fields in POST to encrypt (like Credit Card info)
- Final web server then has access to key to decrypt.
Unicast vs Anycast IP
- Unicast: one server holds the IP
- Anycast: multiple servers have same IP address and client routed to closest
Leverages AWS internal network to route to your application
- Creates two IP addresses for your app
- works with Elastic IP, EC2, ALB, NLB, public or private
- No issue with client cache because IP doesn't change regardless of location
- Includes health checks with <1min failover
- Easy to whitelist with only 2 IPs
- DDoS protection with AWS Shield
vs CloudFront
- proxying packets at edge
- good for non-HTTP use cases, like gaming (UDP), IoT (MQTT), or VoIP
- good for HTTP cases that require a static IP address
Highly secure, portable devices to collect and process data at the edge, migrating data in and out of AWS.
- Data migration: Snowcone, Snowball Edge, Snowmobile
- Edge computing: Snowcone, Snowball Edge
These are offline devices that AWS sends to you for migrating massive amounts of data. AWS puts the data in S3 for you. Rubric: if it would take more than a week to transfer data, use a Snowball device.
Snowball Edge: physical data transport for TBs or PBs of data.
- Storage Optimized: 80TB of HDD capacity
- Compute Optimized: 42TB of HDD capacity
- Can cluster up to 15 together
Snowcone: smaller device than Edge, more rugged. 8TBs of storage.
- Better if the Edge device is too large.
- Can send back to AWS or connect to AWS DataSync.
Snowmobile: is an actual truck. Exabytes of data (1 EB = 1000 PB = 1000000 TB)
- each snowmobile is 100 PB of capacity
- recommended for > 10 PB
Edge computing: processing data when it is away from the internet, like a truck, ship, or in a mine. Assumes limited or no internet access.
- Edge devices can run EC2 instances or Lambdas directly (AWS IoT Greegrass)
AWS OpsHub: GUI you can install to manage these Snow devices
Snowball cannot import into Glacier directly, so would have to use a Lifecycle Policy to transfer.
Launch 3rd party high-performance file systems on AWS, like Lustre, Windows File Server, or NetApp ONTAP. Fully managed.
Windows: supports SMB protocols and NTFS. Active Directory integration.
- This is what you would use for shared Windows FS (EFS is POSIX)
Lustre: "Linux" + "cluster", parallel distributed file system for large-scale computing.
- Machine learning or High Performance Computing (HPC)
- e.g., Video processing, financial modeling, electronic design automation
- Seamless integration with S3
File System deployment options:
- Scratch: temporary, data is not replicated. But high bursts (6x faster)
- good for short-term processing to optimize costs
- Persistent: long-term storage, replicated in AZ, replace failed files in minutes
- usage: long-term processing, sensitive data
For hybrid cloud solutions, some in AWS some on-prem. Bridge storage to S3. Allows for import into EBS, S3, or Glacier. This is something installed in our corporate data center to smooth communication to AWS.
Three types:
- File Gateway: using NFS and SMB protocol. Can be integrated with Active
Directory locally.
- Can get native access for Windows File Server.
- Includes local cache for frequently access data.
- Volume Gateway: block storage using iSCSI protocol
- Cached volumes: low latency access to most recent data
- Stored volumes: entire dataset on premise with scheduled backups to S3
- Tape Gateway: uses Virtual Tape Library (VTL) to back up to S3, iSCSI protocol.
You can also order a hardware appliance through the console.
Fully-managed service for file transfers into and out of S3 or EFS using FTP. Can integrate with existing authentication systems (AD, LDAP, Okta, Cognito, ...)
Producers send messages to a Queue, Consumers poll for messages.
Fully managed. One of AWS's oldest systems. Unlimited throughput and unlimited number of messages in a queue.
- Default retention: 4 days, maximum of 14 days
- Low latency, < 10ms on publish and receive
- Limit of 256KB per message
- Can have duplicate messages, "at least once" delivery
- Can have out of order messages, "best effort ordering"
Producer uses SendMessage
API of SDK
- message is persisted until consumer deletes it Consumer
- can poll for up to 10 messages at a time.
- use the
DeleteMessage
API to remove processed messages - perfect use case for ASG to scale consumer instances to handle queue
- CloudWatch Metric: Queue Length =
ApproximateNumberOfMessages
- So CloudWatch Alarm can trigger scaling on queue length
- CloudWatch Metric: Queue Length =
Security
- Encryption:
- In-flight using HTTPS API
- At-rest using KMS keys
- Client-side encryption if client does the work.
- IAM policies to regulate API
- SQS Access Policies for cross-account access
Access Policy
- Cross Account Access: different AWS account polling
- Publish S3 Event Notifications to SQS Queue needs an explicit policy
Message Visibility Timeout: after message polled by consumer, it becomes invisible to other consumers.
- default timeout of 30 seconds
- consumer can call
ChangeMessageVisibility
API to extend the lease if taking more than the limit- if the window is missed, can get duplicate message handling.
- There is a receive count to see how many times a message was received by some consumer.
Dead Letter Queues (DLQ): useful for debugging
- able to set threshold on how many times a message can be received without
getting handled,
MaximumReceives
. - Set high retention to be able to check these messages.
- configured against another queue to send to.
Delay Queue: up to 15 minute delay to avoid being handled right away
- Default is 0 seconds, can set at queue level
- Can explicitly send using
DelaySeconds
parameter
Long Polling: consumer waiting for messages when there are none available
- up to 20 seconds (preferred)
- decreases number of calls to API, while increasing efficiency and latency of application
- wait time between 1–20 seconds
- consumer can also set with
WaitTimeSeconds
Request-Response Systems:
- system where there are separate request and response queues to separate producers and consumers from each other. Need correlation ID to map responses.
- There is a "SQS Temporary Queue Client" (Java-based) that AWS provides for this pattern. It uses virtual queues to be more cost-effective.
ASG: set up custom metric of Queue Length / Number of Instances
- need second alarm for reducing number of instances
- nothing provided out of the box from AWS for this
Limited throughput: 300 msg/s without batching, 3000 msg/s with Exactly-once send capability, and guarantee that messages are processed in order.
Naming: has to end in ".fifo"
Allows for content-based deduplication within window
Pub/Sub system. Max 10,000,000 subscriptions/topic. 100,000 topic limit
"Event producer" only sends to one SNS topic "Event receivers" (subscribers) can have as many as we want to listen for notifications.
Each subscriber gets all messages sent to a topic (although filtering is available)
Possible subscribers: SQS, HTTP (with retries), Lambda, email, SMS, Mobile
SNS integrates with a LOT of AWS services.
Also "Direct Publish" for mobile SDKs, to publish to a platform endpoint
Fan Out: SNS + SQS
- Push once in SNS, all SQS subscribers receive.
- fully decoupled, no data loss, and SQS provided data persistence.
- And can add new SQS subscribers over time.
- Example: S3 events to multiple queues.
- Only allowed to have one S3 event rule per event type and prefix
SNS FIFO Topic is available (same throughput limitations as SQS)
- Only SQS can subscribe (unlike Standard).
Message Filtering: JSON policy on topic subscriptions
To collect, process, and analyze streaming data in real time
- Kinesis Data Streams: capture, process, and store data streams
- Kinesis Data Firehose: load data streams into AWS data stores
- Kinesis Data Analytics: analyze data streams with SQL or Apache Flink
- Kinesis Video Streams: capture, process, or store video streams
Data Streams
- Divided into shards, 1 MB/sec or 1000 msg/sec per shard
- Producers push Records into stream with Partition Key and Data Blob up to 1MB
- Consumer also receives Record with Sequence number
- Can have fan-out with multiple consumers
- Shared 2 MB/sec per shared for all consumers (pull)
- or, Enhanced, 2 MB/sec per share per consumer (push) (more $)
- Retention of 1 day (default) up to 365 days
- Allows for replaying of data.
- Immutability: once data entered into stream, cannot be deleted
- Producers: SDK, Kinesis Producer Library (KPL), Kinesis Agent
- Consumers: SDK, Kinesis Client Library (KCL), Lambda, Firehose, Analytics
- Use the partition key to ensure ordering in Kinesis
- With SQS would use a Group ID for similar partitioning
Data Firehose
- Reads records up to 1MB at a time
- Can have Lambda function for any transformations
- Then does batch writes into target data store (so, near real-time service)
- Near Real Time: 60 sec latency for non-full batches
- Or, minimum 32 MB of data at time
- Destinations: S3, Redshift (via COPY through S3), ElasticSearch
- Also 3rd parties: Datadog, Splunk, New Relic, mongoDB
- or custom via HTTP endpoint
- Able to send all or just failed data to S3 backup bucket
- Fully managed, so pay for data going through.
Data Analytics
- Sources & Sinks: Kinesis Data Streams or Kinesis Data Firehose
- Use SQL statements to process and redirect real-time data
- Fully managed, pay for consumption rate
- Use cases: Time-series analytics, real-time dashboard or metrics
Managed Apache ActiveMQ
Allows you to use existing protocols (SNS/SQS are proprietary)
- MQTT, AMQP, STOMP, Openwire, WSS
Use case: migrating existing applications into the cloud without rewriting your queueing.
Doesn't "scale" as much as SQS/SNS, runs on dedicated machine, but can run with HA failover.
- failover using EFS within region but across AZ.
For launching Docker containers on AWS. You must provision and maintain the infrastructure (EC2 instances), but AWS takes care of starting/stopping containers. Integrates with ALB.
Launch Types:
- Amazon EC2 Launch Type: ECS cluster can cross AZ within region, in a single ASG. ECS Agent runs on each instance.
- Fargate: Launch Docker containers but you do NOT provision the
infrastructure. (i.e., simpler) ... "serverless" offering
- AWS calculates based upon CPU/RAM that you need.
- Launches "Task" inside cluster, using ENI for network interface. Each takes an IP, so have to have enough dedicated in the cluster.
IAM Roles for ECS Tasks
- EC2 Instance Profile: used by the ECS agent, to make API calls to ECS service, send container logs to CloudWatch, pull Docker image from ECR, and reference Secrets Manager or SSM Parameter Store.
- ESC Task Role: specific to each task for what your app/service needs.
- defined in the task definition, unique by task type.
EFS can be used to share data between instances/tasks.
ECS Services & Tasks
- Common with ALB to have each task get assigned a random port that then ties
into the load balancer. "dynamic port mapping"
- Security: must allow any port from the ALB security group.
- Fargate is different, since each task gets own IP from ENI, so same port is exposed by task.
Event Bridge: build a rule to run an ECS task, which handles launching task for you.
ECS Scaling Examples
- CloudWatch Metric based upon ECS Service CPU Usage, trigger CloudWatch Alarm
- Can also increase instances with Capacity Providers (i.e., not Fargate)
- Could also configure to something like SQS queue length, if workers in ECS
ECS Rolling Updates
- can control how many tasks are started and stopped and in which order
- Use minimum and maximum healthy percent to determine rate
Store, manage, and deploy containers, paying for what you use Fully integrated with ECS & IAM, backed by S3
- Can do vuln scanning and other actions on these images
- Integration with CodeBuild
Kubernetes is open-sources system for automatic deployment, scaling, and management of containerized (usually Docker) applications.
- Alternative to ECS, similar goal but different API
- Supports both EC2 and Fargate modes, like ECS
- Kubernetes is cloud-agnostic
- "Pods" include nodes, exist inside ASG
"Serverless" now means that you don't manage/provision the server. Idea was pioneered by AWS Lambdas, though, meaning Function as a Service (FaaS).
Includes: AWS Lambda, DynamoDB, Cognito, API Gateway, S3, SNS, SQS, Kinesis Data Firehose, Aurora Serverless, Step Functions, Fargate
Virtual functions, limited by time (up to 15 min), that run on demand, with automated scaling
Free tier: 1,000,000 Lambda requests and 400,000GB of compute time per month
Can increase RAM up to 10GB (which also improved CPU and network)
Languages: Node.js, Python, Java, .NET Core, Golang, C# / Powershell, Ruby
- Customer Runtime API (community supported) to run other languages
- Lambda Container Image: implementing Lambda Runtime API
- ECS / Fargate preferred for arbitrary Docker images
Limitations:
- Execution: 128MB–10GB
- 4KB of environment variables
- 512MB disk capacity in "function container" under /tmp
- Concurrent executions: 1000 (but can request more)
- Deployment: 50MB max (compressed ZIP)
- 250MB uncompressed
- Can use /tmp to load other files during startup
Lambda@Edge: deployed alongside CDN using CloudFront
- Use lambdas to change request or response, four types:
- viewer request: between user and CloudFront
- origin request: between CloudFront and origin
- origin response
- viewer response
- ... so could avoid hitting origin by using viewer request and response
- Use cases: security and privacy, dynamic web app at edge, SEO, intelligently route across origins and data centers, bot mitigation, real-time image transformation, A/B testing, authentication and authorization, user prioritization, tracking and analytics
NoSQL DB, Fully managed, highly available with replication across multiple AZs
- millions of requests/sec, trillions of rows, 100s of TB of storage
- low latency on retrieval
- enables event-driven programming with DynamoDB Streams
DB is made of tables, primary key must be decided at creation time, infinite number of rows.
- each item has attributes that can be added over time and can be null
- max item size 400KB Data types:
- Scalar types: string, number, binary, boolean, null
- Document types: list, map
- Set types: String set, number set, binary set
Primary key can be composed of 1–2 columns. Partition Key (required) and Sort Key (optional)
Read/Write Capacity Modes
- Provisioned (default): specify read/writes per second, so plan ahead
- pay for provisioned Read Capacity Units (RCU) and Write Capacity Units (WCU)
- Can add auto-scaling based upon RCU and WCU target utilization
- On-Demand Mode
- scales based upon reads/writes, so pay for what you use
- about 2–3x more expensive, but great for unpredicatable workloads
DynamoDB Accelerator (DAX)
- Seemless in-memory cache for DynamoDB
- Helps solve read congestion by caching
- microsecond latency for cached data
- Doesn't require any changes to application logic
- 5 minutes TTL (default)
- vs. ElastiCache, EC would be better for storing calculated or aggregated data
DynamoDB Streams
- ordered stream of item-level modifications (create/update/delete)
- can send to Kinesis Data Streams, Lambda, or Kinesis Client Library apps
- Data retention up to 24 hours
- Use cases: react to changes in real-time (like welcome e-mails), analytics, insert into derived tables, insert into ElasticSearch, cross-region replication
DynamoDB Global Tables
- Two-way replication across regions, for low latency across them
- "Active-Active" replication, can read and write to table from any region
- Have to enable DynamoDB Streams as a prerequisite
Time To Live (TTL)
- automatically delete items after an expiry timestamp (built in function)
- You explicitly add an attribute to your items that it interacts with
Indexes
- Global Secondary Indexes (GSI) and Local Secondary Indexes (LSI)
- Allow for queries on attributes that are not the primary key
Transactions
- allow a guarantee that you write to two tables or neither.
Publicly available REST endpoint for triggering serverless, proxying request to Lambda.
- supports WebSocket Protocol
- can build in API keys and request throttling
- Swagger / Open API import support
- Can transform and validate requests at the Gateway
- Can cache API responses
- Maximum (and default) timeout is 29 seconds
Can expose any AWS service through the API Gateway.
One use case is to use API Gateway in front of HTTP endpoints to handle rate limiting.
Endpoint Types:
- Edge-Optimized (default): for global clients, routing through CloudFront Edge
- API Gateway still lives in only one region
- Regional: for clients in the same region
- Private: only accessible from VPC using a resource policy
Security:
- IAM Permissions: good for access within your own infrastructure
- Leverages "Sig v4" capability with IAM creds in headers
- no additional costs, but limited to folks with IAM permissions
- Lambda Authorizer: uses Lambda to validate token in header
- Can cache result of authentication (for up to 1 hour)
- use with OAuth, SAML, other 3rd party auth
- Lambda must return IAM policy for user
- Cognito User Pools: API Gateway automatically verifies with Cognito
- Cognito only helps with authentication, not authorization
- up to your backend to handle authorization
- pools can be backed by Facebook, Google, etc. logins
When we want to give users an identity to interact with our application
Cognito User Pools (CUP):
- Sign-in functionality for app users
- Allows for simple login, like username/password, incl. 2FA
- Sends back JSON Web Tokens (JWT) Cognito Identity Pools (Federated Identity):
- Provide AWS credentials to users so they can access resources directly
- Get temporary credentials back from pool with temporary, predefined IAM policy
- Example use: temporary access to write to S3 Cognito Sync (deprecated and replaced by AppSync):
- synchronize data from device to Cognito
Federation options: SAML 2.0, Custom Identity Broker, Web Identity Federation with or without Amazon Cognito, Single Sign On, Non-SAML with AWS Microsoft AD
- With federation, no need to create IAM users, that's managed outside of AWS
SAML 2.0 Federation: to integrate with Active Directory / ADFS with AWS
- provides access to AWS Console or CLI
- no need to create an IAM user for each employee
- this is considered "old way" of doing things, Amazon SSO is newer/simpler
Custom Identity Broker Application
- use only if identity provider is no compatible with SAML 2.0
- Use AssumeRole or GetFederationToken APIs
- your custom broker is the one talking to STS in this case
Web Identity Federation, AssumeRoleWithWebIdentity
- Not recommended by AWS (use Cognito instead)
Framework for developing and deploying serverless applications
- Configuration is all in YAML
- Also allows for running locally for testing/debugging
- integrates with CodeDeploy
Based on PostgreSQL, but not used for OLTP. Designed for OLAP (OnLine Analytical Processing).
- Can store PB of data.
- Massively Parallel Query Execution (MPP)
- Integrated with BI tools, like Quicksight or Tableau
- 1–128 nodes, each up to 128TB of space
- Leader node: for query planning and results aggregation
- Compute node: for performing the queries and send results to leader
- AWS claims 10x performance vs other data warehouses
Redshift Spectrum: run queries directly against S3 without loading.
- query is distributed among 1000s of Spectrum nodes (separate from compute nodes)
Redshift Enhanced VPC Routing: COPY / UNLOAD commands go through VPC instead of public internet.
No Multi-AZ mode, need to use snapshots (stored in S3) for recovery.
- snapshots are incremental (i.e., diffs)
- Can restore snapshots into a new cluster
- Can configure to automatically copy snapshots to another Region
- Can automate, e.g., every 8 hours, every 5GB, or on schedule
- Can set retention (TTL), otherwise remain until manually deleted.
Load data:
- Kinesis Data Firehose (through S3 copy)
- S3 bucket using COPY command (with or without VPC Routing)
- Push from EC2 using JDBC driver (be sure to write in batches)
vs Athena: Redshift has faster queries, joins, and aggregations by using indexes.
Managed extract, transform, and load (ETL) service
- use case: preparing and transforming for analytics
Glue Data Catalog: metadata about all your AWS datasets
- Glue Data Crawlers: crawl your RDS, S3, DynamoDB, or JDBC compliant
- Includes table names, fields, and data types
- Used by Glue Jobs (ETL), Athena, Redshift Spectrum, and EMR
Fully managed graph DB.
- Can use clustering to improve performance.
Common to use as a complement to other DBs for searching and indexing. vs. DynamoDB, can search any field, even partial matches.
- built-in integrations with Kinesis Data Firehose, AWS IoT, and CloudWatch Logs
- multi-AZ, clustering, reported to scale to PB of data
ELK stack: Comes with Kibana (visualization) and Logstash (log ingestion)
Supports Cognito for security.
Metrics for every service in AWS. Metrics belong to namespaces, have timestamps. Dimension is an attribute of a metric, up to ten dimensions per metric.
Detail monitoring. You can pay extra to get metrics more often, which can be helpful if you need to scale faster in an ASG. (default every 5 minutes)
- Free tier allows 10 detailed monitoring metrics
PutMetricData: API call to push custom metrics
- memory (RAM) usage, disk space, number of logged in users
- EC2 memory usages it not a default metric, has to be pushed from inside instance as a custom metric.
- Up to you to provide useful dimensions, like instance ID or environment name
- Metric resolution (with
StorageResolution
API parameter) can be- standard: 1 minute
- high resolution: 1/5/10/30 seconds
- You can push metrics with timestamps as old as two weeks and up to two hours into future, so ensure that instance has time set correctly.
Dashboards: There are plenty of built-in, but can also create custom ones.
Logs
- groups: arbitrary name, usually of your application
- stream: an instance within the application / log files/ containers
- can define expiration policy (storage costs $)
- can send to S3, Kinesis Data Streams, Kinesis Data Firehosem, Lambda, and ElasticSearch
- SDK, CloudWatch Logs Agent for sending custom logs
Metric Filters and Insights
- can filter on different things, like specific IP or "ERROR"
- can be used to trigger alarms
- Log Insights can be used to query logs, or add them to dashboard
- Can aggregate across accounts or regions
S3 Export: isn't real-time, can take up to 12 hours, via CreateExportTask
- use Log Subscriptions instead for real-time
CloudWatch Agent: for pushing logs from EC2 instance
- by default, no logs are coming from EC2
- can set up agent to capture logs, provided IAM permissions
- "Logs Agent" is an older version only sending logs
- "Unified Agent" is newer and also includes system-level metrics
CloudWatch Alarms
- States: OK, INSUFFICIENT_DATA, ALARM
- Targets:
- Stop, terminate, reboot, or recover an EC2 instance
- Recovery: same IPs, metadata, and placement group
- Trigger Auto Scaling action
- SNS
- Stop, terminate, reboot, or recover an EC2 instance
- Can set alarm state from CLI to test.
aws cloudwatch set-alarm-state ...
CloudWatch Events -> displaced by CloudWatch EventBridge
- default event bus is generated by AWS services
- same thing, but name shifting to EB over time
Builds on CloudWatch Events (same API and endpoints), but extends with more buses
Default event bus: generated by AWS, same as with CW Events Partner event bus: Can receive events from other SaaS providers, e.g., Zendesk, DataDog, Segment, Auth0, ...
- They can send events into your system in real time Custom event bus: for your own applications, so that some apps can react to other apps
Schema Registry: EventBridge can analyze events in bus and infer the schema
- can be versioned
- allows you to generate code for your app knowing in advance the event bus data
Provides governance, compliance, and audit for your AWS account. Enabled by default.
- History of events/API calls made within your AWS account.
- Can separate read and write management events
Can put logs into CloudWatch Logs or S3, if needed for more than 90 days.
Data Events: high volume operations are not logged by default
- includes things like reading from S3, Lambda function activity, etc.
Insights: has to be enabled, but will detect unusual activity in your account.
- analyzes write events, and will generate EventBridge event
Help with auditing and compliance of AWS resources, recording configuration changes over time.
- e.g., checking for unrestricted SSH access, public access on S3 buckets
- AWS provides over 75 config rules to help monitor common scenarios
- Does not prevent the actions, just confirm compliance with rules
- no free tier
Can configure remediation of non-compliant resources with SSM Automation Documents
- e.g., Could automatically revoke API keys after 90 days
- Can wire up to Lambda function
Notifications to EventBridge or SNS.
Grant limited and temporary access to AWS resources. Valid for 15 min up to 1 hour
- AssumeRole: within own account or Cross Account Access to perform action for target account.
- AssumeRoleWithSAML: for users logged in via SAML
- AssumeRoleWithWebIdentity: return creds for users logged in with an identify
provider (Facebook Login, Google Logic, OIDC compatible)
- AWS recommends against this, prefer Cognito instead
- GetSessionToken: for MFA
Microsoft Active Directory
- AD objects are organized in "trees", a group of trees is a "forest"
- it is a DB of object: User Accounts, Computers, Printers, File Shares, ...
AWS Managed Microsoft AD: create your own AD in AWS
- establish "trust" connections with on-prem AD
- users exist in both ADs
AD Connector: Direct Gateway (proxy) to redirect to on-prem AD
- Users are managed in the on-prem AD
Simple AD: AD-compatible managed directory on AWS
- Cannot be joined with on-prem AD
Global service to manage multiple AWS accounts.
Main account in "master" account, you cannot change it. Other accounts are "member" accounts.
- member accounts can only be part of one organization
This is about consolidating your billing across accounts. Single payment method, and pricing benefits from aggregated usage (volume discount).
May want to use this for separating departments as cost centers, based upon regulatory restrictions, or for better resource isolation (e.g., VPCs)
- This approach makes it harder for distinct VPCs to accidentally talk to each other.
- Can still centralize CloudTrail and CloudWatch
- Allows for cross-account roles for Admin purposes
Organization Units (OU): up to you how to organize: by business unit, but environment, by project, etc.
Service Control Policies (SCP): allow or deny list of IAM actions, applied at OU or Account level, but don't apply to management account.
- applies to all users and roles (including root) of account
- But cannot restrict service-linked roles for cross-org integration
- Allows have to be explicit, nothing enabled by default
- Use cases: restrict access to certain services or enforce PCI compliance by disabling services
- Hierarchy, so if a parent is denied something, it cannot be explicitly granted to a sub-account in the hierarchy.
- Migrating accounts: have to leave first org, then get invited to new org,
since only allowed to be member of one org at a time.
- to migrate root, have to migrate all users first, delete OU, then migrate
Allows you to share AWS resources you own with other AWS accounts, to avoid resource duplication.
- Can share VPC subnets, but cannot share security groups or default VPC
- participants cannot view, modify, delete resources that belong to other participants
- Can share: AWS Transit Gateway, Route53 Resolver Rules, License Manager Config
- Can access other services in the subnet using the private IP, and referencing the security groups
Centrally manage access to multiple accounts and 3rd-party business applications
- e.g., Dropbox, Office365, Slack
- integrates with AWS Organizations, SAML 2.0 markup, and Active Directory
vs AssumeRoleWithSAML: assume role would be required for each login/service
Mostly when documentation in AWS talks about "encryption", it is referring to KMS. Fully managed by AWS.
- Never store secrets in plain text. Encrypted secrets can be stored in code/environment variables.
- Keys are bound to a single region
Customer Master Keys (CMK) types:
- Symmetric (AES-256 keys), you never get access to the Key unencrypted
- Asymmetric (RSA & ECC key pairs), public keys for encryption, private key
for decryption
- cannot get access to private key unencrypted
- use case: encryption outside of AWS by users who cannot call KMS API
CMK cost:
- AWS Managed Service Default CMK: free
- User Keys created in KMS: $1/month
- User Keys imported (256-bit symmetric), $1/month
- Then you pay for each API call to KMS (~ $0.03/10000 calls)
KMS can only encrypt up to 4KB of data per call
- Have to use envelope encryption if larger
To give access to KMS:
- Key Policy allows the user
- IAM Policy allows the API calls
Key Policies
- cannot control access without them (unlike other Policies)
- Default KMS Key Policy:
- Complete access to the key to the root user = entire AWS account
- named with "aws/"
- Custom policy:
- define users, roles that can access
- define who can administre
- useful for cross-account access
Copy snapshots across regions:
- create snapshot (encrypted with your CMK)
- attach a KMS Key Policy to authorize cross-account access
- Share the encrypted snapshot
- (target) create copy of snapshot, encrypt with own KMS key
- create volume from snapshot in target
Key Rotation
- can only configure for customer-managed CMK (not AWS managed)
- When enabled, happens every 1 year
- previous key kept in order to decrypt older data
- new key has the same CMK ID, only the backing key is changed Manual Rotation
- new key will have a new CMK ID
- better to use aliases in this case to hide change of key from apps
- ideal rotation for CMK not eligible for auto-rotation, like asymmetric CMK
Securely store secrets (e.g., for configuration). Serverless.
- version tracking of configuration/secrets
- management using path & IAM
- integrated with CloudFormation
Can store with hierarchy, e.g., /department/app/dev/db-url
Teirs:
- Standard, free, up to 10,000 parameters up to 4KB each
- no parameter policies available
- no charge for standard throughput, but higher costs $0.05/10000 tx
- high throughput = > 1000 tx/sec
- Advanced, charge, up to 100,000 parameters up to 8KB each
- parameter policies available
- includes TTL to force update/delete
- can include notifications of pending deadlines.
- charge regardless of throughput
- parameter policies available
Newer than SSM. Has capability to force rotation of secrets every X days.
- can automate generation of secrets on rotation using Lambda
- Lambda has to have permission to create/rotate they keys
- integrates with RDS, and that's its primary focus
- does support key-value pairs as well, like for API keys
- no free tier, costs $0.40/secret/month and $0.05/10000 API calls
AWS provisions dedicated encryption hardware, and you manage your own keys entirely.
- HSM device is tamper resistant, FIPS 140-2 Level 3 compliance (KMS is Level 2)
- Have to use CloudHSM Client Software to manage
- Redshift supports CloudHSM for database encryption
- Good option to use with SSE-C encryption
- IAM is only for CRUD of the HSM Cluster, but not managing keys/users
- Support for HM across regions
- Supports cryptographic acceleration
DDoS protection.
Standard:
- free for all customers
- protects against attacks like SYN/UDP Floods, reflection attacks, other layer 3/4 attacks Advanced:
- $3000/month/organization
- protects against more sophisticated attacks
- Also 24/7 access to response team (DRP)
- protects against higher costs associated with DDoS usage spikes
Protects web applications from common layer 7 exploits
- (layer 7 = HTTP, layer 4 = TCP)
- Can be deployed on ALB, API Gateway, or CloudFront
- define a Web ACL (Web Access Control List)
- rules can include IP addresses, headers, body, URI strings
- can protect against SQLi and XSS
- can add size constraints (like on uploads)
- can geo-match (block countries)
- rate-based rules for DDoS protection
- cost about $5/month
Firewall Manager: tool for managing WAFs
- can also manage Shield Advance from here, as well as EC2 and ENI resources in VPC
Intelligent Threat discovery to protect AWS account, using machine learning to detect anomalies
- input includes CloudTrail, VPC Flow, and DNS logs
- set up CloudWatch Event rule to SNS or Lambda
- has dedicated "finding" to protect against CryptoCurrency attacks
Automated security assessments for EC2 instances
- analyze running OS against known vulns
- analyze against unintended network accessibility
- AWS Inspector Agent has to be installed on EC2 instance
- possible to trigger SNS
Fully managed data security and privacy service, using machine learning and pattern matching to protect sensitive data (like PII).
- e.g., analyze S3 buckets for potential PII then notify
AWS's responsibility: security of the cloud
- protecting infrastructure (hardware, software, facilities, networking)
- managed services Customer responsibility: security in the cloud
- keeping EC2 instances up to date, firewall, network config, IAM
- encrypting app data Shared responsibility:
- patch management, configuration management, awareness & training