Amazon Certification Architect Professional

1. AWS And General IT Knowledge

Working with VPC and DNS

  1. to leave the deafult DNS in the dhcp set leave the text "Amazon DNS"
  2. with a DHCP internal you cannot resolve the dns from on premise

Storage

  1. S3 max files size 5 TB , 5 GB single file
  2. Glacier 40 TB per archive , single upload is 4 GB
  3. EBS: you can encrypt during a copy process if the start volume is not encrypted

RDS

  1. Read reaplica use mysql asynchronus
  2. If the master instance fail the read replica instance can be used as failover BUT only to read data
  3. Mysql can run read replica in other regions in this case use always the SSL/TLS option.

to review this replication notes

Migrate from AWS to ON PREM
  1. AWS - configure as replication source
  2. AWS - mysqldum
  3. ON PREM - mysql restore
  4. ON PREM - sync replicate with AWS
  5. ON PREM - stop replication
  6. AWS - stop instance
Migrate from ON PREM to AWS
  1. ON PREM - set read only
  2. ON PREM - mysqldump
  3. ON PREM - set write again
  4. AWS RDS - do a mysql restore
  5. AWS RDS - activate replication

Disaster Recovery

Time:

  • RTO Recovery TIME Objective
  • RPO Recovery POINT Objective

Methods:

  • Pilot Light
  • Warm Stadby

Storage Gateway

there 3 ways to use it

  1. Gateway-cached volumes— You can store your primary data in Amazon S3 and retain your frequently accessed data locally (it is a storage not a backup system)
  2. Gateway-stored volumes— In the event that you need low-latency access to your entire data set, you can configure your gateway to store your primary data locally, and asynchronously back up point-in-time snapshots of this data to Amazon S3
  3. Gateway-virtual tape library (gateway-VTL) — With gateway-VTL, you can have an almostlimitless collection of virtual tapes. You can store each virtual tape in a virtual tape library (VTL) backed by Amazon S3 or a virtual tape shelf (VTS) backed by Amazon Glacier

Elastic Load Balancer , change policy

with these commands it is possible send the requestor ip for a tcp protocol from the balancer to the ec2 machine where there is the nginx installed

 aws elb create-load-balancer-policy --load-balancer-name ophy-pipp-ElasticL-1KUZ4IS2YGYY0 --policy-name linuxacademy-protocol-policy --policy-type-name ProxyProtocolPolicyType --policy-attributes AttributeName=ProxyProtocol,AttributeValue=true

aws elb describe-load-balancer-policies --load-balancer-name ophy-pipp-ElasticL-1KUZ4IS2YGYY0
{
    "PolicyDescriptions": [
        {
            "PolicyAttributeDescriptions": [
                {
                    "AttributeName": "ProxyProtocol",
                    "AttributeValue": "true"
                }
            ],
            "PolicyName": "linuxacademy-protocol-policy",
            "PolicyTypeName": "ProxyProtocolPolicyType"
        }
    ]
}

aws elb set-load-balancer-policies-for-backend-server --load-balancer-name ophy-pipp-ElasticL-1KUZ4IS2YGYY0 --instance-port 80 --policy-names linuxacademy-protocol-policy

if you also change the main nginx.conf so the ip of the requestor is logged in the main log file
 server {
 listen 80 proxy_protocol;
 listen [::]:80 proxy_protocol;
 set_real_ip_from 10.0.0.0/16;
 real_ip_header proxy_protocol;
 server_name _;
 root /usr/share/nginx/html;

 http {
 log_format main ‘$proxy_protocol_addr - $remote_user [$time_local] “$request” ‘
 ‘$status $body_bytes_sent “$http_referer” ‘
 ‘”$http_user_agent” “$http_x_forwarded_for”’;

after that if you do a tail on the log you will see the public ip instead of the ELB private ip 10.0.2.65

 10.0.2.65 - - [10/Jul/2017:10:09:47 -0400] "GET /favicon.ico HTTP/1.1" 404 3650 "http://ophy-pipp-elasticl-1kuz4is2ygyy0-559468751.us-east-1.elb.amazonaws.com/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36" "-"
10.0.2.65 - - [10/Jul/2017:10:29:21 -0400] "PROXY TCP4 123.125.71.56 10.0.2.65 15682 80" 400 173 "-" "-" "-"

2. Enterprise Account Management

Budgets

  • Budgets are used to track how close your current costs are to exceeding the set “budget” for a given billing period.
  • Budgets are updated every 24 hours
  • Budgets do not show refunds
  • Budgets can work with SNS/CloudWatch for billing alerts to receive notifications

Temporary Access Using Roles and STS (Security Token Service)

  • The endpoint is https://sts.amazonaws.com
  • Temporary credentials require the “token” as well as the access key and secret access key in order to make API calls
  • you can view and access the temporary credentials using the following command
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/role-name

Federated Access Using SAML

still not so clear

  • If your Identity Provider does not support SAML 2.0 , you need to write your custom identity broker application

Web Identity Federation

  • Let users sign using a third party identity provider like Amazon, Facebook, Google or any OpenID 2.0 compatible provider.
  • You can allow the authenticated user access to STS to gain temporary role

3. CloudTrail

  • CloudTrails are configured on a per region basis and a region can include global services
  • CloudTrails log files from different regions can be sent to the same S3 buckets
  • CloudTrail can integrate into SNS, CloudWatch, and CloudWatch logs to send notifications when specific API events occur
  • Limit and control access to CloudTrail and CloudTrail logs

4. KMS

  • Customer Master Key (CMK) – A logical key that represents the top of a customer ’s key hierarchy
  • If another key is not specified then by default the CMK is used to encrypt the resources.
  • CMK settings cannot be modified

If key rotation is enabled for a specific CMK

  • KMS will create a new version of the backing key for each rotation
  • KMS will automatically use the latest version of the backing key to perform data encryption.
  • To decrypt data KMS will determine which key (the old or new) that the data was encrypted with and it will automatically decrypt it with that correct CMK.

Features:

  • Quorum-based access - No single amazon employee can gain access to a customers master keys.
  • Regional Independence - AWS provides regional independence for customer data, in other words the key usage is isolated within an AWS region.

5. Kinesis

  • preserves the data for up to 24 hours
  • Can stream from as little as a few megabytes to several terabytes per hour

6. Amazon EC2 And Design Patterns

  • Not all AZ's support EBS optimized instance types ensure which ones do before migrating

Architecting For Performance: Burstable CPU Credits:

Burstable instances are perfect for workloads that do not use the full CPU often but casually need to burst.

  • T2 instance types have “burstable” CPU performance
  • Each instance has a “base line” performance but can “burst” to greater CPU usage if credits allow
  • One CPU “credit” is equal to one vCPU running at 100% utilization for one minute
  • One CPU “credit” is equal to one vCPU running at 50% utilization for two minutes, etc
  • “Credits” are accrued when the instance uses LESS than it’s base level performance

Architecting For Performance: Storage

  • General Purpose: not disk intensive 1GiB – 16TiB, 160MiB/s, baseline performance of 3 IOPS/GiB with burstable “credits”
  • Provisioned IOPS: production and database 4GiB – 16TiB, 320MiB/s, Up to 20,000 IOPS per volume
  • Magnetic: infrequent 1GiB – 1TiB, 40-90MiB/s, 100 IOPS with burstable to hundreds of IOPS

Increasing Performance With RAID Configurations

With RAID 0 you will get whatever additional throughput you provision on attached EBS volumes. Striping together two 20,000 volumes in RAID 0 will result in 40,000 IOPS I/O

Problem: After 8-10 EBS volumes striped together your bottleneck becomes instance bandwidth. How can you get more throughput?
Solution: Use Instance-store backed instances, stripe the ephemeral storage devices attached for several hundred thousand IOPS depending on instance size

To keep the data do a DRBD Asynchronous Replication AZ 2

HPC On AWS

Placement group
  • A placement group is a logical grouping of instances within a single Availability Zone. When using a placement group the application can take advantage of lowQ latency, 10Gbps network.
  • An already running instance cannot be added to a placement group
  • Use the same instance type to help ensure the instances are located as close as possibile. AWS groups physical hardware based off of instance type.
  • If you receive a capacity error when launching an instance in a placement group, stop and restart the instances in the placement group, and then try the launch again.
  • Auto Scaling can be used to launch instances in placement groups based of of CloudWatch metrics
SR-QIOV (Enhanced Networking)
  • Single Root I/O Virtualization that creates enhanced networking abilities on instances which results in higher performance of packets per second, lower latency, and reduced jitter (jitter = noise on the wire)
  • Supported Instance Types: C3, C4, D2, I2, M4, R3 (notice GPU) instances are not listed!
  • Supports only HVM virtualization and Amazon Linux has it on by default and in order to enable it the kernel module ixgbevf is required

DDoS Mitigation Strategies

CloudFront:
  • CloudFront has built in abilities to absorb and deter DDoS attacks while still serving traffic to legit users. This is done as part of the CloudFront service and requires no additional configuration.
  • CloudFront can scale to handle any increase in traffic which helps absorb attacks
  • CloudFront uses filtering techniques to ensure that only valid TCP connections and HTTP requests are successful in passing through the edge locations
  • Solves UDP and SYN flood DDoS attacks

Networking monitoring

  • promiscuous mode is not allowed so the hypervisor has it disabled so it will not deliver any traffic to instances that is not specifically addressed to the instances.
  • Place an IDS inside of your cluster and allow your EC2 instances to send “copies” of of the traffic to the instances for “monitoring” only.
  • Place IDS software on your EC2 instances that deliver your primary “front end” application
  • The first 4 and last 1 IP addresses of a given subnet are not available due to AWS reservations of the IP addresses for networking purposes.

7. Direct Connect

  • Can only communicate with internal IP addresses inside of EC2
  • Cannot access public IP addresses as Direct Connect is NOT an internet provider
  • Create multiple private VIFs (Virtual Interface) to multiple VPC ’s at a time

Public Virtual Interfaces: Use Direct Connections to AWS and connects to public AWS endpoints for any AWS service such as DynamoDB or Amazon S3

  • Requires public CIDR block range
  • Still has consistent traffic as it is sent over your dedicated network to the Direct Connect partner at the partners connection to AWS

An AWS Direct Connect location provides access to the AWS region it is associated with. It does not provide access to other AWS regions.
What if your creating multiQregion design and have a need for a more reliable network connection?

  • Create a public virtual interface to the remote regions public endpoints and use VPN

over the public virtual interface to protect the data

8. Amazon ElastiCache

Caching Strategies:
  • Lazy Loading: check cache, if no read db and write in cache
  • write through: every time there is a write , the app write 2 times in the db and in the cache
  • Adding TTL can be applied to both lazy loading and write through to manage cache resources.
Memcache
  • Does not have backup abilities
  • Scales by adding more nodes to the cluster
  • Every node in the cluster is the same instance type
  • Memcached supports auto discovery, client programs automatically identify all nodes in a cache cluster
  • Improve fault tolerance by locating nodes in multiple availability zones
  • Memecached is a great solution for storing “session” state in applications this will make web servers stateless which allows for easily scaling
Redis

used for:

  • small data sets can be stored in memory
  • frequent changes
  • persistent
  • automatic failover

scaling :

  • to increase writes you need to increase the size
  • support clusters of read replicas
  • to increase the size you need to take a snapshot and create a new instance or add a node and seed it from the original

redis support backups with snapshot (Memcache no) but you cannot copy to another region

9. RedShift

Intro:

  • Fully managed petabyte scale data warehouse
  • run in a single AZ
  • continues backup to s3 if there is a fail the system will fix the wrong nodes

How it works:

  • Redshift distributes the query from the “leader” node in parallel across all the cluster’s compute nodes.
  • The compute nodes work together to execute the queries and return the data back to the leader node which then organizes the results and sends it back to the client requesting the data from the cluster.

Scaling:

  • change instance type (this also influence the storage available)
  • when you add a node the system redistribuite the data across the nodes.

Change the node type:

  • all connections are terminated , the cluster is restarted in read-only mode , any uncompleted transactions will be rolled back
  • a new cluster is started and use as source the original one
  • the end point is changed

Cost:

  • storage is included in the cluster
  • no spot instance allowed
  • reserved instance are allowed

Backup:

  • you can have manually and automatic snapshot for backup
  • you can copy the snapshots to another region
  • the snapshot contains also some redshit configuration

10. CloudFront

Keys concept:

  • we can store dynamic content , we need to configure if the dynamic content change it doesn't stay cached
  • it can be used to stream media
  • it cache the last request until the TTL expire , or is set to 0, or the object is invalidate
  • CloudFront allows you to respond back with custom error message/pages. I.E 404 not found page.
HTTP Methods
  • No caching: DELETE, PATCH, POST, PUT
  • Caching: GET, HEAD, OPTIONS

When there is no caching the cloudfront act only as a proxy, for the DELETE method cloudfront delete the object from the origin but not from the cache so you need to invalidate the cache.

Dynamic Content With CloudFront
  1. put the whole website behind CDN
  2. define 2 origins , 1 for static content (S3), 1 for dynamic content (EC2)
  3. use TTL = 0 for dynamic content
  4. the CDN works as a proxy for upload and download, anyway this improve the speed also for dynamic because it use the internal communication
  5. behind the CDN you can put also an on premise resource

How works a dynamic content request:

  • enable the forward of query string
  • set the TTL = 0 . When this is set the CDN made a request to the origin with a "if-modified-since" header if it isn't change it use the current data.

Other options:

  • device detection : it analyze the user agent header and send different response
  • geo targeting: it sends different content based on different country
Reporting with CloudFront
  • there are access logs for cloudfront origin and it can be integrated with EMR
  • this access logs are stored in an S3 buckets
CloudFront Security
  • Signed url with expire dates
  • Signed Cookies: more flexible respect than signed url , it can validate each request to the cloudfront
  • Geo Restriction : by country
  • Forcing SSL
Performance
  • increase TTL
  • use the proxy for upload as already explained
  • if there are simultaneous request , CloudFront waits the end of the first request so it can post the same result to the others.
Video Streaming

Different Type:

  • on demand (web cloudfront distribution)
  • pre-recorded media (MP4 Adobe Streaming RTMP, streaming no download)
  • live streaming (WOWZA media server , it can stay in an EC2 , you need to use the web CDN and not the RTMP option)

11. Amazon Elastic Transcoder

  • to start a conversion you start a job
  • a single job can create up to 30 output video files
  • the jobs go in a pipeline, they are managed using the submit order and they can be paused
  • notification integrated with SNS
  • each pipeline has an input/output bucket , a role, and a bucket for the thumbnails

12. AWS Data Pipeline (it is a data orchestrator)

some possible actions:

  • moving and transforming data from and to : DynamoDB, RDS, S3, EMR (with this you can also process with the Hadoop streaming), Redshift
  • you can set some precondiction to start, for example when some logs arrive in a bucket
  • you can use many form of schedule
  • you can backup dynamodb tables
  • you can run SQL query within Redshitdata and store in other resource.

Supported Database:

  • JDBC databases
  • RDS databases
  • Redshift databases

Precondictions:

  • check dynamodb table/existance
  • check S3 key and prefix
  • run linux bash commands

Resource that run behind to perform the action:

  • ec2 resources
  • EMR cluster

For this reason you can use spot and reserved instance to reduce the cost

13. RDS

Overview + Security

Encryption data at rest :
  • encrypt: the underlying storage, snapshots, backups, logs, read replicas. You can decide to encrypt only at creation time
  • key: you cannot change after creation, you need to have the same key in the other regions if you want cross region and snapshot copy.
  • works on: mysql, oracle, sql server, postgresql, mariadb
Transparent Data Encryption TDE:
  • encrypt before the data is written to the disk and decrypt when it is read from the disk
  • It works only on Oracle and SQL Server
SSL Encrypted Connection endpoint
  • it used from the client/app to the RDS
  • a new SSL certificate is create when the instance is created

MySQL and MariaDB

Read Replica
  • data can be replicated asynchronously using the MySQL native features from the master to any slave in any region
  • improve time for disaster recovery
  • help with data migration
  • allow the read to scale (writes always on the master)
  • still best practice using a caching in front of the master
keep in mind that
  1. Replication for MySQL server works only from 5.6.13 or later
  2. Multy AZ faiolver use a synchronous replication

Oracle

  • RDS Supported version: SE, EE , SE One
  • RDS not supported: RAC , it is a cluster with shared cache architecture
  • You can have Oracle RAC in EC2 , using a placement group and a internal VPN to workaround the multicast feature requested. Use Data Guard to extend the high avalability of the cluster.
  • Backup : for Oracle RDS use the snapshot backup, for EC2 and on premise you can use RMAN with an S3 connected.

MSSQL

  • Support MultyAZ
  • Not Support Read Replica , you can scale changing the instance size
Import data from on prem MSSQL to RDS MSSQL

on prem:

  • turn off all the application
  • disable key constraint and backup
  • export the tables in flat files using the tool server studio

on RDS:

  • create an empty table
  • import from flat files

14. CloudSearch

  • it is an Apache SOLAR as a service
  • it can be used also on DynamoDB for search
  • Automatic Autoscale or manually autoscale if you know a big load is coming
  • Multy AZ available
  • Behing there are ec2 machines

15 EMR Elastic Map Reduce

  • it is an Apache Hadoop cluster as a service
  • HDFS is the file system shared between the nodes of the cluster
  • MapReduce is the programming language model for the query written in java
  • Hive a component that use SQL type query language
  • Pig is used to write mapreduce programs
  • EMR is used to find "trends" in big amount of data
Component
  • Master node : manage the data distribution to the slaves
  • Core node: store data in HDFS , it is managed from the master node
  • Task node: no HDFS only run the task and send data to core node, it is managed from the master node
  • EMRFS: can be used instead of HDFS to store data in s3

In some case test or not persistend data is possible to use spot instance in the EMR Cluster

15. OpsWorks Deployment And Concepts

  • a layer is used to group different chef recipe to apply
  • an instance can belong to different layers
  • Lifecycle events: setup , configure, deploy, undeploy, shutdown

Deployments:

  • you can use rolling deployments or blue green with OpsWorks

16. SQS Message Priority

  • You can have 2 different queues low priority and high priority
  • High priority has more instances assigned than low priority to process quickly the important jobs

17. DynamoDB Use Cases

DynamoDB Use Cases

still to understand

DynamoDB Secondary Indexes

still to understand

DynamoDB Multi-Region

  • To migrate to a secondary region as part of the daily backup you can use data pipeline to migrate in a different dynamodb table
  • If you have a multi region solution you can use the "DynamoDB Streams" , these are an exact order of modification on a table to put inside a stream (Kinesis) . In these case you have a consistent db in multiple regions.
Salvo diversa indicazione, il contenuto di questa pagina è sotto licenza Creative Commons Attribution-ShareAlike 3.0 License