Scenarios based Questions

Your company has built-in internal scrum tool for Running  all your scrum ceremonies. Usages predictably high between 9-10AM Monday-Friday and also 1 PM-2 PM Thursday and Friday. Which feature of autoscaling will easily repair your system To handle the Load?

  • Target tracking could to work but we need to invest time in determining correct metric to track example (CPU, memory, load balancer requests).
  • Also Manuel Scaling requires that someone changes configuration to scale up and scale down every day. 
  • Finally over provisioning in order to cope with peak demand defeats the purpose of Elastic Scaling of Our compute.
  • For situations where your traffic is very predictable, the easiest way to scale with demand is to create scheduled Scaling Actions.

A product manager vox into your office and advises that this simple single node MySQL RDS Instance that has been used for a pilot needs to be upgraded for production. She also advises that they may need to alter the size of The instance once they see how many people use it during peak periods. The Key concern is that there cannot be any outages off more than a few seconds during go live period.Give recommendations for scaling the RDS?

  • Convert RDS Instance two Multi-AZ implementation
  • Consider replacing it with Aurora before go live.
  • There are two issues to be addressed in this question. Minimizing outages, weather due to required maintenance or unplanned failures.Plus the possibility of needing to scale up or down. Read replicas can help you with high read loads but are not intended to be a solution to system outages. 
  • Multi-AZ implementations will increase availability because in the event of a Instance outage one of the instances in other availability zones will pick up the load with minimum the delay. 
  • Aurora provided the same capability with potentially higher Availability And faster response.
You work for a major news network in Europe.They have just released a new mobile app that Allows users to post the photos of newsworthy events in real time. You are organization expects this app to grow very quickly, essentially doubling its user base each month. The app uses S3 to store the images, and you are expecting sudden and sizable increases in traffic to S3 when a major news event takes place.You need to keep your storage cost to a minimum, and you are happy to temporarily lose access to up to 0.1% of uploads per year.With these factors in mind which Storage Media what do you choose To keep Costs as low as possible?

The key drivers here are availability and cost, so an awareness of cost is necessary to answer this. Glacier cannot be considered as it is not intended for direct access. S3 has an availability of 99.99%,S3 Standard-IA Has an availability of 99.9% while S3-1Zone – IA only has 99.5%. So S3 Standard -IA is Preferred.

You work for a manufacturing company That operate a hybrid infrastructure With systems located both in the local data center And in AWS,Connected via AWS direct connect.Currently, all on-premise servers Are backed up to a local NAS,But your CTO Wants you to decide on the best way to store copies of these back ups in AWS. He has asked You to purpose A solution which will provide access to files within milliseconds should they be needed, but at the same time minimizes cost. As these files will be copies of back ups Stored on-premise,Availability is not as critical as durability, But both are important.What will be your solution?

S3 standard-IA provides rapid access to files and is resilient against events that impact an entire Availability Zone, while offering the same 11 9’s of durability as all other storage classes.The trade-off is in The Availability is designed for 99.9% Availability Over a given year,As opposed to 99.99% that S3 standard offers. However in this brief as cost is more important than availability ST Standard-IA is the logical choice.

You manage a high-performance Site that collects Scientific data using A bespoke protocol Over TCP port 1414. The data comes in at high speed and is distributed to an Autoscaling group of EC2 Compute Services spread over three Availability Zones. Which type of AWS load balancer would best meet this requirement?

The network load balancer is specifically designed for hi performance traffic that is not conventional web traffic. The classic load balancer might also do the job, but would not offer the same performance.

You have a website With three distinct services (my site.co/accounts, mysite.co/sales, and mysite.co/support); Each hosted by different Web server autoscaling Groups. You need to use advance routing to send requests to specific Web servers, Based on configured rules. Which of the following AWS services should you use?

The Application load balancer Has functionality to distinguish traffic for Different Targets (Mysite.co/accounts, mysite.co/sales, mysite.com/support) And distribute traffic based on Rules for: target group, condition, and priority.

Following an unplanned Outage, you have been called into a planning meeting. You are asked what can be done to reduce the risk of bad deployments and single point failure off yours AWS resources. Which solutions can be used to mitigate the problem?
  • Cross zone load balancing reduces the need to maintain equivalent numbers of instances in each enabled availability zone, and improve your applications ability to handle the loss of one or more instances.
  • Although the methods baby, you can place multiple auto scaling or target groups behind ELBs.
  • Using route 53 in combination with ELBs is a good pattern to distribute regionally as well as across availability zones.
  • The purpose of canary deployment is to reduce the risk of deploying a new version that impacts the workload.
You need to use an object based storage solution to store your critical, non replaceable data in a cost effective way this data will be frequently updated and will need some from off version control enabled on it. Which S3 storage solution should you use?
  • From the question we can identify that the data is non replaceable (all S3 classes are at 11 9s of durability now except for RRS)
  • the data is frequently updated(Classes outside of S3 Standard & S3 Intelligent-Tiering have extra charges for frequently accessed data).
  • cost-effective (S3 is more cost effective than S3 Intelligent-Tiering if the data is updated frequently)
  • version control must be an available feature (S3 has versioning as a feature) All of these items combined make S3 the best option for the available information.
You work for a large software company in Seattle. They have beer production environment provisioned on AWS inside a custom VPC. The VPC contains both a public and private subnet. The company tests their applications on custom EC2 instances inside a private subnet. There are approximately 500 instances, and they communicate to the outside world via a proxy server. At 3 AM every night, the EC 2 instances pull down OS updates, which are usually 150MB or so. They then apply these updates and reboot: if this software has not downloaded within half an hour, then the update will attempt to download the following day. You notice that a number of EC2 instances are continually failing to download the updates in the allotted time. What can be the reason of the failure?

Network throughput is the obvious bottleneck. You are not provided in this question whether the proxy server is in the public or private subnet. If it is in a public subnet, the proxy server instance size itself may not be large enough to cope with the current network throughput. If the proxy server is in the private subnet, then it must be using a NAT instance or NAT gateway to communicate out to the Internet. If it is a NAT instance, this may also be inadequately provisioned in terms of size. You should therefore increase the size of the proxy server and/or the NAT solution.

You are a solutions architect working for a biotech company who is pioneering research in immunotherapy. They have developed a new cancer treatment that may be able to cure up to 94% of cancers. They store their research data on S3. However, an intern recently deleted some critical files accidentally. You have been asked to prevent this from happening in the future. Which of the solutions can be used to prevent accidental data loss?

To prevent or mitigate future accidental deletions, consider the following features:
  • Enable versioning to keep historical versions of an object and MFA delete to require multifactor authentication (MFA) when deleting an object version.
You are a systems administrator and you need to monitor the health of your production environment. You decide to do this using cloud watch. However, you notice that you cannot see the health of every important metric in the default dashboard. When monitoring the health of your EC2 instances, for which of the metrics do you need to design a custom cloud watch metric?

Remember under the shared security model, AWS can see the instance, but not inside the instance to indicate how it is doing. AWS can see that you have memory, but not how much of the memory is being used. In case of CPU, AWS can see how much of CPU you are using, but cannot see what you are using it for. So you may need to design A metric for memory usage.

You have been engaged as a consultant by a company that generates utility bills and publishes them online. PDF images are generated, then stored on a high performance RDS instance. Customers view invoices once per month. Recently, the number of customers has increased threefold, and the wait time necessary to view invoices has increased unacceptably. The CTO is unwilling to alter the codebase more than necessary this quarter, but needs to return performance to an acceptable level before the end of month print run. Which of the solutions would you feel comfortable proposing to the CTO and GM?

Read replicas are often a great way to help read queries on your database. One way of scaling is vertical scaling. The decision must make sure the new instance size is the best solution.

You're building out a single-region application in us-west-2. However, disaster recovery is a strong consideration, and you need to build the application so that if us-west-2 becomes unavailable, you can fail-over to us-west-1. Your application relies exclusively on pre-built AMI's, and has specific launch permissions, custom tags, and security group rules. In order to run your application leveraging those AMI's in your backup region, which process would you follow?

Copy the AMI from us-west-2 to us-west-1. After the copy operation is complete, apply launch permissions, user-defined tags, and security group configurations.
AWS does not copy launch permissions, user-defined tags, or security group rules from the source AMI to the new AMI. After the copy operation is complete, you can apply launch permissions, user-defined tags, and security group configurations to the new AMI.

You work for a genomics company that is developing a cure for motor neuron disease by using advanced gene therapies. As a part of their research, they take extremely large data sets (usually in the terabytes) and analyze these data sets using Elastic Map Reduce. In order to keep costs low, they run the analysis for only a few hours in the early hours of the morning, using spot instances for the task nodes. The core nodes are on-demand instances. Lately however the EMR jobs have been failing. This is due to spot instances being unexpectedly terminated. What is recommended to have the best experience in terms of availability using the Spot service?

Capacity Rebalancing helps you maintain workload availability by proactively augmenting your fleet with a new Spot Instance before a running Spot Instance receives the two-minute Spot Instance interruption notice. When Capacity Rebalancing is enabled, Auto Scaling or Spot Fleet attempts to proactively replace Spot Instances that have received a rebalance recommendation, providing the opportunity to rebalance your workload to new Spot Instances that are not at elevated risk of interruption. Capacity Rebalancing complements the capacity optimized allocation strategy (which is designed to help find the most optimal spare capacity) and the mixed instances policy (which is designed to enhance availability by deploying instances across multiple instance types running in multiple Availability Zones). Reference: Best practices for EC2 Spot.

Allocation strategies in Auto Scaling groups help you to provision your target capacity without the need to manually look for the Spot Instance pools with spare capacity. AWS recommends using the capacity optimized strategy because this strategy automatically provisions instances from the most-available Spot Instance pools. You can also take advantage of the capacity optimized allocation strategy in Spot Fleet. Because your Spot Instance capacity is sourced from pools with optimal capacity, this decreases the possibility that your Spot Instances are reclaimed. Reference: Best practices for EC2 Spot.

Your company provides an online image recognition service and uses SQS to decouple system components. Your EC2 instances poll the image queue as often as possible to keep end-to-end throughput as high as possible, but you realize that all this polling is resulting in both a large number of CPU cycles and skyrocketing costs. How can you reduce cost without compromising service?

SQS long polling doesn't return a response until a message arrives in the queue, reducing your overall cost over time. Short polling WILL return empty responses.

Your company has a policy of encrypting all data at rest. You host your production environment on EC2 in a bespoke VPC. Attached to your EC2 instances are multiple EBS volumes, and you must ensure this data is encrypted. Which options will allow you to do this?
  • Encrypt your data inside your application, before storing it on EBS.
  • Use third party volume encryption tools.
  • Encrypt the data using native encryption tools available in the operating system.
Encrypting Amazon EBS volumes attached to Windows instances can be done using BitLocker or Encrypted File System (EFS) as well as open source applications like TrueCrypt. Some common block-level open source encryption solutions for Linux are Loop-AES, dm-crypt (with or without) LUKS, and TrueCrypt.

Your company has hired a young and enthusiastic accountant. After reviewing the AWS documentation and usage graphs, he announces that you are wasting vast amounts of money running your Windows servers for a full hour instead of spinning them up only when they are needed and down again as soon as they are idle for 1 minute. He cites the AWS claim that you only pay for what you use, and that as a senior engineer, you should be more conscious of wasting company money. How do you respond?

You acknowledge that Windows instances are billed by second increments, with a minimum of 1 minute. However, you explain that storage charges are incurred even if the instance sits idle. Taking into account productivity losses, stopping and restarting instances may actually result in additional costs. As such, your solution is fine as it now stands.

You are working in the media industry, and you have created a web application where users will be able to upload photos they create to your website. This web application must be able to call the S3 API in order to be able to function. Where should you store your API credentials whilst maintaining the maximum level of security.

Don't save your API credentials. Instead, create a role in IAM and assign this role to an EC2 instance when you first create it.

You are a solutions architect working for a large anti-virus company and your job is to secure your company’s production AWS environment. A new policy dictates that a particular public-facing subnet needs to allow RDP on port 3389 at custom network ACL layer. You create an inbound rule allowing traffic to port 3389 on the ACL level. However, users complain that they still cannot connect. Which of the following answers may represent the root cause of the connectivity issues?

Network Access Control Lists are stateless, so rules must be created for both inbound and outbound traffic.

You run a meme creation website that stores the original images in S3 and each meme's metadata in DynamoDB. You need to decide upon a low-cost storage option for the memes, which won't be accessed on a regular basis, but require rapid access when needed. If a meme object is unavailable or lost, a Lambda function will automatically recreate it but at a $10 licensing cost per creation. There is a very large number of files. Which storage solution should you use to store the memes in the most cost-effective way?

The storage savings between IA and OneZone-IA are about $0.0025 this is small compared to the $10 for licensing if many files are lost. The durability of S3 - IA and S3 - OneZone-IA is the same: 99.999999999%., but there is far more of a risk of high costs if it is in one zone. S3 - IA guards against that possibility.

A single m4.large NAT instance inside a VPC supports a company of 100 people. This NAT instance allows individual EC2 instances in private subnets to communicate out to the internet without being directly accessible via the internet. As the company has grown over the last year, they are finding that the additional traffic through the NAT instance is causing serious performance degradation. What might you do to solve this problem?

The network bandwidth of the NAT instance depends on the bandwidth of the instance type. m4.xlarge instances deliver high network performance, whereas m4.large have moderate network performance. Hence, increasing the class size of the NAT instance would solve the performance degradation issue. References: 1. Amazon EC2 Instance Types, 2. Comparison of NAT instances and NAT gateways.

Your company is looking for an inexpensive solution for offsite backups that you can easily recover to your local data center. You need low-latency access to your entire dataset. Which AWS Storage Gateway configuration would you use to achieve both of these ends?

A volume gateway provides cloud-backed storage volumes that you can mount as Internet Small Computer System Interface (iSCSI) devices from your on-premises application servers. The gateway supports stored volumes if you need low-latency access to your entire dataset. You can configure your on-premises gateway to store all your data locally, then asynchronously back up point-in-time snapshots of this data to Amazon S3. This configuration provides durable and inexpensive offsite backups that you can recover to your local data center or Amazon Elastic Compute Cloud (Amazon EC2). For example, if you need replacement capacity for disaster recovery, you can recover the backups to Amazon EC2.

You have been asked by your employer to create an identical copy of your production environment in another Region for disaster recovery purposes. Which AWS resources would you NOT need to recreate, because they are available universally across the console?

Route 53 configurations are available universally across the AWS management console and do not need to be recreated in a different region.

Identity Access Management Roles are available universally across the AWS management console and do not need to be recreated in a different region.

A client is concerned that someone other than approved administrators is trying to gain access to the Linux web app instances in their VPC. She asks what sort of network access logging can be added. What might you recommend?​

VPC Flow Logs is a feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC. Flow log data can be published to Amazon CloudWatch Logs or Amazon S3. After you've created a flow log, you can retrieve and view its data in the chosen destination. You can create a flow log for a VPC, a subnet, or a network interface. If you create a flow log for a subnet or VPC, each network interface in that subnet or VPC is monitored.

You are a solutions architect working for a construction company. Your company is migrating their production estate to AWS, and you are in the process of setting up access to the AWS console using Identity Access Management (IAM). You have created 15 users for your system administrators. What further steps do you need to take to enable your system administrators to get access to the AWS console in a secure fashion?

You should generate a password for each administrator user and give these passwords to your system administrators. You should then have each user set up multi-factor authentication once they have been able to log in to the console. You cannot use the secret access key and access key id to log in to the AWS console; rather, these credentials are used to call Amazon APIs.

The Customer Experience manager comes to see you about some odd behaviors with the ticketing system: messages presented to the support team are not arriving in the order in which they were generated, and occasionally they are receiving a duplicate copy of the message. You know that this is due to the way that the underlying SQS standard queue service is being used to manage messages. What are correct explanations?


When a consumer receives and processes a message from a queue, the message remains in the queue. Amazon SQS doesn't automatically delete the message. To prevent other consumers from processing the message again, Amazon SQS sets a visibility timeout, a period of time during which Amazon SQS prevents other consumers from receiving and processing the message. The visibility timeout begins when Amazon SQS returns a message. During this time, the consumer processes and deletes the message. However, if the consumer fails before deleting the message and your system doesn't call the DeleteMessage action for that message before the visibility timeout expires, the message becomes visible to other consumers and the message is received again. If a message must be received only once, your consumer should delete it within the duration of the visibility timeout.

Standard queues support at-least-once message delivery. However, occasionally (because of the highly distributed architecture that allows nearly unlimited throughput), more than one copy of a message might be delivered out of order.

You are a solutions architect working for a busy media company with offices in Japan and the United States. Your production environment is hosted both in US-EAST-1 and AP-NORTHEAST-1. Your European users have been connecting to the production environment in Japan, and are seeing the site in Japanese rather than in English. You need to ensure that they view the English language version. Which of the routing policies could help you achieve this?

The aim is to direct sessions to the host that will provide the correct language. Geolocation is the best option because it is based on national borders. Geoproximity routing is another option where the decision can be based on distance. While latency-based routing will usually direct the client to the correct host, connectivity issues with the US Regions might direct traffic to AP. In this case, the word "ensure" is operative: users MUST connect to the English-language site. Watch the wording in the exam: a requirement may be presented very casually in the wording of the question. However, understanding that requirement is mandatory if you're going to arrive at the correct answer.

You are a solutions architect at a large digital media company. The company has decided that they want to operate within the Japanese region, and they need a bucket called "testbucket" set up immediately for testing purposes. You log in to the AWS console and try to create this bucket in the Japanese region. However, you are told that the bucket name is already taken. What should you do to resolve this?

Bucket names are global, not regional. This is a popular bucket name and is already taken. You must choose another bucket name.

You are a consultant planning to deploy DynamoDB across three AZs. Your lead DBA is concerned about data consistency. Which of the following do you advise the lead DBA to do?

When you request a strongly consistent read, DynamoDB returns a response with the most up-to-date data, reflecting the updates from all prior write operations that were successful. However, this consistency comes with some disadvantages such as read might not be available if there is a network delay or outage, higher latency than eventually consistent reads, global secondary indexes not supported, and use of more throughput capacity than eventually consistent reads.

You work for a popular media outlet about to release a story that is expected to go viral. During load testing on the website, you discover that there is read contention on the database tier of your application. Your RDS instance consists of a MySQL database on an extra large instance. Which of the following approaches would be best to further scale this instance to meet the anticipated increase in traffic your viral story will generate?

You should consider; using ElastiCache, using RDS Read Replicas Scaling up may also resolve the contention, however it may be more expensive than offloading the read activities to cache or Read-Replicas. RDS Multi-AZ is for resilience only.

A user of your web-site makes an HTTP request to access a static resource on your server. The request is automatically redirected to the nearest CloudFront server. For some reason, the requested resource does not exist on the CloudFront server. Which of the following is true?

CloudFront checks its cache for the requested files. If the files are in the cache, CloudFront returns them to the user. If the files are not in the cache, it does the following: a) CloudFront compares the request with the specifications in your distribution and forwards the request for the files to your origin server for the corresponding file type—for example, to your Amazon S3 bucket for image files and to your HTTP server for HTML files. b) The origin servers send the files back to the edge location. c) As soon as the first byte arrives from the origin, CloudFront begins to forward the files to the user. CloudFront also adds the files to the cache in the edge location for the next time someone requests those files. Reference: How CloudFront delivers content to your users.

At the monthly product meeting, one of the Product Owners proposes an idea to address an immediate shortcoming of the product system: storing a copy of the customer price schedule in the customer record in the database. You know that you can store large text or binary objects in DynamoDB. You give a tentative OK to do a Minimal Viable Product test, but stipulate that it must comply with the size limitation on the Attribute Name & Value. Which is the correct limitation?

The combined Value and Name combined must not exceed 400 KB.
DynamoDB allows for the storage of large text and binary objects, but there is a limit of 400 KB.


Comments

Popular posts from this blog

Effect : Deny vs No Action

AWS Summaries

Infrastructure Setup using Cloud Formation Templates