Cloud Support Engineer Questions
aka. Site Reliability Engineer (SRE) Interview Preparation
General
1. What is Cloud Support Engineer?
First, it's a customer support role in the cloud.
- Apply troubleshooting techniques to provide solutions to our customers' needs.
- Help customer to use Big Data services and Machine Learning solutions.
- Communicate with customers and resolve customer issues.
2. Why AWS and Amazon ?
- Amazon is a top internet company with a strong focus on customer experience. And there are a lot of talented people in amazon creating amazing products to make people’s life better.
- Amazon web service is an online platform that provides scalable and cost-effective cloud computing solutions, prividing a number of services like Storage, Computing, Networking, CDNs and so on.
- AWS has a about 40% market share, making it the king of cloud computing service providers.
- Like one of the leadership principles
Learn and Be Curious
. I'm always seeking the ways to learn new knowledge and improve myself. I can learn and use groundbreaking technologies in AWS. - Another one is
Ownership
, I never say "that’s not my job.". I faced a number of issues and problems. I fixed so many bugs. I learned a wide range of knowledge from Machine Learning, front end, to back end, from Python to Java, I configured databases, servers, containers like Docker, I deployed web applications, set up load balancing servers. I also performed a load test via JMeter.
Network
1. What is TCP/IP
A set of protocols that define how two or more devices can communicate with each other.
2. What is a MAC address?
A MAC address is a unique identification number or code used to identify individual devices on the network.
3. What is an IP address?
An Internet Protocol address (IP address) is a numerical label assigned to each device connected to a computer network that uses the Internet Protocol for communication.An IP address serves two main functions: host or network interface identification and location addressing.
4. Explain the OSI model. What layers there are? What each layer is responsible for?
- Application: user end (HTTP is here)
- Presentation: establishes context between application-layer entities (Encryption 加密 is here)
- Session: establishes, manages and terminates the connections
- Transport: transfers variable-length data sequences from a source to a destination host (TCP & UDP are here)
- Network: transfers datagrams from one network to another (IP is here)
- Data link: provides a link between two directly connected nodes (MAC is here)
- Physical: the electrical and physical spec the data connection (Bits are here)
Hadoop
1. How is data stored in HDFS reliably and how is the data recovered, if a node fails in a Hadoop cluster hosting HDFS?
- Reliabililty is achieved by replication and it is recommnded to gave atleast 3 replicas for HDFS blocks with rack awareness.
- HDFS’s placement policy is to put one replica on one node in the local rack, another on a node in a different (remote) rack, and the last on a different node in the same remote rack. 本地机架 1 个,远程机架 1 个,远程机架上的不同节点 1 个
- This policy cuts the inter-rack write traffic which generally improves write performance.
- The purpose of a rack-aware replica placement policy is to improve data reliability, availability, and network bandwidth utilization.
- On the instance of node failure, the node is marked as dead because of the lack of heartbeat. Balancing makes the blocks to spread out on the other clusters using replicas.
- Each
DataNode
sends a Heartbeat message to theNameNode
periodically. 定期发送心跳消息
- Each
2. How do you find the container number of the ApplicationMaster
?
it's always the very first container
3. In mapreduce jobs, what are the primary resource bottle necks?
- memory
- i/o
- network
4. What are the three different stages of a reducer?
- shuffle
- sort
- actual reducer code
5. What kind of containers can a NodeManager
create? or can you tell me the architecture of yarn?
- application master
- mapper
- reducer
- executor
6. What is the benefit of using Capacity scheduler over fair scheduler in Hadoop?
Capacity scheduler gaurantees a minimum resource available for queues all the time. There is an added benefit that cluster's excess capacity can be used by others when available. This provides elasticity for the users in a cost-effective manner.