Cloud computing architecture consists of many layers which help it to be more organized and can be managed from one place. The layers are as follows:
1. Cloud controller or CLC is the top most level in the hirerachy which is used to manage the virtualized resources like servers, network and storage with the user APIs.
2. Walrus is used for the storage and act as a storage controller to manage the demands of the users. It maintains a scalable approach to control the virtual machine images and user data.
3. Cluster Controller or CC is used to control all the virtual machines for executions the virtual machines are stored on the nodes and manages the virtual networking between Virtual machines and external users.
4. Storage Controller or SC provides a storage area in block form that are dynamically attached by Virtual machines.
5. Node Controller or NC is at the lowest level and provides the functionality of a hypervisor that controls the VMs activities, which includes execution, management and termination of many instances.
In a mapreduce job the master pings each worker periodically. In case a worker does not respond to that system then the system is marked as failed. Even completed tasks are rescheduled because the output was stored in a in a local disk of a worker which failed. Hence mapreduce is able to handle large-scale failures easily by simply restarting a task. The master node always saves itself at checkpoints and in case of any failure it simply restarts from that checkpoint.
MapReduce is a software framework that was created by Google. It`s prime focus was to aid in distributed computing, specifically large sets of data on a group of many computers. The frameworks took its inspiration from the map and reduce functions from functional programming.
Amazon S3 provides uploading of large files and retrieve small offsets for end-to-end transfer data rates. The large file gets stored into small files that are smaller in size. Amazon S3 stores multiple of files together in a bundle or in a compressed form, For example in .gzip or .gz format and then convert them into Amazon S3 objects. The files get uploaded on the Amazon server by the use of FTP or another protocol and then retrieved through the HTTP GET request. The request includes the defined parameters like URL, offset (byte-range) and size (length).