1. Zero infrastructure investment : Cloud architecture provide user to build large scale system with full hardware, machines, routers, backup and other components. So, it reduces the startup cost of the business.
2. Just-in-time Infrastructure : It is very important to scale the infrastructure as the demand rises. This can be done by taking cloud architecture and developing the application in the cloud with dynamic capacity management.
3. More efficient resource utilization : Cloud architecture provides users to use their hardware and resource more efficiently and utilize it in a better way. This can be done only by applications request and relinquish resources only when it is needed (on-demand).
In a mapreduce job the master pings each worker periodically. In case a worker does not respond to that system then the system is marked as failed. Even completed tasks are rescheduled because the output was stored in a in a local disk of a worker which failed. Hence mapreduce is able to handle large-scale failures easily by simply restarting a task. The master node always saves itself at checkpoints and in case of any failure it simply restarts from that checkpoint.
MapReduce is a software framework that was created by Google. It`s prime focus was to aid in distributed computing, specifically large sets of data on a group of many computers. The frameworks took its inspiration from the map and reduce functions from functional programming.
Amazon S3 provides uploading of large files and retrieve small offsets for end-to-end transfer data rates. The large file gets stored into small files that are smaller in size. Amazon S3 stores multiple of files together in a bundle or in a compressed form, For example in .gzip or .gz format and then convert them into Amazon S3 objects. The files get uploaded on the Amazon server by the use of FTP or another protocol and then retrieved through the HTTP GET request. The request includes the defined parameters like URL, offset (byte-range) and size (length).