In a mapreduce job the master pings each worker periodically. In case a worker does not respond to that system then the system is marked as failed. Even completed tasks are rescheduled because the output was stored in a in a local disk of a worker which failed. Hence mapreduce is able to handle large-scale failures easily by simply restarting a task. The master node always saves itself at checkpoints and in case of any failure it simply restarts from that checkpoint.
MapReduce is a software framework that was created by Google. It`s prime focus was to aid in distributed computing, specifically large sets of data on a group of many computers. The frameworks took its inspiration from the map and reduce functions from functional programming.
Amazon S3 provides uploading of large files and retrieve small offsets for end-to-end transfer data rates. The large file gets stored into small files that are smaller in size. Amazon S3 stores multiple of files together in a bundle or in a compressed form, For example in .gzip or .gz format and then convert them into Amazon S3 objects. The files get uploaded on the Amazon server by the use of FTP or another protocol and then retrieved through the HTTP GET request. The request includes the defined parameters like URL, offset (byte-range) and size (length).