How does MapReduce work? She said it was Madewell . How does MapReduce works? 我们简单地介绍了MapReduce 设计模型的概念。 We introduce the notion of MapReduce design patterns. Encaustic is one of those new ways.
目前为止,你已经了解了MapReduce 的基本原理。 So far you have learned the fundamentals of Teradata . 它带有自己的运行时,而不是构建在MapReduce 之上。 It has its own runtime, rather than being built on top of MapReduce . We're also planning to revise a lot of the MapReduce APIs. 作为MapReduce 之外的一种选择,Spark是一种数据处理引擎。 An alternative to MapReduce , Spark is a data-processing engine. The YARN-based architecture is not constrained to MapReduce . 它带有自己的运行时,而不是构建在MapReduce 之上。 It comes with its own runtime, rather than building on top of MapReduce . Google对MapReduce 的最初应用是建立万维网的索引。 Google's original application of MapReduce was for the indexing of the World Wide Web. 下一步是研究更适合MapReduce 的复杂文件格式,例如Avro和SequenceFile。 The next step is to study complex file formats that are more appropriate for mapreduce , such as Avro and Sequencefile. Xml-它包含Hadoop核心的配置设置,例如HDFS和MapReduce 常用的I/O设置。 Xml: It contains the configuration settings for Hadoop Core such as I/O settings that are common to HDFS and MapReduce . 所以当Google发表了自己的GFS和MapReduce 论文后,Yahoo应该是最早关注这些论文的公司。 So when Google published its own GFS and MapReduce papers, Yahoo should be the first company to focus on these papers. HadoopHDFS中的数据以分布式方式存储,MapReduce 负责数据的并行处理。 The data in Hadoop HDFS is stored in a distributed manner and MapReduce is responsible for the parallel processing of data. Hadoop分布式文件系统HDFS使用MapReduce 框架来处理大量的数据,需要几分钟或几小时。 Hadoop Distributed File System(HDFS) uses MapReduce framework to process the vast amount of data that takes minutes or hours. 小结:Spark是数据处理的瑞士军刀;HadoopMapReduce 是批处理的突击刀。 Bottom line: Spark is the Swiss army knife of data processing, while Hadoop MapReduce is the commando knife of batch processing. 基本上来说,MapReduce 是一组在组织和计算在跨多个数据库的数据时非常有用的函数。 Basically speaking, MapReduce is a set of functions that are very useful in organizing and computing data across multiple databases. 每一个reduce实例可以将records写到输出文件中,组成MapReduce 计算的"answer"的一部分。 Each reduce instance can write records to an output file, which forms part of the“answer” to a MapReduce computation. MapReduce 程序本质上是并行的,因此对于使用群集中的多台机器执行大规模数据分析非常有用。MapReduce programs are parallel in nature, thus are very useful for performing large-scale data analysis using multiple machines in the cluster. 一个典型的Hadoop使用模式包括三个阶段:加载数据到HDFS、MapReduce 操作、从HDFS检索结果。 A typical Hadoop usage pattern involves three stages:• loading data into HDFS,• MapReduce operations, and• retrieving results from HDFS. 缺点是MapReduce (或任何第三方软件)缺乏对读取ProtocolBuffers序列化生成的文件支持。 The disadvantage is that mapreduce (or any third-party software) lacks support for files generated by reading protocol buffers serialization. Chukwa是建立在Hadoop分布式文件系统(HDFS)和MapReduce 框架之上,并继承了Hadoop的可扩展性和鲁棒性。 Chukwa is built on top of the Hadoop distributed filesystem(HDFS) and MapReduce framework and inherits Hadoop's scalability and robustness. Netflix使用Amazon的弹性MapReduce 分发Hadoop,并开发了自己的Hadoop平台即服务,它称之为Genie。 Netflix uses Amazon's Elastic MapReduce distribution of Hadoop and has developed its own Hadoop Platform as a Service, which it calls Genie. 但是,TaskTracker和JobTracker已分别由NodeManager和ResourceManager/ApplicationMaster在第二版MapReduce 中替换。 However, TaskTracker and JobTracker have been replaced in the second version of MapReduce by Node Manager and ResourceManager/ ApplicationMaster, respectively. 在考虑如何压缩将由MapReduce 处理的数据时,理解这些压缩格式是否支持切分(splitting)是非常重要的。 When considering how to compress data that will be processed by MapReduce , it is important to understand whether the compression format supports splitting. AdaptiveMapReduce :一个IBMResearch解决方案,通过更改MapReduce 任务的处理方式来加速小型MapReduce 作业的执行。 Adaptive MapReduce : An IBM Research solution for speeding up the execution of small MapReduce jobs by changing how MapReduce tasks are handled. 因而,ApacheSpark与HadoopMapReduce 之间的差异表明ApacheSpark比MapReduce 更高级的集群计算引擎。 Hence, the differences between Apache Spark vs. Hadoop MapReduce shows that Apache Spark is much more advanced cluster computing engine than MapReduce . Amazon弹性MapReduce --该WebService通过运行在EC2和简单存储服务上的托管Hadoop框架处理大的数据集. Amazon Elastic MapReduce - A web service for processing large data sets utilizing a hosted Hadoop framework running on the EC2 and the Simple Storage Service.
Display more examples
Results: 733 ,
Time: 0.0178