MLlib contains many algorithms and utilities.This section provides a brief description of both Spark and MLlib . MLlib is Spark's machine learning(ML) library.Scikit-learning和SparkMLlib 是机器学习框架。 Scikit-learn and Spark MLlib are machine learning frameworks. MLlib can easily be plugged into Hadoop workflows.
在这篇文章中,我们将使用SparkMLlib 开发Java中的算法。 In this post, we are going to develop an algorithm in Java using Spark MLlib . MLlib (Spark)是ApacheSpark的机器学习库。MLlib (Spark) is Apache Spark's machine learning library.它可以与ApacheSpark、MLlib 、HBase、Elasticsearch和Spray轻松绑定。 It can be easily bundled with Apache Spark, MLlib , HBase, Elasticsearch, and Spray. MLlib-MLlib 包含构建在SparkRDD基础之上的原始API。Mllib contains the original API built on top of RDDs.这一应用使用一个基本的算法来基于SparkMLlib 介绍机器学习的概念。 This application uses a basic algorithm to introduce the concept of machine learning based on Spark MLlib . MLlib-MLlib 包含构建在SparkRDD基础之上的原始API。Mllib package contains the original API built on top of RDDs.PCA可使用热门的机器学习(ML)库(如在ApacheSpark上运行的MLlib )来执行。 PCA can be executed using popular machine learning(ML) libraries like MLlib that runs on Apache Spark. Mllib 附带了许多机器学习算法,可用于学习和预测数据。Mllib comes with a number of machine learning algorithms that can be used to learn from and make predictions on data. 如果想要进行机器学习和预测建模,Mahout或MLLib 会更好地满足您的需求吗? If you're looking to do machine learning and predictive modeling, would Mahout or MLLib suit your purposes better? SparkMLlib 处理用于转换数据集的机器学习模型,表示为RDD或DataFrames。 Spark MLib handles machine-learning models used for transforming datasets, which are represented as RDDs or DataFrames. 了解如何使用Java和SparkMLlib 开发一种算法,该算法能够根据700万条记录的数据集检测欺诈行为。 Learn how to develop an algorithm with Java and Spark MLlib that can detect fraud based on a dataset with seven million records. SparkMLlib 提供了你想要的基本机器学习、特性选择、管道和持久性的任何东西。 Spark MLlib supplies pretty much anything you would want in the way of basic machine learning, feature selection, pipelines, and persistence. 作为ApacheSpark项目的一部分,MLlib 是一个机器学习库,承诺性能比MapReduce高100倍。 Part of the Apache Spark project, MLlib is a machine learning library that promises performance 100 times faster than MapReduce. ApacheMahout(一个针对Hadoop的机器学习库)已经脱离MapReduce,转而加入SparkMLlib 。 Apache Mahout(a machine learning library for Hadoop) has already turned away from MapReduce and joined forces on Spark MLlib . 早在四月,当时是SparkMLlib组的经理Xiangrui在LinkedIn上联系我,询问我是否对SparkMLlib 团队中的某个职位感兴趣。 Back in April, Xiangrui contacted me via LinkedIn asking me if I was interested in a position on the Spark MLlib team. ApacheMahout(一个Hadoop的机器学习库)摒弃MapReduce并将所有的力量放在SparkMLlib 上。 Apache Mahout(a machine learning library for Hadoop) has already turned away from MapReduce and joined forces on Spark MLlib . 有趣的是,Hadoop体系的的开源工具里,针对于MLlib 和其他免费开源分析/数据挖掘工具的使用量也在减少。 What is interesting is the decline in usage for Hadoop Open Source Tools, for MLlib and Other free analytics/data mining tools. Spark的MLlib 可以使用基于Hadoop的数据源,例如Hadoop分布式文件系统(HDFS)或HBase,以及本地文件。 Spark's MLlib can use a Hadoop-based data source, for example, Hadoop Distributed File System(HDFS) or HBase, as well as local files. 架构师可以扩展当前流行的函数库(例如MLlib 和TensorFlow),以使用这些工具创建预测分析应用程序。 Architects can expand on the popularity of libraries such as MLlib and TensorFlow to create predictive analytics applications using these tools. 它建立在ApacheSpark、MLlib 和HBase之上,甚至在Github上被评为最受欢迎的基于ApacheSpark的机器学习产品。 It is built on Apache Spark, MLlib , and HBase and was even ranked on Github as the most popular Apache Spark-based machine learning product. 如前所述,许多专业数据科学家选择使用开源机器学习工具,如TensorFlow,ApacheSpark的MLlib 或Caffe。 As already mentioned, many professional data scientists choose to use open source machine learning tools, such as TensorFlow, Apache Spark's MLlib or Caffe. Spark生态环境包括MLlib (机器学习库),可持续加速和优化分类、回归、聚类等数据处理。 The Spark ecosystem includes MLlib (machine learning library), which constantly accelerates and improves data processes like classification, regression, clustering, and more. SparkMLlib 包含一个用于创建机器学习管道的框架,允许在任何结构化数据集上轻松实现特征提取、选择和转换。 Spark MLlib includes a framework for creating machine learning pipelines, allowing for easy implementation of feature extraction, selections, and transformations on any structured dataset. 目前这些库包括SparkSQL、SparkStreaming、MLlib (用于机器学习)以及GraphX,我们会在稍后针对每一个库进行进一步描述。 These libraries currently include SparkSQL, Spark Streaming, MLlib (for machine learning), and GraphX, each of which is further detailed in this article. 作为包含内存数据处理的框架,ApacheSparkMLlib 具有算法数据库,其重点是聚类,协同过滤,分类和回归。 As a framework that contains in-memory data processing, Apache Spark MLlib features an algorithms database with a focus on clustering, collaborative filtering, classification and regression.
展示更多例子
结果: 64 ,
时间: 0.0163
English
Bahasa indonesia
日本語
عربى
Български
বাংলা
Český
Dansk
Deutsch
Ελληνικά
Español
Suomi
Français
עִברִית
हिंदी
Hrvatski
Magyar
Italiano
Қазақ
한국어
മലയാളം
मराठी
Bahasa malay
Nederlands
Norsk
Polski
Português
Română
Русский
Slovenský
Slovenski
Српски
Svenska
தமிழ்
తెలుగు
ไทย
Tagalog
Turkce
Українська
اردو
Tiếng việt