课程信息
课程名称: Hadoop开发工程师(CCDH)认证
公开班、定制班
开课时间:2025-05-10
课程介绍
【课程简介】
作为大数据核心技术,hadoop 为企业提供了高扩展、高冗余、高容错、和经济有效的“数据驱动”解决方案。针对目前普遍缺乏海量数据技术人员的现状,Cloudera公司推出面向开发人员的认证Cloudera Certified Developer for Apache Hadoop (CCDH)。通过在青蓝咨询的CCDH课程培训您将学习到:
* Hadoop核心
* HDFS和MapReduce工作原理
* 如何开发MapReduce应用
* 如何单元测试MapReduce应用
* 如何使用MapReduce combiners, partitioners和distributed cache
* 开发调试MapReduce应用
* 如何实现MapReduce应用中的输入/输出
* 常见MapReduce算法
* 如何用MapReduce来联结数据集
* 如何把Hadoop嵌入到企业已有的计算环境里
* 如何使用Mahout来进行机器学习
* 如何使用Hive和Pig来快速开发数据分析应用
* 如何使用Oozie来创建管理工作流
【授课对象】
企业管理者、CIO、CTO、政府信息部门官员、项目(开发)经理、咨询顾问 、IT经理,IT咨询顾问,IT支持专家、系统工程师、数据中心管理员、云计算管理员及想加入云计算队伍的您需要使用Apache Hadoop来开发功能强大的数据分析应用的程序开发人员。
学员需具备程序设计经验,特别是Java方面的技能和背景。无需Hadoop方面的基础和经验。
【授课内容】
了解MapReduce和HDFS是如何组合相互匹配,提供可扩展的强大系统。
学习编写针对Hadoops API的程序,掌握编写更有趣的数据处理任务所需的基本技能。
掌握如何在数据中心服务器上或Amazons EC2上部署Hadoop,利用Hadoop扩充现有系统。
掌握如何把不同类型数据导入Hadoop作进一步分析,以及利用Sqoop导入现有数据库。
掌握如何使用Hive,涉及数据导入、表格创建及作出查询。
掌握最佳方案以减轻MapReduce程序调试难度,及规模调试的本地测试工具和技术。
深入了解Hadoop API,包括自定义数据类型和文件格式,HDFS的直接访问,中间数据划分,以及其他工具,如DistributedCache。
深入了解图算法,以及PageRank。了解有效执行联接的策略,比较不同数据模型的不同技术。
掌握如何进行MapReduce程序优化,提高性能。
模块 |
内容 |
The Motivation for Hadoop
|
l Problems with Traditional Large-Scale Systems l Introducing Hadoop l Hadoopable Problems |
The Motivation for Hadoop
|
l Problems with Traditional Large-Scale Systems l Introducing Hadoop l Hadoopable Problems |
Hadoop: Basic Concepts and HDFS
|
l The Hadoop Project and Hadoop Components l The Hadoop Distributed File System |
Introduction to MapReduce V2
|
l MapReduce Overview l Example: WordCount l Mappers l Reducers |
Hadoop Clusters and the Hadoop Ecosystem
|
l Hadoop Cluster Overview l Hadoop Jobs and Tasks l Other Hadoop Ecosystem Components |
Writing a MapReduce Program in Java
|
l Basic MapReduce API Concepts l Writing MapReduce Drivers, Mappers, and Reducers in Java l Speeding Up Hadoop Development by Using Eclipse l Differences Between the Old and New MapReduce APIs |
Writing a MapReduce Program Using Streaming |
l Writing Mappers and Reducers with the Streaming API |
Unit Testing MapReduce Programs
|
l Unit Testing l The JUnit and MRUnit Testing Frameworks l Writing Unit Tests with MRUnit l Running Unit Tests |
Delving Deeper into the Hadoop API
|
l Using the ToolRunner Class l Setting Up and Tearing Down Mappers and Reducers l Decreasing the Amount of Intermedi-ate Data with Combiners l Accessing HDFS Programmatically l Using The Distributed Cache l Using the Hadoop API’s Library of Mappers,Reducers, and Partitioners |
Practical Development Tips and Techniques |
l Strategies for Debugging MapReduce Code l Testing MapReduce Code Locally by Using |
LocalJobRunner
|
l Writing and Viewing Log Files l Retrieving Job Information with Counters l Reusing Objects l Creating Map-Only MapReduce Jobs |
Partitioners and Reducers
|
l How Partitioners and Reducers Work Together l Determining the Optimal Number of Reduc-ers for a Job l Writing Customer Partitioners |
Data Input and Output
|
l Creating Custom Writable and Writable-Comparable Implementations l Saving Binary Data Using SequenceFile andAvro Data Files l Issues to Consider When Using File Compression l Implementing Custom InputFormats and OutputFormats |
Common MapReduce Algorithms
|
l Sorting and Searching Large Data Sets l Indexing Data l Computing Term Frequency — Inverse Document Frequency l Calculating Word Co-Occurrence l Performing Secondary Sort |
Joining Data Sets in MapReduce Jobs |
l Writing a Map-Side Join l Writing a Reduce-Side Join |
Integrating Hadoop into the Enterprise Workflow |
l Integrating Hadoop into an Existing Enterprise l Loading Data from an RDBMS into HDFS by Using Sqoop l Managing Real-Time Data Using Flume l Accessing HDFS from Legacy Systems with FuseDFS and HttpFS |
An Introduction to Hive, Imapala, and Pig |
l The Motivation for Hive, Impala, and Pig l Hive Overview l Impala Overview l Pig Overview l Choosing Between Hive, Impala, and Pig |
An Introduction to Oozie |
l Introduction to Oozie l Creating Oozie Workflows |
Conclusion |
l Conclusion |
注:具体开课时间将根据实际进行调整,请关注青蓝咨询官方公众号消息或咨询课程顾问!
【联系青蓝咨询】
地址: 深圳市南山区高新南一道06号TCL大厦B座3楼309室 (公交站:大冲 地铁站:一号线高新园C出口)
邮编:518057
电话:0755-86950769