Pentaho Kettle Solutions: Building Open Source Etl Solutions With Pentaho Data Integration 9780470635179 下载 pdf 百度网盘 epub 免费 2025 电子书 mobi 在线

Pentaho Kettle Solutions: Building Open Source Etl Solutions With Pentaho Data Integration 9780470635179精美图片
》Pentaho Kettle Solutions: Building Open Source Etl Solutions With Pentaho Data Integration 9780470635179电子书籍版权问题 请点击这里查看《

Pentaho Kettle Solutions: Building Open Source Etl Solutions With Pentaho Data Integration 9780470635179书籍详细信息

  • ISBN:9780470635179
  • 作者:暂无作者
  • 出版社:暂无出版社
  • 出版时间:2010-09
  • 页数:720
  • 价格:298.10
  • 纸张:胶版纸
  • 装帧:平装
  • 开本:大32开
  • 语言:未知
  • 丛书:暂无丛书
  • TAG:暂无
  • 豆瓣评分:暂无豆瓣评分
  • 豆瓣短评:点击查看
  • 豆瓣讨论:点击查看
  • 豆瓣目录:点击查看
  • 读书笔记:点击查看
  • 原文摘录:点击查看
  • 更新时间:2025-01-20 00:50:29

内容简介:

  A complete guide to Pentaho Kettle, the Pentaho Data

lntegration toolset for ETL This practical book is a complete guide

to installing, configuring, and managing Pentaho Kettle. If you’re

a database administrator or developer, you’ll first get up to speed

on Kettle basics and how to apply Kettle to create ETL

solutions—before progressing to specialized concepts such as

clustering, extensibility, and data vault models. Learn how to

design and build every phase of an ETL solution.

Shows developers and database administrators how to use the

open-source Pentaho Kettle for enterprise-level ETL processes

(Extracting, Transforming, and Loading data) Assumes no prior

knowledge of Kettle or ETL, and brings beginners thoroughly up to

speed at their own pace Explains how to get Kettle solutions up and

running, then follows the 34 ETL subsystems model, as created by

the Kimball Group, to explore the entire ETL lifecycle, including

all aspects of data warehousing with Kettle Goes beyond routine

tasks to explore how to extend Kettle and scale Kettle solutions

using a distributed “cloud” Get the most out of Pentaho Kettle and

your data warehousing with this detailed guide—from simple single

table data migration to complex multisystem clustered data

integration tasks.

From the Back Cover The ultimate resource on building and

deploying data integration solutions with Kettle

Kettle is a scaleable and extensible open source ETL and data

integration tool that lets you extract data from databases, flat

and XML files, web services, ERP systems, and OLAP cubes. It

provides over 120 built-in transformation steps to validate,

cleanse, and conform data, as well as numerous options to load data

into data warehouses and many other targets. Kettle is a

comprehensive, low-cost alternative to traditional data integration

tools like Informatica PowerCenter, IBM InfoSphere DataStage, and

BusinessObjects Data Integrator.

This book explains in detail how to use Kettle to create, test,

and deploy your own ETL and data integration solutions. You'll

learn to use Kettle's programs to create transformations and jobs,

use version control, audit data, and schedule your ETL solution.

Then you'll progress to more advanced concepts such as clustering

and cloud computing, real-time data integration, loading a Data

Vault model, and extending Kettle by building your own plugins. In

addition, you'll find hands-on examples and case studies that show

exactly how to put Kettle's features into practice.

Explore the components of the Kettle ETL toolset

Discover how to install and configure Kettle and connect it to

various data sources and targets

Design and build every aspect of an ETL solution using

Kettle

Learn how to load a data warehouse with Kettle

Understand the steps for deploying and scheduling ETL

solutions

Gain the skills to integrate Kettle with third-party

products

Learn to extend Kettle and build your own plugins

Use clustering and cloud computing to scale and improve the

performance of your Kettle ETL solutions

Find out how to use Kettle for real-time data integration


书籍目录:

Introduction xxxi Part I Getting Started

 Chapter

 ETL Primer

 OLTP versus Data Warehousing

 What Is ETL?

 The Evolution of ETL Solutions

 ETL Building Blocks

 ETL, ELT, and EII

 ELT

 EII: Virtual Data Integration

0 Data Integration Challenges

1 Methodology: Agile BI

2 ETL Design

4 Data Acquisition

4 Beware of Spreadsheets

5 Design for Failure

5 Change Data Capture

6 Data Quality

6 Data Profiling

6 Data Validation

7 ETL Tool Requirements

7 Connectivity

7 Platform Independence

8 Scalability

8 Design Flexibility

9 Reuse

9 Extensibility

9 Data Transformations

0 Testing and Debugging

1 Lineage and Impact Analysis

1 Logging and Auditing

2 Summary

2 Chapter

 Kettle Concepts

3 Design Principles

3 The Building Blocks of Kettle Design

5 Transformations

5 Steps

6 Transformation Hops

6 Parallelism

7 Rows of Data

7 Data Conversion

9 Jobs

0 Job Entries

1 Job Hops

1 Multiple Paths and Backtracking

2 Parallel Execution

3 Job Entry Results

4 Transformation or Job Metadata

6 Database Connections

7 Special Options

8 The Power of the Relational Database

9 Connections and Transactions

9 Database Clustering

0 Tools and Utilities

1 Repositories

1 Virtual File Systems

2 Parameters and Variables

3 Defining Variables

3 Named Parameters

4 Using Variables

4 Visual Programming

5 Getting Started

6 Creating New Steps

7 Putting It All Together

9 Summary

1 Chapter

 Installation and Configuration

3 Kettle Software Overview

3 Integrated Development Environment: Spoon

5 Command-Line Launchers: Kitchen and Pan

7 Job Server: Carte

7 Encr.bat and encr.sh

8 Installation

8 Java Environment

8 Installing Java Manually

8 Using Your Linux Package Management System

9 Installing Kettle

9 Versions and Releases

9 Archive Names and Formats

0 Downloading and Uncompressing

0 Running Kettle Programs

1 Creating a Shortcut Icon or Launcher for Spoon

2 Configuration

3 Configuration Files and the .kettle Directory

3 The Kettle Shell Scripts

9 General Structure of the Startup Scripts

0 Adding an Entry to the Classpath

0 Changing the Maximum Heap Size

1 Managing JDBC Drivers

2 Summary

2 Chapter

 An Example ETL Solution--Sakila

3 Sakila

3 The Sakila Sample Database

4 DVD Rental Business Process

4 Sakila Database Schema Diagram

5 Sakila Database Subject Areas

5 General Design Considerations

7 Installing the Sakila Sample Database

7 The Rental Star Schema

8 Rental Star Schema Diagram

8 Rental Fact Table

9 Dimension Tables

9 Keys and Change Data Capture

0 Installing the Rental Star Schema

1 Prerequisites and Some Basic Spoon Skills

1 Setting Up the ETL Solution

2 Creating Database Accounts

2 Working with Spoon

2 Opening Transformation and Job Files

2 Opening the Step's Configuration Dialog

3 Examining Streams

3 Running Jobs and Transformations

3 The Sample ETL Solution

4 Static, Generated Dimensions

4 Loading the dim-date Dimension Table

4 Loading the dim-time Dimension Table

6 Recurring Load

7 The load-rentals Job

8 The load-dim-staff Transformation

1 Database Connections

1 The load-dim-customer Transformation

5 The load-dim-store Transformation

8 The fetch-address Subtransformation

9 The load-dim-actor Transformation

01 The load-dim-film Transformation

02 The load-fact-rental Transformation

07 Summary

09 Part II ETL

11 Chapter

 ETL Subsystems

13 Introduction to the

4 Subsystems

14 Extraction

14 Subsystems

--3: Data Profiling, Change Data Capture, and Extraction

15 Cleaning and Conforming Data

16 Subsystem

: Data Cleaning and Quality Screen Handler System

16 Subsystem

: Error Event Handler

17 Subsystem

: Audit Dimension Assembler

17 Subsystem

: Deduplication System

17 Subsystem

: Data Conformer

18 Data Delivery

18 Subsystem

: Slowly Changing Dimension Processor

18 Subsystem

0: Surrogate Key Creation System

19 Subsystem

1: Hierarchy Dimension Builder

19 Subsystem

2: Special Dimension Builder

20 Subsystem

3: Fact Table Loader

21 Subsystem

4: Surrogate Key Pipeline

21 Subsystem

5: Multi-Valued Dimension Bridge Table Builder

21 Subsystem

6: Late-Arriving Data Handler

22 Subsystem

7: Dimension Manager System

22 Subsystem

8: Fact Table Provider System

22 Subsystem

9: Aggregate Builder

23 Subsystem

0: Multidimensional (OLAP) Cube Builder

23 Subsystem

1: Data Integration Manager

23 Managing the ETL Environment

23 Summary

26 Chapter

 Data Extraction

27 Kettle Data Extraction Overview

28 File-Based Extraction

28 Working with Text Files

28 Working with XML files

33 Special File Types

34 Database-Based Extraction

34 Web-Based Extraction

37 Text-Based Web Extraction

37 HTTP Client

37 Using SOAP

38 Stream-Based and Real-Time Extraction

38 Working with ERP and CRM Systems

38 ERP Challenges

39 Kettle ERP Plugins

40 Working with SAP Data

40 ERP and CDC Issues

46 Data Profiling

46 Using eobjects.org DataCleaner

47 Adding Profile Tasks

49 Adding Database Connections

49 Doing an Initial Profile

51 Working with Regular Expressions

51 Profiling and Exploring Results

52 Validating and Comparing Data

53 Using a Dictionary for Column Dependency Checks

53 Alternative Solutions

54 Text Profiling with Kettle

54 CDC: Change Data Capture

54 Source Data--Based CDC

55 Trigger-Based CDC

57 Snapshot-Based CDC

58 Log-Based CDC

62 Which CDC Alternative Should You Choose?

63 Delivering Data

64 Summary

64 Chapter

 Cleansing and Conforming

67 Data Cleansing

68 Data-Cleansing Steps

69 Using Reference Tables

72 Conforming Data Using Lookup Tables

72 Conforming Data Using Reference Tables

75 Data Validation

79 Applying Validation Rules

80 Validating Dependency Constraints

83 Error Handling

83 Handling Process Errors

84 Transformation Errors

86 Handling Data (Validation) Errors

87 Auditing Data and Process Quality

91 Deduplicating Data

92 Handling Exact Duplicates

93 The Problem of Non-Exact Duplicates

94 Building Deduplication Transforms

95 Step

: Fuzzy Match

97 Step

: Select Suspects

98 Step

: Lookup Validation Value

98 Step

: Filter Duplicates

99 Scripting

00 Formula

01 JavaScript

02 User-Defined Java Expressions

02 Regular Expressions

03 Summary

05 Chapter

 Handling Dimension Tables

07 Managing Keys

08 Managing Business Keys

09 Keys in the Source System

09 Keys in the Data Warehouse

09 Business Keys

09 Storing Business Keys

10 Looking Up Keys with Kettle

10 Generating Surrogate Keys

10 The "Add sequence" Step

11 Working with auto-increment or IDENTITY Columns

17 Keys for Slowly Changing Dimensions

17 Loading Dimension Tables

18 Snowflaked Dimension Tables

18 Top-Down Level-Wise Loading

19 Sakila Snowflake Example

19 Sample Transformation

21 Database Lookup Configuration

22 Sample Job

25 Star Schema Dimension Tables

26 Denormalization

26 Denormalizing to

NF with the "Database lookup" Step

26 Change Data Capture

27 Slowly Changing Dimensions

28 Types of Slowly Changing Dimensions

28 Type

 Slowly Changing Dimensions

29 The Insert / Update Step

29 Type

 Slowly Changing Dimensions

32 The "Dimension lookup / update" Step

32 Other Types of Slowly Changing Dimensions

37 Type

 Slowly Changing Dimensions

37 Hybrid Slowly Changing Dimensions

38 More Dimensions

39 Generated Dimensions

39 Date and Time Dimensions

39 Generated Mini-Dimensions

39 Junk Dimensions

41 Recursive Hierarchies

42 Summary

43 Chapter

 Loading Fact Tables

45 Loading in Bulk

46 STDIN and FIFO

47 Kettle Bulk Loaders

48 MySQL Bulk Loading

49 LucidDB Bulk Loader

49 Oracle Bulk Loader

49 PostgreSQL Bulk Loader

50 Table Output Step

50 General Bulk Load Considerations

50 Dimension Lookups

51 Maintaining Referential Integrity

51 The Surrogate Key Pipeline

52 Using In-Memory Lookups

53 Stream Lookups

53 Late-Arriving Data

55 Late-Arriving Facts

56 Late-Arriving Dimensions

56 Fact Table Handling

60 Periodic and Accumulating Snapshots

60 Introducing State-Oriented Fact Tables

61 Loading Periodic Snapshots

63 Loading Accumulating Snapshots

64 Loading State-Oriented Fact Tables

65 Loading Aggregate Tables

66 Summary

67 Chapter

0 Working with OLAP Data

69 OLAP Benefits and Challenges

70 OLAP Storage Types

72 Positioning OLAP

72 Kettle OLAP Options

73 Working with Mondrian

74 Working with XML/A Servers

77 Working with Palo

82 Setting Up the Palo Connection

83 Palo Architecture

84 Reading Palo Data

85 Writing Palo Data

89 Summary

91 Part III Management and Deployment

93 Chapter

1 ETL Development Lifecycle

95 Solution Design

95 Best and Bad Practices

96 Data Mapping

97 Naming and Commentary Conventions

98 Common Pitfalls

99 ETL Flow Design

00 Reusability and Maintainability

00 Agile Development

01 Testing and Debugging

06 Test Activities

07 ETL Testing

08 Test Data Requirements

08 Testing for Completeness

09 Testing Data Transformations

11 Test Automation and Continuous Integration

11 Upgrade Tests

12 Debugging

12 Documenting the Solution

15 Why Isn't There Any Documentation?

16 Myth

: My Software Is Self-Explanatory

16 Myth

: Documentation Is Always Outdated

16 Myth

: Who Reads Documentation Anyway?

17 Kettle Documentation Features

17 Generating Documentation

19 Summary

20 Chapter

2 Scheduling and Monitoring

21 Scheduling

21 Operating System--Level Scheduling

22 Executing Kettle Jobs and Transformations from the Command

Line

22 UNIX-Based Systems: cron

26 Windows: The at utility and the Task Scheduler

27 Using Pentaho's Built-in Scheduler

27 Creating an Action Sequence to Run Kettle Jobs and

Transformations

28 Kettle Transformations in Action Sequences

29 Creating and Maintaining Schedules with the Administration

Console

30 Attaching an Action Sequence to a Schedule

33 Monitoring

33 Logging

33 Inspecting the Log

33 Logging Levels

35 Writing Custom Messages to the Log

36 E-mail Notifications...


作者介绍:

  Matt Casters is Founder of Kettle and works as Chief Data

Integration at Pentaho, where he leads Kettle software development.

Roland Bouman is an application developer focusing on open source

web technology, databases, and business intelligence. Jos van

Dongen is an independent business intelligence consultant and

well-known author, analyst, and presenter.


出版社信息:

暂无出版社相关信息,正在全力查找中!


书籍摘录:

暂无相关书籍摘录,正在全力查找中!



原文赏析:

暂无原文赏析,正在全力查找中!


其它内容:

书籍介绍

A complete guide to Pentaho Kettle, the Pentaho Data lntegration toolset for ETL This practical book is a complete guide to installing, configuring, and managing Pentaho Kettle. If you’re a database administrator or developer, you’ll first get up to speed on Kettle basics and how to apply Kettle to create ETL solutions—before progressing to specialized concepts such as clustering, extensibility, and data vault models. Learn how to design and build every phase of an ETL solution. Shows developers and database administrators how to use the open-source Pentaho Kettle for enterprise-level ETL processes (Extracting, Transforming, and Loading data) Assumes no prior knowledge of Kettle or ETL, and brings beginners thoroughly up to speed at their own pace Explains how to get Kettle solutions up and running, then follows the 34 ETL subsystems model, as created by the Kimball Group, to explore the entire ETL lifecycle, including all aspects of data warehousing with Kettle Goes beyond routine tasks to explore how to extend Kettle and scale Kettle solutions using a distributed “cloud” Get the most out of Pentaho Kettle and your data warehousing with this detailed guide—from simple single table data migration to complex multisystem clustered data integration tasks. From the Back Cover The ultimate resource on building and deploying data integration solutions with Kettle Kettle is a scaleable and extensible open source ETL and data integration tool that lets you extract data from databases, flat and XML files, web services, ERP systems, and OLAP cubes. It provides over 120 built-in transformation steps to validate, cleanse, and conform data, as well as numerous options to load data into data warehouses and many other targets. Kettle is a comprehensive, low-cost alternative to traditional data integration tools like Informatica PowerCenter, IBM InfoSphere DataStage, and BusinessObjects Data Integrator. This book explains in detail how to use Kettle to create, test, and deploy your own ETL and data integration solutions. You'll learn to use Kettle's programs to create transformations and jobs, use version control, audit data, and schedule your ETL solution. Then you'll progress to more advanced concepts such as clustering and cloud computing, real-time data integration, loading a Data Vault model, and extending Kettle by building your own plugins. In addition, you'll find hands-on examples and case studies that show exactly how to put Kettle's features into practice. Explore the components of the Kettle ETL toolset

Discover how to install and configure Kettle and connect it to various data sources and targets

Design and build every aspect of an ETL solution using Kettle

Learn how to load a data warehouse with Kettle

Understand the steps for deploying and scheduling ETL solutions

Gain the skills to integrate Kettle with third-party products

Learn to extend Kettle and build your own plugins

Use clustering and cloud computing to scale and improve the performance of your Kettle ETL solutions

Find out how to use Kettle for real-time data integration


书籍真实打分

  • 故事情节:8分

  • 人物塑造:6分

  • 主题深度:7分

  • 文字风格:4分

  • 语言运用:7分

  • 文笔流畅:5分

  • 思想传递:4分

  • 知识深度:5分

  • 知识广度:8分

  • 实用性:5分

  • 章节划分:6分

  • 结构布局:5分

  • 新颖与独特:9分

  • 情感共鸣:5分

  • 引人入胜:7分

  • 现实相关:5分

  • 沉浸感:9分

  • 事实准确性:9分

  • 文化贡献:6分


网站评分

  • 书籍多样性:8分

  • 书籍信息完全性:5分

  • 网站更新速度:5分

  • 使用便利性:7分

  • 书籍清晰度:5分

  • 书籍格式兼容性:3分

  • 是否包含广告:3分

  • 加载速度:3分

  • 安全性:6分

  • 稳定性:4分

  • 搜索功能:6分

  • 下载便捷性:5分


下载点评

  • 书籍完整(653+)
  • 内容完整(96+)
  • 好评多(197+)
  • 图书多(258+)
  • 已买(61+)
  • 一般般(93+)
  • 无漏页(103+)
  • 排版满分(236+)
  • 无多页(616+)
  • 无水印(559+)
  • 简单(91+)
  • 经典(94+)
  • 一星好评(460+)

下载评价

  • 网友 隗***杉: ( 2025-01-13 17:19:39 )

    挺好的,还好看!支持!快下载吧!

  • 网友 马***偲: ( 2025-01-14 16:40:50 )

    好 很好 非常好 无比的好 史上最好的

  • 网友 利***巧: ( 2024-12-23 01:44:54 )

    差评。这个是收费的

  • 网友 寇***音: ( 2025-01-13 07:18:12 )

    好,真的挺使用的!

  • 网友 詹***萍: ( 2025-01-01 09:32:04 )

    好评的,这是自己一直选择的下载书的网站

  • 网友 邱***洋: ( 2025-01-16 23:30:14 )

    不错,支持的格式很多

  • 网友 师***怀: ( 2025-01-15 03:06:11 )

    好是好,要是能免费下就好了

  • 网友 仰***兰: ( 2024-12-22 08:13:50 )

    喜欢!很棒!!超级推荐!

  • 网友 游***钰: ( 2024-12-26 05:53:39 )

    用了才知道好用,推荐!太好用了


随机推荐