joins - Join Optimizations in PIG Hash Join(Single MR job...

This preview shows page 1 - 13 out of 13 pages.

Join Optimizations in PIG
Image of page 1

Subscribe to view the full document.

Image of page 2
Image of page 3

Subscribe to view the full document.

Hash Join(Single MR job) Pages Users Users = load users as (name, age); Pages = load pages as (user, url); Jnd = join Users by name, Pages by user; Map 1 Pages block n Map 2 Users block m Reducer 1 Reducer 2 (1, user) (2, name) (1, fred) (2, fred) (2, fred) (1, jane) (2, jane) (2, jane)
Image of page 4
Image of page 5

Subscribe to view the full document.

Replicated Join(Single map-only job) Pages Users aaron aaron . . . . . . . zach aaron . zach Users = load users as (name, age); Pages = load pages as (user, url); Jnd = join Pages by user, Users by name using replicated’ ; Map 1 Map 2 Users Pages Pages aaron… amr aaron . zach amy… barb Users aaron . zach
Image of page 6
Image of page 7

Subscribe to view the full document.

Merge Join(Dual map-only jobs) Pages Users aaron . . . . . . . . zach aaron . . . . . . zach Users = load users as (name, age); Pages = load pages as (user, url); Jnd = join Pages by user, Users by name using merge’ ; Map 1 Map 2 Users Users Pages Pages aaron… amr aaron amy… barb amy
Image of page 8
Image of page 9

Subscribe to view the full document.