データ解析型プロジェクトにおける差分処理を考慮した並列分散処理(Differential Processing for Data-analytic Projects with Parallel and Distributed Processing} )

Reviewed
Lin Li, Hirotaka Hokazono, Yuichi Hattori, Sozo Inoue,
マルチメディア,分散,協調とモバイルシンポジウム(DICOMO2012)予稿集
(Not Available)
(Not Available)
7 pages
2012-07-04
Ishikawa, Japan
http://www.dicomo.org/2012/
In this paper, we propose a distributed parallel processing system for data-analytic project, which manages dependency among data and analytic programs, and re-execute updated programs and dependent programs for updated data/programs. In the system, a data analyzer can specify the dependency, parts for requiring distributed parallel processing using Hadoop Streaming, and they can be processed only for updated and dependent part, with flexibly selecting parallel or sequential execution. The specification can also specify multiple execution for the same program for different data as a simple statement, while their dependencies are checked separately.

Data Files