Tune the solution. At this stage, if the solution you have created is working correctly and theresults are valuable, you should decide whether you will repeat it in the future; perhaps withnew data you collect over time. If so, you should tune the solution by reviewing the log files itcreates, the processing techniques you use, and the implementation of the queries to ensurethat they are executing in the most efficient way. It’s possible to fine tune big data solutions toimprove performance, reduce network load, and minimize the processing time by adjustingsome parameters of the query and the execution platform, or by compressing the data that istransferred over the network.Visualize and analyze the results. Once you are satisfied that the solution is working correctlyand efficiently, you can plan and implement the analysis and visualization approach you require.This may be loading the data directly into an application such as Microsoft Excel, or exporting it
Planning a big data solution33into a database or enterprise BI system for further analysis, reporting, charting, and more. Formore details, seeConsuming and visualizing data from HDInsight.Automate and manage the solution. At this point it will be clear if the solution should becomepart of your organization’s business management infrastructure, complementing the othersources of information that you use to plan and monitor business performance and strategy. Ifthis is the case, you should consider how you might automate and manage some or all of thesolution to provide predictable behavior, and perhaps so that it is executed on a schedule. Formore details, seeBuilding end-to-end solutions using HDInsight.Note that, in many ways, data analysis is an iterative process; and you should take this approach whenbuilding a big data batch processing solution. In particular, given the large volumes of data andcorrespondingly long processing times typically involved in big data analysis, it can be useful to start byimplementing a proof of concept iteration in which a small subset of the source data is used to validatethe processing steps and results before proceeding with a full analysis. This enables you to test your bigdata processing design on a small cluster, or even on a single-node on-premises cluster, before scalingout to accommodate production level data volumes.It’s easy to run queries that extract data, but it’s vitally important that you make every effort to validatethe results before using them as the basis for business decisions. If possible you should try to crossreference the results with other sources of similar information.Is big data the right solution?The first step in evaluating and implementing any business policy, whether it’s related to computerhardware, software, replacement office furniture, or the contract for cleaning the windows, is todetermine the results that you hope to achieve. Deciding whether to adopt a Hadoop-based big databatch processing approach is no different.
Upload your study docs or become a
Course Hero member to access this document
Upload your study docs or become a
Course Hero member to access this document
End of preview. Want to read all 346 pages?
Upload your study docs or become a
Course Hero member to access this document