interpreters for data sources that are bath processed. The serving layerqueries are written in Apache Hive and must support multiple sessions.Unique GUIDs are used across the data sources, which allow the dataanalysts to use Spark SQL.The data sources in the batch layer share a common storagecontainer. The Following data sources are used:* Hive for sales data* Apache HBase for operations data* HBase for logistics data by suing a single region server.End of Repeated scenario.The business analysts report that they experience performance issueswhen they run the monitoring queries.You troubleshoot the performance issues and discover that theintermediate tables generated when the analysts run the queries causepressure for the Java Virtual Machine (JVM) garbage collection per job.Which configuration settings should you modify to alleviate theperformance issues?A. spark.sql.inMemoryColumnarStorage.batchSizeB. spark.sql.broadcaseTimeoutC. spark.sql.files.openCostInBytesD. spark.sql.shuffle.partitionsAnswer: D 13.Note: This question is part of a series of questions that present thesame Scenario. Each question I the series contains a unique solutionthat might meet the stated goals. Some question sets might have morethan one correct solution while others might not have correct solution.You are implementing a batch processing solution by using AzureHDInsight.You plan to import 300 TB of data.You plan to use one job that has many concurrent tasks to import thedata in memory.You need to maximize the amount of concurrent tanks for the job.What should you do? page 11 / 18
You've reached the end of your free preview.
Want to read all 18 pages?
- Winter '20
- Data Mining, Hadoop, correct solution