Posts

Showing posts from October, 2016

Tableau integration with sparkSQL and basic data analysis with Tableau

Steps for Tableau integration with sparkSQL and basic data analysis: ================================================ Run the spark-Sql in NameNode[sparkSql server node] as: /opt/spark/sbin/start-thriftserver.sh   --hiveconf hive.server2.thrift.port=10001 Download & install  tableau-10 from the site:  https://www.tableau.com/products  [14-days trail version] Download & install tableau driver for spark-SQL:  https://downloads.tableau.com/drivers/mac/TableauDrivers.dmg Open tableau & connect to sparkSQL. Provide server as NameNode IP & port as 10001 [as in step-1 above] Select Type as ‘SparkThriftServer’ Select Authentication as  ‘Username and password’ Provide username as ‘hive’ [This is same as in hive-site.xml] Provide password as ‘hive@123’ [This is same as in hive-site.xml] Search & select the database name in ‘Select Schema’ dropdown. [This is the same parquet db sparkJobs created ] ...

How to connect SQL workbench to SparkSQL

Steps to setup sql workbench for accessing spark-sql datases: Start sparkSql on Namenode as:  /opt/spark/bin/spark-sql --verbose --master yarn --driver-memory 5G --executor-memory 5G --executor-cores 2 --num-executors 5 Download SQL workbench, for macOs download from:  http://www.sql-workbench.net/Workbench-Build117-MacJava7.tgz Extract the downloaded tgz file and launch SQLWorkbenchJ Copy the jar   /opt/spark/lib/spark-assembly-1.2.1-hadoop2.4.0.jar [Or, equivalent as per the hadoop version] from Namnode(spark-sql server) On SQLWorkbench, from menu go to file-> Manage drivers. Click on 'Create new entry' button on top left corner. Provide the driver name such as spark-sql_driver. In Library section, select the jar (needed for jdbc driver) copied from name node in step 3 above. In the classname section, click on the 'Search button'.  From the pop up window, select the driver 'org.apache.hive.jdbc.HiveDriver' and click 'Ok' From...