How to access & use sparkSQL via PySpark in spark1.5?
To access the sparkSQL in spark1.5, follow just following steps: 1. Import the Spark Context and hive context from pyspark import SparkContext, SparkConf from pyspark.sql import HiveContext from pyspark.sql.functions import col 2. Set the application name and configurations [This is mandatory only if you are running your code in yarn-client mode] appName = "SqlPyspark" conf = SparkConf().setAppName(appName) conf.setExecutorEnv('PYTHONPATH', '/opt/spark/python:/opt/spark/python/lib/py4j-0.8.2.1-src.zip') 3. Create spark and Hive contexts: sc = SparkContext(conf=conf) hc = HiveContext(sc) 4. Now use hive context to access database and perform any operations: hc.sql(“show databases“) 5. If you wish to compile all above in a python file , then run the following command to access/operate on sparkSQL: /opt/spark/bin/spark-submit --master yarn --deploy-mode client --py-files [Other py files if any]...