Thursday, July 27, 2017

Connect Apache Spark in MapR to Netezza



Steps:


 

  1. Copy package com.ibm.SparkTC:spark-netezza_2.10:0.1.1 to <spark home> /bin in mapR https://mvnrepository.com/artifact/com.ibm.SparkTC/spark-netezza_2.10/0.1.1
  2. Copy nzjdbc3.jar netezza client install to mapR ( copied to <spark home>/bin)
  3. Set the driver-class-path and jar from spark-shell
  4. Created a sample JDBC program to  load data from a netezza table to spark dataframe
Run Command:
(Here spark home is "/opt/mapr/spark/spark-2.0.1")

 
cd   <spark home>/bin
 

bash-4.2$ ./spark-shell --packages com.ibm.SparkTC:spark-netezza_2.10:0.1.1 --driver-class-path /opt/mapr/spark/spark-2.0.1/bin/nzjdbc3.jar --jars /opt/mapr/spark/spark-2.0.1/bin/nzjdbc3.jar


 
scala>

 

Connect to netezza DB


 
Set connection parameters:
 

scala> val nzoptions = Map("url" -> "jdbc:netezza://<netezza server name>:5480/<db name>",


 
     |  "user" -> "<user name>",

 
     |  "password" -> "<password>",

 
     |  "dbtable" -> "<Table name>",

 
     |  "numPartitions" -> "8")

 



Load table data to dataframe:

 



scala> val Testdf = spark.sqlContext.read.format("com.ibm.spark.netezza").options(nzoptions).load()


 
Print Schema of the Table:

 
scala>Testdf.printSchema
root
 |– FNAME: string (nullable = true)
 |– LANAME: string (nullable = true)


Show Table content:


scala>Testdf.show