Steps:
- Copy package com.ibm.SparkTC:spark-netezza_2.10:0.1.1 to <spark home> /bin in mapR https://mvnrepository.com/artifact/com.ibm.SparkTC/spark-netezza_2.10/0.1.1
- Copy nzjdbc3.jar netezza client install to mapR ( copied to <spark home>/bin)
- Set the driver-class-path and jar from spark-shell
- Created a sample JDBC program to load data from a netezza table to spark dataframe
Run Command:
(Here spark home is "/opt/mapr/spark/spark-2.0.1")
cd <spark home>/bin
bash-4.2$ ./spark-shell
--packages com.ibm.SparkTC:spark-netezza_2.10:0.1.1 --driver-class-path
/opt/mapr/spark/spark-2.0.1/bin/nzjdbc3.jar --jars
/opt/mapr/spark/spark-2.0.1/bin/nzjdbc3.jar
scala>
Connect to netezza DB
Set connection parameters:
scala>
val nzoptions = Map("url"
-> "jdbc:netezza://<netezza server name>:5480/<db name>",
|
"user" -> "<user name>",
|
"password" -> "<password>",
|
"dbtable" -> "<Table name>",
|
"numPartitions" -> "8")
Load table data to dataframe:
scala> val Testdf =
spark.sqlContext.read.format("com.ibm.spark.netezza").options(nzoptions).load()
Print Schema of the Table:
scala>Testdf.printSchema
root
|– FNAME: string (nullable = true)
|– LANAME: string (nullable = true)
Show Table content:
scala>Testdf.show
Show Table content:
scala>Testdf.show