Networking, Python, BigData and Linux

Posts

Showing posts from May, 2016

Important points on Apache spark

- May 11, 2016

Spark Basics: Iterative programs (ML) Spark Scala(Functional programming language) has: Immutability, Lazy transformation(Execution before evaluation), type inferred, Because of Immutability, we can Cache & distribute. RDD is a big collection of data structure. RDD is big data collection of with properties: immutable,distributed & lazy evaluation,time Inference, resilient(Fault-tolerant) & cacheable. Spark Remembers all its transformation, Transformation doest apply any action. So it has multiple copies.(Bad!!!!!!!!! ) Scala code runs on top of JVM. Spark-shell is interactive. var means immutable. but immutable value can be mapped to new value. Interactive Queries Real Time & batch processing unified Good use of resources (Multi-core),Network speed,disk Velocity is as much important as Volume. Realtime processing is as much important as Batch processing Existing map-reduce has tightly coupled with API. Spark makes use of Hadoop distributed storage....