Posts

Showing posts from July, 2018

Automatically Create timelions/visualizations and dashboards in Kibana-6.0+ with Python

While using the metric beat with elasticsearch and Kibana for performance metrics analysis, it's really tedious to create visualisations and dashboards in Kibana. It's great to automate the stuff with Python using Kibana Rest APIs.  Here is my rough automation in python: https://github.com/Indu-sharma/timelion-dashboard-kibana-python   

PySpark : Cheat-sheet

https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PySpark_Cheat_Sheet_Python.pdf

Performance monitoring Tools for Big data

Performance monitoring and sizing at scale in the big data ecosystem is a real challenge.  Here are a few of the tools to use: 1.  Metric beat : Run the metric beat on each of the cluster nodes, and visualise the stats using Elasticsearch/Kibana.  https://www.elastic.co/guide/en/beats/metricbeat/current/index.html This is good for many components such as Docker, Kubernetes, KVM, Elasticsearch, Kafka, Logstash and many more components. 2.  Dr.Element: This is mainly for performance monitoring and tuning of Hadoop cluster and spark jobs: https://github.com/linkedin/dr-elephant 3. ElasticHQ/ Rally Monitor the elasticsearch Indexing and query performance at scale: http://www.elastichq.org/index.html Rally for sizing ES: https://www.elastic.co/blog/announcing-rally-benchmarking-for-elasticsearch 4. Sparklens from Qubole For profiling and sizing of spark jobs alone sparklens from Qubole is a good choice too : https://github.com/qubole/sparklens 5. Linux OS tools: You ca

ElasticSearch: Cheat-sheet

# Elasticsearch Cheatsheet - an overview of commonly used Elasticsearch API commands # cat paths /_cat/allocation /_cat/shards /_cat/shards/{index} /_cat/master /_cat/nodes /_cat/indices /_cat/indices/{index} /_cat/segments /_cat/segments/{index} /_cat/count /_cat/count/{index} /_cat/recovery /_cat/recovery/{index} /_cat/health /_cat/pending_tasks /_cat/aliases /_cat/aliases/{alias} /_cat/thread_pool /_cat/plugins /_cat/fielddata /_cat/fielddata/{fields} # Important Things bin/elasticsearch                                                       # Start Elastic instance curl -X GET  'http://localhost:9200/?pretty=true'                       # View instance metadata curl -X POST 'http://localhost:9200/_shutdown'                          # Shutdown Elastic instance curl -X GET 'http://localhost:9200/_cat?pretty=true'                    # List all admin methods curl -X GET 'http://localhost:9200/_cat/indices?pretty=true'       

Elasticsearch : How to do Performance Testing?

Step-1: First perform index performance testing by ingesting data with following settings applied: "refresh_interval" : "-1" "number_of_replicas" : 0,  "merge.scheduler.max_thread_count" : 1, "translog.flush_threshold_size" : "1024mb", "translog.durability" : "async" "thread_pool.bulk.queue_size": 1000 "bootstrap.memlockall": True Step-2: Scale data and Elastic nodes, JVM heap memory & ingest data and measure indexing performance. At T1:  curl -XGET http://localhost:9200/ /_stats/indexing?pretty=true | grep -Ei 'index_total|index_time_in_millis' At T2:   curl -XGET http://localhost:9200/ /_stats/indexing?pretty=true | grep -Ei 'index_total|index_time_in_millis' Indexing rate = 1000(index_total(at T2) - index_total(at T1)) /  (index_time_in_millis(at T2) - index_time_in_millis(at T1)) Step-3: Use benchmarking tools such as Rally https://esrall