Posts

Showing posts from 2017

Basic Workflows/Algorithms for machine Learning in Text Processing

Web Extraction:  Apache Tika (or Any Custom Crawler) Concept Extraction      Sentence Detection (maxent-3.0.0.jar, opennlp-tools-1.5.3.jar)      NER(Named Entity Recognition)      Concept Extraction (snowball-stemmer-1.3.0.581.1.jar)      Multi phrase to list phrase Concept Filtering    Zipf Filtering   ChiSq Filtering   Low frequency Filtering   Signal Filtering String Indexer:     StringIndexer (spark-mlib_2.10-2.1.0.jar) Hashing TF     HashingTF (spark-mlib_2.10-2.1.0.jar)     IDF (spark-mlib_2.10-2.1.0.jar) Classifier for Supervised Classification Algorithms:    [Train & Predict] . Naive Bayes   . Support Vector Machines (C Value, Gamma Value & Karnel) . Decision Trees    (Entropy, Min-Sample-Slits, Impurity, Information Gain) . K Nearest Neighbors . Random Forest...

Kafka, Python and Spark streaming

Image

TCPDUMP :: Capture N pcap files for N Hour

tcpdump -G 300 -W 10 -w 'BR_tcpdump_%Y%m%d-%H%M%S.pcap' -i eth0:0 'port 2055'  -Z admin ; tar -cvzf tcpdump.tar.gz BR_tcpdump_* cd /tmp/;tcpdump -G 300 -W 12 -w 'BR_tcpdump_%Y%m%d-%H%M%S.pcap' -i eth0:0 'port 2055' -Z admin -z gzip & disown

Where all my memory went on my Linux box ?

1. Check the Used memory with free command: [admin@BIZ-UPGRADE-NN1 ~]# free -g              total       used       free     shared    buffers     cached Mem:            44         17         26          0          0          8 -/+ buffers/cache:          8         35 Swap:           20          0         19 2.  Check the Top 20 Processes consuming memory:  ps -eo pmem,pcpu,vsize,pid,cmd | sort -k 1 -nr | head -20  => Sort by Resident  memory & ps -eo pmem,pcpu,vsize,pid,cmd | sort -k 3 -nr | head -20  => Sort by Virtual memory

MP-BGP peering Sample network topology and configurations

Image
Let's consider a simple Toplogy diagram as shown below: Configuration of R1: =============== hostname R1 ! ip cef ! ! interface Loopback0  ip address 10.10.10.10 255.255.255.0 ! interface Serial1/0  ip address 20.16.10.1 255.255.255.0  mpls ip  clock rate 2000000 ! router bgp 5500  no synchronization  bgp router-id 10.10.10.10  bgp log-neighbor-changes  redistribute connected  neighbor 20.16.10.2 remote-as 6500  neighbor 20.16.10.2 soft-reconfiguration inbound  no auto-summary  !  address-family vpnv4   neighbor 20.16.10.2 activate   neighbor 20.16.10.2 send-community both !--- Sends the community attribute to a BGP neighbor.  exit-address-family ! ! end Configuration of R2: =============== ! hostname R2 ! ip cef ! ip vrf WAN  rd 2020:1  route-target export 2020:1  route-target import 2020:1 ! interface Loopback0  ip vrf forwar...

How to access & use sparkSQL via PySpark in spark1.5?

To access the sparkSQL in spark1.5, follow just following steps: 1. Import the Spark Context and hive context  from pyspark import SparkContext, SparkConf from pyspark.sql import HiveContext from pyspark.sql.functions import col 2. Set the application name and configurations [This is  mandatory only if you are running your code in yarn-client mode] appName = "SqlPyspark" conf = SparkConf().setAppName(appName) conf.setExecutorEnv('PYTHONPATH', '/opt/spark/python:/opt/spark/python/lib/py4j-0.8.2.1-src.zip') 3. Create spark and Hive contexts:  sc = SparkContext(conf=conf) hc = HiveContext(sc) 4. Now use hive context to access database and perform any operations: hc.sql(“show databases“) 5. If you wish to compile all above in a python file , then run the following command to access/operate on sparkSQL: /opt/spark/bin/spark-submit --master yarn  --deploy-mode client  --py-files [Other py files if any]...

How to know the Wifi Password in Windows/Linux/MAC OS:

Windows OS:          Open the command terminal as : cmd           Run the following command:          netsh wlan show profile name=[wifi-name]  key=clear           Now, under the Security Settings section, Value 'Key Content' is the Wifi Password.        2.  MAC OS:                      Open the terminal & fire the following command:           security find-generic-password -wa [wifi-name]           The result displayed in plain text is the Wifi Password.       3. Linux OS:           Open the terminal and run following command          cat /etc/NetworkManager/system-connections/[wifi-name] | grep psk=