Friday, 26 July 2019

Top 100 Splunk Interview questions and answers

In this article, we will see important Splunk Interview Questions and Answers.

Here are the top 100 interview questions and answers on Splunk

What is Splunk and its uses?
Splunk is a software used for monitoring, searching, analyzing the machine data in real time. The adata source can be web application, sensors, devices, or user created data.

What are the components of Splunk?
The components of Splunk are:
(a) search head - GUI for searching
(b) Forwarder - forward data to indexer
(c) indexer - index machine data
(d) Deployment server - Manages splunk components  in distributed environment.

Briefly describe how Splunk works?
Splunk works by collecting, parsing, indexing and analyzing data. Data is collected by the forwarder from the source and forwarder forward the data to the indexer. On data stored in the indexer the search head searches, visualizes, analyzes and performs various functions.

What is Splunk forwarder?
Splunk forwarder is used to forward data to indexer.

What are the advantages of Splunk forwarder?
Splunk forwarder can throttle bandwidth and provide an encrypted SSL connection for transferring data from forwarder to indexer.

What are the types of forwarder?
There are two types:
Universal Forwarder
Heavy Weight Forwarder

What is Universal Forwarder?
In Universal forwarder, splunk agent is installed on non-splunk system to gather data locally but it can't parse or index data

What is Heavy Weight Forwarder?
Heavy weight forwarder is the full instance of splunk with advance functionality and it works as remote controller as well as intermediate forwarder and data filter.

What is the advantage of Splunk over other similar tools?
Splunk is a single integrated tool for machine data. It does all the role starting from performing IT operation, analyzing machine logs with providing business intelligence. There can be other tools in market but Splunk is the only tool that provides end-to-end data operation. You might need 3-4 tools individually for what Splunk is doing as a single software.

What are the configuration files of Splunk?
props.conf
indexes.conf
inputs.conf
transforms.conf
server.conf

What are the different licenses in Splunk?
There are 6 type of licenses in Splunk
Enterprise license
Free license
Forwarder license
Beta license
Licenses for search heads 
Licenses for cluster members

What is the limitation of free license in Splunk?
In free license we cannot authenticate and schedule searches, distribute search, forwarding in TCP/ Http and deployment management

What is the use of License Master in Splunk?
License Master controls how much data size we can index in a day. For example if we have 200 GB license model, then we can only index 200 GB of data in a day. So we should have the license for the maximum data size we are getting.

Suppose due to some reason License Master is unreachable, will the indexing stop?
Data search will stop if License Master is not reachable, however data will continue to indexed. You will get a warning on web UI or search head that you have exceeded the indexing volume. The indexing will not stop.

What is the use of DB Connect in Splunk?
Its a plugin to connect to generic SQL database and integrate with it.

What is the command for boot-start enable and disable?
To enable Splunk to boot-start, the command is:
$SLUNK_HOME/bin/Splunk
To disable Splunk to boot-start, the command is:
$SPLUNK_HOME/bin/Splunk

What is summary index in Splunk?
To boost the reporting efficiency, Summary indexes are used. Basically it enables user to generate report after processing huge volume of machine data.

What are the different types of Summary Index?
There are two types:
Default Summary Index - It is used by Splunk Enterprise by default in case no other summary index are specified.
Additional Summary Index - To enable running varieties of reports, additional summary index is used.

What is the default field for events?
The five default fields are 
source, 
host, 
source type, 
index,
timestamp

How can we restart Splunk?
Splunk can be restarted from the Splunk Web. The steps are
1.Go to System, navigate to Server Controls.
2.Click on Restart Splunk.

How to search multiple ips in splunk?
Using lookup tables, we can search multiple IP addresses 

What is the most efficient way to filter events in splunk?
The most efficient way to filter events in Splunk is by time / duration.

How can we reset Splunk password?
To reset the password, access to the file where Splunk is running is necessary. Then perform the following steps:
Move $SPLUNK_HOME/etc/passwd file to $SPLUNK_HOME/etc/passwd.bak
Restart Splunk and log in with default username and password i.e. admin/changeme.
Reset the password and combine the password file with the backup file.

What is sourcetype?
Sourcetype in Splunk is a default data field.Sourcetype is the format of the data that shows its origin. for eg, take .evt files,  it originate from the event viewer. The classification of the incoming data can be done based on service, system, format and character code. The common source types are apache_error, websphere_core, apache_error and cisco_syslog.  What it does is processes and distributes incoming data into different events. 

How to use two sourcetypes in splunk? 
I would like to give usecase on how to search 2 sourcetpes in a lookup file
sourcetype=X OR sourcetype=Y | lookup country.csv
Using this code, sourcetypes X and Y can be searched in a lookup file.

What is kv store in splunk?
KV stands for key value that allows to store and obtain data inside Splunk. The KV store has the following functions:
(a) To manage a job queue.
(b) For storing metadata by the user.
(c) Analysing the workflow.
Storing the user application state required for handling a UI session. To store the results of the search queries in Splunk. Maintaining a list of environment assets and checkpoint data.

What is deployer in Splunk? 
A deployer is used to deploy configuration information and apps to the cluster head. The set of configuration details such as updates that the deployer sends is called configuration bundle. The deployer shares this bundle when a new cluster member joins the cluster. It handles the basic app configurations and user configurations. 
However, the latest states cannot be restored to the members of the cluster.

Which roles can create data models in Splunk?
Data models can be created through admin or power roles by the users. For other users, these models can only be created if they have the write access to the application. The permissions based on the roles determine whether a user can edit or view them.

When to use auto_high_volume in Splunk?
auto_high_volume is used when the indexes are of very high volume. A high volume index can get over 10GB of data.

What are the Splunk alternatives?
logstash, 
Loggly, 
Loglogic,
sumo logic

How to restart splunk webserver and daemon?
To restart webserver: splunk start splunkweb
To restart daemon: splunk start splunkd

How to clear Splunk search history?
we need to delete searches.log from this path
$splunk_home/var/log/splunk/searches.log

What is fishbucket in Splunk?
Its a directory or index at default location /opt/splunk/var/lib/splunk .It contains seek pointers and CRCs for the files you
are indexing, so splunkd can tell if it has read them already.We can access it through GUI by searching for  “index=_thefishbucket”

Which commands are used in the reporting results category?
  • Top 
  • Rare 
  • stats
  • Chart 
  • Timechart 

What is the use of stat command?
Stat reports data in tabular format and multiple fields is used to build table.

What is the use of chart command?
As name indicates, chart is used to display data in bar, line or area graph. It takes 2 fields to display chart.

What is the use of timechart?
Timechart is used to display data on timeline. It just takes 1 field as the other field is by default is time field.

How to disable the Splunk boot start?
$SPLUNK_HOME/bin/Splunkdisable boot-start

How to disable the Splunk launch message?
we can disable Splunk launch messabe by adding this in splunk_launch.conf
Set valueOFFENSIVE=Less in splunk_launch.conf

What is difference between Splunk app and Splunk Add-on?
Splunk app has GUI configuration whereas Splunk app doesnt have it (only command line)

In Splunk cluster, how to offline a peer?
Using command Splunk offline, we can offline a peer

What are the different categories in SPL command?
SPL command has five major categories:
Sorting Results, Filtering Results, Grouping Results, Filtering, Modifying and Adding Fields and Reporting Results.

How to specify minimum disk usage in splunk
Using the following commands we can set minimum disk usage:
/opt/splunk/bin/splunk set minfreemb = 20000
It requires restart, so
/opt/splunk/bin/splunk restart

Do you know what is SOS in context of Splunk?
Yes, SOS stands for Splunk on Splunk. Its a type of splunk app which provides graphical interface of Splunk performance and issues.

What does Lookup commands do?
It adds fields based while identifying the value in the event, referencing a lookup table and while adding up the fields in the matching rows in the lookup table of the event. 

What does input lookup command do?
It returns the whole lookup table as the search results.

What does the output lookup command do?
It outputs the current search results to a lookup table on the disk.

What does the sort command do in Splunk?
As name explains, it sorts the search results by the use of specified fields.
Here is the syntax:
Sort[<count>] <sort-by-clause>... [desc]

What is transaction Command and how does it works?
The transaction command is helpful in  two specific scenarios:
As we know, unique id (from one or more fields) alone is not enough to differentiate between two transactions. This might be the use case when the identifier is reused, for example web sessions identified by cookie/client IP. In this scenario, time span or pauses are also used to segment the data into transactions. In other cases when an identifier is reused, say in DHCP logs, a particular message may identify the beginning or end of a transaction. When it is desirable to see the raw text of the events combined rather than analysis on the constituent fields of the events.

How To Troubleshoot Splunk Performance Issues ?
Well, we can start from here: First I would like to check splunkd.log to trace any error. If all is fine then I will check server / vm performance issue (i.e. cpu / memory / storage IO etc) and lastly install Splunk on Splunk which provides GUI where we can check any performance issues.

How to create a new app from template in Splunk?
Go to dir /opt/splunk/bin/splunk 
create app New_App -template app1

What Is Dispatch Directory ?
$SPLUNK_HOME/var/run/splunk/dispatch contains a directory for each search that is running or has completed. For example, a directory named 1434308973.367 will contain a CSV file of its search results, a search.log with details about the search execution, and other stuff. Using the defaults (which you can override in limits.conf), these directories will be deleted 10 minutes after the search completes – unless the user saves the search results, in which case the results will be deleted after 7 days. 

What Is Difference Between Search Head Pooling And Search Head Clustering?
Both are features provided splunk for high availability of splunk search head in case any one search head goes down.Search head cluster is newly introduced and search head pooling will be removed in next upcoming versions.Search head cluster is managed by captain and captain controls its slaves.Search head cluster is more reliable and efficient than search head pooling.

What is null queue in Splunk?
Null queue used to trim out all the data that is unwanted.

List different types of search modes supported in splunk?
There are three modes:
  • Fast mode
  • Smart mode
  • Verbose mode

What is btool in Splunk?
Splunk btool is a command line tool to troubleshoot configuration file issues. It is also used to  see what values are being used by your Splunk Enterprise installation in existing environment.

How to use btool?
Command: /opt/splunk/bin/splunk btool input list

How to rollback your splunk web configuration bundle to last version?
Here is the command
/opt/splunk/bin/splunk rollback cluster-bundle

How to change port in Splunk?
/opt/splunk/bin/splunkset web-port <port_number>

What Is Map-reduce Algorithm?
Map-reduce algorithm is inspired by map and reduce funtionality and used for batch based large scale parallelization.

Where does Splunk default configuration file located?
Default configuration file is located in  $Splunkhome/etc/system/default

What is lookup command used for?
Lookup command is used for referencing fields from an external csv file that matches fields in your event data.

How Splunk Avoids Duplicate Indexing Of Logs ?
This is done by keeping track of indexed events in a directory called fish buckets and contains seek pointers and CRCs for indexed files. This way it can check whether it has been indexed or not and avoid duplicate index.

That's all for Splunk Interview questions with answers. If you have any questions, please mention in comments. Thanks!!!

Related Articles:
You may also like:

Saturday, 13 July 2019

Introduction to Docker Compose

In this post, we will see how to manager multiple container lifecycle using docker-compose. Let us understand what is docker compose.

Introduction

Docker compose is a tool provided by docker to manage multiple containers within a single host. So in case you want to create / start  / stop/ remove / scale up / scale down multiple containers from a single command, docker compose comes handy. But if you are looking for managing multiple containers within multiple host, consider using docker swarm / kubernetes. Docker compose manages container lifecycle within a single host. If your containerized application is hosted on multiple node in non-clustered mode, then you need to use another copy of docker-compose on another host. However I would suggest to look for docker swarm or kubernetes for that to manage it from single point.

What docker compose can do for you?

Docker compose can automate your container deployment, re-deployment, undeployment. It is not a tool to solely create docker image (docker build used to create docker image)

Installation of Docker Compose
If you are using Windows or Mac, docker compose is already installed as it comes in Docker toolbox. But in case of Linux, We need to first install docker compose.

YAML Configuration file
Docker compose provides a configuration file docker-compose.yml where in we need to write yaml script to manager container lifecycle.  Here is the simple example of docker-compose.yml with instruction

Docker-compose Example
Docker Compose Example

Let me explain further on it.
version : It indicates compose version number
services: This indicates docker-compose that below is the list of services that needs to be containerized
<service-name>: Name of the service for reference purpose
build: Path to docker file from where image to be build to be used to create container
ports: port mapping from host to container
volumes: volume mapping from host to container
image: image name to be used to create container

Docker compose commands:

So you now have docker-compose.yml and want to manage lifecycle of containers through it. We can create, start, stop, destroy, scale using the same docker-compose.yml file. 
Here is the list of commands:
  • docker-compose up - It creates container (if required) and also run the container. Use it with this option ( -d ) to run this daemon in background
  • docker-compose down - Just opposite of up command, it stops all the containers and also removes them.
  • docker-compose start - It starts the container. Please note that if the container does not exist, it will not create a new container. It just starts the stopped container listed in docker-compose.yml
  • docker-compose stop - It stops the running container. It goes through each service mentioned in docker-compose.yml and tries to stop the started container.
  • docker-compose rm - It deletes the stopped container. use it with -f to force delete the container.
  • docker-compose scale - It set the number of containers per service. We can both scale up and scale down the number of containers per service using the same command
  • docker-compose exec - It run the command inside the container. You need to pass container id along with command. For eg. docker-compose exec -i <container_id> ls / home
  • docker-compose pause - It pause the services. This is different from stop in the way that it is like sleeping for sometime and when resume it continues from the point where it is paused. Stop service will kill the container running thread and it will start from the scratch. For example, you have application that prints 2 line of statement. if you pause after printing first line of statement, unpause will continue from there and it will print the second line. In case of stop after first line and restart, it will again print both first and second line of statement
  • docker-compose unpause - Resumes the paused services.
  • docker-compose port - It prints the public port for port binding
  • docker-compose build - It build or rebuild services
  • docker-compose bundle - It generates a docker bundle from compose file
  • docker-compose config - It validates the docker compose file
  • docker-compose create - It creates the services
  • docker-compose images - It list the docker images
  • docker-compose logs - View container output
  • docker-compose top - It views the running container
  • docker-compose version - It displays the version of docker compose installed on the host
  • docker-compose help - It gets help on a command
Thats all for short introduction to docker compose. If you are using Ansible, you can manage containers lifecycle using docker_compose module. For any query, please mention in comments section. Thanks!!