Data analyst interviews at Indian companies in 2026 typically test SQL skills, Python basics, analytical thinking, and business acumen. Unlike software engineer interviews, DSA is rarely tested — but your ability to tell a story with data is scrutinized closely.
Our take: The data analyst interview is less about knowing every SQL function and more about showing you can think like an analyst. The case study rounds separate good from great — the candidates who nail them approach problems structurally, communicate assumptions clearly, and always relate back to business impact.
Technical Interview Questions
1. SQL and Databases
Question: Find the second highest salary from Employees table
Expected approaches:
- Using subquery with MAX
- Using LIMIT with OFFSET
- Using DENSE_RANK()
Sample answer:
SELECT MAX(salary) FROM Employees WHERE salary NOT IN (SELECT MAX(salary) FROM Employees);
Or using DENSE_RANK:
SELECT salary FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as rank FROM Employees) WHERE rank = 2;
Question: Write a query to find duplicate rows in a table
Expected approach:
- Use COUNT() with GROUP BY and HAVING
- Identify columns that define "duplicate"
Sample answer:
SELECT column1, column2, COUNT(*)
FROM table_name
GROUP BY column1, column2
HAVING COUNT(*) > 1;
2. Statistics and Probability
Question: Explain p-value
Expected answer: "P-value is the probability of observing results as extreme as the ones observed, assuming the null hypothesis is true. A low p-value (typically < 0.05) suggests that the null hypothesis is unlikely, leading to its rejection."
Question: What's the difference between Type I and Type II errors?
Answer:
- Type I error: False positive (rejecting true null hypothesis)
- Type II error: False negative (failing to reject false null hypothesis)
3. Data Analysis and Visualization
Question: How would you analyze sales data to identify trends?
Expected approach:
- Clean and preprocess data
- Exploratory data analysis (EDA)
- Time series analysis
- Segmentation analysis
- Visualization using tools like Tableau, Power BI, or Python libraries
Sample answer: "First, I'd ensure data quality by handling missing values and outliers. Then, I'd perform EDA to understand distributions and correlations. For trend analysis, I'd use time series decomposition to identify seasonal patterns. I'd segment data by product, region, and customer demographics to uncover deeper insights, then visualize findings using appropriate charts."
Coding Challenges
1. Data Cleaning in Python
Problem: Clean a dataset with missing values, outliers, and inconsistent formatting.
Expected steps:
- Handle missing values (imputation, deletion)
- Detect and handle outliers
- Standardize data formats
- Normalize/scaling
- Validate data quality
Sample code structure:
import pandas as pd
import numpy as np
def clean_data(df):
# Handle missing values
df.fillna(method='ffill', inplace=True)
# Detect outliers using IQR
Q1 = df['column'].quantile(0.25)
Q3 = df['column'].quantile(0.75)
IQR = Q3 - Q1
df = df[~((df['column'] < (Q1 - 1.5 * IQR)) | (df['column'] > (Q3 + 1.5 * IQR)))]
# Standardize formats
df['date'] = pd.to_datetime(df['date'], errors='coerce')
return df
2. A/B Test Analysis
Problem: Analyze results of an A/B test to determine if the new feature significantly improves conversion rates.
Expected approach:
- Check data distribution and sample size
- Perform hypothesis testing (chi-square test for proportions)
- Calculate confidence intervals
- Determine statistical significance
- Consider practical significance
Sample code structure:
from statsmodels.stats.proportion import proportions_ztest
# Assuming we have control and treatment groups
count_control = 500 # conversions in control
count_treatment = 600 # conversions in treatment
n_control = 10000 # total in control
n_treatment = 10000 # total in treatment
z_score, p_value = proportions_ztest([count_treatment, count_control],
[n_treatment, n_control])
Behavioral Questions
1. Tell me about a time you had to analyze complex data and present findings to non-technical stakeholders
Use STAR method:
- Situation: Describe the project context
- Task: What needed to be accomplished?
- Action: What steps did you take?
- Result: What was the outcome?
Sample answer: "Situation: I was tasked with analyzing customer churn data for a telecom company.
Task: I needed to identify key factors driving churn and present recommendations to the marketing team (non-technical).
Action: I performed cohort analysis, built a churn prediction model, and identified top reasons for churn. I created visualizations using Tableau to make the data accessible. I then prepared a presentation focusing on business implications rather than technical details.
Result: The marketing team implemented two of my recommendations, resulting in a 15% reduction in churn over the next quarter."
2. Describe a situation where you had to meet a tight deadline
What they're looking for: Time management, prioritization, stress handling.
Sample answer: "In my previous role, we had to analyze Q4 sales data and present insights to the CEO within 48 hours.
I broke down the task: data extraction (1 hour), cleaning (2 hours), analysis (3 hours), visualization (2 hours), report writing (1 hour). I prioritized the most critical metrics and used automated scripts for data cleaning. I also asked a colleague to help with visualization. We delivered the report on time, and the CEO used our insights for the annual strategy meeting."
3. How do you handle conflicting priorities from different stakeholders?
What they're looking for: Communication, negotiation, prioritization skills.
Sample answer: "I believe in transparent communication and data-driven decision-making. When stakeholders have conflicting priorities, I:
- Organize a meeting to understand each stakeholder's goals and constraints
- Analyze the data and present objective criteria for prioritization
- Facilitate a discussion to find common ground
- Document the decision-making process and communicate it clearly
In my last role, the sales and product teams had conflicting priorities. I created a prioritization framework based on revenue impact and implementation effort, which helped us make objective decisions."
SQL and Database Questions
1. Complex Query Writing
Question: Write a query to find customers who have made purchases in the last 30 days but haven't in the last 7 days.
Sample answer:
SELECT DISTINCT c.customer_id
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)
AND o.order_date < DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY);
2. Database Design
Question: Design a database schema for a social media platform.
Key entities:
- Users (user_id, name, email, created_at)
- Posts (post_id, user_id, content, timestamp)
- Comments (comment_id, post_id, user_id, content, timestamp)
- Likes (user_id, post_id, comment_id, timestamp)
- Friends (user_id1, user_id2, status)
Relationships:
- One-to-many: Users → Posts, Users → Comments
- Many-to-many: Users ↔ Users (friendships)
Considerations:
- Indexing for performance
- Normalization vs denormalization
- Handling large volumes of data
Statistics and Probability Questions
1. Explain confidence intervals
Expected answer: "A confidence interval gives a range of values within which we expect the true population parameter to lie, with a certain level of confidence (usually 95%). It quantifies the uncertainty in our sample estimate."
2. How would you determine sample size for an A/B test?
Expected approach:
- Base on expected effect size (minimum detectable difference)
- Set significance level (usually 5%)
- Set statistical power (usually 80%)
- Use power analysis formulas or calculators
Sample answer: "I'd use power analysis considering: baseline conversion rate, minimum detectable effect, significance level (α=0.05), and power (1-β=0.8). For example, if baseline conversion is 10% and we want to detect a 2% absolute increase, we'd need about 3,800 users per variant."
Data Visualization and Communication
1. How do you decide which visualization to use?
Expected answer:
- Trends over time: Line chart
- Comparisons: Bar chart
- Distributions: Histogram, box plot
- Relationships: Scatter plot, heatmap
- Composition: Pie chart, stacked bar chart
Key principle: Choose the simplest visualization that effectively communicates the insight.
2. Describe a time you presented complex data to non-technical stakeholders
Use STAR method:
Situation: "I analyzed website user behavior data and found that 40% of users dropped off at the checkout page."
Task: "I needed to present these findings to the marketing and design teams to justify a redesign."
Action: "I created a simplified dashboard with three key visualizations: user flow heatmap, drop-off point analysis, and A/B test results. I focused on business impact rather than technical details."
Result: "The team understood the problem and approved the redesign. After implementation, checkout completion increased by 15%."
Case Study Questions
1. "How many gas stations are there in the United States?"
Approach:
- Estimate number of cars (300 million × 85% = 255 million)
- Estimate stations per car (1 per 1000 cars = 255,000 stations)
- Adjust for rural vs urban distribution
- Consider alternative sources (trucks, motorcycles)
Expected: Demonstrate structured thinking, make reasonable assumptions, calculate step-by-step.
2. "How would you analyze the impact of a new feature on user engagement?"
Approach:
- Define metrics (DAU, MAU, session duration, etc.)
- Collect data before and after feature launch
- Use A/B testing if possible
- Control for external factors
- Statistical analysis (t-tests, regression)
- Present findings with visualizations
Technical Tools and Platforms
1. Python for Data Analysis
Question: How would you handle missing values in a dataset? Answer:
- Analyze missingness pattern (MCAR, MAR, MNAR)
- Options: deletion, imputation (mean/median/mode), model-based imputation
- Consider the impact on analysis results
Sample code:
import pandas as pd
import numpy as np
# Check missing values
print(df.isnull().sum())
# Handle missing values
df_filled = df.copy()
df_filled['numeric_column'] = df_filled['numeric_column'].fillna(df_filled['numeric_column'].median())
df_filled['categorical_column'] = df_filled['categorical_column'].fillna(df_filled['categorical_column'].mode()[0])
2. SQL Optimization
Question: How would you optimize a slow-running query? Answer:
- Check execution plan for bottlenecks
- Add indexes on filtered/joined columns
- Avoid SELECT *, use specific columns
- Reduce joins where possible
- Consider query restructuring
- Update statistics
Conclusion
Data analyst interviews test a combination of technical skills, business acumen, and communication abilities. The key to success:
- Master fundamentals (SQL, statistics, data visualization)
- Practice coding regularly on platforms like LeetCode (SQL and Python)
- Understand system design principles for data pipelines
- Prepare behavioral answers using STAR method
- Know your tools (languages, databases, visualization tools)
- Stay updated with industry trends
Remember: interviews are a two-way street. You're also evaluating the company. Ask thoughtful questions and ensure the role aligns with your career goals.
Need help with specific data analyst interview questions? Check out our guides on SQL interview questions, statistics for data science, and case study interviews.