Data Analyst Interview Questions and Answers - Best Answers & Tips (2026)

Interview prep is often treated like a last-minute checklist, but one unclear answer can undo months of good experience. Data Analyst Interview Questions and Answers matters because it helps interviewers see your judgment, not just your resume. The data analyst interview is less about knowing every SQL function and more about showing you can think like an analyst. The case study rounds separate good from great, and the candidates who nail them approach problems structurally, communicate assumptions clearly, and always relate back to business impact.

Technical Interview Questions

1. SQL and Databases

Question: Find the second highest salary from Employees table

Expected approaches: Using subquery with MAX, using LIMIT with OFFSET, or using DENSE_RANK.

Sample answer:

SELECT MAX([salary](/salary/cost-of-living-mumbai-fresher-salary)) FROM Employees WHERE salary NOT IN (SELECT MAX(salary) FROM Employees);

Or using DENSE_RANK:

SELECT salary FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as rank FROM Employees) WHERE rank = 2;

Question: Write a query to find duplicate rows in a table

Expected approach: Use COUNT with GROUP BY and HAVING, and identify columns that define a duplicate.

Sample answer:

SELECT column1, column2, COUNT(*) 
FROM table_name 
GROUP BY column1, column2 
HAVING COUNT(*) > 1;

2. Statistics and Probability

Question: Explain p-value

Expected answer: The p-value is the probability of observing results as extreme as the ones observed, assuming the null hypothesis is true. A low p-value, typically less than 0.05, suggests that the null hypothesis is unlikely, leading to its rejection.

Question: What's the difference between Type I and Type II errors?

Answer: A Type I error is a false positive, meaning you reject a true null hypothesis. A Type II error is a false negative, meaning you fail to reject a false null hypothesis.

3. Data Analysis and Visualization

Question: How would you analyze sales data to identify trends?

Expected approach: Clean and preprocess the data, perform exploratory data analysis, apply time series analysis, segment the data, and visualize findings using tools like Tableau, Power BI, or Python libraries.

Sample answer: "First, I'd ensure data quality by handling missing values and outliers. Then, I'd perform EDA to understand distributions and correlations. For trend analysis, I'd use time series decomposition to identify seasonal patterns. I'd segment data by product, region, and customer demographics to uncover deeper insights, then visualize findings using appropriate charts."

Coding Challenges

1. Data Cleaning in Python

Problem: Clean a dataset with missing values, outliers, and inconsistent formatting.

Expected steps: Handle missing values through imputation or deletion, detect and handle outliers, standardize data formats, normalize or scale the data, and validate data quality.

Sample code structure:

import pandas as pd
import numpy as np

def clean_data(df):
    # Handle missing values
    df.fillna(method='ffill', inplace=True)
    
    # Detect outliers using IQR
    Q1 = df['column'].quantile(0.25)
    Q3 = df['column'].quantile(0.75)
    IQR = Q3 - Q1
    df = df[~((df['column'] < (Q1 - 1.5 * IQR)) | (df['column'] > (Q3 + 1.5 * IQR)))]
    
    # Standardize formats
    df['date'] = pd.to_datetime(df['date'], errors='coerce')
    
    return df

2. A/B Test Analysis

Problem: Analyze results of an A/B test to determine if the new feature significantly improves conversion rates.

Expected approach: Check data distribution and sample size, perform hypothesis testing using a chi-square test for proportions, calculate confidence intervals, determine statistical significance, and consider practical significance.

Sample code structure:

from statsmodels.stats.proportion import proportions_ztest

## Assuming we have control and treatment groups
count_control = 500  # conversions in control
count_treatment = 600  # conversions in treatment
n_control = 10000  # total in control
n_treatment = 10000  # total in treatment

z_score, p_value = proportions_ztest([count_treatment, count_control], 
                                     [n_treatment, n_control])

Behavioral Questions

1. Tell me about a time you had to analyze complex data and present findings to non-technical stakeholders

Use STAR method: Describe the project context for the Situation, what needed to be accomplished for the Task, what steps you took for the Action, and what the outcome was for the Result.

Sample answer: "Situation: I was tasked with analyzing customer churn data for a telecom company.

Task: I needed to identify key factors driving churn and present recommendations to the marketing team, who were non-technical.

Action: I performed cohort analysis, built a churn prediction model, and identified top reasons for churn. I created visualizations using Tableau to make the data accessible. I then prepared a presentation focusing on business implications rather than technical details.

Result: The marketing team implemented two of my recommendations, resulting in a 15% reduction in churn over the next quarter."

2. Describe a situation where you had to meet a tight deadline

What they're looking for: Time management, prioritization, stress handling.

Sample answer: "In my previous role, we had to analyze Q4 sales data and present insights to the CEO within 48 hours. I broke down the task into data extraction taking 1 hour, cleaning taking 2 hours, analysis taking 3 hours, visualization taking 2 hours, and report writing taking 1 hour. I prioritized the most critical metrics and used automated scripts for data cleaning. I also asked a colleague to help with visualization. We delivered the report on time, and the CEO used our insights for the annual strategy meeting."

3. How do you handle conflicting priorities from different stakeholders?

What they're looking for: Communication, negotiation, prioritization skills.

Sample answer: "I believe in transparent communication and data-driven decision-making. When stakeholders have conflicting priorities, I organize a meeting to understand each stakeholder's goals and constraints, analyze the data and present objective criteria for prioritization, facilitate a discussion to find common ground, and document the decision-making process and communicate it clearly. In my last role, the sales and product teams had conflicting priorities. I created a prioritization framework based on revenue impact and implementation effort, which helped us make objective decisions."

SQL and Database Questions

1. Complex Query Writing

Question: Write a query to find customers who have made purchases in the last 30 days but haven't in the last 7 days.

Sample answer:

SELECT DISTINCT c.customer_id
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)
AND o.order_date < DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY);

2. Database Design

Question: Design a database schema for a social media platform.

Key entities: Users with user_id, name, email, and created_at; Posts with post_id, user_id, content, and timestamp; Comments with comment_id, post_id, user_id, content, and timestamp; Likes with user_id, post_id, comment_id, and timestamp; and Friends with user_id1, user_id2, and status.

Relationships: There is a one-to-many relationship from Users to Posts and from Users to Comments. There is a many-to-many relationship between Users for friendships.

Considerations: Indexing for performance, normalization versus denormalization, and handling large volumes of data.

Statistics and Probability Questions

1. Explain confidence intervals

Expected answer: A confidence interval gives a range of values within which we expect the true population parameter to lie, with a certain level of confidence, usually 95%. It quantifies the uncertainty in our sample estimate.

2. How would you determine sample size for an A/B test?

Expected approach: Base it on the expected effect size or minimum detectable difference, set the significance level at usually 5%, set the statistical power at usually 80%, and use power analysis formulas or calculators. For example, if the baseline conversion is 10% and you want to detect a 2% absolute increase, you would need about 3,800 users per variant.

Data Visualization and Communication

1. How do you decide which visualization to use?

Expected answer: Use a line chart for trends over time, a bar chart for comparisons, a histogram or box plot for distributions, a scatter plot or heatmap for relationships, and a pie chart or stacked bar chart for composition. The key principle is to choose the simplest visualization that effectively communicates the insight.

2. Describe a time you presented complex data to non-technical stakeholders

Use STAR method:

Situation: "I analyzed website user behavior data and found that 40% of users dropped off at the checkout page."

Task: "I needed to present these findings to the marketing and design teams to justify a redesign."

Action: "I created a simplified dashboard with three key visualizations: a user flow heatmap, drop-off point analysis, and A/B test results. I focused on business impact rather than technical details."

Result: "The team understood the problem and approved the redesign. After implementation, checkout completion increased by 15%."

Case Study Questions

1. "How many gas stations are there in the United States?"

Approach: Estimate the number of cars at 300 million times 85%, which equals 255 million. Estimate stations per car at 1 per 1,000 cars, giving 255,000 stations. Adjust for rural versus urban distribution and consider alternative sources like trucks and motorcycles. The goal is to demonstrate structured thinking, make reasonable assumptions, and calculate step-by-step.

2. "How would you analyze the impact of a new feature on user engagement?"

Approach: Define metrics like DAU, MAU, and session duration. Collect data before and after the feature launch. Use A/B testing if possible. Control for external factors. Apply statistical analysis using t-tests or regression. Present findings with visualizations.

Technical Tools and Platforms

1. Python for Data Analysis

Question: How would you handle missing values in a dataset? Answer: Analyze the missingness pattern to determine if it is MCAR, MAR, or MNAR. Options include deletion, imputation using mean, median, or mode, and model-based imputation. Consider the impact on analysis results.

Sample code:

import pandas as pd
import numpy as np

## Check missing values
print(df.isnull().sum())

## Handle missing values
df_filled = df.copy()
df_filled['numeric_column'] = df_filled['numeric_column'].fillna(df_filled['numeric_column'].median())
df_filled['categorical_column'] = df_filled['categorical_column'].fillna(df_filled['categorical_column'].mode()[0])

2. SQL Optimization

Question: How would you optimize a slow-running query? Answer: Check the execution plan for bottlenecks. Add indexes on filtered and joined columns. Avoid SELECT star and use specific columns instead. Reduce joins where possible. Consider query restructuring. Update statistics.

Conclusion

Data analyst interviews test a combination of technical skills, business acumen, and communication abilities. The key to success is mastering fundamentals including SQL, statistics, and data visualization; practicing coding regularly on platforms like LeetCode for both SQL and Python; understanding system design principles for data pipelines; preparing behavioral answers using the STAR method; knowing your tools including languages, databases, and visualization tools; and staying updated with industry trends. Remember that interviews are a two-way street, and you are also evaluating the company. Ask thoughtful questions and ensure the role aligns with your career goals.

Need help with specific data analyst interview questions? Check out our guides on SQL interview questions, statistics for data science, and case study interviews.

Your Move

Record three answers using the STAR method: situation, task, action, result.
Replay each answer and check whether it is specific, concise, and tied to the role you want.
Rewrite the weakest answer, then practice it once more without reading notes.

Technical Interview Questions

1. SQL and Databases

Question: Find the second highest salary from Employees table

Question: Write a query to find duplicate rows in a table

2. Statistics and Probability

Question: Explain p-value

Question: What's the difference between Type I and Type II errors?

3. Data Analysis and Visualization

Question: How would you analyze sales data to identify trends?

Coding Challenges

1. Data Cleaning in Python

2. A/B Test Analysis

Behavioral Questions

1. Tell me about a time you had to analyze complex data and present findings to non-technical stakeholders

2. Describe a situation where you had to meet a tight deadline

3. How do you handle conflicting priorities from different stakeholders?

SQL and Database Questions

1. Complex Query Writing

2. Database Design

Statistics and Probability Questions

1. Explain confidence intervals

2. How would you determine sample size for an A/B test?

Data Visualization and Communication

1. How do you decide which visualization to use?

2. Describe a time you presented complex data to non-technical stakeholders

Case Study Questions

1. "How many gas stations are there in the United States?"

2. "How would you analyze the impact of a new feature on user engagement?"

Technical Tools and Platforms

1. Python for Data Analysis

2. SQL Optimization

Conclusion

Your Move

Interview Preparation Checklist

Related Articles

The Ultimate Interview Guide for Freshers in India (2026)

Fresher Interview Questions and Answers (2026): Complete Guide

Top SQL Interview Questions and Answers (2026)

More in Interview Prep

The Ultimate Interview Guide for Freshers in India (2026)

Fresher Interview Questions and Answers (2026): Complete Guide

Top SQL Interview Questions and Answers (2026)