20 SQL Queries Every Data Analyst Must Know With Indian Business Examples

Knowing SELECT and WHERE is just the entry fee. What separates a junior analyst from someone who can answer real business questions fast is fluency with the 20 SQL patterns that appear again and again in actual dashboards, reports, and ad-hoc requests. This post covers those patterns — not theoretical — with scenarios drawn from Kozhikode textile distributors, Kochi IT company HR data, and e-commerce funnel analytics that Indian analysts actually encounter.

1–5: Window Functions You Cannot Avoid

1. RANK and DENSE_RANK for Leaderboards

A Kozhikode textile distributor wants to rank their stockists by quarterly fabric purchases. Ties happen when two stockists buy exactly the same volume — which the regional manager wants to see as a shared rank, not skipped numbers.

SELECT
  stockist_name,
  district,
  quarterly_purchase_value,
  RANK() OVER (ORDER BY quarterly_purchase_value DESC) AS rank_with_gaps,
  DENSE_RANK() OVER (ORDER BY quarterly_purchase_value DESC) AS rank_no_gaps
FROM stockist_quarterly_summary
WHERE fy_quarter = 'Q3-FY2026';

Use DENSE_RANK when the business context is "position on a leaderboard". Use RANK when you need to know exactly how many peers outperformed a given row.

2. ROW_NUMBER for Deduplication

An e-commerce order table often has duplicate entries from webhook retries. To keep only one row per order ID — the most recent one:

WITH ranked_orders AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY order_id
      ORDER BY created_at DESC
    ) AS rn
  FROM raw_orders
)
SELECT * FROM ranked_orders WHERE rn = 1;

3. LAG and LEAD for Period-over-Period Comparison

A Kochi IT company wants to compare each month's headcount against the previous month to flag departments with unusual attrition:

SELECT
  dept_name,
  report_month,
  headcount,
  LAG(headcount) OVER (PARTITION BY dept_name ORDER BY report_month) AS prev_month_headcount,
  headcount - LAG(headcount) OVER (PARTITION BY dept_name ORDER BY report_month) AS net_change
FROM monthly_headcount_snapshot
ORDER BY dept_name, report_month;

LEAD does the same in the opposite direction — useful for calculating how many days until the next event, like the next purchase by a customer.

4. Running Totals with SUM OVER

Cumulative revenue by day for an e-commerce order funnel report:

SELECT
  order_date,
  daily_revenue,
  SUM(daily_revenue) OVER (ORDER BY order_date ROWS UNBOUNDED PRECEDING) AS cumulative_revenue
FROM daily_revenue_summary
WHERE order_date BETWEEN '2026-04-01' AND '2026-06-30';

5. Moving Averages for Smoothing Noisy Data

A 7-day rolling average of daily orders removes weekend dips from the trend line:

SELECT
  order_date,
  daily_orders,
  AVG(daily_orders) OVER (
    ORDER BY order_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS rolling_7day_avg
FROM daily_order_counts;

6–9: CTEs and Subqueries

6. Basic CTE for Readable Multi-Step Logic

CTEs make complex queries readable by naming intermediate steps. For an employee performance analysis at a Kochi IT company with multiple sub-calculations:

WITH dept_avg AS (
  SELECT
    dept_id,
    AVG(performance_score) AS avg_score
  FROM employee_reviews
  WHERE review_year = 2026
  GROUP BY dept_id
),
above_avg_employees AS (
  SELECT e.emp_name, e.dept_id, e.performance_score, d.avg_score
  FROM employee_reviews e
  JOIN dept_avg d ON e.dept_id = d.dept_id
  WHERE e.performance_score > d.avg_score AND e.review_year = 2026
)
SELECT * FROM above_avg_employees ORDER BY dept_id, performance_score DESC;

7. Recursive CTE for Hierarchy Traversal

An org chart query — find all direct and indirect reports under a given manager:

WITH RECURSIVE org_tree AS (
  -- Base: the manager themselves
  SELECT emp_id, emp_name, manager_id, 0 AS depth
  FROM employees
  WHERE emp_id = 1042  -- Target manager's emp_id

  UNION ALL

  -- Recursive: their reports, and reports of reports
  SELECT e.emp_id, e.emp_name, e.manager_id, ot.depth + 1
  FROM employees e
  INNER JOIN org_tree ot ON e.manager_id = ot.emp_id
)
SELECT * FROM org_tree ORDER BY depth, emp_name;

8. Subquery in WHERE for Existence Checks

Find customers who placed an order this quarter but not last quarter — useful for "returning after lapse" campaign lists:

SELECT DISTINCT customer_id, customer_name
FROM orders
WHERE order_date BETWEEN '2026-04-01' AND '2026-06-30'
  AND customer_id NOT IN (
    SELECT DISTINCT customer_id
    FROM orders
    WHERE order_date BETWEEN '2026-01-01' AND '2026-03-31'
  );

9. Correlated Subquery for Row-by-Row Logic

For each product in a Kozhikode distributor's catalogue, find the most recent purchase price paid:

SELECT
  p.product_id,
  p.product_name,
  (
    SELECT purchase_price
    FROM purchase_history ph
    WHERE ph.product_id = p.product_id
    ORDER BY purchase_date DESC
    LIMIT 1
  ) AS latest_purchase_price
FROM products p;

10–13: CASE WHEN for Segmentation

10. Customer Value Segmentation

Classify customers as High, Medium, or Low value based on 12-month spend — a common input for CRM campaign targeting:

SELECT
  customer_id,
  customer_name,
  annual_spend,
  CASE
    WHEN annual_spend >= 500000 THEN 'High Value'
    WHEN annual_spend >= 100000 THEN 'Medium Value'
    ELSE 'Low Value'
  END AS customer_segment
FROM customer_annual_summary
WHERE summary_year = 2026;

11. CASE WHEN in Aggregation (Conditional Count)

Count orders by status in a single row per customer — instead of joining multiple filtered aggregations:

SELECT
  customer_id,
  COUNT(*) AS total_orders,
  SUM(CASE WHEN status = 'delivered' THEN 1 ELSE 0 END) AS delivered,
  SUM(CASE WHEN status = 'returned' THEN 1 ELSE 0 END) AS returned,
  SUM(CASE WHEN status = 'cancelled' THEN 1 ELSE 0 END) AS cancelled
FROM orders
WHERE order_date >= '2026-01-01'
GROUP BY customer_id;

12. Indian Fiscal Year Grouping

Indian businesses report on an April–March fiscal year. This query correctly assigns each transaction to its fiscal year:

SELECT
  CASE
    WHEN MONTH(transaction_date) >= 4
    THEN YEAR(transaction_date)
    ELSE YEAR(transaction_date) - 1
  END AS fiscal_year,
  SUM(amount) AS total_revenue
FROM transactions
GROUP BY
  CASE
    WHEN MONTH(transaction_date) >= 4
    THEN YEAR(transaction_date)
    ELSE YEAR(transaction_date) - 1
  END
ORDER BY fiscal_year;

A January 2026 transaction gets fiscal_year = 2025 (FY2025-26), while an April 2026 transaction gets fiscal_year = 2026 (FY2026-27). This aligns with how Indian CA firms and finance teams label fiscal years.

13. Quarter within Indian Fiscal Year

SELECT
  CASE
    WHEN MONTH(order_date) IN (4,5,6)   THEN 'Q1'
    WHEN MONTH(order_date) IN (7,8,9)   THEN 'Q2'
    WHEN MONTH(order_date) IN (10,11,12) THEN 'Q3'
    ELSE 'Q4'
  END AS fiscal_quarter,
  COUNT(*) AS order_count,
  SUM(order_value) AS total_value
FROM orders
WHERE YEAR(order_date) IN (2025, 2026)
GROUP BY fiscal_quarter
ORDER BY MIN(order_date);

14–17: Joins, Self-Joins, and Set Operations

14. Self-Join for Employee–Manager Pairs

SELECT
  e.emp_name AS employee,
  m.emp_name AS manager,
  e.dept_name
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id
ORDER BY e.dept_name, e.emp_name;

15. Finding Gaps with OUTER JOIN

Identify which days in a date range had zero orders — useful for spotting payment gateway outages at Indian e-commerce companies:

WITH date_series AS (
  SELECT DATE('2026-04-01') + INTERVAL seq DAY AS dt
  FROM (SELECT 0 seq UNION SELECT 1 UNION SELECT 2 /* ... to 89 */) nums
  WHERE DATE('2026-04-01') + INTERVAL seq DAY <= '2026-06-30'
)
SELECT ds.dt, COALESCE(COUNT(o.order_id), 0) AS orders
FROM date_series ds
LEFT JOIN orders o ON DATE(o.created_at) = ds.dt
GROUP BY ds.dt
HAVING orders = 0
ORDER BY ds.dt;

16. UNION vs UNION ALL for Combining Data Sources

Combine offline and online sales into a single analysis table:

SELECT order_id, customer_id, order_date, order_value, 'online' AS channel
FROM online_orders
WHERE order_date >= '2026-04-01'

UNION ALL  -- Use UNION if you need deduplication; UNION ALL is faster

SELECT bill_no, customer_id, bill_date, bill_value, 'offline' AS channel
FROM pos_transactions
WHERE bill_date >= '2026-04-01'
ORDER BY order_date;

17. Subquery vs JOIN Performance

A subquery in WHERE often performs differently from a JOIN to the same data. For large tables in MySQL, prefer EXISTS over IN with subqueries:

-- Often faster on large tables than IN (subquery)
SELECT c.customer_id, c.customer_name
FROM customers c
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.customer_id = c.customer_id
    AND o.order_date >= '2026-01-01'
);

18–20: Advanced Aggregation and Query Optimisation

18. GROUP BY with ROLLUP for Subtotals

Sales analysis by region and district with automatic subtotals — useful for MIS reports sent to Kerala-based FMCG distributors:

SELECT
  COALESCE(region, 'ALL REGIONS') AS region,
  COALESCE(district, 'ALL DISTRICTS') AS district,
  SUM(sales_value) AS total_sales
FROM distributor_sales
WHERE fy_year = 2026
GROUP BY ROLLUP(region, district)
ORDER BY region, district;

ROLLUP produces subtotal rows automatically — you get a row for each district, a row for each region total, and a grand total row, all in a single query.

19. PIVOT Pattern Using Conditional Aggregation

Turn rows into columns — monthly revenue by product category, formatted for pivot-style reporting:

SELECT
  product_category,
  SUM(CASE WHEN MONTH(order_date) = 4  THEN order_value ELSE 0 END) AS Apr,
  SUM(CASE WHEN MONTH(order_date) = 5  THEN order_value ELSE 0 END) AS May,
  SUM(CASE WHEN MONTH(order_date) = 6  THEN order_value ELSE 0 END) AS Jun,
  SUM(CASE WHEN MONTH(order_date) = 7  THEN order_value ELSE 0 END) AS Jul,
  SUM(CASE WHEN MONTH(order_date) = 8  THEN order_value ELSE 0 END) AS Aug,
  SUM(CASE WHEN MONTH(order_date) = 9  THEN order_value ELSE 0 END) AS Sep
FROM orders
WHERE order_date BETWEEN '2026-04-01' AND '2026-09-30'
GROUP BY product_category;

20. EXPLAIN PLAN for Query Optimisation

When a dashboard query is slow, the first step is understanding what the database engine is actually doing:

EXPLAIN SELECT
  c.customer_id,
  c.customer_name,
  SUM(o.order_value) AS lifetime_value
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2025-01-01'
GROUP BY c.customer_id, c.customer_name
HAVING SUM(o.order_value) > 50000;

Look for "Full Table Scan" or "Using temporary; Using filesort" in the EXPLAIN output — these indicate missing indexes. For the query above, an index on orders(customer_id, order_date) will typically reduce execution time by an order of magnitude on tables with millions of rows.

An e-commerce funnel analysis at a Kochi company that takes 40 seconds to run often drops to under 2 seconds after adding a composite index on the join and filter columns. Always check EXPLAIN before adding an index — sometimes the query plan reveals a structural issue that an index alone cannot fix.

Frequently Asked Questions

What is the difference between RANK and DENSE_RANK in SQL?

RANK assigns the same rank to tied rows and then skips numbers — two rows tied at rank 2 are both ranked 2, and the next row is ranked 4. DENSE_RANK also assigns the same rank to ties but does not skip numbers — the next row after two tied rank-2 rows receives rank 3. For sales leaderboards where multiple representatives might hit the same revenue, DENSE_RANK gives cleaner results because rank numbers stay consecutive.

How do I calculate Indian fiscal year (April–March) groupings in SQL?

Use a CASE WHEN to shift months: when the month is April (4) or later, use the calendar year as the fiscal year; otherwise subtract 1. A January 2026 transaction belongs to FY2025, while an April 2026 transaction belongs to FY2026. Wrap this in a CTE for reuse across multiple aggregations in the same query.

When should I use a CTE instead of a subquery in SQL?

Use a CTE when you need to reference the same intermediate result more than once, when the logic has more than two levels of nesting, or when you want to make the query readable for the next analyst who inherits it. Use a subquery for simple one-off filters where creating a named reference adds no clarity. In PostgreSQL and BigQuery, the query optimiser treats them differently — profile both if performance matters on large datasets.