Hello World

Be Happy!

Greatest N Per Group


> why i get ramdom payment_no when i use max(payment_create)?

> You get random payment_no because of how GROUP BY works in SQL.

What happens:
sql
SELECT user_no, payment_no, MAX(payment_date)
FROM bt_payment
GROUP BY user_no


Step by step:
1. GROUP BY user_no creates groups of rows
2. MAX(payment_date) calculates the maximum date per group
3. payment_no - SQL doesn't know which payment_no to pick from the group

Example data:
user_no | payment_no | payment_date
123     | P001       | 2024-01-01
123     | P002       | 2024-02-01  ← MAX date
123     | P003       | 2024-01-15


After GROUP BY user_no:
• Group has 3 rows for user_no=123
• MAX(payment_date) = 2024-02-01 ✅
• payment_no = ??? (P001, P002, or P003) - SQL picks randomly ❌

Why random?
• SQL knows the MAX date is 2024-02-01
• But SQL doesn't automatically know you want P002 (the payment_no from that
same row)
• SQL just picks any payment_no from the group

The disconnect:
• MAX(payment_date) comes from row with P002
• payment_no might come from row with P001 or P003
• They're from different rows in the same group

This is why you need window functions or correlated subqueries to get the
payment_no that actually corresponds to the MAX date row.


Solve
-- Example data:
-- user_no | payment_no | payment_date
-- 123     | P001       | 2024-01-01
-- 123     | P002       | 2024-02-01  ← We want this row (latest date)
-- 123     | P003       | 2024-01-15

-- ❌ WRONG - This gives random payment_no
SELECT user_no, payment_no, MAX(payment_date)
FROM payment 
WHERE user_no = 123
GROUP BY user_no;
-- Result: user_no=123, payment_no=P001 (random!), MAX(payment_date)=2024-02-01

-- ✅ Solution 1: Window Function (MySQL 8.0+)
SELECT user_no, payment_no, payment_date
FROM (
  SELECT user_no, payment_no, payment_date,
    ROW_NUMBER() OVER (PARTITION BY user_no ORDER BY payment_date DESC) as rn
  FROM payment
  WHERE user_no = 123
) ranked
WHERE rn = 1;
-- Result: user_no=123, payment_no=P002, payment_date=2024-02-01 ✅

-- ✅ Solution 2: Correlated Subquery
SELECT user_no, payment_no, payment_date
FROM payment p1
WHERE user_no = 123
  AND payment_date = (
    SELECT MAX(payment_date)
    FROM payment p2
    WHERE p2.user_no = p1.user_no
  );
-- Result: user_no=123, payment_no=P002, payment_date=2024-02-01 ✅

-- ✅ Solution 3: Self JOIN (your approach fixed)
SELECT p1.user_no, p1.payment_no, p1.payment_date
FROM payment p1
LEFT JOIN payment p2 ON p1.user_no = p2.user_no 
  AND p2.payment_date > p1.payment_date
WHERE p2.payment_no IS NULL
  AND p1.user_no = 123;
-- Result: user_no=123, payment_no=P002, payment_date=2024-02-01 ✅

-- How Solution 3 works step by step:
-- Step 1: p1 gets P001, p2 finds P002,P003 (dates > 2024-01-01) → p2 NOT NULL
-- Step 2: p1 gets P002, p2 finds nothing (no dates > 2024-02-01) → p2 IS NULL ✅
-- Step 3: p1 gets P003, p2 finds P002 (date > 2024-01-15) → p2 NOT NULL
-- Only Step 2 passes the WHERE p2.payment_no IS NULL condition

#sql (1) #greatest-n-per-group (1) #database-design-pattern (1)
List