Duplicate rows can occur for different reasons and there are a few different approaches that can be taken to delete them from a table. However, the state of the duplicate rows will determine what approach can be used to remove them.
I have created a table with stock info in it to use for demoing how each delete method works.
DELETE TOP(X) from a table
One of the most comm occurrence of duplicate rows in a table is when a record was accidentally inserted twice.
In this example ticker XYZ has two rows for 11/3.
This duplicate row can be deleted using the TOP clause.
DELETE TOP(1) FROM dbo.Stockhistory WHERE Ticker = 'XYZ' AND MarketDate = '2022-11-03'
SQL will delete the one row for ‘XYZ’ on Nov 3rd.
The limitation with this approach is you must exactly which stock as the date of the duplicate row.
DELETE with CROSS APPLY
A different approach would be to use cross apply with delete.
Using the same example from above.
DELETE TOP (1)
FROM [H]
FROM dbo.Stockhistory AS [H]
CROSS APPLY
(
SELECT Ticker FROM dbo.Stockhistory AS [B]
WHERE H.Ticker = B.Ticker
GROUP BY Ticker
HAVING COUNT(*) > 1
) AS [A]
This would have the same affect as above. With this method you don’t need to specify the ticker or date.
However, this method will only remove one duplicate row at a time.
DELETE with window functions
The above methods work when you have just one duplicate. A different approach will be needed when there are multiple duplicate records on multiple dates.
For example:
- Ticker XYX has 2 rows on 11/3
- Ticker DEF has 3 rows on 11/4
- Ticker JKM has 4 rows on 11/5
To clear all of these duplicates from the table. You will need to use a window function. You can do this using either the ROW_NUMBER or RANK() function:
DELETE FROM [A]
FROM
(
SELECT ROW_NUMBER() OVER( PARTITION BY Ticker ORDER BY TICKER ) AS [RowInstance], *
FROM dbo.Stockhistory
) AS [A]
WHERE A.[RowInstance] > 1
All the duplicate rows across all dates for all ticker entries have been deleted.