Background:
I have a table with millions of rows which I want to better manage.
Every new file update, I insert a new dataset and change active=1 to active=0 and set new row to active=1
Problem is that a file can have issues and be resent – meaning that the same day dataset can be duplicated in the database (which hasn’t been an issue until now)
Current Problem:
We’re now extending functionality and trying to select datasets for the last day of month… BUT, due to structure, there is many, many examples of duplicates.
Finding and deleting duplicates is a process I would prefer to avoid… So…
My proposed solution:
Create a unique constraint on customer_id-file_date (Obviously this is a challnge because of all the existing duplicates)
So…
I want to replicate the table structure and then add the unique constraint
Then, select all rows from the old table and use; INSERT INTO … (SELECT * …) ON DUPLICATE UPDATE …
Question:
Does a SELECT * return rows in the order they were inserted?
i.e. How can I be certain that only newer rows will update older rows?
OR – is there a better way?