/    Sign up×
Community /Pin to ProfileBookmark

Please help with making this SQL Query

Hello!

**In pseudo-SQL it would be like this:**
“GROUP all items in `products_table` by similar `name`. Then, EVERY FIRST element in each GROUP is given the ‘1’ in the field `leader`“.

Here is an example:
**Stage 1**
`
+—-+———+——–+
| id | name | leader |
+—-+———+——–+
| 1 | Foo | 0 |
+—-+———+——–+
| 2 | Baz | 0 |
+—-+———+——–+
| 3 | Bar | 0 |
+—-+———+——–+
| 4 | Baz | 0 |
+—-+———+——–+
| 5 | Foo | 0 |
+—-+———+——–+
| 6 | Bar | 0 |
+—-+———+——–+
| 7 | Baz | 0 |
+—-+———+——–+
| 8 | Bar | 0 |
+—-+———+——–+
| 9 | Bar | 0 |
`

**Stage 2**
`
+—-+———+——–+
| id | name | leader |
+—-+———+——–+

| 1 | Foo | 0 |
+—-+———+——–+
| 5 | Foo | 0 |
+—-+———+——–+

| 2 | Baz | 0 |
+—-+———+——–+
| 4 | Baz | 0 |
+—-+———+——–+
| 7 | Baz | 0 |
+—-+———+——–+

| 3 | Bar | 0 |
+—-+———+——–+
| 6 | Bar | 0 |
+—-+———+——–+
| 8 | Bar | 0 |
+—-+———+——–+
| 9 | Bar | 0 |
`

**Stage 3**
`
`

+—-+———+——–+
| id | name | leader |
+—-+———+——–+
| 1 | Foo | 1 |
+—-+———+——–+
| 5 | Foo | 0 |
+—-+———+——–+

| 2 | Baz | 1 |
+—-+———+——–+
| 4 | Baz | 0 |
+—-+———+——–+
| 7 | Baz | 0 |
+—-+———+——–+

| 3 | Bar | 1 |
+—-+———+——–+
| 6 | Bar | 0 |
+—-+———+——–+
| 8 | Bar | 0 |
+—-+———+——–+
| 9 | Bar | 0 |
`

Is it possible to do this without iterating over each row in the table and finding similar rows by their `name`? There are 1 500 000 rows in there and it will be quite slow. I wish this could be done in no more than about 30 minutes.

Thank you !

to post a comment
PHP

10 Comments(s)

Copy linkTweet thisAlerts:
@ginerjmSep 12.2021 — Didn't we go thru this a couple of weeks ago???
Copy linkTweet thisAlerts:
@weareandreiauthorSep 12.2021 — @ginerjm#1636898 Sorry for repeating the question. I thought that I didn't explain the question well enough last time. And hence the replies were very confusing and in the end, I still didn't find a way to solve this problem.
Copy linkTweet thisAlerts:
@ginerjmSep 12.2021 — Refresh my memory. These 'similar' names - are they IN FACT the same person? Or just 'similar' names?

If you have a person with a multiple ids what does it mean? Doesn't make sense to me.
Copy linkTweet thisAlerts:
@ginerjmSep 12.2021 — You know what - I don't want to re-learn this problem all over again. Good luck.
Copy linkTweet thisAlerts:
@weareandreiauthorSep 12.2021 — The name belongs to the product, not to a person. And a product can have the same name but still different characteristics. For example same shoes, different size. I need a leader because I don't want to see all the shoe sizes as different products. I want to see only the first element of this group, no matter what size.

I know this database design is a little sloppy, but I don't have an opportunity to change it right now.

So think about it like this: I have a database with shoes. All have different sizes but someone made a bad design and now all of this is on the same table. Luckily (I made such an example here), I can define similar shoes by a similar name and nothing else matters. So when the online shop will load its products to show them to the user I don't want it to show 10-20 similar products but just with different sizes. Figuring out who is the 'main shoe row' and who is just the 'size' in the moment of the query would take too long. So I want to make a function that will 'prepare' the database by assigning 1 at leader column where shoes is 'main' and 0 where the shoe is just indication the size (altough the records look similar but again that is just the bad design of the database).

Hope this helps now 😄
Copy linkTweet thisAlerts:
@ginerjmSep 12.2021 — Right - products. So you have a product that has multiple stock items. If the product sizes all have common attributes and the only difference is the size itself it sounds like that info should be in a table of its own so that you can manage your products as a whole, including the leader record. The fact that you have mutliple records and want to distinguish one of a set from all of the others tells me you need to alter your db design.

Think about this. You will have multiple records for shoe A with multiple ids(?) and multiple sizes (hence the multiple records). The product goes thru a name change. That means you have multiple records that have to be edited which is not right. Either only have one place with the product name in it or retain all of the current records and start a whole new set of the current record style with the new name on them.

Without knowing everything my first thought would be with a product change you would create a new id with the new product's attributes on it and then start tracking the new size records and sales using that new id. If necessary you can create a product-links table to match up the old product id with the new product id(s) so that you can have an overall picture of all of these similar products.
Copy linkTweet thisAlerts:
@sibertSep 12.2021 — > @weareandrei#1636902 Hope this helps

Not very much. Please create a SQLFiddle so it is easier to follow. And more clear the expected output.

Like this (You can use this as a template): https://www.db-fiddle.com/f/eaQG8H4yqY9hnQBZjzJgz/44
Copy linkTweet thisAlerts:
@weareandreiauthorSep 12.2021 — @ginerjm#1636903 so it sounds to me that you are telling me there is no way to do this in SQL as simple as I want it? And the only way to make everything work is to change database design?
Copy linkTweet thisAlerts:
@ginerjmSep 12.2021 — Not at all. I am suggesting that you needs a redesign with idea you are discussing. If you want to do it that way though then go ahead, knowing that you may have problems in the future.

You could write a query that picks out the product name duplicates with the lowest id and use that output as a the input to a 2nd query that would use that lowest id to do the 'leader' posting.

Something like

select min(id) as first_id, name group by name where 1

and use that output to build a set of "values()" for the update query.
Copy linkTweet thisAlerts:
@ginerjmSep 13.2021 — Just a thought as I was reviewing this topic: You should really get used to using meaningful (yet short) names for your table column names. Just plain 'id' or 'name' is not sufficient when you are reading code and you come across a field name that gives you no id what it is for. Plus - name is never a good choice since it could be a reserved word. Same with 'date' or 'time' and probably a few others.
×

Success!

Help @weareandrei spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 4.23,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,

tipper: @Samric24,
tipped: article
amount: 1000 SATS,
)...