Click to See Complete Forum and Search --> : Selecting a set of values as a field... or something like that


FeelLikeANut
01-15-2008, 11:48 AM
I'm playing with a program for a school library. Here's a simplified version of my setup.students
-----------------------------
| student_id | student_name |
|---------------------------|
| | |
-----------------------------

books
------------------------
| book_id | book_title |
|----------------------|
| | |
------------------------

students_books
------------------------
| student_id | book_id |
|----------------------|
| | |
------------------------A student can borrow any number of books, and a book can be loaned to any number of students, hence the students_books cross reference table. But now I need to select data to fill this student data structure...student = {
id: ...,
name: ...,
books: [
...,
...
]
}So I need to select the student information (simple enough) and each book they've borrowed. This is what I've tried.SELECT
students.student_id,
student_name,
book_title
FROM
students
LEFT OUTER JOIN students_books ON
students.student_id = students_books.student_id
LEFT OUTER JOIN books ON
students_books.book_id = books.book_id
WHERE
students.student_id = ?But I get a separate result for each book, which actually makes sense from what I know of SQL. Right now I'm thinking I'll need two separate select statements. But is there some clever way to perform a single select to get what I need? Perhaps something with group by or union that I don't know how to use well?

chazzy
01-15-2008, 07:14 PM
it's really a design preference.

some things to consider.

where will the performance bottle neck be? When querying multiple times or adding in the logic to separate it from a single query (hint: the answer that most development frameworks picks the latter as the performance issue)

it's also a question of how to provide the books. Do you need to always have the books or can you hold off and only sometimes need the books?

FeelLikeANut
01-15-2008, 08:07 PM
where will the performance bottle neck be?I'm not sure which of the two options will be a bottleneck. I'll probably run test suites to check for correctness and measure performance, but for now I want to at least have the choice to run the query both ways.

The only way I know how to accomplish this right now is to query the student information, then use the results from the first query to run a second query getting the books for the student, and then combine the information in the main program code.

What I'm trying to work out is a -- maybe better, we'll see -- purely SQL alternative, where I get the results without needing the main program to re-manipulate the data.

FeelLikeANut
01-16-2008, 04:56 PM
chazzy doesn't love me anymore. :p

jk. Though I am wondering if I haven't gotten another response because the problem is a pain or if it's because I didn't make much sense in my last post.

If it's because I wasn't clear, let me know and I'll try again. Thanks!

chazzy
01-16-2008, 08:18 PM
YOur response made 0 sense to me, sorry :-(

FeelLikeANut
01-16-2008, 10:21 PM
That's OK. I figured. So here's what I got. I can doSELECT student_id, student_name
FROM students
WHERE student_id = 123to get$student_result =
array(
array(
'student_id' => 123,
'student_name' => 'John Doe'
)
)and then doSELECT book_id, book_title
FROM books JOIN students_books ON books.student_id = 123to get$books_results =
array(
array(
'book_id' => 975,
'book_title' => 'Some Grand Story'
),
array(
'book_id' => 573,
'book_title' => 'Some Other Story'
)
)and combine the results$student_result[0]['books'] = $books_resultsto getarray(
array(
'student_id' => 123,
'student_name' => 'John Doe',
'books' => array(
array(
'book_id' => 975,
'book_title' => 'Some Grand Story'
),
array(
'book_id' => 573,
'book_title' => 'Some Other Story'
),
)
)
)And what I'm wondering is if I can select the same data -- the student information plus the book records belonging to that student -- in a single select statement rather than programmically combining the results of two selects.

chazzy
01-16-2008, 10:46 PM
I think you missed the hint I gave you in the first post.

Without knowing how complex your software is, I'd like to assume that you have separate DAO's for retrieving data that return objects. My thought would be to let the DAO for students handle students and the DAO for books handle books.

I guess what you're showing me is PHP (it looks like PHP to me anyways). IN java we use terms Lazy and Eager to determine how an object is loaded - lazy means that we only load it when called upon, and eager preloads an object. when you deal with objects with relationships (such as in students and books) then we run into a dilemma - students relate to books on a many to many relationship, and books relate to students in the same manner (regardless to how your app behaves, your data model says this). If we eagerly load all books and all students at run time, we'll run in to an endless loop (load a student, load all of the books for that student, then load all of the students for that book, then the rules say load books for that student... you get the point).

Using 2 queries instead of one won't kill the performance of your application.

FeelLikeANut
01-17-2008, 09:03 AM
I'm not actually using Java, nor am I using the DAO pattern, at least not in the traditional sense. I've embraced some of the UML terminology, making a distinction between actors and objects. I have a Student object, which is just a data container, and I have an Archivist, which is an actor. I can create objects and then pass them to my archivist to file away. And when I ask the archivist for something, it gets all the data it needs to recreate the state of the object. So I have one class that has full access to the database, and the rest of my program stores, and gets stored objects, through that class's API.

So I hope that helps you understand my program a bit better. And now back to the question. I've already written the two queries and combined the results programmically, so now trying to get it down to just one query is more to satisfy my own curiosity than it is about performance. I just want to see if it can be done. But I haven't been able to figure it out, so I need help.

chazzy
01-17-2008, 07:33 PM
I could have sworn you had this at some point earlier in the thread...


select
columns
from
students s
left join
students_books sb
on s.student_id = sb.student_id
inner join
books b
on b.book_id = sb.book_id;

FeelLikeANut
01-18-2008, 03:35 PM
Well I did have that earlier. But this is the kind of result it produces:array(
array(
'student_id' => 123,
'student_name' => 'John Doe',
'book_id' => 975,
'book_title' => 'Some Grand Story'
),
array(
'student_id' => 123,
'student_name' => 'John Doe',
'book_id' => 573,
'book_title' => 'Some Other Story'
)
)The student information is duplicated for each book result.

chazzy
01-19-2008, 08:54 AM
that is correct. if all you do is dump the data from the query in to the array that is what you will get.

SQL is a query language, it helps you find things. If you want to find every student and every book, this is one way, it requires one query. Another approach requires 2 queries. SQL has no concept of "first row"