What is an optimal object-oriented program with an underlying database?
Consider a very simple message board system consisting of a user and amessages table in the database. The user table would look like the following:
+----+-------------+ | id | username | +----+-------------+ | 1 | John Doe | | 2 | Jane Doe | | 3 | Richard Roe | +----+-------------+
and the messages would look like the following:
+----+-------------+---------+--------------------------------------------------------------------------+ | id | in_reply_to | user_id | message | +----+-------------+---------+--------------------------------------------------------------------------+ | 1 | 0 | 2 | What is an optimal object-oriented program with an underlaying database? | | 2 | 0 | 1 | What does cohesion mean? | | 3 | 1 | 3 | Donno, check ostehamster.dk | | 4 | 2 | 1 | Try http://en.wikipedia.org/wiki/Cohesion_%28computer_science%29 | +----+-------------+---------+--------------------------------------------------------------------------+
Now, if we want to show the message with id 1 and all answers, together with the name of the user posting it, it is very simple
SELECT u.username, m.message FROM messages m LEFT JOIN user u ON u.id = m.user_id WHERE m.id = 1 OR m.in_reply_to = 1 ORDER BY m.id ASC;
This results in :
+-------------+--------------------------------------------------------------------------+ | username | message | +-------------+--------------------------------------------------------------------------+ | Jane Doe | What is an optimal object-oriented program with an underlaying database? | | Richard Roe | Donno, check ostehamster.dk | +-------------+--------------------------------------------------------------------------+
Very simple!
Object-oriented
Now, if we want to make this object-oriented with high cohesion, a possibility for inheritance and all the other buzzwords, we need a User and a MessageBoard class.
class User { int user_id; std::string username; public: User(int user_id); std::string get_username(void); int get_user_id(void); private: void load_user(int user_id); }
class MessageBoard { int msg_id; std::string[] messages; public: MessageBoard(int msg_id); std_string print(void); private: void load_messageboard(int msg_id); }
Actually, it might be more correct to have a Message class too. The Message class would handle each unique message, and the MessageBoard would then link a number of Message objects together. However, to make it more simple, we only use the MessageBoard class.
If the User and MessageBoard should have high cohesion, the classes should be wrappers for the underlying tables, and the above SQL query with the LEFT JOIN cannot be used. It should be possible for the User developers to change the database design and the internal functionality of the User class, without the MessageBoard developers needing to make any changes, as long as the User API does not change.
The User class could have a function returning the "LEFT JOIN" statement, which could then be put into the query made by the MessageBoard class. But in my opinion this will result in lower cohesion, because users and messages could be stored in different databases, hosts and even different formats.
If the tables can not be joined in any way, the print function could be implemented like:
void print() { conn = mysql_init(NULL); if (!mysql_real_connect(conn, server, user, password, database, 0, NULL, 0)) { mysql_query(conn, "SELECT user_id, message FROM messages m WHERE m.id = 1 OR m.in_reply_to = 1 ORDER BY m.id ASC;") res = mysql_use_result(conn); while ((row = mysql_fetch_row(res)) != NULL) { User user = new User(atoi(row[1])); printf("User: %s, Message: %s\n", user.get_username(), row[2]); } mysql_free_result(res); } mysql_close(conn); }
However, for each line in the message board this will make a new instance of the User class, which again will make a connection and a query to the database. For a message board it will not be a lot of calls, but in bigger and more complex systems it can be a serious bottleneck.
Caching objects
This could be solved by having a User cache, like this static function:
static User* get_cached_user(int user_id) { static std::vector< User* > cache; vector< User* > ::iterator it; /* Is the user chached? */ for ( it=cache.begin() ; it < cache.end(); it++ ) { if( (*it).get_user_id() == user_id) { return (*it); } } /* Generate and cache user */ cache.push_back(new User(user_id)); return cache.back(); }
This will decrease the number of queries to the database. However it might still result in a lot of calls to the database compared to the "LEFT JOIN" query.
The last option I can think of is to make a static function in the User class called
User*[] get_multiple_user(int[] ids)
The function takes an array of user ids, fetches all user data from the database in one query, and then generates an array of User objects pointers. This caching method will work like the one above, but all users are found using one database call, and thereby it decreases the total number of database queries to two.
But what is the right solution?
Making “backtrace” in MySQL with “recursive” call
Having items mapped in a tree structure is often used in database design, using "child" and "parent" ids or "nodes" and "leaf nodes". Often when presenting data, it is nice to make a "backtrace" by showing all nodes from the leaf to the root of the tree.
In this case I have a system with a lot of users, each of which is mapped to a group. The groups are defined as a tree structure, but because of the roles of the system, the leaf node can never be more than three nodes away from the root. I need to show a list of, say all users not logged in the last 2 months, and the list should show in which groups the users is located.
I could find all users, get the group id from each users and with a recursive function in the application, find the parent group id of the group, get this group from the database, find the parent group id of the group, get this group from the database..... This would require a lot of database activity. It is also possible to make nested sets or stored procedures, but if the depth of the three is relative low (and known), I believe the below is simpler.
The table of groups is defined as:
CREATE TABLE `groups` ( `gid` INT NOT NULL AUTO_INCREMENT PRIMARY KEY , `pgid` INT NOT NULL , `title` VARCHAR( 100 ) NOT NULL ) ENGINE = MYISAM ;
With these groups:
mysql> select * from groups; +-----+------+---------+ | gid | pgid | title | +-----+------+---------+ | 1 | 0 | Level 1 | | 2 | 1 | Level 2 | | 3 | 2 | Level 3 | | 4 | 3 | Level 4 | | 5 | 4 | Level 5 | +-----+------+---------+ 5 rows in set (0.00 sec)
With the below query, it is possible to make the backtrace using joins. Line 6 will find the record holding the parent group of the selected group as the temporary record bt1. Line 7 will then use the parent group id from bt1 to find this group as bt2, and so on.
1 2 3 4 5 6 7 8 9 | SELECT groups.gid, groups.title, bt1.gid AS 'gid-1', bt1.title AS 'title-1', bt2.gid AS 'gid-2', bt2.title AS 'title-2', bt3.gid AS 'gid-3', bt3.title AS 'title-3' FROM groups LEFT JOIN groups AS bt1 ON bt1.gid = groups.pgid LEFT JOIN groups AS bt2 ON bt2.gid = bt1.pgid LEFT JOIN groups AS bt3 ON bt3.gid = bt2.pgid WHERE groups.gid IN (1,2,3,4,5); |
The result is
+-----+---------+-------+---------+-------+---------+-------+---------+ | gid | title | gid-1 | title-1 | gid-2 | title-2 | gid-3 | title-3 | +-----+---------+-------+---------+-------+---------+-------+---------+ | 1 | Level 1 | NULL | NULL | NULL | NULL | NULL | NULL | | 2 | Level 2 | 1 | Level 1 | NULL | NULL | NULL | NULL | | 3 | Level 3 | 2 | Level 2 | 1 | Level 1 | NULL | NULL | | 4 | Level 4 | 3 | Level 3 | 2 | Level 2 | 1 | Level 1 | | 5 | Level 5 | 4 | Level 4 | 3 | Level 3 | 2 | Level 2 | +-----+---------+-------+---------+-------+---------+-------+---------+ 5 rows in set (0.00 sec)
This is not really a recursive call as the title says, but the result is the same, though much faster :)
Search and replace in MySQL
Howto replace foo with bar in the column col_name of all records of table table_name, where col_name has foo in it.
UPDATE table_name
SET col_name = (REPLACE (col_name, 'foo', 'bar'))
WHERE col_name LIKE '%foo%';
