Challenge, sharing, optimization and little green men
Date: 19 Aug 2002 20:01:48 -0700
Message-ID: <47684b13.0208191901.4c19892e_at_posting.google.com>
Hi,
Here is an idea about sharing, storing, and organizing a type of data.
I wish we could review it and optimize it for huge volume. Even by breaking everything and using a totaly different approach ! :)
Btw it is not straightforward, (this is probably its most serious problem) so take your time to read the description and to understand what was not explicitly described or explained.
Comments, even not constructives are more than welcome ! :)
You can see the database model at
http://www.siteavie.com/dbpattern/pattern.jpg
Florent Ramiere
This pattern is about sharing, storing, and organizing a type of data.
This pattern is using 5 tables:
one to store the data (DATA)
one to link a data to one or more categories (ie DATAFOLDERDATA)
one to store recursive categories of data (ie DATAFOLDER)
one to the store the shared status of a category (ie
SHAREDDATAFOLDER)
one to cache shared data (ie CACHEDSHAREDDATA)
DATA
A data is linked directly to one or more category. Most of the time the data is linked to only one category. So we have a field in the data table that is used to link data to the category.
(DATA.DATAFOLDER_ID=DATAFOLDER.DATAFOLDER_ID)
We save also in the data table the number of categories this data is linked to.
So if this value is superior to 1, then we make the join betwee datadatafolder and datadatafolder.
CATEGORY
A category (a.k.a. folder) is recursive, it can contain one or more
categories.
We save for each folder these informations:
NBELEMENTS: the number of elements linked to this category
FOLDER_ID: the parent folder id
DEPTH: the actual depth of this category in the whole hierarchy
NBSUBFOLDER: the number of sub categories
MAXSUBDEPTH: maximum sub-depth starting from this category
This datas are stored in order to ease the client side: no data browsing is needed as every information about the hierarchy is already computed.
SHARING
Each table of this pattern is linked to a user that owns the data. Its folders, and its associated datas can be shared to other users.
Most of the time the users will not share their datas, for lack of GUI
;) or simply for lack of usage.
So we need to lower at the maximum the overhead of sharing management.
A folder contains the following field:
ISSHARED
NBSHAREELEMENT
isshared is used to quickly know the status of the folder, shared or
not.
If it is, we make joins. If not: we do not.
Nb share elements will save the overhead of looking through the
sharedatafolder table for the ... number of elements.
When a folder is shared, we fill the SHAREDDATAFOLDER table in order
to link a user to this folder.
At the same time we do cache the elements of this folder in the
CACHEDSHAREDDATA table.
Linking a data to the user requesting the share.
When a folder is shared, we update its ISSHARED field. We update the ISSHARED field of the shared datas too.
SHARING RIGHTS
we save the sharing rights in of the folders in the DATADATAFOLDER table,
here are the fields:
CANADD
CANDELETE
CANUPDATE
CANADDELEMENT
CANUPDATEELEMENT
CANDELETEELEMENT
SHARING NOTIFICATION
When a user wants to share a folder, it has the possibility to specify which event he wants to be notified. the notifications can be triggered on the following events:
add : a sub folder is added delete : a sub folder is deleted update : this folder is updated add element : a data has been added to this folder delete element : a data from this folder has been deleted update element : a data from this folder has been updated connect : a user has now attached to this folder disconnect : a user is no more attached to this folder
For the data in itself, it is simpler, the events are:
udpate : this data has been updated delete : this data has been deleted
The triggered events are stored in a specific table to ease its
retreival.
These datas should have a TTL in order to avoid incrise of its size.
(tables are SHAREEVENT and SHAREEVENTTYPE)
GARBAGE COLLECTION
Garbage collection is needed when a data is shared: Example:
PETER is sharing is files
three friends share PETER's files
PETER unsubscribes the service
PETER's friens cannot access anymore the files.
Same thing for the folders.
So we need to do some "reference counting", when a data is linked we
increment its NBREFERENCE field. When a data is unlinked we decrement
its NBREFERENCE field.
When this fiels is equal to 0. Then nobody is linked to it. And we can
destroy it.
We cannot cascade delete the folders, as DATA may be linked individually.
STATISTICS
We save for each table (if it has some meanings...)
NBACCESS : number of access of this information NBSHAREDACCESS : number of shared access of this information NBUPDATE : number of update of this information LASTACCESS : the last access of the owner on this data LASTSHAREACCESS : the last access of non owner on this data CREATIONDATE : the creation date DELETIONDATE : the deletion date (if flagged as deleted)
These datas are not meant to be reliable, and real time acurate, most
of the time, worker thread will store them, and dump all theses
updates during
the night as batch updates.
The only real overhead I do see is the size of the database, but these
datas
are very usefull for the production/maintenance teams.
USER SUPPORT
Each table has a ISDELETED flag.
When a user deletes something, we set ISDELETED to true, and we forget about this element (and its sub elements if any)
IE we NEVER delete anything from our database when the user requests it. We just flag the datas as deleted.
This is important for the customer support, if a user unsuscribed or deleted important data mistakingly, the support team will be able to recover its datas.
Of course, a Daemon would remove every data deleted marked elements periodically after a period of time. Received on Tue Aug 20 2002 - 05:01:48 CEST