Re: Lazy summer project: build a basketball app.

From: <kvnkrkptrck_at_gmail.com>
Date: 30 Jun 2006 08:56:16 -0700
Message-ID: <1151682976.631329.88180_at_p79g2000cwp.googlegroups.com>


Bob Badour wrote:
> kvnkrkptrck_at_gmail.com wrote:
>
> > Bob Badour wrote:
> >
> >>kvnkrkptrck_at_gmail.com wrote:
> >>
> >>
> >>>All, it's been a long time since I've done a from scratch database
> >>>design. What follows is a shameless appeal for anyone with some
> >>>database design expertise, a few extra minutes, and desire to help out
> >>>- to offer any criticisms/advice they see fit. I'll conclude the
> >>>opening paragraph with an apology to those without said expertise, few
> >>>extra minutes, or desire to help; for the time wasted on reading the
> >>>opening paragraph (and a suggestion to read no further/waste no further
> >>>time).
> >>>
> >>>My dad is a huge D3 college basketball fan, and during the season, he
> >>>attempts to keep daily tabs on the entire division (hundreds of teams).
> >>> Last year, he approached me for help. He'd been using dozens of Excel
> >>>sheets to track the games, and was having tremendous difficulties
> >>>keeping up. Data entry was a nightmare - all games needed to be
> >>>modified in two places (once for each team). Due to lack of data
> >>>integrity, his sheets were riddled with errors/typos. And reporting
> >>>was even worse - he spent hours each weekend manually sorting sheets,
> >>>summing functions, etc., in order to answer such simple questions as
> >>>"Which team has the best record", and "Next weekend, are there any
> >>>undefeated teams playing each other".
> >>>
> >>>Anyway, at the time, I helped him out by writing some ugly Excel macros
> >>>for data synchronization (allowing him to enter a score once and have
> >>>it posted in both places) - but it was conditioned on his promise to
> >>>let me design a relational database for the next season.
> >>>
> >>>So here I sit - it's been a long time since I've done a from-scratch
> >>>relational design, but it's a slow afternoon, so I figured I'd start
> >>>some ER planning.
> >>>
> >>>I start with a Team entity, with each team having a division attribute
> >>>(1, 2, or 3), a conference attribute, a college name, a shorthand name
> >>>/ abbreviation, and a city/state (pk will be division, conference,
> >>>college name). I can hand this entity over to my dad right now, and he
> >>>can start populating it. But I'm getting a bit stuck on the next step.
> >>>
> >>>
> >>>At first, I thought, I'll need a simple Game relation. But I think
> >>>this will need to be a super relation of sorts - because certain
> >>>attributes of tournament games are different than non-tournament games
> >>>(e.g. existence of a home/away team; existence of the name of the
> >>>tournament; a tournament round number; etc.).
> >>>
> >>>Another odd characteristic of the Game relation is that I can't come up
> >>>with a decent natural key while retaining symmetry of opponents (team A
> >>>vs. team B == team B vs. team A).
> >>>If I recall correctly, in a game between some team1 and team2, both
> >>>(Game Date, Team1) and (Game Date, Team2) are viable candidates - no
> >>>team will ever play two games on the same day. But I'd have to break
> >>>the symmetry in order to get a natural key (I think).
> >>>
> >>>So, I'm now leaning towards introducing an artificial Game entity with
> >>>a surrogate Game_id, e.g. <Game_id(pk), scheduled date>, and also
> >>>introducing a Opponent relation <Game ID (pk / fk on Game), Team (pk /
> >>>fk on Team), Points scored, ...>. I'd have to enforce some
> >>>application-level constraints (e.g. each game_id must exist exactly two
> >>>times in Opponent (if at all)), but it does keep a nice symmetry.
> >>>
> >>>Anyway, I've got ample time to think this one through - and my thought
> >>>is "if I build this right the first time, I can design some front-ends
> >>>for data loading, some views for data extraction, etc..., and my dad
> >>>might have a nice toy to play with for years to come".
> >>>
> >>>So, having said that - I'm open to any design advice or criticisms
> >>>anyone may have to offer. My best idea right now:
> >>>Team <division, conference, college, a shorthand name, city, state>
> >>>Game <Game_id (pk), Date>
> >>>TourneyGame <Game_id (pk - fk on Game.Game_id), Tournament_name,
> >>>round_number>
> >>>NonTourneyGame <Game_id (pk - fk on Game.Game_id)>
> >>>TourneyOpponent <Game_id (pk - fk on TourneyGame.game_id), Team_fk
> >>>(pk), Points_Scored>
> >>>NonTourneyOppenent <Game_id (pk - fk on NonTourneyGame.game_id),
> >>>Team_fk (pk), Home_Away_Ind, Points_scored>
> >>
> >>What about forfeits? Or does college basketball not have forfeits?
> >>
> >
> > I don't think forfeits are an issue - although games can certainly be
> > cancelled or rescheduled due to inclemental weather (one more argument,
> > I suppose, for avoiding Game_Date as part of a key - if a game between
> > A and B changes from the 1st to the 2nd of Sep, is it a different game?
> > I think not.). I'll ask to be sure.
> >
> >
> >>Do all teams have a home court? If so, when a tournament game is played
> >>at the home court of one of the teams, is that team not a home team? Is
> >>a team ever considered the home team when playing at a different court?
> >>
> >>Does college basketball have anything like a pre-season or exhibition game?
> >>
> >
> > In his spreadsheet, he used "A", "H" or "T" for this column... I don't
> > see a single instance of "AT" or "HT", so my guess is that he'd
> > consider a team as neither home nor away, even if it was playing in a
> > tournament hosted on its own court. I'll ask him, though, if he'd like
> > the ability to specify a team in a tournament as "home" if it is
> > hosting the tournament and away otherwise.
>
> I would not be so hasty to assume the spreadsheet reflects what your
> father wants to enter. It could be he set that up for entry because he
> did not have an easy way to derive it. If you can derive it, you will
> save him having to enter it.
>
>
> >>What are the requirements your father has?
> >
> > The requirements are inferred from the types of facts he included on
> > his spreadsheets from last season. My idea was to build a basic model,
> > asking questions as I go along. But my "high-order" goal is to deliver
> > something that lets him track the same facts he'd tracked last season,
> > but taking advantage of a relational foundation so I can quickly and
> > easily build on it data entry forms, queries, reports, etc.
> >
> > I wion't get into how long it took me to write the stupid error-prone
> > score synchronization Excel macro, but all I could think was, "This
> > should *not* be this hard"...
>
> Perhaps you should question your father on the requirements he tried to
> meet with his spreadsheet before making all kinds of tradeoffs and
> compromises. You might be able to compromise less and deliver more.

In other words, put the horse before the carriage. Check. I'll actually be visiting him this weekend, so I suppose I'll plan to bring a notebook. My current understanding of the situation is that the spreadsheets haved served as a fact base to help him 1) make insightful comments in discussions with other D3 fans (online or other local enthusiasts)
2) identify "interesting" upcoming match-ups (between undefeated teams, or between two conference leaders, etc.) 3) identify upcoming local matchups (which he may go to see in person) 4) monitor teams that he has some "link" to (knows the coach, knows someone who went there, used to coach there, etc.)

Each year, he has made his spreadsheets a little more elaborate than the prior year - but last year was the first time he attempted to track the entire division. I'm not sure if there is more information about games/schools/etc. that he'd care to track, but that will be a big part of what I ask him over the weekend. I know that he obtains most of his schedule information from schools via their websites, about 2 - 4 weeks before the season, in a variety of formats, and enters them in manually (using one spreadsheet per conference)... he is able to eliminate duplicates within a conference, but all interconference games are tracked in two places. These schedules, I believe, either list a specific opponent (and exact date, home/away, etc.), or the name of a tournament (and the start date of the tournament). I think he entered tournament games with a "null" opponent, and then filled in opponents after the fact (as well as inserting the second and third round games).

"Conference games" may just be inferred from the fact that the two teams are in the same conference.... I'll have to ask. I'll also ask if he would like to have special tournament tracking flexibility - at the very least, it's an interesting challenge on its own right to represent a tournament in a way that captures: "Who played who in which round".

Druing the season, I believe there is a website that does a fair job of collecting the scores (d3hoops.com or some such), and he gets the majority of scores by cutting/pasting scores from there into his spreadsheet. As far as I know, the site only posts teams, scores, and overtimes; but I can check on whether more information is available. Once I have a data model in place, I can write a little parse/load script to allow for easier data loading.

Anyway, good pointers, I'll report back when I get some better spec's (based on questions like, "is there any information you'd like to store, but don't"; "how do you want to handle data entry? if by loading a file, how should missing info, conflicting info, duplicate entries, typos, etc be handled". Of course, a lot of what he'll tell me will probably have more to do with front end, but I'll try to get as much feel for the underlying data models and constraints as I can. Received on Fri Jun 30 2006 - 17:56:16 CEST

Original text of this message