I actually cheated a bit here and used the more extensive data from
the HTML pages, which meant I had to download 495 pages. This only
took a few minutes (I didn't time it exactly).
The Perl script to process the pages and put them in an SQLite db also
took a few minutes.
I agree that this won't scale well: the time (and even space) required
for 6M games might be excessive.
I've now written a Perl script that parses the XML data: http://github.com/barrycarter/bcapps/bl ... 2sqlite.pl
The resulting db is: http://ccgames.db.94y.info/
and includes only 1 player's games, since I'm waiting until the API
upgrade to download data for all games.
I haven't really played w/ this much. If someone's interested, take a
look at the schema: http://schema.ccgames.db.94y.info/
I need to tweak my script to include log items, but things like start
time aren't in the XML dump, so can't be in the database.