1. Welcome to SportsJournalists.com, a friendly forum for discussing all things sports and journalism.

    Your voice is missing! You will need to register for a free account to get access to the following site features:
    • Reply to discussions and create your own threads.
    • Access to private conversations with other members.
    • Fewer ads.

    We hope to see you as a part of our community soon!

Getting solid data for sports articles

Discussion in 'Online Journalism' started by prhodes, Mar 5, 2012.

  1. prhodes

    prhodes New Member

    Anybody 'data journalists' out there for sports? I've had a pretty tough time finding good data - downloadable, properly formatted data. I know I could pay for it, but I don't have the cash, so I'm looking for free data - huge amounts of it.

    If you can get a solid amount of data, you can run various statistics on it, make a graph or two, and write something interesting based on what the graph indicates. But, it's hard to find good data that is freely available.

    I've used the Baseball Databank before - it's great. All tables are normalized, downloadable to a MySQL installation. But this is the rare, rare exception. Most are csv (comma separated value) tables, and are unlinked - and many of those are incomplete. Now, I don't mind transforming the data into something more readily useful - that's part of it. But often I'm having trouble just finding complete data sets.

    Any suggestions?
     
  2. buckweaver

    buckweaver Active Member

    What kind of data are you looking for?
     
  3. spikechiquet

    spikechiquet Well-Known Member

    You need data?
    <img src="http://images.wikia.com/memoryalpha/en/images/1/13/Data,_2364.jpg">

    <img src="http://www.cedmagic.com/featured/goonies/data-then.jpg">
     
  4. prhodes

    prhodes New Member

    Anything sports related. For example, there was a free baseball databank that I was able to use for a few different articles (one of which did an in-depth analysis of the Yankees and MLB payrolls). It has tons of great info - salaries, offensive stats, defensive stats - for every player in a MySQL database.

    I'd like to find something similar for all sports - both NCAA and Pro. The NBA would be a great starting place, as well as the NFL. Sometimes you don't know what you are going to write about until you analyze the data and come up with various graphs and charts. When you are surprised at something, often that can make a great topic for an article.
     
  5. prhodes

    prhodes New Member

    LOL! I only wish I had this guy around to crunch my numbers for me.
     
  6. buckweaver

    buckweaver Active Member

    I believe most of the baseball data you're going to find is CSV-based (although if you look hard enough, you can find people who have converted some of them to MySQL.) Here are the best, for my money:

    Lahman Baseball Database: http://www.seanlahman.com/baseball-archive/statistics
    Baseball Databank: http://www.baseball-databank.org/
    Lee Sinins' Complete Encyclopedia: http://www.baseball-encyclopedia.com/

    You can also download plenty of data in CSV form from Baseball-Reference.com. Just look for the CSV links on, well, just about any page.

    Retrosheet.org has a ton of data in its game event files, though if you don't have much experience with their naming conventions or MS-DOS, it can be difficult to learn how to extract. Here's a primer: http://www.retrosheet.org/datause.txt

    FanGraphs.com has tons of advanced metrics in datasets, all available for download. Just look for the link that says "Extract data" to download the CSV file on whichever leaderboard you want.

    Mike Fast has instructions (older, but still useful) on how to download complete Pitch F/X data into a MySQL database here: http://fastballs.wordpress.com/2007/08/23/how-to-build-a-pitch-database/

    ***

    I'm not much help on other sports (as anyone here could tell you. :D)

    But I do know that all the Sports-Reference sites (for NBA, NFL, NHL) now have a basic Play Index enabled, and you can run a lot of fun queries off that feature. They're not as well-developed (yet) as the Play Index at Baseball-Reference.com, which is Sean Forman's bread and butter, but it's a nice start.

    http://www.hockey-reference.com/play-index/
    http://www.basketball-reference.com/play-index/
    http://www.pro-football-reference.com/play-index/
     
  7. prhodes

    prhodes New Member

    Thanks for the links - greatly appreciated!

    BTW, I found a great source of NCAA Basketball data - just in time for March Madness. It's at http://www.hoopstournament.net. I'm messing around with it right now in hopes to produce an article for my website next week.

    I wonder if anybody has anything they'd like to see?
     
Draft saved Draft deleted

Share This Page