Where to Now?

From US Go Wiki
Jump to: navigation, search

With the ratings system more or less active, it's worth thinking about where to go now. Brainstorming, random thoughts and discussion here.

  1. Various analyses of the ratings graph structure.
    • Jonathan Bresler Have lots of ideas regarding this. The first is Find Path...find a shortest path through the ratings graph structure from one player to another. For example Yu-Chia Wu (AGA ID 18000) is one of our newest members. From Yu-Chia Wu to Chuck Robbins (AGA ID 4) using tournaments played during the last twelve months only results in a path of five tournaments, all of them NOVA tournaments.
      • Andy Okun A fun, but unimportant, variation on this would be automatically calculating each member's shusaku number, based on a few anchor members' numbers that can be reasonably documented. This would be good because it could be based on tournament play ... a high standard. (Pro games only highest, tournament games next, documented games including online next, including teaching and simuls next, then any game at all. We could do the tournament one easily.)
  2. Online form for ratings submission
      • As anxious as I am for the rank certification, this would actually be a really useful thing to do up front. What is involved in this? I know nothing XML. Is validation tricky?
    • Either require AGA's XML tournament format or program authors can provide validating XML Schema and XLST transformation instructions.
    • Allows for validation of tournaments prior to submission and can provide immediate feedback of provisional ratings
      • Gurujeet KhalsaDoes EGF already have a form for this we can adapt? UCSF has a nice feature with an online form where an individual can enter their results and opponents and get back an 'unofficial' rating. That would be a nice feature as well.
      • pwaldron: The form won't be the major issue--data formatting will. Owing to Paul Matthews' poor documentation of the AGA's data format there are many variants in the wild. I'm not sure if we want to try and support any funny variants, or simply force strictness and let people complain to the authors of the tournament pairing programs.
  3. Rank certificates
    1. Need leadership decisions
      1. Will rank certificates be conservative or given away without much statistical credibility?
      2. Will certificates be a money-making effort, or given away freely?
      3. Delivered electronically (i.e., automatically sent out after each ratings run) or physically (requires clerical support)?
    2. Need to find a program coordinator to take care of integrating ratings side and certificate generation/delivery side. If system runs automatically, this will be a time-limited position.
    3. Need certificate designs
    4. If system is to be run automatically, need a programmer to take care of everything not ratings related
      1. Database queries
      2. Automatic generation of certificate and email out
      3. Requires player email addresses, which means membership database must be moved online
    5. If system is to be run with clerical support, need to find a volunteer
    6. Ratings system to be coded (Phil)
      1. pwaldron: Initial work was done over Christmas 2009. Given my experiences with the AGA's interest in projects waning over time I don't want to put in additional effort until more of the issues on this list are addressed.
    • Andy Okun: What is the first thing you need from us on this?
    • Steve C: changed order and enumerated list to what i think is the order
  4. Evaluating online tournament games
    • Need volunteer to compile data from existing sources before deciding what else is required.
      • Gurujeet Khalsa I think there are real benefits to ranking some online games (and not just tournaments). For the rating system more games means less uncertainty in the rating and rapidly improving players can rapidly improve their rating faster without showing up at a tournament several stones stronger than their rating. Down here at the 8K level that is a big deal. It would be bad if online further eroded tournament attendance, so I'd favor something like a required ratio of rated online games to tournament games. Also, need to do something to assure that online games are played against a variety of players. The EGF has a feature in their rating system that assigns weights to different levels of tournaments with full weight being given to serious tournamnets with a decent time limit, less weight to faster time limits, and less weight to club matches. We could do something similar with online games, rating them but not giving them as much weight as tournament games.
      • pwaldron:: The biggest question is whether games played online are statistically the same as games played face-to-face. Based on a limited sample size it looks like upsets happen online far more than they should, even in 'serious' tournaments.
      • Gurujeet KhalsaIt's an interesting question. I don't play on kgs much and sometimes when I have, I've played less seriously than in a tournament (e.g. trying a new attack just to see what happens, or just not concentrating as much). For whatever reason, it is my AGA rating that I take seriously. However, if I were playing an AGA rated game online, with a tournament time-limit of at least 45 min, and knew the other player by at least name and AGA ID as opposed to a semi-anonymous handle, I believe I would treat it just as seriously as a tournament game. I don't know if that would be generally true of others. A weakness of ratings systems is that they do best with about 5-10 games per update period (at least that's the estimates I've read). I live in the DC area which is one of the more active ones for tournaments and it is difficult to get in that many rated games. If online games were treated as seriously as tournament games then it could substantially increase the volume of rated games and lead to better ratings overall. Even if there is more 'noise' (upsets) in online results statistically it might still be better to have the higher volume.
      • pwaldron:It's a question that we'll be collecting data for. There are a few 'serious' online AGA tournaments to pull results from and analyse. It may be that online games are not suitable for rating, they may not be or they could be partitioned to keep two parallel rating systems. Regardless, the first issue is collection of online tournament results where player identities are known and the tournament is 'seriousness.' There's a policy and philosophical question too: if a player can get rated games sitting in their living room in front of a computer, does it adversely affect attendance at face-to-face events?
        • Andy Okun: The policy question is the one that motivates me about this. Face-to-face events should be the primary concern. Is the different, cool, serious thing we offer. But in the long run does it affect attendance at face-to-face events if the majority of new players get used to the idea of being "rated" -- and seriously so -- via online servers without ever needing to play face-to-face? I believe it has, though I have no data to back that up. I think standard we have should be "serious tournament games, mostly face-to-face" but -- unless the data indicate that online games simply aren't appropriate -- we should be pitching AGA ratings as a natural next step or upgrade from the KGS or CyberOro or other online experience
        • Gurujeet Khalsa:I think we are all agreed that it would be a bad idea if it further eroded tournament play. That's why I'd favor it with a restriction like having a limit ratio of online to face-to-face games. That might actually encourage some players to get out and play more in tournaments. The quality of the ratings depends not just on the seriousness of the games played, but also on having enough volume to be reasonably predictive about a rating. It would be interested to see a distribution of games played per year by player. I suspect that there will be a large tail of players with few games whose ratings are rather dubious.
  5. Ranking computer programs
    • Easy enough to include them in the ratings system
    • Make sure computer ratings are based on current ratings, but human ratings cannot be affected by computers
    • Should rating pass through to different versions/hardware?
  6. Evaluating alternate rating algorithms?
    • Improving current system
      • Gurujeet Khalsa: I'd like to see both transparency on how sigmas are calculated and possibly a better means of doing so. There is a popular belief that the current system can be gamed somewhat by self-promoting to get a larger sigma. This leads to undesirable behavior and somewhat arbitrary rules like no self-promotions of a single rank.
      • pwaldron: The sigma calculation is fairly straightforward. They're the main diagonal elements of the covariance matrix. Once the system is locked I'll be publishing the documentation of the system; the code itself is GPL'd. The no self-promotion of a single rank is a consequence of people being overoptimistic about their rating. The US Open clamped down some time ago after it was discovered that half the players in the 1 dan band of the US Open were actually rated as 1 kyu, and none of those self-promoted players managed to post a positive score. As far as I know there is no official policy--local TDs have simply followed the lead of the US Open. The self-promotion rules have never been documented, but here are the ones that I have implemented:
        • Playing at a rating difference of less than 1.0 of last published rating: player is seeded according to their last published rating
        • Playing at a rating difference between 1.0-3.0 of last published rating combined with scoring at least one win at new rank: sigma/seed altered to provide for quicker rating response. Note that this also increases the consequences for losing after a self-promotion.
        • Playing at a rating difference greater than 3.0 from last published rating combined with scoring at least one win at new rank: reseed.
    • Glicko
      • Patent free
      • Any meaningful improvement to current system? How to integrate handicap/komi?
      • Gurujeet Khalsa: If we have successfully reimplemented current system then not sure what advantage Glicko would offer as it is conceptually quite similar but less complex to implement. Still it would not be difficult to test it and compare results.
    • Whole History Methods
      • Some interest expressed here by outside parties
    • EGF Rating System
    • Trueskill
      • Patented algorithm, but still potentially an interesting collaboration with Microsoft
      • Gurujeet Khalsa Thore Graepel developer of TrueSkill at Microsoft Research has expressed an interest in collaborating with the AGA. He is a dan level go player as well. TrueSkill takes a Glicko-like system and extends it to team play so that the relative contributions of team members can be separately evaluated. It was used in Age of Empires for handicaps with the handicap being treated as a virtual team member. He's interested in applying that to handicaps in go and also interested in the problem of mapping ratings to kyu/dan ranks. Allan has been requested to get official AGA representation with Thore to see if it would be of interest to AGA.
  7. Version/source control for programming efforts
    • Jonathan Bresler: all of the AGAGD code and scripts are under RCS source control on a server at my house. A CVS or Subversion repository would be much better, accessible to all members of the project.
    • Steve C: Have asked Phil Waldron to investigate Git as alternative solution to SVN or CSV. We will install one on usgo.org when a suitable repository is found.
    • pwaldron: I don't think git is suitable. The main issue is that we need a version control system that provides source code access when my computer is offline (usually the case). If my computer has the active development branch then people won't be able to download or sync their working copies. svn on usgo.org would be my choice.
    • Steve Colburn: I will install SVN soon to the aga server.
    • Steve C: SVN installed on 4/18/2010. Please contact me if you need access to the svn repo.
  8. Need some reliable method of backups
    • Gurujeet Khalsa: What is needed? Software? Methodology? Storage? I can set up a private ftp directory if that would help. Chris Garlock and I exchange e-journal files that way. Not sure what programming tools are in use but I can also get Visual Studio inexpensively and donate it if that would be of use.
    • Steve Colburn: I will consult with Chuck R on his backup policy for the USGO.org server. I believe it is replicated between two datacenters currently. Can something be implemented within the ratings system to backup current code(.tar/.tar.gz) and provide MD5. Setup rsync to offsite server.
    • Jonathan Bresler: We have a shell script that creates backups and names the backup file using the date and time so that each one is uniquem for example 2010-03-08-19:51:33.sql.gz. If we add a line to the crontab to fire off the script periodically, then we could use rsync to copy the backup files to a offsite server.
    • Steve C: Chuck replicates all aga data between two servers. I would feel better if there was a backup of important ratings data(ratings and AGAGD) to an offsite server where physical media was kept for offsite backup. I wouldn't mind taking over this role.
    • Steve C: Can we create a script to run a backup every two weeks and email me when completed? I will add it to the cron list and burn a copy for off-site storage.
  9. Release of rating code
    • Code packaged; licensed under GPL v3 or later
    • Still need to create test tournament data to provide complete system for people
Personal tools