These are peer reviews that I have written, with identifying details removed to maintain the confidentiality of the submission process.
Paper [xxx] - [xxx] Reviewer 2 - Reid Priedhorsky Overall rating: 5 (scale is 1..5; 5 is best) Overall Rating 5 (Definite accept: I would argue strongly for accepting this paper.) Expertise Knowledgeable The Review *** Significance of contribution -- Authors identify the factors that are associated with [xxx] in Wikipedia. This is significant because it will help develop a better understanding of the [xxx] of large CALV communities. *** Relevant previous work -- Looks good. No self-references, kudos. [xxx - citation] may also be relevant. *** Validity of work presented -- Analysis seems sound. *** Originality of work -- Good. Followup work should include a careful review of previous Wikipedia [xxx] work, and comparisons with [xxx] in other communities. *** Other comments and suggestions -- I would be interested in a discussion of the consequence of an [xxx]Bot. What would be the benefits? These should be more clearly articulated and justified. What would be the downsides? Would it lead to laziness on the part of evaluators, who simply trust [xxx]Bot rather than making their own evaluations? Would it lead to gaming the system (e.g., members of the American Congress routinely vote with or against their party on essentially meaningless procedural votes in order to raise or lower their "party unity" statistics, depending on political expediencies). [xxx] on column [xxx] of page [xxx] is a confusing forward reference for those not familiar with Wikipedia [xxx], and the clarification in the first paragraph of the next column is unclearly written. Suggest "[xxx]" I'm not sure "[xxx]" is the right [xxx] reference to put in the title, because [xxx] is continually growing, so [xxx] seems like a bad metaphor (it implies a [xxx]). However, I really like the [xxx] reference in the title -- perhaps you can think up a better one. Also consider [xxx] from the movie [xxx]. Suggest use of "[xxx]" as shorthand for "[xxx]". Strange case of acronym "[xxx]" is consistent with use within Wikipedia, but I find it jarring and hard to read. Suggest that authors carefully consider whether the readability benefit of "[xxx]" is worth the inconsistency with Wikipedia. (More bluntly: I think Wikipedia made a stupid decision. Follow cautiously.) Are there simple [xxx] techniques that could be included to make your predictor more effective? "Any [xxx]" of [xxx] is too strong. How do you know you've captured _all_ [xxx]? Suggest "common [xxx]" or similar language. I would like more discussion on how the written Wikipedia guidelines differ from [xxx].
Paper [xxx] - [xxx] Reviewer 1 - Reid Priedhorsky Overall rating: 4 (scale is 1..5; 5 is best) Overall Rating 4 (Probably accept: I would argue for accepting this paper.) Contribution 4 (Very Good: A solid new contribution that appeals to a substantial segment of the [xxx] community.) Expertise Expert Additional Information about Expertise. (hidden from author) I led the team which published Priedhorsky et. al, "Creating, Destroying, and Restoring Value in Wikipedia", GROUP 2007. I do not have the expertise to evaluate the statistical methods employed by the authors. Summary This submission identifies inter-editor [xxx] factors that increase article quality. Specifically, adding more editors to an article increased the article's quality only if they [xxx], and [xxx] was more effective for young articles. The authors distinguish between [xxx] and [xxx]. [xxx] was helpful in many-editor articles, but [xxx] was not. The Review Summary: * This is a solid, well-executed study expressed in a somewhat shaky way. I believe it should be accepted, but the authors need to make several revisions. Strengths: * This work discovers what kinds of [xxx] work and what kinds do not, which can help wiki practitioners structure their communities more effectively, reducing the "magical thinking" that still often surrounds wiki success among practitioners. * Authors point out that the simple [xxx] usually employed in [xxx] approaches is not how wikis work (as one might naively expect); they aggregate in more complex ways. This may seem obvious but is worth pointing out. * The basic structure of the experiment seems sound (with the exception of the statistics, which I'm not really qualified to comment on). * That [xxx] correlate well to [xxx] is a small but interesting and useful result. This should have exposure beyond being buried on Page [xxx]. Weaknesses: * Layout of article is sloppy. For example, the column tops on Page [xxx] do not line up, there is a large blank space on Page [xxx], and figures are on wildly different pages from their first references (and sometimes these first references are out of order). * First sentence of Introduction ("[xxx]") nearly duplicates Wikipedia's [xxx]. Authors need to either quote Wikipedia directly or come up with something distinct. * Reg. [xxx] paragraph on [xxx]: 1. Authors use number of edits on discussion page as a proxy for quantity of [xxx]. This proxy choice needs to be explained and justified. It may be the best available but I'm a little skeptical it's a good one. 2. What about automated edits (by "bots")? These happen on discussion pages too. Authors needs to explain and justify how these edits were considered (even if the answer is "not considered", since that requires even more justification). 3. (Incidentally, authors may wish to call them "talk pages" rather than [xxx] to match Wikipedia usage.) * Reg. [xxx] paragraph on Page [xxx] ("[xxx]"): Authors need to quantify the distribution of editors on edits. I would expect that most articles would have only a handful of active editors (i.e., closer to [xxx]). * Authors consider only _number_ of edits, not size or persistence of edits. * How were page views measured? This needs to be explained. * Authors need to better summarize the gini coefficient and give a cite. I still have no idea what it is. * Clarify how the [xxx] samples relate to the numbers given in Table [xxx]. * I do not think screenshots are the best way to present article structure (Figs [xxx]). I suggest the authors rebuild these figures, working from the screenshots. In particular, it would be useful to indicate somehow what exactly has changed between the [xxx] parts of Fig [xxx]. * Figure [xxx] is kind of strange. I can understand the appeal of wanting to show how many edits were happening, but [xxx] is almost totally obscured. This graph needs to be revised for clarity. * Table [xxx] is hard to read. Suggestions: 1. Left-align the items in the vertical key column. 2. Exchange the vertical key column with the three descriptive stats columns. As-is, it's very hard to tell which row label a number in the matrix corresponds to since one has to follow the row left through three extra unrelated numbers. 3. Highlight "interesting" numbers in the matrix. What am I looking for here? * I had a hard time following Table [xxx]. Perhaps the items called out in the text should be highlighted somehow. * Figures [xxx] (which are out of order, BTW) need units on the Y axis. Isn't this a predicted change in assessment levels? * Figures [xxx]: Log_2 on the X-axis is fine, but don't make the reader compute exponents. Label using the plain numbers, not their logs (e.g. 32, not 5); labeling in base-10-round numbers may be better as well. * All figures except for [xxx] were very fuzzy in my printout. Were these (vector) figures converted to bitmaps before inclusion in the article? Don't do that. * A good citation regarding Wikipedia vandalism is Priedhorsky et. al, GROUP 2007. This is more up to date and comprehensive than Viegas.
Paper [xxx] - [xxx] Reviewer 1 - Reid Priedhorsky Overall rating: 3 (scale is 1..5; 5 is best) Overall Rating 3 (Borderline: could go either way.) Expertise Knowledgeable Summary and contribution (entered before [xxx], and uneditable thereafter) This paper contains a brilliant but perhaps obvious idea ([xxx]) contained within an interesting idea ([xxx]) which seems promising, but I am not convinced that it is better than alternatives. This work was validated using a user study. The [xxx] technology is implemented, while [xxx] is not. The Review (entered before [xxx], and uneditable thereafter) Strengths of the paper: * This paper covers a very interesting and important problem. This reviewer's web [xxx] will be sadder for the time being because he now realizes that he really wants elegant, consistent [xxx] (please feel free to take this term if you like it) of [xxx] of the type proposed in this paper; i.e., he didn't know what he was missing. I hope strongly that the authors will continue their work in this area. * Good use of figures. * I really like the method of having a human designer build [xxx] and then using these to guide the design of automated methods. Questions: * How was the [xxx] chosen? Why was there just one? * Were the experiments conducted on a fast network connection? Did this affect subjects' processing times (i.e., in real-world web browsing, loading a page can take a while -- was this taken into account)? Weaknesses: * I feel that the experiment did not explore the design space well. I particular, there seem to be any number of dimensions in [xxx] design, some possibilities being: [xxx - bullet list removed] Obviously, no experiment could explore all of the possibilities. But the paper does not give any sense that a design space with many dimensions exists nor how the dimensions explored in the experiment were chosen. * In particular, the [xxx] within the [xxx] seemed excessively brief and not very useful. In all three of the samples in Figure [xxx], [xxx] text was redundant with [xxx] text, and in two this resulted in (apparently) more-useful text being pushed out of the frame. * I think that in measuring screen real estate, shape needs to be considered in addition to area. For example, while [xxx] = [xxx], these rectangles are very differently shaped and affect screen layout differently. At least, the choice to consider only area needs to be discussed and justified. * One thing that [xxx] (particularly the [xxx] triples found in [xxx]) afford is a deeper second reading (i.e., one can then [xxx] and then [xxx]). [xxx] don't afford this. I wonder if this accounts for users [xxx] in the [xxx] condition. This should be discussed. * I worry that automatic extraction of [xxx] is not as reliable as the authors claim. The works cited in support of this claim ([xxx]) cover [xxx], which is a related but different problem than [xxx]. The results may still be comparable but this needs to be clearly stated and justified in the related work section. As is I'm not convinced they are. * Accuracy of [xxx] should be compared to accuracy of [xxx]. * What are the "fundamental differences" between this work and [xxx] noted on page [xxx]? You need at least a sentence or two here; don't leave this until later. * The paper needs to be more closely proofread. In particular the citation [xxx] on page [xxx] should be [xxx]. And why are odd pages numbered but even ones not? The last line of Figure [xxx]'s caption is partly missing. * Figure [xxx] is illegible in my black-n-white printout (done using xpdf). It shows up with a black background, which makes the black text hard to read. :) * The writing seemed a little loose. I think you could tighten it to [xxx] pages without any substantive cuts, and this would make it a stronger paper. * When giving pixel dimensions, use a real multiplication symbol rather than the letter x. * Careful with the use of the word "significant". I suggest it be used _only_ when statistical significance is meant. In particular the use on the [xxx] paragraph of page [xxx] is unwarranted. Additional review comments entered after [xxx] (the start of rebuttal period) After reading the rebuttal, I am leaving my assessment at 3. I believe my review remains fundamentally sound. In particular, the weaknesses of the paper that I thought were most problematic, namely: - [xxx] design space was not explored or even enumerated - nonutility/redundancy of [xxx] not addressed - shape vs. area not addressed were not addressed in the rebuttal.
Paper [xxx] - [xxx] Reviewer 1 - Reid Priedhorsky Overall rating: 2 (scale is 1..5; 5 is best) Contribution Type Specific Rating 2.0 - Disagree Overall Rating 2.0 - Possibly Reject: The submission is weak and probably shouldn't be accepted, but there is some chance it should get in. Expertise Passing Knowledge Contribution to [xxx] This paper presents an analysis of the social structure of a group of players (a "[xxx]") in a massively multiplayer online role-playing game, specifically [xxx]. The Review SUMMARY This paper explores a new and very interesting social network -- that of the MMORPG [xxx]. However, the methodology is weak, and as such I believe publication at [xxx] to be premature. I strongly encourage the authors to continue this research. It's an interesting area, and there's good work here; it's just not ready. STRENGTHS - Interesting community to study. It seems qualitatively different from both offline communities and other (non-game) online communities. - Addressing this community through the lens of social networks is an interesting and valuable approach, because it allows comparison with other social networks. WEAKNESSES - The major flaw is that the study is too small. The authors study only only one [xxx]; that is just not enough. - No comparisons drawn with other online social networks (e.g. Facebook, Wikipedia); there is robust literature in this area and the authors need to show how their work compares. - How were the [xxx] hours of observation done? How were they distributed? How many different people did them? - What are the details about what [xxx] members knew about the experiment? - It is unclear how the [xxx] coder contributed to the interaction coding process. - It is unclear how the directionality of links is computed. At first glance, it seems that any interaction from A to B builds a link from A to B, implying that a conversation would build links from both A to B and B to A. However, this seems inconsistent with the results since I would expect it to lead to most or nearly all links being symmetric. Is a link built only if conversation is _initiated_? - The paper ends abruptly in the middle of a sentence. Areas for Improvement This box focuses on presentation issues; substantive weaknesses are explained above. Overall, this paper needs to have its writing polished and some other presentation issues corrected before it is up to the level generally found in published [xxx] work. These criticisms are orthogonal to the substantive criticisms above and the overall score I gave. I will give some examples of potential improvements, but this list is not exhaustive. - Spacing and capitalization in the references is often incorrect. - The word "very" is used a few places when it is not needed. - References to 1st and 2nd studies appear before the two studies are explained. - Too much precision is consistently reported. I'd by 3 significant figures in the reported numbers, but definitely not 5. - "[xxx]" is inconsistently capitalized. - Some apostrophes in the chat examples are backwards. - Give the players aliases which are less generic than [xxx]; [xxx], etc.; this will make the chats easier to read. - Too many bracket in chat transcripts; e.g., grammar corrections and inserting missing pronouns are not necessary unless the text is truly opaque.