up to Reid’s Research

Reid Priedhorsky – Sample Reviews

These are peer reviews that I have written, with identifying details removed to maintain the confidentiality of the submission process.

Excellent paper
Good paper
Mediocre paper
Bad paper

Excellent paper

Paper  [xxx] - [xxx]
Reviewer 2 - Reid Priedhorsky

Overall rating:  5  (scale is 1..5; 5 is best)

Overall Rating

   5  (Definite accept: I would argue strongly for accepting this paper.)

Expertise

   Knowledgeable

The Review

   *** Significance of contribution --

   Authors identify the factors that are associated with [xxx] in Wikipedia.
   This is significant because it will help develop a better understanding of
   the [xxx] of large CALV communities.

   *** Relevant previous work --

   Looks good. No self-references, kudos.

   [xxx - citation] may also be relevant.

   *** Validity of work presented --

   Analysis seems sound.

   *** Originality of work --

   Good. Followup work should include a careful review of previous Wikipedia
   [xxx] work, and comparisons with [xxx] in other communities.

   *** Other comments and suggestions --

   I would be interested in a discussion of the consequence of an [xxx]Bot.
   What would be the benefits? These should be more clearly articulated and
   justified. What would be the downsides? Would it lead to laziness on the
   part of evaluators, who simply trust [xxx]Bot rather than making their own
   evaluations? Would it lead to gaming the system (e.g., members of the
   American Congress routinely vote with or against their party on essentially
   meaningless procedural votes in order to raise or lower their "party unity"
   statistics, depending on political expediencies).

   [xxx] on column [xxx] of page [xxx] is a confusing forward reference
   for those not familiar with Wikipedia [xxx], and the clarification in
   the first paragraph of the next column is unclearly written. Suggest
   "[xxx]"

   I'm not sure "[xxx]" is the right [xxx] reference to put in the title,
   because [xxx] is continually growing, so [xxx] seems like a bad metaphor
   (it implies a [xxx]). However, I really like the [xxx] reference in the
   title -- perhaps you can think up a better one. Also consider [xxx] from
   the movie [xxx].

   Suggest use of "[xxx]" as shorthand for "[xxx]".

   Strange case of acronym "[xxx]" is consistent with use within Wikipedia,
   but I find it jarring and hard to read. Suggest that authors carefully
   consider whether the readability benefit of "[xxx]" is worth the
   inconsistency with Wikipedia. (More bluntly: I think Wikipedia made a
   stupid decision. Follow cautiously.)

   Are there simple [xxx] techniques that could be included to make your
   predictor more effective?

   "Any [xxx]" of [xxx] is too strong. How do you know you've captured _all_
   [xxx]? Suggest "common [xxx]" or similar language.

   I would like more discussion on how the written Wikipedia guidelines
   differ from [xxx].

Good paper

Paper [xxx] - [xxx]
Reviewer 1 - Reid Priedhorsky

Overall rating:  4  (scale is 1..5; 5 is best)

Overall Rating

   4  (Probably accept: I would argue for accepting this paper.)

Contribution

   4 (Very Good: A solid new contribution that appeals to a substantial
   segment of the [xxx] community.)

Expertise

   Expert

Additional Information about Expertise.  (hidden from author)

   I led the team which published Priedhorsky et. al, "Creating, Destroying,
   and Restoring Value in Wikipedia", GROUP 2007.

   I do not have the expertise to evaluate the statistical methods employed
   by the authors.

Summary

   This submission identifies inter-editor [xxx] factors that increase article
   quality. Specifically, adding more editors to an article increased the
   article's quality only if they [xxx], and [xxx] was more effective for
   young articles. The authors distinguish between [xxx] and [xxx]. [xxx] was
   helpful in many-editor articles, but [xxx] was not.

The Review

   Summary:

   * This is a solid, well-executed study expressed in a somewhat shaky way.
   I believe it should be accepted, but the authors need to make several
   revisions.

   Strengths:

   * This work discovers what kinds of [xxx] work and what kinds do not, which
   can help wiki practitioners structure their communities more effectively,
   reducing the "magical thinking" that still often surrounds wiki success
   among practitioners.

   * Authors point out that the simple [xxx] usually employed in [xxx]
   approaches is not how wikis work (as one might naively expect); they
   aggregate in more complex ways. This may seem obvious but is worth pointing
   out.

   * The basic structure of the experiment seems sound (with the exception
   of the statistics, which I'm not really qualified to comment on).

   * That [xxx] correlate well to [xxx] is a small but interesting and useful
   result. This should have exposure beyond being buried on Page [xxx].

   Weaknesses:

   * Layout of article is sloppy. For example, the column tops on Page [xxx]
   do not line up, there is a large blank space on Page [xxx], and figures are
   on wildly different pages from their first references (and sometimes these
   first references are out of order).

   * First sentence of Introduction ("[xxx]") nearly duplicates Wikipedia's
   [xxx]. Authors need to either quote Wikipedia directly or come up with
   something distinct.

   * Reg. [xxx] paragraph on [xxx]:
     1. Authors use number of edits on discussion page as a proxy for
   quantity of [xxx]. This proxy choice needs to be
   explained and justified. It may be the best available but I'm a little
   skeptical it's a good one.
     2. What about automated edits (by "bots")? These happen on discussion
   pages too. Authors needs to explain and justify how these edits were
   considered (even if the answer is "not considered", since that requires
   even more justification).
     3. (Incidentally, authors may wish to call them "talk pages" rather
   than [xxx] to match Wikipedia usage.)

   * Reg. [xxx] paragraph on Page [xxx] ("[xxx]"): Authors need to quantify
   the distribution of editors on edits. I would expect that most articles
   would have only a handful of active editors (i.e., closer to [xxx]).

   * Authors consider only _number_ of edits, not size or persistence of
   edits.

   * How were page views measured? This needs to be explained.

   * Authors need to better summarize the gini coefficient and give a cite.
   I still have no idea what it is.

   * Clarify how the [xxx] samples relate to the numbers given in
   Table [xxx].

   * I do not think screenshots are the best way to present article structure
   (Figs [xxx]). I suggest the authors rebuild these figures, working from the
   screenshots. In particular, it would be useful to indicate somehow what
   exactly has changed between the [xxx] parts of Fig [xxx].

   * Figure [xxx] is kind of strange. I can understand the appeal of wanting
   to show how many edits were happening, but [xxx] is almost totally
   obscured. This graph needs to be revised for clarity.

   * Table [xxx] is hard to read. Suggestions:
     1. Left-align the items in the vertical key column.
     2. Exchange the vertical key column with the three descriptive stats
   columns. As-is, it's very hard to tell which row label a number in the
   matrix corresponds to since one has to follow the row left through three
   extra unrelated numbers.
     3. Highlight "interesting" numbers in the matrix. What am I looking for
   here?

   * I had a hard time following Table [xxx]. Perhaps the items called out in
   the text should be highlighted somehow.

   * Figures [xxx] (which are out of order, BTW) need units on the Y
   axis. Isn't this a predicted change in assessment levels?

   * Figures [xxx]: Log_2 on the X-axis is fine, but don't make the reader
   compute exponents. Label using the plain numbers, not their logs (e.g. 32,
   not 5); labeling in base-10-round numbers may be better as well.

   * All figures except for [xxx] were very fuzzy in my printout. Were these
   (vector) figures converted to bitmaps before inclusion in the article?
   Don't do that.

   * A good citation regarding Wikipedia vandalism is Priedhorsky et. al,
   GROUP 2007. This is more up to date and comprehensive than Viegas.

Mediocre paper

Paper  [xxx] - [xxx]
Reviewer 1 - Reid Priedhorsky

Overall rating:  3  (scale is 1..5; 5 is best)

Overall Rating

   3  (Borderline: could go either way.)

Expertise

   Knowledgeable

Summary and contribution (entered before [xxx], and uneditable thereafter)

   This paper contains a brilliant but perhaps obvious idea ([xxx]) contained
   within an interesting idea ([xxx]) which seems promising, but I am not
   convinced that it is better than alternatives.

   This work was validated using a user study. The [xxx] technology is
   implemented, while [xxx] is not.

The Review (entered before [xxx], and uneditable thereafter)

   Strengths of the paper:

   * This paper covers a very interesting and important problem. This
   reviewer's web [xxx] will be sadder for the time being because he now
   realizes that he really wants elegant, consistent [xxx] (please feel free
   to take this term if you like it) of [xxx] of the type proposed in this
   paper; i.e., he didn't know what he was missing. I hope strongly that the
   authors will continue their work in this area.

   * Good use of figures.

   * I really like the method of having a human designer build [xxx]
   and then using these to guide the design of automated methods.

   Questions:

   * How was the [xxx] chosen? Why was there just one?

   * Were the experiments conducted on a fast network connection? Did this
   affect subjects' processing times (i.e., in real-world web browsing,
   loading a page can take a while -- was this taken into account)?

   Weaknesses:

   * I feel that the experiment did not explore the design space well. I
   particular, there seem to be any number of dimensions in [xxx]
   design, some possibilities being:

     [xxx - bullet list removed]

   Obviously, no experiment could explore all of the possibilities. But the
   paper does not give any sense that a design space with many dimensions
   exists nor how the dimensions explored in the experiment were chosen.

   * In particular, the [xxx] within the [xxx] seemed excessively brief and
   not very useful. In all three of the samples in Figure [xxx], [xxx] text
   was redundant with [xxx] text, and in two this resulted in (apparently)
   more-useful text being pushed out of the frame.

   * I think that in measuring screen real estate, shape needs to be
   considered in addition to area. For example, while [xxx] = [xxx],
   these rectangles are very differently shaped and affect screen layout
   differently. At least, the choice to consider only area needs to be
   discussed and justified.

   * One thing that [xxx] (particularly the [xxx] triples found in [xxx])
   afford is a deeper second reading (i.e., one can then [xxx] and then
   [xxx]). [xxx] don't afford this. I wonder if this accounts for users [xxx]
   in the [xxx] condition. This should be discussed.

   * I worry that automatic extraction of [xxx] is not as reliable as the
   authors claim. The works cited in support of this claim ([xxx]) cover
   [xxx], which is a related but different problem than [xxx]. The results may
   still be comparable but this needs to be clearly stated and justified in
   the related work section. As is I'm not convinced they are.

   * Accuracy of [xxx] should be compared to accuracy of [xxx].

   * What are the "fundamental differences" between this work and [xxx] noted
   on page [xxx]? You need at least a sentence or two here; don't leave this
   until later.

   * The paper needs to be more closely proofread. In particular the citation
   [xxx] on page [xxx] should be [xxx]. And why are odd pages numbered but
   even ones not? The last line of Figure [xxx]'s caption is partly missing.

   * Figure [xxx] is illegible in my black-n-white printout (done using xpdf).
   It shows up with a black background, which makes the black text hard to
   read. :)

   * The writing seemed a little loose. I think you could tighten it to [xxx]
   pages without any substantive cuts, and this would make it a stronger
   paper.

   * When giving pixel dimensions, use a real multiplication symbol rather
   than the letter x.

   * Careful with the use of the word "significant". I suggest it be used
   _only_ when statistical significance is meant. In particular the use on the
   [xxx] paragraph of page [xxx] is unwarranted.

Additional review comments entered after [xxx] (the start of rebuttal period)

   After reading the rebuttal, I am leaving my assessment at 3. I believe my
   review remains fundamentally sound.

   In particular, the weaknesses of the paper that I thought were most
   problematic, namely:

     - [xxx] design space was not explored or even enumerated
     - nonutility/redundancy of [xxx] not addressed
     - shape vs. area not addressed

   were not addressed in the rebuttal.

Bad paper

Paper  [xxx] - [xxx]
Reviewer 1 - Reid Priedhorsky

Overall rating:  2  (scale is 1..5; 5 is best)

Contribution Type Specific Rating

   2.0 - Disagree

Overall Rating

   2.0 - Possibly Reject: The submission is weak and probably shouldn't be
   accepted, but there is some chance it should get in.

Expertise

   Passing Knowledge

Contribution to [xxx]

   This paper presents an analysis of the social structure of a group of
   players (a "[xxx]") in a massively multiplayer online role-playing game,
   specifically [xxx].

The Review

   SUMMARY

   This paper explores a new and very interesting social network -- that of
   the MMORPG [xxx]. However, the methodology is weak, and as such I believe
   publication at [xxx] to be premature.

   I strongly encourage the authors to continue this research. It's an
   interesting area, and there's good work here; it's just not ready.

   STRENGTHS

   - Interesting community to study. It seems qualitatively different from
   both offline communities and other (non-game) online communities.

   - Addressing this community through the lens of social networks is an
   interesting and valuable approach, because it allows comparison with
   other social networks.

   WEAKNESSES

   - The major flaw is that the study is too small. The authors study only
   only one [xxx]; that is just not enough. 

   - No comparisons drawn with other online social networks (e.g. Facebook,
   Wikipedia); there is robust literature in this area and the authors need
   to show how their work compares.

   - How were the [xxx] hours of observation done? How were they distributed?
   How many different people did them?

   - What are the details about what [xxx] members knew about the
   experiment?

   - It is unclear how the [xxx] coder contributed to the interaction coding
   process.

   - It is unclear how the directionality of links is computed. At first
   glance, it seems that any interaction from A to B builds a link from A to
   B, implying that a conversation would build links from both A to B and B
   to A. However, this seems inconsistent with the results since I would
   expect it to lead to most or nearly all links being symmetric. Is a link
   built only if conversation is _initiated_?

   - The paper ends abruptly in the middle of a sentence.

Areas for Improvement

   This box focuses on presentation issues; substantive weaknesses are
   explained above.

   Overall, this paper needs to have its writing polished and some other
   presentation issues corrected before it is up to the level generally
   found in published [xxx] work. These criticisms are orthogonal to the
   substantive criticisms above and the overall score I gave. I will give
   some examples of potential improvements, but this list is not exhaustive.

   - Spacing and capitalization in the references is often incorrect.

   - The word "very" is used a few places when it is not needed.

   - References to 1st and 2nd studies appear before the two studies are
   explained.

   - Too much precision is consistently reported. I'd by 3 significant
   figures in the reported numbers, but definitely not 5.

   - "[xxx]" is inconsistently capitalized.

   - Some apostrophes in the chat examples are backwards.

   - Give the players aliases which are less generic than [xxx];
   [xxx], etc.; this will make the chats easier to read.

   - Too many bracket in chat transcripts; e.g., grammar corrections and
   inserting missing pronouns are not necessary unless the text is truly
   opaque.

up to Reid’s Research