These are peer reviews that I have written, with identifying details removed to maintain the confidentiality of the submission process.
Paper [xxx] - [xxx] Reviewer 2 - Reid Priedhorsky Overall rating: 5 (scale is 1..5; 5 is best) Overall Rating 5 (Definite accept: I would argue strongly for accepting this paper.) Expertise Knowledgeable The Review *** Significance of contribution -- Authors identify the factors that are associated with [xxx] in Wikipedia. This is significant because it will help develop a better understanding of the [xxx] of large CALV communities. *** Relevant previous work -- Looks good. No self-references, kudos. [xxx - citation] may also be relevant. *** Validity of work presented -- Analysis seems sound. *** Originality of work -- Good. Followup work should include a careful review of previous Wikipedia [xxx] work, and comparisons with [xxx] in other communities. *** Other comments and suggestions -- I would be interested in a discussion of the consequence of an [xxx]Bot. What would be the benefits? These should be more clearly articulated and justified. What would be the downsides? Would it lead to laziness on the part of evaluators, who simply trust [xxx]Bot rather than making their own evaluations? Would it lead to gaming the system (e.g., members of the American Congress routinely vote with or against their party on essentially meaningless procedural votes in order to raise or lower their "party unity" statistics, depending on political expediencies). [xxx] on column [xxx] of page [xxx] is a confusing forward reference for those not familiar with Wikipedia [xxx], and the clarification in the first paragraph of the next column is unclearly written. Suggest "[xxx]" I'm not sure "[xxx]" is the right [xxx] reference to put in the title, because [xxx] is continually growing, so [xxx] seems like a bad metaphor (it implies a [xxx]). However, I really like the [xxx] reference in the title -- perhaps you can think up a better one. Also consider [xxx] from the movie [xxx]. Suggest use of "[xxx]" as shorthand for "[xxx]". Strange case of acronym "[xxx]" is consistent with use within Wikipedia, but I find it jarring and hard to read. Suggest that authors carefully consider whether the readability benefit of "[xxx]" is worth the inconsistency with Wikipedia. (More bluntly: I think Wikipedia made a stupid decision. Follow cautiously.) Are there simple [xxx] techniques that could be included to make your predictor more effective? "Any [xxx]" of [xxx] is too strong. How do you know you've captured _all_ [xxx]? Suggest "common [xxx]" or similar language. I would like more discussion on how the written Wikipedia guidelines differ from [xxx].
Paper [xxx] - [xxx]
Reviewer 1 - Reid Priedhorsky
Overall rating: 4 (scale is 1..5; 5 is best)
Overall Rating
4 (Probably accept: I would argue for accepting this paper.)
Contribution
4 (Very Good: A solid new contribution that appeals to a substantial
segment of the [xxx] community.)
Expertise
Expert
Additional Information about Expertise. (hidden from author)
I led the team which published Priedhorsky et. al, "Creating, Destroying,
and Restoring Value in Wikipedia", GROUP 2007.
I do not have the expertise to evaluate the statistical methods employed
by the authors.
Summary
This submission identifies inter-editor [xxx] factors that increase article
quality. Specifically, adding more editors to an article increased the
article's quality only if they [xxx], and [xxx] was more effective for
young articles. The authors distinguish between [xxx] and [xxx]. [xxx] was
helpful in many-editor articles, but [xxx] was not.
The Review
Summary:
* This is a solid, well-executed study expressed in a somewhat shaky way.
I believe it should be accepted, but the authors need to make several
revisions.
Strengths:
* This work discovers what kinds of [xxx] work and what kinds do not, which
can help wiki practitioners structure their communities more effectively,
reducing the "magical thinking" that still often surrounds wiki success
among practitioners.
* Authors point out that the simple [xxx] usually employed in [xxx]
approaches is not how wikis work (as one might naively expect); they
aggregate in more complex ways. This may seem obvious but is worth pointing
out.
* The basic structure of the experiment seems sound (with the exception
of the statistics, which I'm not really qualified to comment on).
* That [xxx] correlate well to [xxx] is a small but interesting and useful
result. This should have exposure beyond being buried on Page [xxx].
Weaknesses:
* Layout of article is sloppy. For example, the column tops on Page [xxx]
do not line up, there is a large blank space on Page [xxx], and figures are
on wildly different pages from their first references (and sometimes these
first references are out of order).
* First sentence of Introduction ("[xxx]") nearly duplicates Wikipedia's
[xxx]. Authors need to either quote Wikipedia directly or come up with
something distinct.
* Reg. [xxx] paragraph on [xxx]:
1. Authors use number of edits on discussion page as a proxy for
quantity of [xxx]. This proxy choice needs to be
explained and justified. It may be the best available but I'm a little
skeptical it's a good one.
2. What about automated edits (by "bots")? These happen on discussion
pages too. Authors needs to explain and justify how these edits were
considered (even if the answer is "not considered", since that requires
even more justification).
3. (Incidentally, authors may wish to call them "talk pages" rather
than [xxx] to match Wikipedia usage.)
* Reg. [xxx] paragraph on Page [xxx] ("[xxx]"): Authors need to quantify
the distribution of editors on edits. I would expect that most articles
would have only a handful of active editors (i.e., closer to [xxx]).
* Authors consider only _number_ of edits, not size or persistence of
edits.
* How were page views measured? This needs to be explained.
* Authors need to better summarize the gini coefficient and give a cite.
I still have no idea what it is.
* Clarify how the [xxx] samples relate to the numbers given in
Table [xxx].
* I do not think screenshots are the best way to present article structure
(Figs [xxx]). I suggest the authors rebuild these figures, working from the
screenshots. In particular, it would be useful to indicate somehow what
exactly has changed between the [xxx] parts of Fig [xxx].
* Figure [xxx] is kind of strange. I can understand the appeal of wanting
to show how many edits were happening, but [xxx] is almost totally
obscured. This graph needs to be revised for clarity.
* Table [xxx] is hard to read. Suggestions:
1. Left-align the items in the vertical key column.
2. Exchange the vertical key column with the three descriptive stats
columns. As-is, it's very hard to tell which row label a number in the
matrix corresponds to since one has to follow the row left through three
extra unrelated numbers.
3. Highlight "interesting" numbers in the matrix. What am I looking for
here?
* I had a hard time following Table [xxx]. Perhaps the items called out in
the text should be highlighted somehow.
* Figures [xxx] (which are out of order, BTW) need units on the Y
axis. Isn't this a predicted change in assessment levels?
* Figures [xxx]: Log_2 on the X-axis is fine, but don't make the reader
compute exponents. Label using the plain numbers, not their logs (e.g. 32,
not 5); labeling in base-10-round numbers may be better as well.
* All figures except for [xxx] were very fuzzy in my printout. Were these
(vector) figures converted to bitmaps before inclusion in the article?
Don't do that.
* A good citation regarding Wikipedia vandalism is Priedhorsky et. al,
GROUP 2007. This is more up to date and comprehensive than Viegas.
Paper [xxx] - [xxx]
Reviewer 1 - Reid Priedhorsky
Overall rating: 3 (scale is 1..5; 5 is best)
Overall Rating
3 (Borderline: could go either way.)
Expertise
Knowledgeable
Summary and contribution (entered before [xxx], and uneditable thereafter)
This paper contains a brilliant but perhaps obvious idea ([xxx]) contained
within an interesting idea ([xxx]) which seems promising, but I am not
convinced that it is better than alternatives.
This work was validated using a user study. The [xxx] technology is
implemented, while [xxx] is not.
The Review (entered before [xxx], and uneditable thereafter)
Strengths of the paper:
* This paper covers a very interesting and important problem. This
reviewer's web [xxx] will be sadder for the time being because he now
realizes that he really wants elegant, consistent [xxx] (please feel free
to take this term if you like it) of [xxx] of the type proposed in this
paper; i.e., he didn't know what he was missing. I hope strongly that the
authors will continue their work in this area.
* Good use of figures.
* I really like the method of having a human designer build [xxx]
and then using these to guide the design of automated methods.
Questions:
* How was the [xxx] chosen? Why was there just one?
* Were the experiments conducted on a fast network connection? Did this
affect subjects' processing times (i.e., in real-world web browsing,
loading a page can take a while -- was this taken into account)?
Weaknesses:
* I feel that the experiment did not explore the design space well. I
particular, there seem to be any number of dimensions in [xxx]
design, some possibilities being:
[xxx - bullet list removed]
Obviously, no experiment could explore all of the possibilities. But the
paper does not give any sense that a design space with many dimensions
exists nor how the dimensions explored in the experiment were chosen.
* In particular, the [xxx] within the [xxx] seemed excessively brief and
not very useful. In all three of the samples in Figure [xxx], [xxx] text
was redundant with [xxx] text, and in two this resulted in (apparently)
more-useful text being pushed out of the frame.
* I think that in measuring screen real estate, shape needs to be
considered in addition to area. For example, while [xxx] = [xxx],
these rectangles are very differently shaped and affect screen layout
differently. At least, the choice to consider only area needs to be
discussed and justified.
* One thing that [xxx] (particularly the [xxx] triples found in [xxx])
afford is a deeper second reading (i.e., one can then [xxx] and then
[xxx]). [xxx] don't afford this. I wonder if this accounts for users [xxx]
in the [xxx] condition. This should be discussed.
* I worry that automatic extraction of [xxx] is not as reliable as the
authors claim. The works cited in support of this claim ([xxx]) cover
[xxx], which is a related but different problem than [xxx]. The results may
still be comparable but this needs to be clearly stated and justified in
the related work section. As is I'm not convinced they are.
* Accuracy of [xxx] should be compared to accuracy of [xxx].
* What are the "fundamental differences" between this work and [xxx] noted
on page [xxx]? You need at least a sentence or two here; don't leave this
until later.
* The paper needs to be more closely proofread. In particular the citation
[xxx] on page [xxx] should be [xxx]. And why are odd pages numbered but
even ones not? The last line of Figure [xxx]'s caption is partly missing.
* Figure [xxx] is illegible in my black-n-white printout (done using xpdf).
It shows up with a black background, which makes the black text hard to
read. :)
* The writing seemed a little loose. I think you could tighten it to [xxx]
pages without any substantive cuts, and this would make it a stronger
paper.
* When giving pixel dimensions, use a real multiplication symbol rather
than the letter x.
* Careful with the use of the word "significant". I suggest it be used
_only_ when statistical significance is meant. In particular the use on the
[xxx] paragraph of page [xxx] is unwarranted.
Additional review comments entered after [xxx] (the start of rebuttal period)
After reading the rebuttal, I am leaving my assessment at 3. I believe my
review remains fundamentally sound.
In particular, the weaknesses of the paper that I thought were most
problematic, namely:
- [xxx] design space was not explored or even enumerated
- nonutility/redundancy of [xxx] not addressed
- shape vs. area not addressed
were not addressed in the rebuttal.
Paper [xxx] - [xxx] Reviewer 1 - Reid Priedhorsky Overall rating: 2 (scale is 1..5; 5 is best) Contribution Type Specific Rating 2.0 - Disagree Overall Rating 2.0 - Possibly Reject: The submission is weak and probably shouldn't be accepted, but there is some chance it should get in. Expertise Passing Knowledge Contribution to [xxx] This paper presents an analysis of the social structure of a group of players (a "[xxx]") in a massively multiplayer online role-playing game, specifically [xxx]. The Review SUMMARY This paper explores a new and very interesting social network -- that of the MMORPG [xxx]. However, the methodology is weak, and as such I believe publication at [xxx] to be premature. I strongly encourage the authors to continue this research. It's an interesting area, and there's good work here; it's just not ready. STRENGTHS - Interesting community to study. It seems qualitatively different from both offline communities and other (non-game) online communities. - Addressing this community through the lens of social networks is an interesting and valuable approach, because it allows comparison with other social networks. WEAKNESSES - The major flaw is that the study is too small. The authors study only only one [xxx]; that is just not enough. - No comparisons drawn with other online social networks (e.g. Facebook, Wikipedia); there is robust literature in this area and the authors need to show how their work compares. - How were the [xxx] hours of observation done? How were they distributed? How many different people did them? - What are the details about what [xxx] members knew about the experiment? - It is unclear how the [xxx] coder contributed to the interaction coding process. - It is unclear how the directionality of links is computed. At first glance, it seems that any interaction from A to B builds a link from A to B, implying that a conversation would build links from both A to B and B to A. However, this seems inconsistent with the results since I would expect it to lead to most or nearly all links being symmetric. Is a link built only if conversation is _initiated_? - The paper ends abruptly in the middle of a sentence. Areas for Improvement This box focuses on presentation issues; substantive weaknesses are explained above. Overall, this paper needs to have its writing polished and some other presentation issues corrected before it is up to the level generally found in published [xxx] work. These criticisms are orthogonal to the substantive criticisms above and the overall score I gave. I will give some examples of potential improvements, but this list is not exhaustive. - Spacing and capitalization in the references is often incorrect. - The word "very" is used a few places when it is not needed. - References to 1st and 2nd studies appear before the two studies are explained. - Too much precision is consistently reported. I'd by 3 significant figures in the reported numbers, but definitely not 5. - "[xxx]" is inconsistently capitalized. - Some apostrophes in the chat examples are backwards. - Give the players aliases which are less generic than [xxx]; [xxx], etc.; this will make the chats easier to read. - Too many bracket in chat transcripts; e.g., grammar corrections and inserting missing pronouns are not necessary unless the text is truly opaque.