Noshir Contractor Presents at Midwest Private Equity Conference

Socnets 101: The Interconnection of People Through a Network
Recent advances in digital technologies invite consideration of organizing as a process that which is accomplished by flexible, adaptive, and ad hoc networks. A central challenge, spurred by these developments is that the nature of how we create, maintain and dissolve our knowledge networks has changed radically. Using examples from his research in a wide range of activities such as disaster response, Communities of Practice at Procter & Gamble, public health and massively multiplayer online games, Noshir Contractor will present a framework that can be used to help us leverage – Discover, Diagnose, and Design – our 21st century knowledge networks.

“The Midwest Private Equity Conference typically brings together over 150 middle market practitioners to facilitate deal flow. Along with being a forum for networking, the Conference includes updates on regulatory and legislative issues impacting middle market funds and panel discussions on how to make your funds function more effectively with advice on fundraising, management of funds, and deal trends…”

For more information: http://www.nasbic.org/page/MWPEC

Continue Reading

SONIC Fudan Collaboration

The Chinese Marketing Research Center at Fudan University and the Science of Networks in Communities (SONIC) Laboratory at Northwestern University have signed a strategic cooperation agreement to promote academic collaboration in research and education.  The two research centers will work with Shanda Game to expand the Virtual Worlds Exploratorium (VWE) research to many Chinese online games.  Professor Noshir Contractor also received the title of Honorary Professor from Fudan University on September 13th, 2010.

Continue Reading

Google, Bing & searching searches

While real news has been busy with important events, recent geek headlines have been dominated by a spectacularly public feud between search megalith Google and Microsoft’s relatively young competitor, Bing.

Of course, competitors are naturally suspicious of one another. Corporate sabotage is as old as corporations themselves. But, according to Google Fellow Amit Singhal, Google grew particularly suspicious of Bing in the summer of 2010.

Early in the summer, someone could Google for “torsorophy” and Google would suggest that the user search for “tarsorrhaphy” instead — the name of a rare eye surgery. Meanwhile, Bing remained incapable of making this correction, and would deliver its users results that matched the literal string “torsorophy.”

That changed later in the summer. Suddenly, a Bing search for “torsorophy” (the misspelled term) began returning Google’s first result for “tarsorrhaphy” (the correctly spelled term) without offering any spelling correction to the user.

From Singhal’s blog:

Google's search result for "torsoraphy," with a spelling suggestion and results associated with that suggestion.
Bing's search result for "torsoraphy," (which began appearing after Google's in late Summer 2010), including results for the correctly-spelled term without the associated spelling suggestion.

“Torsorophy” is a rare search term. Intuitively, it seems improbable that two independently-designed search algorithms could come up with the same answer for such an uncommon query. For Singhal, Bing’s change represented a chance that Bing was directly copying off of Google’s search results.

So Singhal decided to set up a sting operation (or, in his words, “an experiment”):

We created about 100 “synthetic queries”—queries that you would never expect a user to type, such as “hiybbprqag.” As a one-time experiment, for each synthetic query we inserted as Google’s top result a unique (real) webpage which had nothing to do with the query.

In this case, [hiybbprqag] returned a seating chart for the Wiltern Theater in Los Angeles. The term “juegosdeben1ogrande” returned a page for hip hop fashion accessories.

[T]here was absolutely no reason for any search engine to return that webpage for that synthetic query. You can think of the synthetic queries with inserted results as the search engine equivalent of marked bills in a bank. […] We asked these engineers to enter the synthetic queries into the search box on the Google home page, and click on the results, i.e., the results we inserted.

Within a couple weeks, Bing started matching Google’s planted results. Singhal concluded that Bing must be using some means to “send data to Bing on what people search for on Google and the Google search results they click.”

The VP of Bing, Harry Shum, quickly fired back a public response:

We use over 1,000 different signals and features in our ranking algorithm. A small piece of that is clickstream data we get from some of our customers, who opt-in to sharing anonymous data as they navigate the web in order to help us improve the experience for all users.

(For the record, my personal research indicates that Bing marks their “opt-in” feature by default. It would be more accurate for Shum to say that Bing learns from customers who fail to opt-out of Bing’s clickstream.)

In a recent “Future of Search” event, Shum clarified extemporaneously:

It’s not like we actually copy anything. It’s really about, we learn from the customers — who actually willingly opt-in to share their data with us. Just like Google does. Just like other search engines do. It’s where we actually learn from the customers, from what kind of queries they type — we have query logs — what kind of clicks they do. And let’s not forget that the reason search worked, the reason web worked, is really about collective intelligence.

The confusing aspect of this row is that Google nor Bing seem to be lying. Instead, Google is calling Bing’s practice cheating while Bing feels that seeing what its customers find on other search engines — and using that data to tailor its own results — is fair game.

Google's PageRank analysis of a small network of links. For most search terms, these networks are many orders of magnitude more complex than the one in this diagram.

So Google and Bing’s feud is a lot more complex than Bing’s copying search results from Google. First, Bing gets their information from users who “opt-in” to share the searches they make in Bing’s toolbar — a toolbar that can search numerous search engine, Google included.

Second, Bing didn’t recreate, hack or steal Google’s algorithm. That would be intellectual property theft. Instead, Bing treated Google’s algorithm the same way any normal user would: (that is, like a black box: some input goes in, some input comes out). Bing called upon its users to find this mysterious algorithm’s output, and then used the harvested Google output to inform Bing’s own decision.

But Google’s patented algorithm (and the many algorithms that support it, like Google’s spelling correction algorithm) is a really big deal in the search engine world. The PageRank algorithm (right) works by tracking enormous networks of links, then using these data to construct a new network: one of complex probabilities that try to answer the question, “Which page are you probably trying to find?”

The tempting analogy here — and the analogy Google would like us to use — is one where a student peeks over at his classmate’s paper during an exam when he doesn’t know an answer. When Bing is sure it has an answer, it may be less likely to look over at Google’s blue book. But when someone searches Bing for something uncommon, like “torsorophy” or “juegosdeben1ogrande,” Bing’s algorithm is capable of looking at Google’s answers and  allowing those .

The question is not whether or not Bing copied results from Google. Both sides assert that, in one way or another, Google’s results worked their way into Bing’s. The question is whether or not this flavor of copying is fair game, or if it’s unfairly piggybacking on Google’s hard work.

The cheating analogy expresses a clear opinion on who’s wrong and who’s right in this mess. It frames Bing as the dumb jock cheating off the smart kid’s test (and anyone who cares about this debate enough to read this far is likely to associate with the smart kid). But it doesn’t capture the full subtlety of what exactly has been going on between Google’s search results and Bing’s.

Consider Dogpile. Dogpile is a “meta-search.” It compiles results of several search engines (including Google and Bing), seeing where they agree and aggregating result unique to each engine. Essentially, it searches searches.

And Dogpile doesn’t try to hide their aggregate searching: if you Google for Dogpile, you’ll see:

So why has Bing gotten into trouble while Dogpile — the original meta-search engine — has avoided the negative press? Both Dogpile and Bing use Google’s output to inform their final output. And, at the end of the day, Dogpile “cheats” off of Google much more directly than does Bing (Dogpile queries Google directly instead of using its users as an intermediate).

In 2007, Dogpile published a study touting the benefits of searching searches:

Of course, unlike Dogpile, Bing didn’t credit Google as a source in compiling its search results. But let’s pretend that Bing decides to do what Dogpile does. Let’s pretend that, tomorrow, Bing will start crediting the search engines from which it collects data. Let’s say that Bing will continue to combine meta-search data with the numerous other factors it considers, but when it spits out the results, it includes a note about how it effectively meta-searches certain external engines. Would Google’s beef disappear? Would Bing, like Dogpile, be safe from criticism?

Consider this analogy: Bing, like the rest of us, is a Google user. And like the rest of us, Bing doesn’t actually care how Google arrived at its answers. It’s just curious what answers Google can give it. It uses Google’s output as one of many inputs into its own algorithm. Bing’s black box, like Google’s, uses some public tools (unprotected sites, databases and link depositories) and some private tools (the sum total of its many algorithms) to create search results for its user. The difference is that Bing uses one public tool in creating search results that Google doesn’t use: the results of other search engines.

And Google Search is a public tool, supported by sponsors in the form of advertisements. Anyone can Google a query and receive their results, free of charge. Unlike an exam, a Google search is available for everyone to cheat off of — including other search engines. So, why should one particular public tool be off-limits to the designers of search engines? If search engines can freely search public sites, indicies, and databases, why can’t search engines freely search searches?

Here’s a more appropriate schoolroom analogy: Google and Bing are two students on opposite sides of a classroom, each writing the answers to the same test on opposing chalkboards. While Google is busy tabulating its results in isolation, Bing doesn’t consider its answer complete until it’s turned around to see what Google got.

Some internet users (including this one) may sense sleaziness in Bing’s failure to credit Google for contributing to its end product. But certainly it’s Bing’s lack of citation, not their so-called cheating, for which the designers of the search tool are to blame.

After all, Bing’s search algorithm isn’t doing anything different from what your normal Google user does everyday: querying an opaque system and using that system’s output to inform decision-making. Should I be crediting Google every time Google’s algorithm is indirectly responsible for my pulling a profit? If so, I owe them a solid percentage of my wages — I found SONIC lab through a Google search.

Further reading:

Continue Reading

Discover Text Software Training: Unlock the Power of Text

Friday, February 4, 9-1Stuart Shulman1 a.m.

Location: Frances Searle Building, Room 1.459 (SONIC Conference Room)

Discover Text Software Training: Unlock the Power of Text: A PhD-holding Political Scientist, Stu knows the importance of easy to use, powerful, text analytic software. As founder of a technology start up (http://texifter.com) and the QDAP labs (http://www.umass.edu/qdap), Stu’s work advances text mining and natural language processing research. His software trainings link these worlds via straightforward and easy to understand explanations of software features that can be tailored for all experience levels and project types.

Dr. Stuart W. Shulman is founder & CEO of Texifter, LLC and an Assistant Professor of Political Science at the University of Massachusetts Amherst. He is the founding Director of the Qualitative Data Analysis Program (QDAP) at the University of Pittsburgh and at UMass Amherst, as well as Associate Director of the National Center for Digital Government.

Continue Reading

Stuart Shulman SONIC Speaker Series

Stuart ShulmanStuart Shulman presents  “Measuring Validity in Annotation” Friday, February 4, 2011.Tools for reviewing, coding, and retrieving text found in qualitative data analysis packages carry with them no particular attributes for ensuring the reliability or validity of the recorded observations. Based on more than 10 years of multidisciplinary experience doing qualitative research, this presentation guides researchers through aspects of coder validity and reliability.

Dr. Stuart W. Shulman is founder & CEO of Texifter, LLC and an Assistant Professor of Political Science at the University of Massachusetts Amherst. He is the founding Director of the Qualitative Data Analysis Program (QDAP) at the University of Pittsburgh and at UMass Amherst, as well as Associate Director of the National Center for Digital Government.

[line]

Measuring Validity in Annotation


[line]

Continue Reading

Local Model of Scientific Collaboration in Developing Nation

A region specific study, located in the Middle East, on the scientific collaboration of physicists, who are based in Iran, across multiple disciplines was recently done a few months ago. Analyzing the network’s basic properties, betweenness, diamater, etc., the local model is then compared to the global model of scientific collaboration to discover how a developing region such as the Middle East can contribute to global scientific process statistically. However, what becomes striking is how the local model unexpectedly differs from the global model. For example, despite the fact that the diameter of Iran’s physicists network is much smaller, resulting from a low diversity of information and disciplines, the physicists are reluctant to collaborate.

Continue Reading

Collaboration between SONIC and CFES

On Monday, January 24, 2011, SONIC Lab Director Professor Noshir Contractor and Dr. Jan van Dijk of the Center for eGovernment (CFES), signed a memorandum of understanding.

The aim of the collaboration is to combine the research expertise of CFES with SONIC Lab’s in advancing our understanding the networked government. Examples of networks with the context of government:

  • Networks of collaborating governmental agencies
  • Communication networks between citizens, businesses, and governments
  • Intermediary networks (roles of intermediaries in the stakeholder/government relationship)
  • Open data networks

Understanding the complexities of the networked government is difficult. Research in this domain is scarce. The shortage of research is magnified by the rise of social media (web 2.0). We lack theories that explain and anticipate the transformation and impact of the networked government at the individual, organizational, and societal levels. The cooperation between CFES and SONIC will seek to advance our knowledge in this field.

The mission of the SONIC/CFES collaboration is the following:

The SONlC/CFES collaboration combines social network theories, methods, and tools with knowledge from the e-Government domain to understand and meet the needs of the networked government.

The research groups will exchange knowledge, collaborate on funding of projects for (internationally comparative) research, and coauthor publications.

The website for CFES: http://www.utwente.nl/ibr/cfes/
The website for SONIC: http://sonic.northwestern.edu/

Continue Reading

ICA and Sunbelt Conference paper acceptances

We have had papers accepted at the International Communication Association (ICA) and the INSNA Sunbelt conference for research related to team assembly and collaboration on Wikipedia articles about breaking news topics.

Keegan, B., Gergle, D., Contractor, N. (2011). “A Multi-theoretical, Multi-level Model of High Tempo Collaboration in an Online Community.” INSNA Sunbelt XXXI, Tampa, Florida.

Keegan, B. (2011). “Breaking News, Breaking Planes, and Breaking Hearts: Psycholinguistics and Sensemaking in Collaborative Accounts of Catastrophe.” International Communication Association, Boston, MA.

My ICA paper was nominated as a Best Student Paper for the Communication and Technology Division.

Continue Reading

Gold farming conference acceptances

The VWO gold farming team has had several papers accepted for presentation at upcoming conferences.

Keegan, B., Ahmad, M., Williams, D., Srivastava, J., Contractor, N. (2011). “Mapping Gold Farming Back to Offline Clandestine Organizations: Methodological, Theoretical, and Ethical Challenges.” Game Behind the Video Game, Rutgers University, New Brunswick, NJ.

Keegan, B., Ahmad, M., Williams, D., Srivastava, J., Contractor, N. (2011). “Title: Using ERGMs to Map Online Clandestine Behavior to Offline Criminal Activity.” International Network of Social Network Analysis, Sunbelt XXXI, Tampa, FL.

See this and other VWO-related information at the project website.

Continue Reading