A NetLogo model of question-answering
What is it?
This is a question-answering model. It tries to simulate the scenario where a user poses a question to his social network. Will the friends and online social networks available be able to answer his question? In the model, this will depend on who the user has access to face-to-face, how knowledgeable they are about your topic, the size and diversity of online social networks, how difficult the topic is to answer, and how knowledgeable the user is about the topic. In practice, there are many other factors that would also matter, such as the user's relationship with people in the network and the last time they interacted, which are not modeled here.
How it works
When you hit setup, you get a random scenario. Each scenario has a user (circled) with either a "hard" or "easy" question. Easy questions are well-formed, fairly straight forward queries such as: What is the local DMV's phone number? Hard questions are ill-formed, open-ended queries that tend to require a good bit more reformulation and iteration to solve, such as: What are the latest regulations in enlisting army personnel with GED credentials?
A user has two types of peers in his world: friends who he can talk to face-to-face (shown in the black part of the world) and digital friends who exist in online social networks (shown in the yellow part of the world). Friends in these different networks are treated slightly differently in the model.
Real-life friends have an attribute of knowledge that is supposed to be related to their expertise on the user's topic. When they have high knowledge, the provide a correct answer to the user 85% of the time (when knowledge = 9) or 70% of the time (when knowledge = 8). By only providing an answer part of the time, the model controls for incorrect responses. This does no take into consideration friends' availability, interest in the topic, relationship with the user, etc.
Social networks have an attribute of diversity, but the number of social networks available to the user is also important. When 2 or more social networks exist, the diversity scores are converted into large (projected) social network sizes. When fewer than 2 social networks exist, diversity scores are converted into slightly smaller social network sizes. Ultimately, the number of people projected to be in these combined social networks will determine how many replies the user receives.
Replies from all individuals and networks are tallied, and the sum total is used to determine if the question actually gets answered. The two types of questions require different numbers of replies to be answered with confidence. Easy questions need two corroborating replies to be answered. (We assume that all replies are on-target.) Hard questions need between 3 and 5 replies depending on the user's expertise on the topic. The user has an attribute of knowledge too (not depicted visually in the model); when the user is knowledgeable about the topic of his question (knowledge >= 8), only 3 replies are needed to answer it. Otherwise, 5 replies are needed.
On each trial a different random scenario is created and the social networks are polled to provide an responses. For each question in a trial, the ultimate outcome is whether the question was, indeed, answered. This is plotted in the histogram: the green bar represents answered questions, the red bar unanswered questions. Since the point of the model is to see which network configurations are best suited for question-answering, many trials are run to see how different networks respond. The relevant variables are saved to text file which can be analyzed offline. (This text file is not saved when the model is embedded on a website).
How to use it
To view random different configurations of the model, hit setup. Setup only generates a new random social network layout.
To see if a question gets answered, hit the run-n-times button. The slider control box under the run button specifies how many trials you want to run. By default this is set to 100 to help reveal patterns across multiple trials (max = 250)
Things to notice
Hard and easy questions are posed with equal probability, approximately alternating between trials. Easy questions get answered 96% of the time. Hard questions also get answered quite frequently: 86% of the time.
There are 2 plots: on shows the number of total replies received per trial; the other shows whether the question was subsequently answered.
Real-life friends tend to supply the most answers in this model. By increasing the max number of friends, the number of answered questions should also go up.
Extending the model
This is just a model. Question-answering and social information seeking is much more complicated in practice. Therefore a natural extension of this model would be to include some additional factors: relationship between the user and his real-life friends; last time they interacted; friends' availability; etc. Right now, average number of replies from friends is 5.2. This seems unusually high, especially when compared to the average number of replies from social networks (2.2). I would expect that larger social networks would (more reliably) produce replies than individual face-to-face contacts alone. This may be an area to work on within the model.
Downloads and source files
View/download model file (as presented here): question-answering-noTextFile.nlogo
View/download model file, with an exportable results file: question-answering.nlogo
Credits and references
Brynn M. Evans, March 19, 2009