Tags: albert model, erds, eters, existing network, gin, google, mcsweeney, network generation, plugin components, project proposal, random network, random networks, randomization, regularity lemma, section 3, syr edu, syracuse university, useful tools, watts, world networks,
Random Network Plugin
Google Summer of Code 2008 Project Proposal
Patrick J. McSweeney
pjmcswee@syr.edu
Patrick J. McSweeney
1 Introduction
My name is Patrick J. McSweeney and I am a third year PhD candidate in com-
puter science at Syracuse University. For my Google Summer of Code proposal
I have chosen to expand upon Idea #9 which outlines the construction of a plu-
gin to generate random networks. Random networks are useful tools in analyzing
real world networks. According to the Szemer´di regularity lemma, any property
e
that a random network has implies that almost all networks have said property.
Therefore, by comparing a real world network to random networks we can detect
meaningful properties. The rest of this article is organized as follows: section 2
outlines my extended proposal, section 3 discusses development issues and section
4 discusses my interests and strengths for this project.
2 Plugin Components
The plugin will have two main modes: random network generation and network
randomization. In random network generation mode the plugin will create a ran-
dom network from scratch using user-supplied parameters. In network random-
ization mode the plugin will alter an existing network based on user-supplied pa-
rameters.
2.1 Random Network Generation Mode
The plugin I envision will be capable of constructing random networks based on
three different models: the Erds-R´nyi model, the Watts-Strogatz model, and the
o e
Barab´si-Albert model. The user will be able to select which model to use and
a
based on that choice, will be prompted for other needed parameters.
ˇ Erds-R´nyi model: This random network generator takes two param-
o e
eters: (n, p). Where n is the number of nodes and p is the independent
probability of each edge. A random network constructed using this model
has on average p n(n-1) edges in an undirected network, or pn2 in a di-
2
rected network. The user will provide the (n, p) pair to generate the random
network.
ˇ Watts-Strogatz Model: The Watts-Stogatz model generates random net-
works with higher clustering coefficients. The model takes a parameter
which acts to interpolate between a regular lattice and a Erds-R´nyi model
o e
created network. The procedure randomizes the regular lattice with node-
degree K by changing edges with probability .
ˇ Barab´si-Albert model: This model uses preferential attachment to gen-
a
erate scale-free random network. Initially a core of a few nodes is needed.
The algorithm continuously adds a single node to the network, one at a time.
The probability of a new node u being connected to an existing node v in
a network which currently has L edges is: degree(v) . Thus we have a posi-
2|L|
tive feedback cycle, where nodes that have high degrees get larger degrees
and nodes with small degrees stay small. The user will need to provide the
number of nodes in the network, and possibly some information about the
initial network core.
2.2 Network Randomization
In this mode the user will open an existing network, and select a model for ran-
domizing that network out of several possible operations. Here I suggest some
ways in which we can randomize an existing network.
1. Keep degree for each node the same, but randomize its neighbors. We
can also allow parameter-ization of this by allowing the user to enter in
a probability P . When P = 1 every edge is randomized, when P = 0
nothing is changed.
2. Randomize the existing network, preserving the degree of connectivity in
the original network. This idea is captured by the paper Random Lifts of
Graphs by A. Amit et al. The paper defines how to lift a base graph to a
random graph. The paper also states the theorem: Let G be a connected
simple graph with minimal degree > 3. Then almost all -lifts of G are
-connected.
3. Shuffle edge weights on an existing network.
4. Assign random values to edge weights. The user will be able to specify a
range [a, b] for the random values.
2
2.3 Interface
Currently I am envisioning a simple interface. At a minimum features that the
interface should offer:
1. Importing a network for randomization
2. Exporting SIF file for randomly generated network or result of network ran-
domization
3. Task specific dialog boxes, which will get required parameters from the user
4. In some situations the users may want to re-evaluate a particular random
network. Along with algorithm specific parameters being set by the user
based on mode, users will also be able to supply a random seed used to start
the random number generator used to create the networks. This feature will
also be very useful for testing purposes.
3 Time Line & Development Issues
According to Google's website we have 7 weeks of prep-time and 13 weeks to
implement this project. I have broken down this time into five phases, described
below.
1. Phase I April 14 - May 5
The first several weeks I will spend my time getting familiar with the Cy-
toscape source code. I should comment that during this time I will be still
involved with my responsibilities on campus with my research and a course
that I am teaching. I will only be able to commit limited hours during this
time. I hope to get to know my mentors, and others involved with Cy-
toscape, in these first weeks along with begin delving into Cytoscape source
code.
Tasks:
ˇ Become familiar with mentors and resources
ˇ Step through plugin tutorials
2. Phase II May 5 - May 26
By May 5th I will be finished with Spring semester responsibilities and will
be able to commit more time to the project. I view this time as the most
crucial period of the project timeline. A solid understanding of Cytoscape
data structures is absolutely necessary for the successful completion of this
project.
3
Tasks:
ˇ Explore Cytoscape code
ˇ Explore existing plugin functionality
3. Phase III May 26 - June 16
These three weeks will be spent outlining the code at a high level. I view
this phase as necessary in order to produce quality code. Planning should
reduce the complexity of the code along with avoiding problems later on.
Tasks:
ˇ Construct class Diagrams
ˇ Construct software architecture
ˇ High level program structure design
4. Phase IV June 16 - July 28
This phase is where the majority of the coding work shall be done. Here I
will be doing the actual Java coding and debugging for the project.
Tasks:
ˇ Implement code outlined from Phase III
ˇ Fill in details from Phase III
ˇ Check program correctness (debugging)
ˇ Work on front end for user input
5. Phase V July 28 - August 18
By this time I plan on being finished with the main programming effort. In
these last three weeks I will be putting final touches on the project.
Tasks:
ˇ Conduct Testing
ˇ Write Documentation
ˇ Final touches
3.1 Mile Stones
To ensure timely completion of the project I have outlined several mile stone
deadlines to help move progress along. You will notice that there are relatively
fewer milestones in the first few weeks, as this is time I will be spending getting
to know Cytoscape code and understanding how plugins work.
4
1. 6-16: High level design with class structures and main algorithms outlined.
2. 6-30: Erds-R´nyi Model Completed.
o e
3. 7-7: Watts-Strogatz Model Completed.
4. 7-14: Barab´si-Albert Model Completed.
a
5. 7-21: Randomization of Existing networks completed.
6. 8-4: Beta Version completed.
7. 8-18: Final Version completed with documentation.
3.2 Hurdles
I predict that the most significant hurdle will be learning the existing code in the
initial phase of the project. For this reason I have devoted the first several weeks
to exploring the Cytoscape source code and plugins. Implementing the algorithms
described above will only be possible once I am able to understand the data struc-
tures and and algorithms that deal with network creation and population.
3.3 Developmental Methodologies
My general implementation methodology will be an incremental approach. The
three random network generators along with randomizing an existing network
provide a clean way to partition the development stages, as each of the four main
sections is fairly independent. The common base that the main components would
share will be the interface and some utility functions used across components.
While testing is listed as something that occurs towards the end, I mean this
only to refer to formal testing of the components. The components will also be
tested strenuously, but less formally, during development in phase IV.
4 My Qualifications
While my experience with open source projects has been limited to a usage role, I
have been involved with several class projects and internships which has exposed
me to a great deal of programming theory and methodology. My dissertation
research has me programming and running experiments almost every day.
Academics: As I have stated earlier, I am in my third year of PhD studies at
Syracuse University majoring in Computer Science. I have completed my course
work with a GPA of 3.97. I received my masters before entering the PhD pro-
gram in 2004 from Syracuse University. For more details my CV can be found at
http://web.ecs.syr.edu/pjmcswee/cv.pdf .
5
Languages: My preferred language is java for two reasons. First, I work on
several different machines I like the ability to move my code around as java is
platform independent (although java versions do sometimes become bothersome).
Secondly I also appreciate Java standardized data structures.
Work Experience: I have been a teaching assistant at Syracuse University for
the last four years. This semester I am instructing CIS 586 Design of Operat-
ing Systems which is an introductory course into the design of operating systems,
predominately comprised of graduate students. My experience both as a teaching
assistant and as an instructor have enforced the importance of organization and
people skills. I have interned for three different companies: Osmose Inc, Telephon-
ics and Lockheed Martin. (For more info see my CV). These experiences have
emphasized the necessity for clear requirements and disciplined coding.
Research: My own research is geared towards community detection within com-
plex networks. The last two years have been spent immersing myself in graph
theory and publications in the field of complex networks. Most recently my ad-
visors and I submitted a paper titled Discovering Community Structure Through
Self-Organizing Nodes which uses a swarm approach to find the community detec-
tion in undirected social networks to the World Congress on Social Simulation.
In the course of my research I have already implemented several algorithms
which generate random networks using Java. The models I have implemented
include the Erds-R´nyi model and the Barab´si-Albert model.
o e a
Motivation: Part of my research currently calls for an investigation into the un-
derlying nature of communities in complex networks and what kind of information
is carried with modularized networks. An in-depth investigation into random net-
works is the next logical step in our research. Furthermore, the majority of our
research has been targeted at complex social networks. I would view this project as
a way to gain a better understanding of how biologists are using complex networks
as compared to other domains.
6