Gephi
Open-source network analysis and visualization software
URL
https://gephi.org (0.10.1 as of May 2025)
Description
Gephi is a free, open-source tool for network visualization and analysis, widely used to explore and represent relationships in large datasets, such as social networks, links between documents, or web structures. Gephi allows users to create customizable network graphs, analyze metrics (like centrality and clustering), and identify patterns within complex datasets. The tool supports importing various data formats (CSV, GEXF) and offers plugins for advanced functionality. It can be used in journalism and open source research to visually analyze and reveal hidden connections in data, such as by examining online misinformation networks.
Gephi has the ability to create detailed, interactive, and visually compelling network graphs. This visual appeal helps to highlight complex relationships within data, making it easier for journalists to uncover hidden links between entities like individuals, companies, or groups. This capability can be particularly valuable for investigative stories where clear visual representation of connections can be crucial for audience understanding.

Core Social Network Analysis Metrics
Gephi includes built-in support for computing key metrics that help identify important nodes in a network. Three core metrics commonly used are degree centrality, betweenness centrality, and closeness centrality:
Degree Centrality measures how many direct connections (edges) a node has. A node with a high degree centrality has many links to others, making it well-connected. It’s essentially a count of immediate neighbors.
What it indicates: Nodes with higher degree centrality can be influencers or hubs that directly reach many others.
Example: In a Twitter network, a user with connections to many others (through follows or mentions) would have high degree centrality.
Betweenness Centrality measures how often a node lies on the shortest paths between other nodes. In other words, a node with high betweenness centrality is a critical broker or bridge in the network.
What it indicates: Such nodes connect different clusters or sections of the graph; they may not have the most connections, but they control information or resource flow by being on the paths that link others. A higher betweenness means a greater brokerage role. They act as gatekeepers or intermediaries.
Example: In a criminal network, a person who links two otherwise separate groups (even with only a few connections themselves) likely has high betweenness – remove that person and the network might fragment.
Closeness Centrality: Measures how “close” a node is to all others in the network, typically defined as the reciprocal of the total distance from that node to all other nodes. A node with high closeness centrality can reach all others quickly (in few hops on average).
What it indicates: This can identify nodes that are centrally positioned overall (not in a geographical sense, but in network topology). Such nodes could quickly disseminate information to the entire network.
Example: In a social network, someone at the “center” of the friend-of-friend graph (even if they aren’t connected to everyone directly) will have a high closeness score, meaning they are on average a short distance from anyone in the network.
After running the statistical analysis functions, results can be used to visually style the graph (e.g., sizing nodes by centrality values). In sum, Gephi visualizes networks and quantifies network structure with built-in measures of centrality (degree, betweenness, closeness, etc.), which can be helpful for investigative analysis.
Gephi in Investigative Journalism
Social network analysis has been used to investigate political influence through campaign contributions, social media manipulation (e.g., election interference via coordinated accounts), and even tracking of criminal or extremist networks. Gephi's network analysis features allow journalists to trace these relationships systematically. Noteworthy examples of the use of Gephi in high-profile cases include:
Panama Papers: The ICIJ’s Panama Papers investigation (2016) involved analyzing a massive trove of offshore financial records. Reporters used network analysis tools, including Gephi to visualize and explore the web of offshore entities and connections. By converting people and companies into “nodes” and their relationships (e.g., directorships or client links) into “edges,” Gephi helped journalists uncover hidden connections in the data. This article cited the case showing how graph visualization enabled the team to trace complex ownership networks and find key intermediaries in the offshore schemes. (Note: ICIJ also used graph databases like Neo4j and a web interface, but Gephi was used for certain analyses and producing visualization graphics.)
9/11 Terrorist Network Analysis: Shortly after the 2001 attacks, analyst Valdis Krebs mapped the connections between the hijackers and associates to show how they were interlinked. Krebs’s paper “Mapping Networks of Terrorist Cells” (2002) demonstrated that even though no single terrorist was connected to all others, there were focal points (connectors) in the network. This analysis pre-dated Gephi (Krebs used available SNA tools of the time), but it’s precisely the kind of investigation Gephi excels at today. Modern journalists and researchers have replicated such network mapping using Gephi to illustrate terrorist cell structures and identify key influencers. Brant Houston (Univ. of Illinois journalism professor) points to Krebs’s 9/11 network mapping as a tutorial example for anyone learning social network analysis. (Note: although Gephi itself wasn’t used in 2002, later analysts could easily perform similar analyses with Gephi’s tools.)
They Rule Project: They Rule (2004–2005) is an investigative data visualization project by artist Josh On, which mapped the interlocking directorates of major U.S. corporations. It provided an interactive web interface for exploring how corporate board members overlap between companies, revealing tight networks of corporate governance. While They Rule wasn’t built with Gephi (it was a custom web app), it’s been cited in the same breath as network journalism examples for its visualization of power networks. The project showed, for instance, that 87 of the top 100 US companies shared board directors, concentrating power within a small elite. An investigative journalist could use Gephi to achieve a similar analysis by importing board membership data and visualizing those connections. So while not a Gephi case per se, it’s a relevant example of network visualization in journalism.
Cost
Level of difficulty
Learning Curve
Due to its extensive features, Gephi has a moderate learning curve. Still, beginners can start with basic tutorials and sample datasets to understand the interface and critical functions like layouts, filters, and metrics. A good strategy is to focus on one feature at a time: experiment with layouts to arrange nodes, use filters to simplify complex networks, and apply basic metrics like centrality to interpret relationships. As they become comfortable, users can explore plugins and advanced features like time-based visualizations for more tailored analyses.
Gephi has an active user community that can provide help and share tips. The primary hub in recent years has been the Gephi Facebook Group, which serves as the main place to ask questions and get support. This Facebook group effectively replaced the older official forum. (The legacy Gephi Forum exists, but as of 2018–2019 it saw declining activity and new questions are directed to the Facebook forum.) Additionally, Gephi’s developers and power users monitor the GitHub issue tracker.
Requirements
No account is needed, but Java installation is required.
Limitations
Gephi can be run most modern computers, but computing requirements increase with graph size. It can be less intuitive for beginners, and certain advanced functions may require plugins or scripting knowledge.
Ethical Considerations
Using Gephi to visualize networks from sensitive or personal data requires ethical handling, particularly regarding privacy and consent, and careful interpretation to avoid misrepresenting the connections shown.
Data integrity is crucial for users of Gephi, as the accuracy and reliability of network visualizations depend directly on the quality of input data. For investigative journalism, any insights or patterns revealed through Gephi's analysis are only as trustworthy as the data provided. Poor data quality — such as incomplete records, unverified sources, or outdated information — can lead to misleading visualizations that misrepresent relationships or inflate the importance of specific network nodes. To ensure meaningful results, Gephi users must verify data sources, validate accuracy, and cross-check information before visualizing it. Maintaining high data integrity not only strengthens the credibility of the analysis but also allows for responsible storytelling, helping to prevent the spread of misinformation and ensuring that network insights are grounded in factual, well-vetted data.
Guides
Official Wiki
https://github.com/gephi/gephi/wiki
Complete Beginners
Levallois, C. (2017, January 20). Simple Gephi Project from A to Z. https://seinecle.github.io/gephi-tutorials/generated-html/simple-project-from-a-to-z-en.html
General / Advanced / Multi-Language
Levallois, C. (2024, November 27). Gephi Tutorials. https://seinecle.github.io/gephi-tutorials/
Grandjean, M. (2024). Gephi. Retrieved November 30, 2024, from https://www.martingrandjean.ch/gephi/ (Tutorials incl 30 Gephi examples)
Videos
Martin Grandjean. (2022, September 21). GEPHI - Introduction to Network Analysis and Visualization (Tutorial) [Video recording]. https://www.youtube.com/watch?v=GXtbL8avpik
Journalism-Specific
Global Investigative Journalism Network (Director). (2023, September 30). GIJC23—Using Social Network Analysis for Investigations [Video recording]. https://www.youtube.com/watch?v=-D8E8JY86b4
Books
Cherven, K. (2015). Mastering Gephi Network Visualization. Packt Pub Ltd.
Gephi Cookbook | Cloud & Networking | Print. (n.d.). Packt. Retrieved November 10, 2024, from https://www.packtpub.com/en-us/product/gephi-cookbook-9781783987405?type=print
Barabási, A.-L. (2016). Network Science. http://networksciencebook.com/ (this is EXCELLENT!)
Open Datasets
Datasets. GitHub. Retrieved November 30, 2024, from https://github.com/gephi/gephi/wiki/Datasets
ASNR - Animal Network Data. Retrieved November 30, 2024, from https://bansallab.github.io/asnr/data.html (ASNR aims to assemble and provide a comprehensive index of real-world animal interaction data sets across all taxa. Only high-value peer-reviewed data.)
Comparison with similar software
NodeXL: NodeXL is an add-in for Microsoft Excel that provides network analysis and visualization within a spreadsheet interface. It is Windows-only (as it hooks into Excel) and comes in a free “NodeXL Basic” version and a paid Pro version. It allows users to import edge lists into Excel and generates graphs from those tables. This approach makes it simple to edit data (you can leverage Excel formulas, etc. for node attributes). Brant Houston explained that NodeXL is integrated with Excel, making it very simple for beginners who are comfortable with spreadsheets. It’s suitable for quick, small to medium-sized network analysis; however, it may struggle with large graphs. Also, advanced visualization customizations and real-time manipulation are more limited than those of Gephi. NodeXL offers a more gentle learning curve and even has built-in data importers for social media (in the Pro version) but lacks the visual polish and plugin extensibility of Gephi. (One might use NodeXL to gather or preprocess data and then use Gephi to fine-tune the visualization, as some workflows suggest.)
Palladio is a web-based network visualization tool developed at Stanford’s Humanities + Design lab. It runs entirely in the browser – no installation required – and is geared towards historians and humanists for exploring complex historical datasets. Palladio is described as a “simple but powerful exploratory data visualization tool” that focuses on ease of use. You can upload spreadsheet data (nodes and links) and interactively create network views, maps, and timelines. It’s great for quickly visualizing a dataset and finding patterns without coding. However, Palladio has notable limitations: since it’s in-browser and meant for lightweight use, it can become slow or unstable with very large datasets. It also hasn't seen active development in a few years but still works in digital humanities classrooms for introducing network analysis before moving to more comprehensive tools. Compared to Gephi, Palladio is less feature-rich – it doesn’t compute advanced network metrics or offer extensive styling options.
PyVis: is a Python library for interactive network visualization. It allows you to generate network graphs in Python and output them as an HTML page (using the JavaScript library vis.js under the hood). Essentially, PyVis is a wrapper that brings the interactivity of vis.js to Python users, so you can script the creation of a network visualization and then view it in a web browser. PyVis is not a GUI tool – it requires writing Python code. It works well with Jupyter notebooks: you can create a Network object, add nodes/edges, and then display an interactive network within the notebook or export it to an HTML file. The result is a web-based visualization where you can pan, zoom, and even click on nodes for details. PyVis offers flexibility for developers (since you can automate tasks and integrate with data analysis pipelines in Python), but it’s less user-friendly for non-coders. It also depends on the browser for rendering, so extremely large networks may be hard to handle (just as any web-based viz would). Gephi might handle larger networks better performance-wise (using OpenGL), whereas PyVis/vis.js running in a browser could hit memory or speed limits for huge graphs. Also, PyVis itself doesn’t compute SNA metrics – you’d use Python libraries (like NetworkX) to do analysis, then use PyVis purely for visualization. PyVis is good for creating interactive visuals with a few lines of code. This makes it a complementary tool: Gephi for point-and-click exploration and PyVis for scripted, shareable interactive diagrams.
Neo4j (with Datashare Plugin): Neo4j is fundamentally different from the above – it’s a graph database rather than a dedicated visualization tool. It's optimized for storing and querying graph data (nodes and relationships) and managing very large, complex networks. It allows the user to run complex queries (using its query language Cypher) to find patterns, shortest paths, sub-networks, etc., in the data. In practice, one might use Neo4j to crunch the data (find communities, run graph algorithms, handle millions of records), then use a visualization front-end (like Gephi, or Neo4j’s own Bloom and Browser interfaces, or Linkurious) to visualize the result. Neo4j does come with basic visualization: the Neo4j Browser GUI can display query results as a node-link diagram, but these are not as customizable as Gephi’s visualizations. A key difference: Gephi works on static data you load into it (good for snapshot analysis and visual exploration), whereas Neo4j is a continuously running database that can be updated and queried in real-time (good for dynamic or very large datasets where you need to sift through data systematically). In short, Neo4j vs Gephi is not an either-or; they often complement each other. Gephi is for visual interactive analysis, Neo4j is for data storage and algorithmic analysis. Also of note: Neo4j is not purely open-source in all its editions (the Community edition is open-source, and enterprise features are commercial), whereas Gephi is fully open-source. For an investigator, choosing Neo4j would depend on needing to handle huge networks or integrate the graph with other systems; choosing Gephi would be about interactive exploration and presentation-quality visuals.
Tool provider
Gephi Consortium (open-source community, CTO : Mathieu Bastian)
Advertising Trackers
Martin Sona
Last updated
Was this helpful?