Gephi
Open-source network analysis and visualization software
Last updated
Was this helpful?
Open-source network analysis and visualization software
Last updated
Was this helpful?
(0.10.1 as of May 2025)
Gephi is a free, open-source tool for network visualization and analysis, widely used to explore and represent relationships in large datasets, such as social networks, links between documents, or web structures. Gephi allows users to create customizable network graphs, analyze metrics (like centrality and clustering), and identify patterns within complex datasets. The tool supports importing various data formats (CSV, GEXF) and offers plugins for advanced functionality. It can be used in journalism and open source research to visually analyze and reveal hidden connections in data, such as by examining online misinformation networks.
Gephi has the ability to create , interactive, and network graphs. This visual appeal helps to highlight complex relationships within data, making it easier for journalists to uncover hidden links between entities like individuals, companies, or groups. This capability can be particularly valuable for investigative stories where clear visual representation of connections can be crucial for audience understanding.
Gephi includes built-in support for computing key metrics that help identify important nodes in a network. Three core metrics commonly used are degree centrality, betweenness centrality, and closeness centrality:
What it indicates: Nodes with higher degree centrality can be influencers or hubs that directly reach many others.
Example: In a Twitter network, a user with connections to many others (through follows or mentions) would have high degree centrality.
What it indicates: Such nodes connect different clusters or sections of the graph; they may not have the most connections, but they control information or resource flow by being on the paths that link others. A higher betweenness means a greater brokerage role. They act as gatekeepers or intermediaries.
Example: In a criminal network, a person who links two otherwise separate groups (even with only a few connections themselves) likely has high betweenness – remove that person and the network might fragment.
What it indicates: This can identify nodes that are centrally positioned overall (not in a geographical sense, but in network topology). Such nodes could quickly disseminate information to the entire network.
Example: In a social network, someone at the “center” of the friend-of-friend graph (even if they aren’t connected to everyone directly) will have a high closeness score, meaning they are on average a short distance from anyone in the network.
After running the statistical analysis functions, results can be used to visually style the graph (e.g., sizing nodes by centrality values). In sum, Gephi visualizes networks and quantifies network structure with built-in measures of centrality (degree, betweenness, closeness, etc.), which can be helpful for investigative analysis.
No account is needed, but Java installation is required.
Using Gephi to visualize networks from sensitive or personal data requires ethical handling, particularly regarding privacy and consent, and careful interpretation to avoid misrepresenting the connections shown.
Data integrity is crucial for users of Gephi, as the accuracy and reliability of network visualizations depend directly on the quality of input data. For investigative journalism, any insights or patterns revealed through Gephi's analysis are only as trustworthy as the data provided. Poor data quality — such as incomplete records, unverified sources, or outdated information — can lead to misleading visualizations that misrepresent relationships or inflate the importance of specific network nodes. To ensure meaningful results, Gephi users must verify data sources, validate accuracy, and cross-check information before visualizing it. Maintaining high data integrity not only strengthens the credibility of the analysis but also allows for responsible storytelling, helping to prevent the spread of misinformation and ensuring that network insights are grounded in factual, well-vetted data.
Cherven, K. (2015). Mastering Gephi Network Visualization. Packt Pub Ltd.
Gephi Consortium (open-source community, CTO : Mathieu Bastian)
Martin Sona
measures how many direct connections (edges) a node has. A node with a high degree centrality has many links to others, making it well-connected. It’s essentially a count of immediate neighbors.
measures how often a node lies on the shortest paths between other nodes. In other words, a node with high betweenness centrality is a critical broker or bridge in the network.
: Measures how “close” a node is to all others in the network, typically defined as the reciprocal of the total distance from that node to all other nodes. A node with high closeness centrality can reach all others quickly (in few hops on average).
Plugins & Experimental Metrics: Gephi’s plugin repository may offer additional statistical measures or variants (for example, advanced community detection algorithms, , , or ). Be sure to check the Gephi Plugin Center if you need specialized metrics.
Social network analysis has been used to investigate through, (e.g., via ), and even of or networks. Gephi's network analysis features allow journalists to trace these relationships systematically. Noteworthy examples of the use of Gephi in high-profile cases include:
Panama Papers: The ICIJ’s Panama Papers investigation (2016) involved analyzing a massive trove of offshore financial records. Reporters used network analysis tools, including Gephi to . By converting people and companies into “nodes” and their relationships (e.g., directorships or client links) into “edges,” Gephi helped journalists uncover hidden connections in the data. This cited the case showing how graph visualization enabled the team to trace complex ownership networks and find key intermediaries in the offshore schemes. (Note: ICIJ also used graph databases like , but Gephi was used for certain analyses and producing visualization graphics.)
9/11 Terrorist Network Analysis: Shortly after the 2001 attacks, analyst Valdis Krebs to show how they were interlinked. Krebs’s paper “Mapping Networks of Terrorist Cells” (2002) demonstrated that even though no single terrorist was connected to all others, there were focal points (connectors) in the network. This analysis pre-dated Gephi (Krebs used available SNA tools of the time), but it’s precisely the kind of investigation Gephi excels at today. Modern journalists and researchers have replicated such network mapping using Gephi to illustrate terrorist cell structures and identify key influencers. Brant Houston (Univ. of Illinois journalism professor) as a tutorial example for anyone learning social network analysis. (Note: although Gephi itself wasn’t used in 2002, later analysts could easily perform similar analyses with Gephi’s tools.)
: They Rule (2004–2005) is an investigative data visualization project by artist Josh On, which mapped the interlocking directorates of major U.S. corporations. It provided an interactive web interface for exploring how corporate board members overlap between companies, revealing tight networks of corporate governance. While They Rule wasn’t built with Gephi (it was a custom web app), it’s been as network journalism examples for its visualization of power networks. The project showed, for instance,, concentrating power within a small elite. An investigative journalist could use Gephi to achieve a similar analysis by importing board membership data and visualizing those connections. So while not a Gephi case per se, it’s a relevant example of network visualization in journalism.
Due to its extensive features, Gephi has a moderate learning curve. Still, beginners can start with basic tutorials and sample datasets to understand the interface and critical functions like layouts, filters, and metrics. A good strategy is to focus on one feature at a time: experiment with layouts to arrange nodes, use filters to simplify complex networks, and apply basic metrics like centrality to interpret relationships. As they become comfortable, users can explore plugins and advanced features like for more tailored analyses.
Gephi has an active user community that can provide help and share tips. The primary hub in recent years has been the , which serves as the main place to ask questions and get support. This Facebook group effectively replaced the older official forum. (The legacy exists, but as of 2018–2019 it saw declining activity and new questions are directed to the Facebook forum.) Additionally, Gephi’s developers and power users monitor the .
Gephi can be run most modern computers, but computing requirements . It can be less intuitive for beginners, and certain advanced functions may require plugins or scripting knowledge.
Levallois, C. (2017, January 20). Simple Gephi Project from A to Z.
Levallois, C. (2024, November 27). Gephi Tutorials.
Grandjean, M. (2024). Gephi. Retrieved November 30, 2024, from (Tutorials incl 30 Gephi examples)
Martin Grandjean. (2022, September 21). GEPHI - Introduction to Network Analysis and Visualization (Tutorial) [Video recording].
Global Investigative Journalism Network (Director). (2023, September 30). GIJC23—Using Social Network Analysis for Investigations [Video recording].
Gephi Cookbook | Cloud & Networking | Print. (n.d.). Packt. Retrieved November 10, 2024, from
Barabási, A.-L. (2016). Network Science. (this is EXCELLENT!)
Datasets. GitHub. Retrieved November 30, 2024, from
ASNR - Animal Network Data. Retrieved November 30, 2024, from (ASNR aims to assemble and provide a comprehensive index of real-world animal interaction data sets across all taxa. Only high-value peer-reviewed data.)
: NodeXL is an add-in for Microsoft Excel that provides network analysis and visualization within a spreadsheet interface. It is Windows-only (as it hooks into Excel) and comes in a free “NodeXL Basic” version and a paid Pro version.. This approach makes it simple to edit data (you can leverage Excel formulas, etc. for node attributes). Brant Houston explained that NodeXL is integrated with Excel, making it very simple for beginners who are comfortable with spreadsheets. It’s suitable for quick, small to medium-sized network analysis; however, it may struggle with large graphs. Also, advanced visualization customizations and real-time manipulation are more limited than those of Gephi. NodeXL offers a more gentle learning curve and even has built-in data importers for social media (in the Pro version) but lacks the visual polish and plugin extensibility of Gephi. (One might use NodeXL to gather or preprocess data and then use Gephi to fine-tune the visualization, .)
is a web-based network visualization tool developed at Stanford’s Humanities + Design lab. It runs entirely in the browser – no installation required – and is geared towards historians and humanists for exploring complex historical datasets. Palladio is described as a that focuses on ease of use. You can upload spreadsheet data (nodes and links) and interactively create network views, maps, and timelines. It’s great for quickly visualizing a dataset and finding patterns without coding. However, Palladio has notable limitations: since it’s in-browser and meant for lightweight use, it can become slow or unstable with very large datasets. It also but still works in digital humanities classrooms for introducing network analysis before moving to more comprehensive tools. Compared to Gephi, Palladio is less feature-rich – it doesn’t compute advanced network metrics or offer extensive styling options.
: is a Python library for interactive network visualization. It allows you to generate network graphs in Python and output them as an HTML page (using the JavaScript library vis.js under the hood). Essentially, PyVis is a wrapper that brings the interactivity of vis.js to Python users, so you can script the creation of a network visualization and then view it in a web browser. PyVis is not a GUI tool – it requires writing Python code. It works well with : you can create a Network object, add nodes/edges, and then display an interactive network within the notebook or export it to an HTML file. The result is a web-based visualization where you can pan, zoom, and even click on nodes for details. PyVis offers flexibility for developers (since you can automate tasks and integrate with data analysis pipelines in Python), but it’s less user-friendly for non-coders. It also depends on the browser for rendering, so extremely large networks may be hard to handle (just as any web-based viz would). Gephi might handle larger networks better performance-wise (using OpenGL), whereas PyVis/vis.js running in a browser could hit memory or speed limits for huge graphs. Also, PyVis itself doesn’t compute SNA metrics – you’d use Python libraries (like ) to do analysis, then use PyVis purely for visualization. PyVis is good for creating . This makes it a complementary tool: Gephi for point-and-click exploration and PyVis for scripted, shareable interactive diagrams.
Neo4j (with Datashare Plugin): Neo4j is fundamentally different from the above – it’s a rather than a dedicated visualization tool. It's optimized for storing and querying graph data (nodes and relationships) and managing very large, complex networks. It allows the user to run complex queries (using its query language ) to find patterns, shortest paths, sub-networks, etc., in the data. In practice, one might use Neo4j to crunch the data (find communities, run graph algorithms, handle millions of records), then use a visualization front-end (like Gephi, or Neo4j’s own and Browser interfaces, or ) to visualize the result. Neo4j does come with basic visualization: the Neo4j Browser GUI can display query results as a node-link diagram, but these are not as customizable as Gephi’s visualizations. A key difference: Gephi works on static data you load into it (good for snapshot analysis and visual exploration), whereas Neo4j is a continuously running database that can be updated and queried in real-time (good for dynamic or very large datasets where you need to sift through data systematically). In short, Neo4j vs Gephi is not an either-or; they often complement each other. Gephi is for visual interactive analysis, Neo4j is for data storage and algorithmic analysis. Also of note: Neo4j is not purely open-source in all its editions (the is open-source, and are commercial), whereas Gephi is fully open-source. For an investigator, choosing Neo4j would depend on needing to handle huge networks or integrate the graph with other systems; choosing Gephi would be about interactive exploration and presentation-quality visuals.