Gephi
Open-source network analysis and visualization software
URL
https://gephi.org (0.10.1 as of October 2025)
Description
Gephi is a free, open-source tool for network visualization and analysis, widely used to explore and represent relationships in large datasets, such as social networks, links between documents, or web structures. Gephi allows users to create customizable network graphs, analyze metrics (like centrality and clustering), and identify patterns within complex datasets. The tool supports importing various data formats (CSV, GEXF) and offers plugins for advanced functionality. It can be used in journalism and open source research to visually analyze and reveal hidden connections in data, such as by examining online misinformation networks.
Gephi has the ability to create detailed, interactive, and visually compelling network graphs. This visual appeal helps to highlight complex relationships within data, making it easier for journalists to uncover hidden links between entities like individuals, companies, or groups. This capability can be particularly valuable for investigative stories where clear visual representation of connections can be crucial for audience understanding.

Core Social Network Analysis Metrics
Gephi includes built-in support for computing key metrics that help identify important nodes in a network. Three core metrics commonly used are degree centrality, betweenness centrality, and closeness centrality:
Degree Centrality measures how many direct connections (edges) a node has. A node with a high degree centrality has many links to others, making it well-connected. It’s essentially a count of immediate neighbors.
What it indicates: Nodes with higher degree centrality can be influencers or hubs that directly reach many others.
Example: In a Twitter network, a user with connections to many others (through follows or mentions) would have high degree centrality.
Betweenness Centrality measures how often a node lies on the shortest paths between other nodes. In other words, a node with high betweenness centrality is a critical broker or bridge in the network.
What it indicates: Such nodes connect different clusters or sections of the graph; they may not have the most connections, but they control information or resource flow by being on the paths that link others. A higher betweenness means a greater brokerage role. They act as gatekeepers or intermediaries.
Example: In a criminal network, a person who links two otherwise separate groups (even with only a few connections themselves) likely has high betweenness – remove that person and the network might fragment.
Closeness Centrality: Measures how “close” a node is to all others in the network, typically defined as the reciprocal of the total distance from that node to all other nodes. A node with high closeness centrality can reach all others quickly (in few hops on average).
What it indicates: This can identify nodes that are centrally positioned overall (not in a geographical sense, but in network topology). Such nodes could quickly disseminate information to the entire network.
Example: In a social network, someone at the “center” of the friend-of-friend graph (even if they aren’t connected to everyone directly) will have a high closeness score, meaning they are on average a short distance from anyone in the network.
After running the statistical analysis functions, results can be used to visually style the graph (e.g., sizing nodes by centrality values). In sum, Gephi visualizes networks and quantifies network structure with built-in measures of centrality (degree, betweenness, closeness, etc.), which can be helpful for investigative analysis.
Gephi in Investigative Journalism
Social network analysis has been used to investigate political influence through campaign contributions, social media manipulation (e.g., election interference via coordinated accounts), and even tracking of criminal or extremist networks. Gephi's network analysis features allow journalists to trace these relationships systematically. Noteworthy examples of the use of Gephi in high-profile cases include:
Panama Papers: The ICIJ’s Panama Papers investigation (2016) involved analyzing a massive trove of offshore financial records. Reporters used network analysis tools, including Gephi to visualize and explore the web of offshore entities and connections. By converting people and companies into “nodes” and their relationships (e.g., directorships or client links) into “edges,” Gephi helped journalists uncover hidden connections in the data. This article cited the case showing how graph visualization enabled the team to trace complex ownership networks and find key intermediaries in the offshore schemes. (Note: ICIJ also used graph databases like Neo4j and a web interface, but Gephi was used for certain analyses and producing visualization graphics.)
9/11 Terrorist Network Analysis: Shortly after the 2001 attacks, analyst Valdis Krebs mapped the connections between the hijackers and associates to show how they were interlinked. Krebs’s paper “Mapping Networks of Terrorist Cells” (2002) demonstrated that even though no single terrorist was connected to all others, there were focal points (connectors) in the network. This analysis pre-dated Gephi (Krebs used available SNA tools of the time), but it’s precisely the kind of investigation Gephi excels at today. Modern journalists and researchers have replicated such network mapping using Gephi to illustrate terrorist cell structures and identify key influencers. Brant Houston (Univ. of Illinois journalism professor) points to Krebs’s 9/11 network mapping as a tutorial example for anyone learning social network analysis. (Note: although Gephi itself wasn’t used in 2002, later analysts could easily perform similar analyses with Gephi’s tools.)
They Rule Project: They Rule (2004–2005) is an investigative data visualization project by artist Josh On, which mapped the interlocking directorates of major U.S. corporations. It provided an interactive web interface for exploring how corporate board members overlap between companies, revealing tight networks of corporate governance. While They Rule wasn’t built with Gephi (it was a custom web app), it’s been cited in the same breath as network journalism examples for its visualization of power networks. The project showed, for instance, that 87 of the top 100 US companies shared board directors, concentrating power within a small elite. An investigative journalist could use Gephi to achieve a similar analysis by importing board membership data and visualizing those connections. So while not a Gephi case per se, it’s a relevant example of network visualization in journalism.
Cost
Level of difficulty
Learning Curve
Due to its extensive features, Gephi has a moderate learning curve. Still, beginners can start with basic tutorials and sample datasets to understand the interface and critical functions like layouts, filters, and metrics. A good strategy is to focus on one feature at a time: experiment with layouts to arrange nodes, use filters to simplify complex networks, and apply basic metrics like centrality to interpret relationships. As they become comfortable, users can explore plugins and advanced features like time-based visualizations for more tailored analyses.
Gephi has an active user community that can provide help and share tips. The primary hub in recent years has been the Gephi Facebook Group, which serves as the main place to ask questions and get support. This Facebook group effectively replaced the older official forum. (The legacy Gephi Forum exists, but as of 2018–2019 it saw declining activity and new questions are directed to the Facebook forum.) Additionally, Gephi’s developers and power users monitor the GitHub issue tracker.
Requirements
Platforms: Windows 10/11, macOS (Intel & Apple Silicon), Linux (desktop). Supported systems
Java runtime: Bundled since 0.9.3; you don’t install Java separately. Docs → Troubleshooting
Install methods: – Windows/macOS installers from the official site. Gephi Desktop – Linux:
snap(official Snap) or Flathub package. Snapcraft, FlathubAuth/tokens: None (local desktop app).
Supported modules/features: – Importers: CSV (nodes/edges), GEXF, GraphML, GDF, Pajek NET, GraphViz DOT, UCINET DL, Netdraw VNA, spreadsheets. FAQ: Supported formats, CSV import doc, GraphML import doc – Layouts: ForceAtlas2 (Gephi’s default), Fruchterman‑Reingold, Yifan Hu. Quickstart (ForceAtlas2), ForceAtlas2 paper – Metrics/Statistics: Degree, betweenness/closeness (via Network Diameter), modularity/community detection, average path length, clustering coefficient. FAQ: Betweenness via Network Diameter, Toolkit Javadoc (Modularity) – Filtering/Queries: Attribute & topology filters; interactive selection. Quickstart – Export: PNG/SVG/PDF images; GEXF/Graph files; multi‑export since 0.10.0. 0.10.0 highlights – Plugins: Install from Tools → Plugins (Plugin Center); compatibility indicated by Gephi version. Plugin Center, Plugin quick start – Programmatic use: Gephi Toolkit 0.10.0 (released 2023‑03‑08). Toolkit releases
• Optional dependencies: – A GPU/driver with stable OpenGL support improves interactivity; outdated/virtualized graphics can cause rendering issues.
Limitations
Scale & performance: Interactivity can degrade on very large graphs; careful styling/filters are often needed. (Example: a user report with ~384k nodes/9.4M edges where UI becomes near‑unresponsive.) GitHub issue
Updates: Only patch versions auto‑update; major updates require manual download. Gephi 0.10.0 announcement
Plugins: Not all community plugins are updated for 0.10.x; check the Plugin Center’s compatibility tags before installing. Plugin Center
No built‑in data collection: You must build your network (APIs, exports, scraping) before importing into Gephi. Quickstart
Gephi Lite differences: Gephi Lite (web) is a separate application and currently lacks CSV import (GraphML/GEXF only); it is useful for quick viewing rather than full desktop analysis. Gephi Lite issue with CSV
Legal/ToS: Importing personal-data graphs may trigger data-protection and platform-ToS obligations; ensure lawful sources/processing.
Ethical Considerations
Use network visualization lawfully and proportionately, minimizing collection and retention of personal data and avoiding harm (e.g., doxxing, exposing sensitive relationships). For methodology and evidentiary handling, see the Berkeley Protocol on Digital Open Source Investigations (OHCHR/UC Berkeley) for standards on identification, collection, verification, and preservation of digital open‑source information. OHCHR Berkeley Protocol hub
Data integrity is crucial for Gephi users, as the accuracy and reliability of network visualizations depend directly on the quality of the input data. For investigative journalism, any insights or patterns revealed through Gephi's analysis are only as trustworthy as the data provided. Poor data quality — such as incomplete records, unverified sources, or outdated information — can lead to misleading visualizations that misrepresent relationships or inflate the importance of specific network nodes. To ensure meaningful results, Gephi users must verify data sources, validate accuracy, and cross-check information before visualizing it. Maintaining high data integrity not only strengthens the credibility of the analysis but also allows for responsible storytelling, helping to prevent the spread of misinformation and ensuring that network insights are grounded in factual, well-vetted data.
Guides
Quickstart (official): basic import → layout → style → export workflow. gephi.org/quickstart
Importing CSV: node/edge table expectations and wizard. docs.gephi.org/…/Import/CSV_Format
Importing GraphML: format support and caveats. docs.gephi.org/…/Import/GraphML_Format
0.10.0 release notes (official blog): Apple Silicon support, better search, multi‑export; patch‑only auto‑updates. Gephi blog, 2023‑01‑09
ForceAtlas2 paper: canonical description of the default layout. PLOS ONE (2014)
Official Wiki
https://github.com/gephi/gephi/wiki
Complete Beginners
Levallois, C. (2017, January 20). Simple Gephi Project from A to Z. https://seinecle.github.io/gephi-tutorials/generated-html/simple-project-from-a-to-z-en.html
General / Advanced / Multi-Language
Levallois, C. (2024, November 27). Gephi Tutorials. https://seinecle.github.io/gephi-tutorials/
Grandjean, M. (2024). Gephi. Retrieved November 30, 2024, from https://www.martingrandjean.ch/gephi/ (Tutorials incl 30 Gephi examples)
Videos
Martin Grandjean. (2022, September 21). GEPHI - Introduction to Network Analysis and Visualization (Tutorial) [Video recording]. https://www.youtube.com/watch?v=GXtbL8avpik
Journalism-Specific
Global Investigative Journalism Network (Director). (2023, September 30). GIJC23—Using Social Network Analysis for Investigations [Video recording]. https://www.youtube.com/watch?v=-D8E8JY86b4
Books
Cherven, K. (2015). Mastering Gephi Network Visualization. Packt Pub Ltd.
Gephi Cookbook | Cloud & Networking | Print. (n.d.). Packt. Retrieved November 10, 2024, from https://www.packtpub.com/en-us/product/gephi-cookbook-9781783987405?type=print
Barabási, A.-L. (2016). Network Science. http://networksciencebook.com/ (this is EXCELLENT!)
Open Datasets
Datasets. GitHub. Retrieved November 30, 2024, from https://github.com/gephi/gephi/wiki/Datasets
ASNR - Animal Network Data. Retrieved November 30, 2024, from https://bansallab.github.io/asnr/data.html (ASNR aims to assemble and provide a comprehensive index of real-world animal interaction data sets across all taxa. Only high-value peer-reviewed data.)
Comparison with similar software
NodeXL: NodeXL is an add-in for Microsoft Excel that provides network analysis and visualization within a spreadsheet interface. It is Windows-only (as it hooks into Excel) and comes in a free “NodeXL Basic” version and a paid Pro version. It allows users to import edge lists into Excel and generates graphs from those tables. This approach makes it simple to edit data (you can leverage Excel formulas, etc., for node attributes). Brant Houston explained that NodeXL is integrated with Excel, making it very simple for beginners who are comfortable with spreadsheets. It’s suitable for quick, small to medium-sized network analysis; however, it may struggle with large graphs. Also, advanced visualization customizations and real-time manipulation are more limited than those of Gephi. NodeXL offers a more gentle learning curve and even has built-in data importers for social media (in the Pro version), but lacks the visual polish and plugin extensibility of Gephi. (One might use NodeXL to gather or preprocess data and then use Gephi to fine-tune the visualization, as some workflows suggest.)
Palladio is a web-based network visualization tool developed at Stanford’s Humanities + Design lab. It runs entirely in the browser – no installation required – and is geared towards historians and humanists for exploring complex historical datasets. Palladio is described as a “simple but powerful exploratory data visualization tool” that focuses on ease of use. You can upload spreadsheet data (nodes and links) and interactively create network views, maps, and timelines. It’s great for quickly visualizing a dataset and finding patterns without coding. However, Palladio has notable limitations: since it’s in-browser and meant for lightweight use, it can become slow or unstable with very large datasets. It also hasn't seen active development in a few years, but still works in digital humanities classrooms for introducing network analysis before moving to more comprehensive tools. Compared to Gephi, Palladio is less feature-rich – it doesn’t compute advanced network metrics or offer extensive styling options.
PyVis: is a Python library for interactive network visualization. It allows you to generate network graphs in Python and output them as an HTML page (using the JavaScript library vis.js under the hood). Essentially, PyVis is a wrapper that brings the interactivity of vis.js to Python users, so you can script the creation of a network visualization and then view it in a web browser. PyVis is not a GUI tool – it requires writing Python code. It works well with Jupyter notebooks: you can create a Network object, add nodes/edges, and then display an interactive network within the notebook or export it to an HTML file. The result is a web-based visualization that allows you to pan, zoom, and click on nodes for detailed information. PyVis offers flexibility for developers (since you can automate tasks and integrate with data analysis pipelines in Python), but it’s less user-friendly for non-coders. It also depends on the browser for rendering, so extremely large networks may be hard to handle (just as any web-based viz would). Gephi might handle larger networks better in terms of performance (using OpenGL), whereas PyVis/vis.js running in a browser could hit memory or speed limits for huge graphs. Also, PyVis itself doesn’t compute SNA metrics – you’d use Python libraries (like NetworkX) to do analysis, then use PyVis purely for visualization. PyVis is good for creating interactive visuals with a few lines of code. This makes it a complementary tool: Gephi for point-and-click exploration and PyVis for scripted, shareable interactive diagrams.
Neo4j (with Datashare Plugin): Neo4j is fundamentally different from the above – it’s a graph database rather than a dedicated visualization tool. It's optimized for storing and querying graph data (nodes and relationships) and managing very large, complex networks. It allows the user to run complex queries (using its query language Cypher) to find patterns, shortest paths, sub-networks, etc., in the data. In practice, one might use Neo4j to crunch the data (find communities, run graph algorithms, and handle millions of records), then use a visualization front-end (such as Gephi, Neo4j’s own Bloom and Browser interfaces, or Linkurious) to visualize the results. Neo4j comes with basic visualization capabilities: the Neo4j Browser GUI can display query results as a node-link diagram, but these are not as customizable as Gephi’s visualizations. A key difference: Gephi works on static data you load into it (good for snapshot analysis and visual exploration), whereas Neo4j is a continuously running database that can be updated and queried in real-time (good for dynamic or very large datasets where you need to sift through data systematically). In short, Neo4j vs Gephi is not an either-or; they often complement each other. Gephi is for visual interactive analysis, Neo4j is for data storage and algorithmic analysis. Also of note: Neo4j is not purely open-source in all its editions (the Community edition is open-source, and enterprise features are commercial), whereas Gephi is fully open-source. For an investigator, choosing Neo4j would depend on needing to handle huge networks or integrate the graph with other systems; choosing Gephi would be about interactive exploration and presentation-quality visuals.
Tool provider
Developer/org: Gephi project / Gephi Consortium (open‑source community). Presences: gephi.org/about, GitHub: gephi/gephi (open-source community, CTO : Mathieu Bastian)
License: CDDL‑1.0 OR GPL‑3.0‑only (dual licensing). License texts: cddl‑1.0.txt, gpl‑3.0.txt, and repo license note
Advertising Trackers
Martin Sona
Last updated
Was this helpful?