Lumen
A research project collecting and publishing legal takedown notices for online content transparency
URL
Description
Lumen is a Harvard‑affiliated research database of legal complaints and content‑removal requests (e.g., DMCA, defamation, court orders) submitted to online services. As of mid‑2025 it hosts ≈43 million notices that reference almost 10 billion URLs. Investigators can search notices by keywords, entities (sender/recipient/principal), topics, and jurisdictions; exact‑phrase searching is available by quoting terms. A researcher login adds a per‑notice “watch” feature that emails updates when new documents (e.g., follow‑up court orders) are added.
Key use cases and features:
Notices Repository: Lumen keeps a collection of important notices, like DMCA claims, defamation issues, privacy concerns, trademark matters, and court orders. Each notice shares details about the sender and recipient, such as who requested the content removal and which hosting or search service was involved. It also provides a brief overview of the reasons for the request and includes the URLs of the content being questioned.
Search and Filtering: full‑text query plus facets (notice type, sender, recipient, date, language, etc.). Since Jun 2024 users can run exact‑match searches by quoting a phrase and, if logged‑in as researchers, add individual notices to a personal “watch” list to be alerted of updates.
API for Advanced Research Researchers and investigative journalists can obtain API credentials to automate queries for large-scale data analysis. The API supports:
Searching by keywords, date ranges, parties involved, etc.
Retrieving entire notices as JSON for customized analytics.
Programmatic data collection over time to identify trends in takedown requests.
Lumen’s overarching goal is to bring transparency to the ecosystem of online content removal requests. It aims to be an independent, research-driven clearinghouse of takedown notices, allowing journalists, researchers, and the public to see who is requesting that specific web content or links be taken down and why. Lumen does not validate or endorse these requests; rather, it archives them to facilitate academic study, watchdog journalism, and informed civic debate.
When/Why a Researcher or Journalist Might Use Lumen
Investigating Patterns of Censorship or Overreach:
Example: A politician repeatedly using DMCA notices to remove critical blog posts.
Outcome: You can uncover if the same entity has filed multiple notices across different platforms to silence certain views.
Checking the Legitimacy of a Takedown Claim:
Example: You receive a tip that content on YouTube was flagged for copyright infringement, but you suspect it’s fair use.
Outcome: A search of Lumen might reveal a DMCA notice that either lacks a credible claim or is suspiciously similar to notices flagged as fraudulent.
Examining Geopolitical or Government Interventions:
Example: A ministry in Country X demands the removal of “defamatory” content from Google or a social network.
Outcome: Lumen’s archive can reveal the scope of government requests, including how often and for what reasons they are made.
Researching Corporate Takedown Practices:
Example: You want to see if a major film studio issues an unusually high number of DMCA requests for minor social-media posts.
Outcome: By aggregating data in Lumen, patterns may emerge, helping you question potentially excessive takedowns.
What Lumen Shows

Full or partially redacted copies of notices:
Sender (often a rights-holder, law firm, or government agency)
Recipient (Google, Vimeo, Medium, or others)
Target URLs and reason (e.g. copyright, defamation, trademark)

Contextual metadata:
Date sent/received
Claimed legal grounds (DMCA, local law, court order)
Whether the recipient took action (partial or none)
Research Tools:
Searchable indexes and filtering (dates, keywords, notice type)
API access for data mining and in-depth analysis
Example of a Basic API Query
Below is a short sample of how a journalist or researcher might retrieve data via Lumen’s API (assuming they have an API token and some familiarity with command-line tools like curl). This example queries for notices that contain the term “fraud” in their text or metadata, from a date range, and returns the first page of JSON results.
# 1. Replace YOUR_API_TOKEN with the token you received from Lumen
# 2. Adjust the date range (in Unix time) and term as needed.
curl -H "User-Agent: MyResearchBot/1.0" \
-H "Accept: application/json" \
-H "X-Authentication-Token: YOUR_API_TOKEN" \
"https://lumendatabase.org/notices/search.json?term=fraud&date_received_facet=1672531200000..1704067200000&page=1"Explanation:
term=fraud: Searches for notices containing the word “fraud” in their text.date_received_facet=1672531200000..1704067200000: Limits results to notices received between two Unix timestamps (e.g., 01 Jan 2023 to 01 Jan 2024).page=1: Retrieves the first set of results.-H "X-Authentication-Token: YOUR_API_TOKEN": Authenticates you as a recognized researcher.Accept: application/json: Ensures responses return in JSON, which is easily parsed by scripts or data-analysis tools.
You will receive a JSON object listing all matching takedown notices, each with fields like id, title, date_received, sender_name, and more. By iterating through page values, or narrowing your date range, you can gather larger sets of notices over time.
Cost
API access is free for research purposes; tokens are issued on request to the Lumen team and use is governed by the API Terms of Use.
Level of difficulty
Beginner-Friendly (Web Search) to Moderate (API Research)
Basic searching requires only a web browser and minimal familiarity with search filters.
API usage or large-scale analysis requires intermediate data-handling or scripting skills (JSON, command line, etc.).
Requirements
Website Usage: No registration to browse truncated info; an email-based request is needed to view unredacted URLs or attachments.
API Usage: Researchers must apply for an API key by emailing Lumen’s team with a brief description of intended usage.
Limitations
Coverage depends on participating platforms. Notable gaps/changes: YouTube does not currently share copies of notices; Twitter/X paused data‑sharing on 2023‑04‑15; Automattic/WordPress is “not currently sending”; GitHub (paused); Stack Exchange stopped in 2017
Redacted Fields: Personally identifying information and entire text explanations may be redacted. For unregistered visitors, full URLs are truncated.
No Bulk Export via Website: For large-scale or automated retrieval, you must use the API.
Date & Result Limits: Extremely large or unfiltered searches might be capped or require date-slicing. Large, unfiltered queries can hit the 10 000‑result cap; slice by date when pulling data. Database growth now > 200 k notices/week.
No Guarantee of Accuracy: Lumen does not confirm or endorse the validity of a notice; some notices may be fraudulent or contain misinformation.
Coverage gaps: Twitter/X paused data-sharing on 15 Apr 2023; some other services (e.g. Stack Exchange) stopped years earlier.
Google omits sender names from defamation notices for privacy reasons, so those fields will read “Redacted”.
Although the software is GPL-2.0, individual notice texts remain under the terms set by the submitter; bulk redistribution of raw data may require permission.
Ethical Considerations
Potential Privacy Risks: Notices sometimes include sensitive info (names, allegations, etc.). Even though Lumen redacts personal data, some details may still appear in the body or attachments. Handle carefully.
Possibility of Misuse: Some takedown requests are abusive or ‘fake DMCA’ attempts, aiming to silence speech or censor legitimate content.
Caution with Publication: If you cite Lumen notices, consider verifying with additional sources. A notice alone is not proof of wrongdoing or infringement.
Guides and articles
Official API Docs (GitBook, Jan 2025) – berkman-klein-center.gitbook.io/lumen-database
Steve Vondran: How to track Trump Twitter Takedowns on the lumen database*, Youtube.
*Certain elements of the user interface may have been altered since publishing
Dan Morrill: DMCA notice BPI Part 2 Empty Claims, Youtube.
Tool provider
The Berkman Klein Centre for Internet & Society at Harvard University A research-focused center studying the intersection of law, technology, and society.
Current major contributors (2025): Google Search, YouTube, Meta (Facebook & Instagram), GitHub, Reddit, Wikipedia, Medium, Vimeo, DuckDuckGo, Wordpress, University of California systems. Twitter/X not currently contributing.
Advertising Trackers
Martin Sona
Last updated
Was this helpful?