Saturday, January 31, 2026

Clinical Trial Automation: The Naming Conventions Challenge

Is cross-system clinical trial automation failing not because workflows cannot be executed, but because protocol terminology cannot be interpreted consistently across systems and clinical studies?


Discussions about automating clinical trials often focus on advanced technologies such as workflow engines, interoperability standards, and AI/ML. In practice, cross-system automation most often breaks much earlier, at a far more basic level: inconsistent naming conventions

Clinical protocols are written in natural language, where the same term can legitimately carry different meanings depending on study design, therapeutic area, or regulatory intent. Software systems, by contrast, assume that identifiers are stable, explicit, and unambiguous. This mismatch creates friction long before questions of execution logic or governance arise.

Several academic and industry initiatives have proposed encoding protocol elements, such as eligibility criteria, visits, milestones, consent states, into structured system logic. While technically feasible, these efforts consistently encounter a semantic problem: the protocol terms being encoded do not have a single, invariant meaning. Instead, their interpretation is context-dependent and often only becomes clear when the protocol is implemented across multiple systems such as CTMS, EDC, TMF, and finance. Automation struggles not because systems cannot execute logic, but because they cannot reliably infer what a term is supposed to mean.

A well-known example is FPI (First Patient In). On the surface, FPI appears to be a simple milestone, frequently used in project plans, dashboards, and sponsor communications. In reality, its meaning varies widely across protocols. In some studies, FPI refers to the date of informed consent; in others, it marks the first screening procedure, the first randomization, or the first dose administration. Each interpretation is valid within its own protocol context. However, when FPI is mapped into systems, problems emerge. CTMS may treat FPI as a project milestone, finance may use it to trigger budget recognition, EDC may infer it from visit data, and TMF may never record it explicitly at all. The same label points to different underlying events, none of which is universally “correct.”

Interpretation of FPI Typical system usage Practical implication
First informed consent signed CTMS, project timelines FPI occurs before any clinical data exists; operational start is assumed early
First screening procedure performed EDC, visit scheduling FPI inferred indirectly from visit data rather than explicitly recorded
First randomization IRT / RTSM FPI tied to eligibility confirmation, not patient contact
First dose administered EDC, safety reporting, finance FPI becomes a safety-relevant and sometimes financial milestone
Protocol-specific composite definition TMF, regulatory narrative Meaning clarified only through documentation and justification

From a systems perspective, this ambiguity is toxic. Automation assumes that a term like FPI corresponds to a specific event or timestamp. When it does not, logic either becomes hard-coded (and wrong for some protocols), or manual overrides and reconciliations proliferate. The issue is not a lack of interoperability, but the absence of shared semantic agreement about what the term represents in a given study. One word in the protocol can silently mean different things across systems, vendors, and stakeholders, forcing humans to continuously translate intent.


This pattern repeats across many commonly used terms: Screening, Baseline, End of Treatment, Study Completion. Each appears simple, yet each acquires meaning only through protocol context. As long as protocols rely on natural language without enforced naming conventions and explicit semantic definitions, clinical systems will require protocol-specific configuration and manual interpretation. Automation, in this sense, does not fail because it is too ambitious, but because it assumes linguistic precision where none formally exists.

A more realistic path forward is not to eliminate interpretation, but to recognize naming conventions as a first-order design problem in clinical research systems. Without stable, protocol-specific definitions of key terms, no amount of technical sophistication can deliver reliable automation. Before workflows can be executed, meanings must be agreed upon. In clinical research, that work remains largely manual and unavoidably so.

A practical solution is to treat FPI not as a single milestone, but as a small family of explicitly qualified events, such as FPI(s) for first screening or FPI(e) for eligibility confirmed. Each with a defined meaning that can be consistently implemented across systems. For example:

| Identifier | Meaning                 |

| FPI(s)     | First patient screened  |

| FPI(c)     | First informed consent  |

| FPI(e)     | Eligibility confirmed   |

| FPI(r)     | First randomization     |

| FPI(d)     | First dose administered |

The absence of explicit, protocol-qualified naming conventions such as FPI(s) or FPI(e) is not due to technical limitations, but to the lack of a designated owner for protocol semantics across clinical systems.


Friday, January 30, 2026

Blockchain to Clinical Trial Automation – What Are the Obstacles?

Why promising concepts have not translated into practical implementation? Lessons Learned.

The idea of using blockchain to automate clinical trials emerged from a compelling analogy: if financial contracts can be expressed as executable smart contracts, why not clinical protocols? 

Wednesday, January 28, 2026

Project Management and Modular Outsourcing

Platforms, GenAI, and modular services. How can this impact Project Management? 

Work organization is evolving across many knowledge-intensive industries. Digital platforms, GenAI-assisted production, and global access to qualified specialists are changing how services are offered, priced, and evaluated. This development is visible even in domains traditionally characterized by strong regulation and institutional structures, like clinical research. Rather than asking whether regulated projects are “moving” to open platforms, a more precise question is:

Monday, January 26, 2026

Swissmedic Submissions and Project Timelines: Why Approval Speed Determines Your Study Start

Clinical trial timelines in Switzerland are tightly linked to the efficiency of submissions to Swissmedic.

Delays at this stage do not remain confined to regulatory milestones. They cascade into site activation, contracts, budgets, and overall project duration.

From a project-management perspective, Swissmedic approval is frequently on the critical path. Even minor, avoidable issues (missing annexes, outdated guidance, inconsistent documents) can postpone study start by weeks or months.

Sunday, January 25, 2026

Longevity as a Project

What changes if we consider life as a project?


Projects fail rarely because of one catastrophic event. They fail because small deviations accumulate, risks go unnoticed, and performance is not tracked against expectations. Aging may follow a similar pattern.

Instead of asking “How do we defeat aging?”, we can ask a different question:

What if we consider life as a long-running project, and manage it using basic project control logic?

This reframing does not promise immortality. It proposes something more modest and potentially more useful: governance of healthspan over time.

Project management analogy: Planned, Actual, and Longevity Value

Friday, January 23, 2026

Innovative Mobile Platforms for Clinical Research and Evidence Generation

Advances in mobile technology have opened new pathways for clinical research that go beyond traditional site-centric models. Mobile apps now play a growing role in study visibility, participant engagement, and data collection, both within regulated clinical trials and in healthy-population research.

Rather than replacing clinical trial systems, these platforms operate at a different layer:
they connect people to research, standardise how participation occurs, and enable more continuous, real-world data capture. Over time, this may support more structured datasets, easier study access, and more reliable evidence for analysis and decision-making.

Below are two groups of mobile platforms illustrating this shift.

Mobile Platforms Supporting Clinical Trial Participation

(Patient-facing, trial-specific apps)

Science 37 — Studies App

Focus: Decentralized and hybrid clinical trials

Science 37’s Studies App allows participants to discover, consent to, and take part in clinical trials remotely. The platform reduces dependence on physical sites and lowers participation barriers, particularly for patients who would otherwise not have access to research centers.

Thursday, January 22, 2026

"Fit for Purpose" in Clinical Research

One trial, one purpose or many stakeholder perspectives?

The expression “fit for purpose” appears 14 times in **ICH GCP ICH E6(R3). This repetition is not accidental. It signals a deliberate regulatory shift away from rigid, one-size-fits-all compliance toward a more contextual, risk-based understanding of quality, proportionality, and oversight in clinical research.

At the same time, the phrase itself is deceptively simple. According to the Cambridge Dictionary, fit for purpose means:

“Suitable and good enough to do what it is intended to do.”

Fit for Purpose in ICH GCP (R3)

Within ICH GCP E6(R3), fit for purpose is used to describe a wide range of elements, including trial processes, quality management systems, oversight mechanisms, data handling approaches, and supporting technologies. Across all these contexts, the underlying principle is consistent: systems and processes should be appropriate for their intended use, proportionate to risk, and focused on what truly matters for participant safety and reliable decision-making.

Wednesday, January 21, 2026

"Quality by Design" in Clinical Trials

What is Quality by Design according to ICH GCP E6(R3)? 

E6(R3) Good Clinical Practice (GCP) places Quality by Design as a a key and recurring principle across the guideline. The concept is introduced in the context of clinical study design, risk identification, and planning, and is reinforced at multiple points throughout the document as follows:

Tuesday, January 20, 2026

FDA Warning Letters to Investigators: Tutorial Examples

FDA warning letters are most often associated with manufacturing or product-related issues, but they can also be issued directly to clinical investigators when significant regulatory concerns are identified during inspections. 

These letters are typically issued months after the underlying events, following:

  • an on-site inspection,

  • documented observations,

  • and review of written responses provided by the investigator or site.

As a result, a warning letter may appear long after the clinical activity in question has already concluded.

Here are 2 examples from December 2025:

  • Example 1  - https://www.fda.gov/inspections-compliance-enforcement-and-criminal-investigations/warning-letters/purushothaman-damodara-kumaran-md-721325-12222025
  • Example 2 - https://www.fda.gov/inspections-compliance-enforcement-and-criminal-investigations/warning-letters/devalingam-mahalingam-md-phd-721145-12112025

What such letters generally indicate

Monday, January 19, 2026

The Ten "Commandments" of AI in Drug Development

Guiding Principles of Good AI Practice in Drug Development, January 2026 (https://www.fda.gov/media/189581/download)


In January 2026, the U.S. Food and Drug Administration, together with international regulatory partners, published ten principles for Good AI Practice in Drug Development:

  1. Human-centric by design

  2. Risk-based approach

  3. Adherence to standards

  4. Clear context of use

  5. Multidisciplinary expertise

  6. Data governance and documentation

  7. Model design and development practices

  8. Risk-based performance assessment

  9. Life cycle management

  10. Clear, essential information

These principles are presented as a foundation for further work rather than a finalized implementation framework. They describe what regulators consider important when AI is used to generate evidence across the drug product life cycle, without defining how these expectations should be met in specific technical or organizational settings.

At this stage, the document is intentionally high-level. Practical interpretation and operationalization will likely evolve through continued dialogue between regulators, industry, standards bodies, and technology developers.

https://www.raps.org/news-and-articles/news-articles/2026/1/ema-fda-issue-joint-ai-guiding-principles-for-drug

Familiar Concepts in a New Context

Many of the principles may sound familiar to professionals working with established clinical trial systems such as EDC, CTMS, or eTMF platforms. Concepts like risk-based approaches, lifecycle management, data governance, documentation, and adherence to standards are already part of everyday regulatory practice.

What appears different is not the concepts themselves, but the context in which they are now being emphasized. When AI is introduced, familiar expectations are applied to technologies that may behave differently from traditional, rule-based systems. This naturally raises questions about interpretation rather than compliance.

An Open Question Worth Considering

One possible way to read the guidance is to view it as an invitation to reflect:

  • Which of these principles are already well understood and operationalized in existing clinical systems?

  • Where might AI introduce additional considerations that are less explicit in traditional software development?

  • How might established practices evolve as systems move from deterministic behavior toward more adaptive or probabilistic approaches?

These are not questions with immediate or universal answers. They depend heavily on context, use case, system design, and regulatory interaction.

Early Guidance, Not Final Instruction

Importantly, the FDA document does not claim to resolve these questions. Instead, it sets a shared reference point for future discussion and alignment. The absence of technical detail should not be read as a gap, but as recognition that good practice in this area is still emerging and will require time, experimentation, and collaboration to mature.

For now, the principles serve as a common language, useful for orientation, internal discussion, and education.

Closing Note

As AI continues to enter regulated environments, documents like this are likely to be revisited, refined, and expanded. Understanding them as living guidance, rather than fixed rules, may be the most appropriate way to approach them at this stage.

For readers involved in clinical systems, software development, or regulatory oversight, the principles offer a structured way to think about AI, without yet demanding definitive answers.

Disclaimer: This post reflects an educational interpretation of publicly available regulatory guidance and does not constitute regulatory or legal advice.



What a 50% Increase in FDA CDER Warning Letters Tells Us About Quality, Oversight, and System Pressure


According to remarks reported by the Regulatory Affairs Professionals Society, the FDA Center for Drug Evaluation and Research (CDER) issued 50% more warning letters in fiscal year 2025 than in the previous year. This is a notable increase and suggests a change in enforcement activity that warrants closer examination. Whether this reflects a shift in regulatory posture, changes in industry behavior, or increased scrutiny of emerging areas remains an open question.

Rather than treating this figure purely as a headline about enforcement intensity, it is useful to consider what such an increase may indicate about where regulatory attention is currently focused and how oversight adapts as technologies, business models, and supply chains evolve.

Enforcement volume as a signal, not just an outcome

Warning letters are often viewed as the final step in regulatory enforcement. From a systems perspective, they can also be understood as lagging indicators. By the time a warning letter is issued, inspections have occurred, observations have been documented, and responses have been reviewed. Escalation typically reflects a judgment that identified issues were not adequately addressed.

A sharp increase in warning letters therefore raises a broader question:
are regulators encountering more instances of noncompliance, or are they applying closer scrutiny to areas where compliance expectations are still being interpreted and tested?

Innovative Software Solutions in Clinical Research

Clinical research software is often associated with large, enterprise platforms. Alongside these established systems, however, a growing group of specialized and innovation-focused solutions addresses specific pain points such as protocol planning, budgeting, recruitment, operational oversight, and documentation. These tools are frequently adopted as complements to core systems rather than replacements, particularly in regulated environments.

This overview highlights selected software providers that are commonly referenced in discussions about digital transformation in clinical research, with a focus on planning, feasibility, budgeting, and operational coordination.

Protocol Design, Planning, Budgeting and Feasibility

  • Espero Health (Sweden) develops software focused on protocol-driven budgeting and feasibility assessment in clinical research. The platform emphasizes deriving cost and effort estimates directly from structured protocol activities and Schedule of Events assumptions, supporting transparency between clinical planning and financial oversight. Website: https://espero-health.com/

  • Trials.ai was developed as a decision-support platform for data-driven clinical trial planning. Public information indicates a focus on analyzing historical trial data, literature, and protocol elements to support study design decisions. The company was later acquired and integrated into a larger life-sciences analytics organization, suggesting its capabilities are now embedded in enterprise-level offerings. Website: https://www.trials.ai/

  • Risklick is a Switzerland-based company offering software to support structured clinical protocol development, particularly in regulated environments such as medical devices and clinical trials. Its solutions emphasize consistency, reuse of prior knowledge, and alignment with regulatory expectations during protocol authoring. Website: https://www.risklick.ch/https://www.linkedin.com/company/risklick

  • Condor Software provides solutions for clinical trial financial management, including budgeting, forecasting, and site payment processes. The platform focuses on improving transparency and alignment between clinical operations and financial oversight, an area often associated with manual reconciliation and fragmented workflows. Website: https://www.condorsoftware.com/https://www.linkedin.com/company/condor-software-inc

  • Clinical Maestro® by Strategikon is a clinical trial planning and intelligence platform designed to support feasibility assessment, operational forecasting, and data-driven decision-making during study design and portfolio planning. The software focuses on improving transparency and predictability prior to and alongside trial execution rather than replacing core operational systems. Website: https://strategikon.com/https://www.linkedin.com/company/clinical-maestro/

    ProofPilot provides a digital protocol and workflow automation platform intended to reduce manual effort during trial setup and execution. Its emphasis is on standardization, traceability, and structured workflows rather than replacing existing operational systems. Website: https://www.proofpilot.com/https://www.linkedin.com/company/proofpilot/

Recruitment, Engagement, and Matching

Documentation and Structured Knowledge

Quality and Regulatory Considerations

All software used in clinical research must operate within established quality and regulatory frameworks, particularly for studies conducted under FDA or EMA oversight. Regardless of innovation level, such tools are expected to support data integrity, audit trails, access control, and validation appropriate to their intended use. As a result, many innovative solutions are positioned as decision-support or planning layers, complementing validated core systems rather than replacing them.

Toward Future Software Evaluation

As digital transformation in clinical research continues, independent and experience-based software evaluation becomes increasingly relevant. Beyond feature descriptions, meaningful assessment requires understanding how tools integrate into operational workflows, quality systems, and regulatory constraints.

This blog aims to document and observe these developments over time. Future posts may explore individual solutions in more depth, focusing on use cases, limitations, and integration considerations, rather than promotional claims.

Sunday, January 18, 2026

How to Access and Read FDA Warning Letters (A Practical Guide for a Tutorial)

FDA warning letters are publicly available and can be read in full. They are one of the most transparent regulatory resources for understanding how quality and compliance issues are identified and described by regulators.

Official FDA Warning Letter Repository

All FDA warning letters are published on the official FDA website:

👉 FDA Warning Letters Database
https://www.fda.gov/inspections-compliance-enforcement-and-criminal-investigations/compliance-actions-and-activities/warning-letters

This repository is maintained by the U.S. Food and Drug Administration and is updated regularly.

How the repository is structured

On the FDA warning letters page, you can:

  • Browse warning letters by year

  • Filter by FDA center (e.g. drugs, biologics, devices)

  • Search by company name

  • Search by subject or keyword

Each entry links to a PDF or HTML letter issued directly by the FDA.

What to look for when reading a warning letter

For educational purposes, it is useful to read warning letters systematically rather than casually. Key sections to focus on include:

  1. Inspection background
    Describes when and why the FDA inspection or review took place.

  2. Observed violations
    Lists specific regulatory deficiencies, often referencing CFR sections.

  3. Regulatory interpretation
    Explains why the FDA considers the findings significant.

  4. Expected corrective actions
    Indicates what the FDA expects the organization to address.

  5. Potential consequences
    Outlines possible enforcement actions if issues are not resolved.

Reading these sections helps build familiarity with how regulators reason, not just what rules exist.

Why warning letters are useful learning material

Unlike guidance documents, warning letters:

  • reflect actual failures, not hypothetical scenarios,

  • show how regulations are applied in practice,

  • reveal recurring patterns across organizations and time,

  • illustrate the link between operational decisions and regulatory outcomes.

For students of clinical research, quality, or project management, they offer insight into system-level weaknesses that are difficult to see in controlled examples.

Using warning letters responsibly

Warning letters should not be read as:

  • judgments of intent,

  • proof of misconduct beyond what is stated,

  • or definitive conclusions about patient harm.

They should be read as regulatory signals: indicators that systems, processes, or controls did not perform as expected.


A suggested exercise (an illustration for learning)

A simple educational approach is to:

  1. Select one warning letter from the FDA database.

  2. Identify the main category of violation (e.g. GMP, clinical research, labeling).

  3. Ask what project, process, or system failure likely contributed.

  4. Consider what preventive controls could have reduced the risk.

This turns regulatory documents into learning artifacts, rather than compliance anecdotes.

Wednesday, January 7, 2026

Reducing Clinical Trial Complexity

What recent publications and reports are pointing to. Evidence and open questions.

Clinical trials are essential for evaluating the safety and efficacy of new therapies, but the operational and financial burden of modern trials has grown significantly over recent decades. Trials are becoming more complex, expensive, and difficult to execute. These often require elaborated protocol designs, extensive regulatory documentation, and multi-site coordination. All of which contribute to longer timelines and higher costs. Recent analyst commentary on the clinical trials industry highlights complexity as a core challenge limiting feasibility. https://www.clinicaltrialsarena.com/features/clinical-trials-challenges-expect-2025

At the same time, funding disruptions have had measurable impacts on clinical research feasibility. A study published in JAMA Internal Medicine reported that NIH grant terminations disrupted approximately 3.5 % of active federally funded clinical trials, affecting over 74,000 enrolled participants and resulting in significant lost funding. These kinds of funding cutbacks do not reflect scientific failure but rather resource constraints that force studies to halt or terminate early. https://www.ajmc.com/view/nih-grant-terminations-disrupt-1-in-30-clinical-trials-impacting-over-74-000-participants

Operational burden and complexity not only challenge sponsors financially but also increase the risk of failure for individual trials and patient safety. A broad literature review of why clinical trials fail identifies high operational and financial burden as one of the factors associated with trial discontinuation, alongside recruitment challenges and design issues. This evidence reinforces the idea that cost and complexity are practical constraints in trial implementation. https://pmc.ncbi.nlm.nih.gov/articles/PMC6092479/

In response to these challenges, researchers and industry groups have begun to develop tools and frameworks to measure and mitigate complexity. A 2025 methodological study introduced a protocol complexity tool that quantifies different aspects of a trial’s design (operational execution, regulatory burden, patient/site burden, etc.) and correlates complexity scores with key trial indicators such as site activation and recruitment timelines. The aim is to provide evidence-based simplification without compromising scientific or ethical standards. https://www.researchgate.net/publication/395032442_Development_of_a_protocol_complexity_tool_a_framework_designed_to_stimulate_discussion_and_simplify_study_design

Beyond individual tools, advocacy from researchers and consortia highlights broader systemic barriers, such as regulatory fragmentation and administrative burden, which can delay trial start-up and reduce feasibility, especially in multinational research contexts. For example, groups in the EU have called for regulatory alignment and reduced administrative complexity to support clinical research across member states, recognizing that procedural obstacles repeatedly delay studies and increase costs. https://eatris.eu/news/clinical-trial-community-seek-urgent-implementation-of-life-science-strategy-as-european-research-becomes-increasingly-endangered/ 

These strands of evidence do not claim that simplification alone will guarantee more lifesaving drugs, nor that reducing cost automatically improves scientific reliability, but they do support a conditional hypothesis: if a large share of current trial resources is consumed by procedural complexity and cost burden rather than core scientific evaluation, then reducing unnecessary complexity could make some studies more operationally feasible within existing budgets, possibly enabling a broader set of research questions to be pursued. https://zenodo.org/records/15651378 

Open question: Does reducing administrative and operational complexity without weakening scientific standards actually free up resources to support additional trials, and could that in turn accelerate meaningful clinical advances? This remains a continuing area of professional and methodological inquiry rather than a settled fact. Recent publications and tool development efforts suggest it is a practical, evidence-informed question worth exploring.


More on this topic in my earlier posts: 

  1. Trial–Project Dualism: An Operational View https://www.project-owner.com/2025/12/trialproject-dualism-operational-view.html
  2. Structuring Project Complexity: Reviewing Patents related to Project Management https://www.project-owner.com/2025/05/structuring-project-complexity.html
  3. From Digitalization to Digital Transformation in Clinical Research https://www.project-owner.com/2025/07/from-digitalization-to-digital.html


Friday, January 2, 2026

Blogging in the GenAI Age: Why Writing May Still Matter in 2026

Blogging in 2026 looks questionable. Most people no longer read blogs and long posts. Information is searched, skimmed, or delegated to generative AI. Even thoughtful posts may attract little attention, while AI can generate fluent text instantly. Against this background, blogging no longer functions reliably as a communication channel. The relevant question is therefore not how to grow a blog, but whether blogging still serves a meaningful purpose.

epistemic blogging One answer is epistemic. Blogging has increasingly become a way of documenting reasoning rather than broadcasting information. Generative AI produces language at scale, but it does not assess novelty, truth, or justification. It optimizes for plausibility, not for being right for the right reasons. When humans write carefully, make assumptions explicit, and acknowledge uncertainty, they leave traces of reasoning that are qualitatively different from synthetic text. Blogging, in this sense, preserves human judgment in public form.

This matters not only for readers, but also for the stability of AI systems. A 2024 Nature paper showed that when generative models are trained recursively on content produced by earlier models, performance degrades, diversity collapses, and errors reinforce over time. This phenomenon is known as model collapse. The study is not about blogs specifically, but it highlights a general mechanism: if new data increasingly lacks grounding in human experience, experimentation, and reasoning, systems become self-referential and brittle. The disappearance of human-authored reasoning from public spaces removes precisely the kind of epistemic input that synthetic systems cannot generate on their own.

At the same time, public writing and blogs can not be treated as reliable input. A 2024 ACM study examining Common Crawl, the largest public web dataset used in AI training, shows that blogs and other websites are included not because they are verified or correct, but because they are publicly available. Publication is treated as implicit permission. As a result, careful reasoning is mixed with speculation, error, and noise. Human-authored text may be necessary to prevent epistemic collapse, but availability alone does not guarantee quality.

Taken together, these findings suggest that the relationship between blogging and AI is unresolved rather than obsolete. Human writing may still be needed to introduce new experiences, reasoning paths, and interpretations into the public record, while the mechanisms for incorporating such material into AI systems remain an open challenge. How this exchange develops will be one of the more interesting questions to watch in 2026.

From the author’s perspective, epistemic motivations do not exclude pragmatic ones. Blogging can also support visibility or credibility, and in some cases writers become trusted one-person commentary channels through consistent, responsible publication. These outcomes are exceptions, but they show that epistemic and practical intentions can coexist.

Blogging in the GenAI age is therefore neither obsolete nor guaranteed to matter. Its value depends on whether human reasoning continues to be expressed publicly, even when attention is scarce and automation is easy.

References: 

  • Shumailov, I., Shumaylov, Z., Zhao, Y. et al. AI models collapse when trained on recursively generated data. Nature 631, 755–759 (2024). https://doi.org/10.1038/s41586-024-07566-y 
  • Stefan Baack. 2024. A Critical Analysis of the Largest Source for Generative AI Training Data: Common Crawl. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT '24). Association for Computing Machinery, New York, NY, USA, 2199–2208. https://doi.org/10.1145/3630106.3659033