Ask a Data Ethicist: What Are the Legal and Ethical Issues in Summarizing Text with an AI Tool?

One of the popular use cases for AI tools is the summarization of texts. Yet, there are both legal and ethical questions to consider when it comes to summarizing copyrighted work. In academia, where many materials are licensed under specific terms and conditions it can be particularly difficult to understand …

What are the legal and ethical issues in summarizing text with an AI tool?

Terms and Conditions

One of the first questions many people have is whether or not the data they put into an AI tool will be used to train future iterations of AI. There are settings to opt out of this use, as well as protections that may be in place if you are using the tool in an enterprise context. It’s best to double-check this at the specific tool level just to confirm the appropriate protections are in place. Many organizations will not permit the use of an AI tool outside their “walled garden” environment. There is always a risk that users may not have the appropriate setting activated and the data, once shared, cannot be deleted.

Another question is whether or not you have the appropriate rights to use the content in this manner. Most people do not read the terms and conditions of AI tools. But, if you did, you’ll find a clause that says that you, the user, are responsible for using this tool lawfully, which means in accordance with copyright laws. That basically puts the onus and risk onto you to determine if you are onside with the law.

If the document you wish to summarize is your own work, you own the copyright and you can confidently proceed. For example, I used NotebookLM to turn one of my articles into a podcast, which I now use as a demonstration of a bad podcast.

However, for many people, the real value is not summarizing their own work, but the work of others. If the copyrighted work is a workplace document, say a report produced by others in your workplace for work purposes, and you are summarizing it for work purposes, you are likely also fine. The copyright would be retained by your employer and they would (likely) sanction that use. You might want to confirm this by reviewing your organization’s acceptable use policies.

Data and AI Ethics Courses

Explore the ethical considerations and standards implicit in the data industry and the emerging realm of AI.

License to Process with AI

The situation becomes more complicated when you consider summarizing other copyrighted materials like books or articles. It’s also, realistically, the type of content most people want to be able to summarize. In this case, there is a copyright holder – the author or publisher. Assuming you have legal access to the work (e.g., you licensed or purchased it), what are the rules?

Let’s look at a specific case, an academic setting. The university library will license various works which staff and students can access for study, research and other academic purposes. Generally speaking, use of these works for private study, might fall under the umbrella of “fair use” or “fair dealing.” There is a multi-part test that lawyers apply to determine this on a case-by-case basis, but in general, academia largely benefits from the “fair use” or “fair dealing” provisions.

However, more publishers, keenly aware of AI-related issues, are adding AI exclusion clauses to their licenses, which may explicitly prohibit the use of processing their content with an AI tool. These explicit AI exclusion clauses can supersede the general “fair use” provisions. This means understanding the specifics of the particular licensed work itself. This is not necessarily the job for individual faculty or students. Typically a librarian with specialized knowledge in copyright and access to the license agreement for the work could provide advice on the specific work to determine whether it is permissible for summarization using an AI tool.

But, let’s get real – is anyone actually doing this?

Who Is Responsible?

Imagine if every faculty member wanted to know if their specific reading list, which may contain a couple dozen papers or book chapters, met the mark for this use case of summarization by AI. I suspect most faculty have not vetted their materials in this manner and may not even be aware that this is a possible issue because they are used to being covered by “fair use” or “fair dealing.”

Many universities have guidance that encourages experimenting with AI tools to advance learning, with vague nods to “respect intellectual property rights,” but offer no specific details about how to go about ensuring that intellectual property rights are actually respected. My sense is that that there’s a “don’t ask too many questions” mentality at play here. If we don’t explicitly know what you’re doing, and we’ve told you that you are responsible, we’ve done our part. Yet, this shirks responsibility at the institutional level and places it on the back of individual faculty and students who are ill equipped to understand this complex and shifting landscape.

There are also questions surrounding which AI tool will be used. There may be more or less protections depending on the tool itself, whether or not it is free or attached to an enterprise account and the settings selected. These are all elements that factor into responsible use.

In addition, the question of how the outputs will be used needs to also be considered. Will it remain private,for self study,or will it be shared with others? Those details also matter in determining fair use or fair dealing. Even Creative Commons licenses are not a slam dunk. For example, ND means “non-derivative,” but what is an AI an summary if not a modification of the work? The implications may not have any practicable merit if the summary is never shared but at the very least, its something to be aware of when using work licensed this way.

The broad take – the blanket coverage of “fair use” or “fair dealing” that protected a lot of educational use cases is challenged by AI tools that are processing data.

Notebook LM: A Safe Choice?

One tool that has been popular with academics and claims to address the issue of not using your data for AI training is Google’s NotebookLM. This tool will summarize docs, but also make quizzes, podcasts, infographics, and other materials that might useful in an education context. Here’s what they say about how they use your data (bold is mine):

“NotebookLM uses the files you add, the outputs you generate, and your chat history to build your knowledge base and assist with your research and tasks. The content in NotebookLM will not be used to directly train our foundational AI models, unless you choose to provide feedback.

Data that NotebookLM shares with other Google services (e.g., Gemini App) is used per the Google Privacy Policy and any service specific notices (e.g., the Gemini Apps Privacy Notice), including for product improvement.” – Google

That is the policy as it stands right now. However, policies can and do change. It wouldn’t be the first time Google changed a policy that had major implications for academic users. For example, Google storage used to be free and unlimited for universities. That changed in 2022, leaving many institutions scrambling to find other alternatives. What’s to stop a change in terms of service for a tool like NotebookLM? Nothing.

Data uploaded to NotebookLM is processed via Google’s Cloud infrastructure, which means it may leave your country of residence and may no longer be protected by local laws. There may be additional protection or data residency restrictions for an enterprise level account vs a personal account.

There is also the possibility that a NotebookLM user might share their notebook and inadvertently violate copyright laws. Personal accounts allow for public sharing, a feature akin to publishing, while enterprise accounts allow for sharing with others within the institution. In either case, the copyrighted content and its derivatives are being distributed to others, moving beyond the level of personal use. It doesn’t help matters that sites that teach educators about using this tool encourage sharing. Here’s what one popular site says:

“Notebooks are shareable! Click the ‘Share’ button to share a notebook with an individual – and give them viewer rights or editor rights. Imagine this as an AI-powered collaborative lesson planning space.” – Ditchthattextbook

Most educators and those who write guidance for educators, are not well versed in understanding the nuance of these types of legal details. They operate under the long standing assumption of blanket protections granted to educators.

New Tools, New Rules

Just as the tools are changing, the rules are also changing. Here are some questions to ask:

What tool is being used and what are the terms of that tool?
- Personal or enterprise account?
- Default to not use data for training models?
- Where is data processed (and are you OK with the arrangement)?
- Who else will have access to the data and for what purposes?
Can the specific piece of content be uploaded into an AI tool while respecting copyright?
- Does the license for the work explicitly prohibit AI processing? If yes, don’t use it.
- Is this a paywalled work? (more risk that open access). Find out the licensing terms to see if this is a permissible use.
- Does the work say “All rights reserved”? Consider this language a red flag.
Is this for your own private, non-commercial, personal use?
- You will not share it with anyone else.
- You will not create derivative works (podcasts, quizzes, summaries, etc.) that you share with others.

Asking yourself those questions before you use the tool can help you stay onside. If all else fails, you could also just read the article and make your own notes. I know, that is so 2021 of me to say.

Additional Resources

The University of Toronto has guidance around GenAI and copyright that starts to address these issues. I’ve reviewed several Canadian university guidelines and this one stands out.

This blog post is a great read and provided helpful background on this issue.

Send Me Your Questions!

I would love to hear about your data dilemmas or AI ethics questions and quandaries. You can send me a note at [email protected] or connect with me on LinkedIn. I will keep all inquiries confidential and remove any potentially sensitive information – so please feel free to keep things high level and anonymous as well.

This column is not legal advice. The information provided is strictly for educational purposes. AI and data regulation is an evolving area and anyone with specific questions should seek advice from a legal professional.

AI Risk Lab

Learn how to manage AI to maximize opportunity and avoid liability – June 8 & 15, 2026.

Enroll Now

Ask a Data Ethicist: What Are the Legal and Ethical Issues in Summarizing Text with an AI Tool?

Terms and Conditions

Data and AI Ethics Courses

License to Process with AI

Who Is Responsible?

Notebook LM: A Safe Choice?

New Tools, New Rules

Additional Resources

Send Me Your Questions!

AI Risk Lab

Katrina Ingram

The Good AI: The Need for AI Agent Behavior Catalogs

How Logical Data Layers Support Ethical, Transparent AI

Ask a Data Ethicist: Can You (Ever) Safely Use ChatGPT for Researching a Medical Condition?

Thanks!

Ask a Data Ethicist: What Are the Legal and Ethical Issues in Summarizing Text with an AI Tool?

Terms and Conditions

Data and AI Ethics Courses

License to Process with AI

Who Is Responsible?

Notebook LM: A Safe Choice?

New Tools, New Rules

Additional Resources

Send Me Your Questions!

AI Risk Lab

Katrina Ingram

Related Articles

The Good AI: The Need for AI Agent Behavior Catalogs

How Logical Data Layers Support Ethical, Transparent AI

Ask a Data Ethicist: Can You (Ever) Safely Use ChatGPT for Researching a Medical Condition?

Lead the Data Revolution from Your Inbox.

Thanks!