StackHacks - Do you want AI to use YOUR Substack writing for training itself?
Here is how to choose...
Why I want OpenAI to use my writing to train itself!
I want AI (Artificial Intelligence) to scrape my Substack posts and use my articles to answer chat questions for all.
Why do I write?
to help you build your lists,
to help you write better,
to help you get more subscribers,
to help with my mental acuity,
make you more money,
to share the information I have learned.
Just like SEO, ChatGPT will cite references to boost the original authorβs authority. This in turn gets more eyes on your work, and will in turn boost your Search Engine rankings and gain readers.
To let the parent company of ChatGPT - Open AI - to use your content, keep this button unchecked and greyed out. Dashboard > Settings > Publication Details > Block AI training.
To block, just click the button.
The default is that it WILL use your content for training. It is up to you to block that as I have outlined above.
New added March 6/24:
From an article on Search Engine Land website:
3 reasons not to block ChatGPTβs GPTBot
So why should you allow GPTBot to crawl your site? Letβs look on the bright side with these three primary benefits of embracing OpenAIβs bot technology.
1. 100 million people use ChatGPT each week
By not allowing GPTBot to crawl your site, thereβs a 100 million-person audience youβre missing out on maximizing brand visibility.Β
Sharing access to your website content can help ensure your brand is both factually and positively represented to ChatGPT users.Β
This means thereβs a higher chance that your brand will actually be recommended by ChatGPT, leading to more traffic and potential customers.Β Β
Some brands report getting 5% of their overall leads, or $100,000 in monthly subscription revenue from ChatGPT. I know our agency has already gotten some leads from ChatGPT, too.
Another way to consider this is as a positive digital PR (DPR) play. You should leverage DPR strategies like brand mention campaigns in todayβs landscape.Β
Permitting GPTBot to crawl your site only adds to these efforts by allowing ChatGPT to access your brand information directly from the source and distribute it to 100 million users positively.Β
2. Generative engine optimization (GEO)
Whether you have fears about AI, we can all agree that itβs changing the marketing landscape. Like all new technologies and trends in our industry, those slow to embrace AI as a conduit for new business and brand exposure will miss the proverbial boat.Β
GEO is picking up steam as a sub-practice of SEO. Youβll miss a significant opportunity if youβre not targeting some of your marketing efforts to be in this marketplace. Competitors may pick up after you let it slip through the cracks.Β
We know itβs easy for brands to fall behind in todayβs fractioned and ever-growing marketing landscape. If your competitors spend years working on GEO, maximizing LLM visibility and developing skills and expertise in this area, thatβs years ahead of you theyβll be.Β
Now, GEO reporting capabilities havenβt caught up to the value yet, which means it will be tough to measure an ROI, but that doesnβt mean itβs something to ignore and fall behind on.
Brands and marketers must start embracing LLMs like ChatGPT as an emerging acquisition channel that shouldnβt be ignored.
3. OpenAIβs pledge to minimize harm
A healthy distrust of AI technologies is important to its legal and ethical growth. But we also need to be open-minded and realize we canβt be effective as marketers if we resist and choose not to grow and innovate in the direction of things.Β
OpenAI clearly states βminimize harmβ as one of the guiding principles of their platform. They also have policies to respect copyright and intellectual property and have stated that GPTBot filters out sources violating their policies.
By allowing GPTBot to crawl your siteβs content, youβre contributing to the clean and accurate training data OpenAI uses to enhance and improve its information accuracy.
As AI technology marches on, it can be easy to get caught up in skepticism, fear, and noise. Those struggling to embrace and maximize it will get left behind.
Why you might not want ChatGPT to scrape and use your information to benefit their software and help it learn:
Privacy: Users may be concerned about the privacy of their personal information, ideas, or opinions. They may not want their writing to be analyzed or potentially used in ways that compromise their privacy by adding a citation.
Intellectual Property: Individuals who generate content, especially creative or proprietary work, may be protective of their intellectual property. They might not want their ideas or writing to be used without explicit permission or compensation. Itβs kind of like a paywall.
Sensitive Information: Users might have written content that contains sensitive or confidential information. Allowing the model to learn from such content could pose a risk of unintentional disclosure or misuse.
Misrepresentation: Users may worry that the model could potentially generate responses that donβt understand the intent in their views, beliefs, or writing style. This could lead to misunderstandings or unintended consequences.
Ethical Concerns: Some writers may have ethical concerns about AI models in general, particularly regarding the potential misuse of information or the impact of AI on society. They might choose not to contribute their writing to avoid supporting or participating in these concerns.
Bias and Fairness: If the input data contains biases, the model could unintentionally perpetuate or amplify those biases in its responses.
Usage
Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to primarily aggregate personally identifiable information (PII), or have text that violates our policies. Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety. Below, we also share how to disallow GPTBot from accessing your site. (https://platform.openai.com/docs/gptbot)
Use this link to see how to disallow the bot from scraping any blog or writing you may have on the internet.
Bing and OpenAI
As of the writing of this article, (mid-January, 2024) Substack does not have an option for me to submit my newsletter (or my posts) to the Bing search engine, as it does with Googleβs search engine. So?
Bing is the search engine that powers **ChatGPT**. Microsoft has integrated OpenAI's ChatGPT technology into Bing, which allows users to carry on a conversation with the search engine.
This integration was announced in February 2023 at a Microsoft ChatGPT event. The integration of ChatGPT technology into Bing is part of Microsoft's efforts to challenge Google's search dominance. And to help with Microsoftβs $13 Billion investment in ChatGPTβs parent company OpenAI, currently run by Sam Altman.
Iβll post more about how to optimize your Substack newsletter for Bing when Substack allows me to submit my newsletter to Bing using the βBing Webmaster Tools Submit URLβ feature.
UPDATE:
Install your Google Search Console
Now that you have a Google Search Console account, hereβs how to add your site to Bing with your Google Search Console in Microsoft Bing Webmaster Tools:
Click Import under the import your sites from Google Search option.
Click on Continue
Select the Google Account that your Google Search Console is set up with
Allow Bing.com access to your Google account
Select the website you want Bing to track and click import.
Click Done.
Do you want ChatGPT to scrape your content?
More: How artists can poison their pics with deadly Nightshade to deter AI scrapers
wow - I just learned more about this subject in a few minutes than I think I knew at all before.
Very interesting, Paul. I hadn't thought about the discoverability aspect.