English Amiga Board


Go Back   English Amiga Board > abime.net - Home Projects > project.EAB

 
 
Thread Tools
Old 17 April 2024, 01:58   #41
hceline
Registered User
 
Join Date: Nov 2009
Location: Top of the world
Posts: 161
Using search term "site:eab.abime.net" on Google shows exactly how many pages are indexed at the moment.
And according to the result, the issue can not be robots.txt.

At this moment it got some of the forums like
https://eab.abime.net/forumdisplay.php?f=37,
and some attachments links, but no threads.

If robots.txt was the issue, these would have been gone with the rest.

The strange thing with the result is that it claims to have "About 94 results" but will show only 17 links.

One explanation might be the reoccurring issues with long periods of "The server is too busy at the moment. Please try again later." have triggered a severe reduction in Google systems internal quality ranking of EAB.

Last edited by hceline; 17 April 2024 at 02:04. Reason: Spelling
hceline is offline  
Old 17 April 2024, 05:28   #42
nocash
Registered User
 
Join Date: Feb 2016
Location: Homeless
Posts: 63
Actually robots.txt won't suppress results that were found in external links from third-party webpages to eab, that's described here, https://developers.google.com/search...g/robots/intro (including a big warning saying to use noindex instead robots.txt if you want to suppress external links).

So far, the problem could be very well caused by robots.txt. But I don't know why google is treating eab as a Hall of Light. Maybe there is another webpage out there that is describing eab as the Hall of Light? Or maybe google is downloading parts of the eab html pages, up to including the Hall of Light search box?

(Btw. that search box is confusing even to humans, why on earth does the forum's search box redirect to Hall of Light search? Yes, I know that there is another search button hidden away between Off Topic and Quick Links, which is even more confusing, why would anyone do that??)
nocash is offline  
Old 17 April 2024, 07:41   #43
hceline
Registered User
 
Join Date: Nov 2009
Location: Top of the world
Posts: 161
EDIT: Nevermind, it seems google has removed support for the "links:" keyword. So the "links:eab.abime.net" number says nothing. But I will contend that there must be lots of external links directly to EAB threads all over the net. And those not showing up would indicate that the problem is elsewhere.

Well if you use the Google search term "links:eab.abime.net" you get at this moment: About 246,000 results

While "site:eab.abime.net" gives now: About 73 results
(after scrolling down and clicking "repeat the search with the omitted results included.")

So if robots.txt does not suppress external links, those two numbers should prove that something else is at play.

Last edited by hceline; 17 April 2024 at 07:49.
hceline is offline  
Old 17 April 2024, 08:36   #44
hceline
Registered User
 
Join Date: Nov 2009
Location: Top of the world
Posts: 161
Quote:
Originally Posted by RCK View Post
- Maybe because the threads doesn't use canonical URLs --> I can try to change the URL scheme with the new server.
The URLs where not a problem up until now? And changing them would break all external links. Which would probably only compound the problem.
Quote:
Originally Posted by RCK View Post
- Maybe because the site is not responsive --> We need to move to Xenforo and lost real Amiga support.
Regarding this I want to point out that the "The server is too busy at the moment. Please try again later." issues started around October 1st 2023
and the Google results apparently had started disappearing by December 12th 2023:
https://eab.abime.net/showpost.php?p...2&postcount=96
hceline is offline  
Old 17 April 2024, 08:40   #45
malko
Ex nihilo nihil
 
malko's Avatar
 
Join Date: Oct 2017
Location: CH
Posts: 4,884
I don't find Google search result that reliable (they way to much push in front paid result/advert), but for what it is worth, or for the curious ones, here is an updated list of their search options :

https://ahrefs.com/blog/google-advan...rch-operators/

Google page on the subject: https://support.google.com/websearch.../2466433?hl=en
malko is offline  
Old 17 April 2024, 15:51   #46
nocash
Registered User
 
Join Date: Feb 2016
Location: Homeless
Posts: 63
Quote:
Originally Posted by RCK View Post
the addition of GPT disallow into the robots.txt was done on "2023-10-03"
Code:
User-agent: GPTBot
User-agent: CCBot
User-agent: ChatGPT-User
Disallow: /
User-agent: *
Allow: /
That looks very unusual, normally there should be a blank like between the two "records", like this:
Code:
User-agent: GPTBot
User-agent: CCBot
User-agent: ChatGPT-User
Disallow: /

User-agent: *
Allow: /
There are several documents stressing that one should properly use blank lines:

1) http://fileformats.archiveteam.org/w...usion_Standard
"Lines with nothing but a comment are ignored, so don't count as blank lines for the purpose of ending a section of the file."

2) http://www.robotstxt.org/robotstxt.html
"you may not have blank lines in a record, as they are used to delimit multiple records."

3) https://searchengineland.com/a-deepe...obotstxt-17573
"Each User-Agent/Disallow group should be separated by a blank line; however no blank lines should exist within a group (between the User-agent line and the last Disallow)."

And pretty much all robots.txt examples and real-life robots.txt files are using blank lines.

That said, these files are both lacking blank lines (but the latter one doesn't seem to have harmful impact on the search results):
https://eab.abime.net/robots.txt
https://amiga.abime.net/robots.txt
PS. If present, the Sitemap line should be also separated by a blank line.

Last edited by nocash; 17 April 2024 at 16:00.
nocash is offline  
Old 17 April 2024, 22:18   #47
jbenam
Italian Amiga Zealot
 
Join Date: Jan 2009
Location: Italy
Age: 36
Posts: 1,911
Quote:
Originally Posted by RCK View Post
- Maybe because the site is not responsive --> We need to move to Xenforo and lost real Amiga support.
- Maybe because the world is now mobile first --> We need to move to Xenforo and lost real Amiga support.
The few times I tried loading EAB on my A4K/060/RTG it took ages to load anything because of the various JS scripts.

Honestly I would gladly give up “Amiga” (Amiga with a PiStorm only, maybe? ) support in exchange for a modern forum like XenForo. I am on mobile 99% of the times I am on EAB anyway - but maybe that’s just me.

I guess you could always do some crazy stuff like statically cache every forum page using something like this: https://docs.litespeedtech.com/lscache/lscxf/

Static pages should load MUCH quicker on a real Amiga It could also help with the Google indexing issue…
jbenam is offline  
Old 22 April 2024, 17:13   #48
RCK
Administrator
 
RCK's Avatar
 
Join Date: Feb 2001
Location: Paris / France
Age: 45
Posts: 3,091
Quote:
Originally Posted by nocash View Post
That looks very unusual, normally there should be a blank like between the two "records", like this:

And pretty much all robots.txt examples and real-life robots.txt files are using blank lines.

That said, these files are both lacking blank lines (but the latter one doesn't seem to have harmful impact on the search results):
https://eab.abime.net/robots.txt
https://amiga.abime.net/robots.txt
PS. If present, the Sitemap line should be also separated by a blank line.
Thanks for the suggestions, I have updated the two robots.txt files to:
- allow once again everyone
- add a whiteline between Allow: / and sitemap
Quote:
User-agent: *
Allow: /

Sitemap: https://eab.abime.net/sitemap_index.xml.gz
We will see if it let google reindex again EAB

Last edited by RCK; 24 April 2024 at 13:16.
RCK is offline  
Old 23 April 2024, 22:22   #49
nocash
Registered User
 
Join Date: Feb 2016
Location: Homeless
Posts: 63
Google is working again for eab since about 20 hours ago.

I've no idea if it was caused by robots.txt, in the "server is too busy" thread you had mentioned to block a whole range of IPs and (and least temporarily) all Mac users - and now switched to UBBB ultimate bad bot blocker - that might also have caused/fixed the problem (especially if that blocking did apply only to eab.abime, and not to amiga.abime, since the latter one did still work okay with google).


Btw. another (dubious) idea would be to add Crawl-delay in robots.txt to prevent bots from overloading the server, something like this
Code:
User-agent: *
Crawl-delay: 2
Allow: /
That Crawl-delay feature is a somewhat good or well-meant idea, but I am not too convinced that it works in practice. As far as I know only bing and yahoo do support it, and most other bots will probably simply ignore it.
And it's not really clear what the delay is supposed to do at all: Most webpages will confidently tell you that it's a delay in seconds between page accesses. But I've also seen a webpage that claims that the delay is counted in milliseconds. And the "official" specs https://blogs.bing.com/webmaster/Aug...awler,-MSNBot/ are merely saying that the delay must be a positive whole number and that values higher than 10 will severely affect the ability of the bot to effectively crawl your site (whatever that means).
In short, I wouldn't recommend to try to use Crawl-delay (unless you should run into more problems about the server getting too busy in future).

Last edited by nocash; 23 April 2024 at 22:30.
nocash is offline  
Old 24 April 2024, 05:27   #50
TCD
HOL/FTP busy bee
 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,602
Quote:
Originally Posted by nocash View Post
Google is working again for eab since about 20 hours ago.
Huzzah! Would be interesting to know if it really was that missing blank line.
TCD is offline  
Old 24 April 2024, 13:14   #51
RCK
Administrator
 
RCK's Avatar
 
Join Date: Feb 2001
Location: Paris / France
Age: 45
Posts: 3,091
Quote:
Originally Posted by nocash View Post
I've no idea if it was caused by robots.txt, in the "server is too busy" thread you had mentioned to block a whole range of IPs and (and least temporarily) all Mac users - and now switched to UBBB ultimate bad bot blocker - that might also have caused/fixed the problem (especially if that blocking did apply only to eab.abime, and not to amiga.abime, since the latter one did still work okay with google).
My previous (manual) and present (ubbb) blocking was applied to whole abime.net server, so for me it should be robots.txt, or some ip range that are correctly whitelisted in ubbb.

Quote:
Originally Posted by nocash View Post
Google is working again for eab since about 20 hours ago.
How are you sure of that ?
One google query with "site:eab.abime.net" only show up 837 results, it's far from the 200k urls waiting in the google search console.
RCK is offline  
Old 24 April 2024, 14:18   #52
pixie
Registered User
 
pixie's Avatar
 
Join Date: May 2020
Location: Figueira da Foz
Posts: 361
Whereas before none result would appear, if you search in google for:
Quote:
sub pixel scrolling site:eab.abime.net
you will get 3 hits with actual threads. It's not much, but it is something. For reference, in bing you get 4990 results.
pixie is offline  
Old 24 April 2024, 19:15   #53
nocash
Registered User
 
Join Date: Feb 2016
Location: Homeless
Posts: 63
The different abime subdomains did have identical robots.txt files, so that doesn't explain why some subdomains did work with google, and others didn't. I think the missing linebreak didn't affect google (but it could have caused problems for other bots).

837 search results - with descriptions - is a huge improvement. Previously it was about 20 search results - without descriptions - meaning that google didn't even bother to look what is in the pages.

I guess everything should get sorted out automatically now, although it may take some months until all pages are re-indexed (as the sitemap files have crawl intervals set to "yearly" for many of the older pages) (though it's also possible that all pages will be crawled starting from "now", rather than from "12 months after last visit").

Last edited by nocash; 24 April 2024 at 19:24.
nocash is offline  
Old 25 April 2024, 22:15   #54
RCK
Administrator
 
RCK's Avatar
 
Join Date: Feb 2001
Location: Paris / France
Age: 45
Posts: 3,091
site:eab.abime.net show 2400 results today, I got hope
RCK is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Google Drive ? Washac project.WHDLoad 0 17 September 2020 11:15
You ever search Google..... Amiga4000 Nostalgia & memories 12 31 January 2020 15:06
Google Search 404 error DH project.EAB 6 07 November 2017 15:52
Search results with BIG thumbnails? rsn8887 HOL suggestions and feedback 1 22 September 2017 02:31
WTF Google? Fingerlickin_B Amiga scene 33 26 June 2015 12:43

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 12:31.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.15237 seconds with 14 queries