17 April 2024, 01:58 | #41 |
Registered User
Join Date: Nov 2009
Location: Top of the world
Posts: 181
|
Using search term "site:eab.abime.net" on Google shows exactly how many pages are indexed at the moment.
And according to the result, the issue can not be robots.txt. At this moment it got some of the forums like https://eab.abime.net/forumdisplay.php?f=37, and some attachments links, but no threads. If robots.txt was the issue, these would have been gone with the rest. The strange thing with the result is that it claims to have "About 94 results" but will show only 17 links. One explanation might be the reoccurring issues with long periods of "The server is too busy at the moment. Please try again later." have triggered a severe reduction in Google systems internal quality ranking of EAB. Last edited by hceline; 17 April 2024 at 02:04. Reason: Spelling |
17 April 2024, 05:28 | #42 |
Registered User
Join Date: Feb 2016
Location: Homeless
Posts: 67
|
Actually robots.txt won't suppress results that were found in external links from third-party webpages to eab, that's described here, https://developers.google.com/search...g/robots/intro (including a big warning saying to use noindex instead robots.txt if you want to suppress external links).
So far, the problem could be very well caused by robots.txt. But I don't know why google is treating eab as a Hall of Light. Maybe there is another webpage out there that is describing eab as the Hall of Light? Or maybe google is downloading parts of the eab html pages, up to including the Hall of Light search box? (Btw. that search box is confusing even to humans, why on earth does the forum's search box redirect to Hall of Light search? Yes, I know that there is another search button hidden away between Off Topic and Quick Links, which is even more confusing, why would anyone do that??) |
17 April 2024, 07:41 | #43 |
Registered User
Join Date: Nov 2009
Location: Top of the world
Posts: 181
|
EDIT: Nevermind, it seems google has removed support for the "links:" keyword. So the "links:eab.abime.net" number says nothing. But I will contend that there must be lots of external links directly to EAB threads all over the net. And those not showing up would indicate that the problem is elsewhere.
Well if you use the Google search term "links:eab.abime.net" you get at this moment: About 246,000 results While "site:eab.abime.net" gives now: About 73 results (after scrolling down and clicking "repeat the search with the omitted results included.") So if robots.txt does not suppress external links, those two numbers should prove that something else is at play. Last edited by hceline; 17 April 2024 at 07:49. |
17 April 2024, 08:36 | #44 | ||
Registered User
Join Date: Nov 2009
Location: Top of the world
Posts: 181
|
Quote:
Quote:
and the Google results apparently had started disappearing by December 12th 2023: https://eab.abime.net/showpost.php?p...2&postcount=96 |
||
17 April 2024, 08:40 | #45 |
Ex nihilo nihil
Join Date: Oct 2017
Location: CH
Posts: 5,123
|
I don't find Google search result that reliable (they way to much push in front paid result/advert), but for what it is worth, or for the curious ones, here is an updated list of their search options :
https://ahrefs.com/blog/google-advan...rch-operators/ Google page on the subject: https://support.google.com/websearch.../2466433?hl=en |
17 April 2024, 15:51 | #46 | |
Registered User
Join Date: Feb 2016
Location: Homeless
Posts: 67
|
Quote:
Code:
User-agent: GPTBot User-agent: CCBot User-agent: ChatGPT-User Disallow: / User-agent: * Allow: / 1) http://fileformats.archiveteam.org/w...usion_Standard "Lines with nothing but a comment are ignored, so don't count as blank lines for the purpose of ending a section of the file." 2) http://www.robotstxt.org/robotstxt.html "you may not have blank lines in a record, as they are used to delimit multiple records." 3) https://searchengineland.com/a-deepe...obotstxt-17573 "Each User-Agent/Disallow group should be separated by a blank line; however no blank lines should exist within a group (between the User-agent line and the last Disallow)." And pretty much all robots.txt examples and real-life robots.txt files are using blank lines. That said, these files are both lacking blank lines (but the latter one doesn't seem to have harmful impact on the search results): https://eab.abime.net/robots.txt https://amiga.abime.net/robots.txt PS. If present, the Sitemap line should be also separated by a blank line. Last edited by nocash; 17 April 2024 at 16:00. |
|
17 April 2024, 22:18 | #47 | |
Italian Amiga Zealot
Join Date: Jan 2009
Location: Italy
Age: 36
Posts: 1,926
|
Quote:
Honestly I would gladly give up “Amiga” (Amiga with a PiStorm only, maybe? ) support in exchange for a modern forum like XenForo. I am on mobile 99% of the times I am on EAB anyway - but maybe that’s just me. I guess you could always do some crazy stuff like statically cache every forum page using something like this: https://docs.litespeedtech.com/lscache/lscxf/ Static pages should load MUCH quicker on a real Amiga It could also help with the Google indexing issue… |
|
22 April 2024, 17:13 | #48 | ||
Administrator
Join Date: Feb 2001
Location: Paris / France
Age: 46
Posts: 3,099
|
Quote:
- allow once again everyone - add a whiteline between Allow: / and sitemap Quote:
Last edited by RCK; 24 April 2024 at 13:16. |
||
23 April 2024, 22:22 | #49 |
Registered User
Join Date: Feb 2016
Location: Homeless
Posts: 67
|
Google is working again for eab since about 20 hours ago.
I've no idea if it was caused by robots.txt, in the "server is too busy" thread you had mentioned to block a whole range of IPs and (and least temporarily) all Mac users - and now switched to UBBB ultimate bad bot blocker - that might also have caused/fixed the problem (especially if that blocking did apply only to eab.abime, and not to amiga.abime, since the latter one did still work okay with google). Btw. another (dubious) idea would be to add Crawl-delay in robots.txt to prevent bots from overloading the server, something like this Code:
User-agent: * Crawl-delay: 2 Allow: / And it's not really clear what the delay is supposed to do at all: Most webpages will confidently tell you that it's a delay in seconds between page accesses. But I've also seen a webpage that claims that the delay is counted in milliseconds. And the "official" specs https://blogs.bing.com/webmaster/Aug...awler,-MSNBot/ are merely saying that the delay must be a positive whole number and that values higher than 10 will severely affect the ability of the bot to effectively crawl your site (whatever that means). In short, I wouldn't recommend to try to use Crawl-delay (unless you should run into more problems about the server getting too busy in future). Last edited by nocash; 23 April 2024 at 22:30. |
24 April 2024, 05:27 | #50 |
HOL/FTP busy bee
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 32,409
|
|
24 April 2024, 13:14 | #51 | |
Administrator
Join Date: Feb 2001
Location: Paris / France
Age: 46
Posts: 3,099
|
Quote:
How are you sure of that ? One google query with "site:eab.abime.net" only show up 837 results, it's far from the 200k urls waiting in the google search console. |
|
24 April 2024, 14:18 | #52 | |
Registered User
Join Date: May 2020
Location: Figueira da Foz
Posts: 451
|
Whereas before none result would appear, if you search in google for:
Quote:
|
|
24 April 2024, 19:15 | #53 |
Registered User
Join Date: Feb 2016
Location: Homeless
Posts: 67
|
The different abime subdomains did have identical robots.txt files, so that doesn't explain why some subdomains did work with google, and others didn't. I think the missing linebreak didn't affect google (but it could have caused problems for other bots).
837 search results - with descriptions - is a huge improvement. Previously it was about 20 search results - without descriptions - meaning that google didn't even bother to look what is in the pages. I guess everything should get sorted out automatically now, although it may take some months until all pages are re-indexed (as the sitemap files have crawl intervals set to "yearly" for many of the older pages) (though it's also possible that all pages will be crawled starting from "now", rather than from "12 months after last visit"). Last edited by nocash; 24 April 2024 at 19:24. |
25 April 2024, 22:15 | #54 |
Administrator
Join Date: Feb 2001
Location: Paris / France
Age: 46
Posts: 3,099
|
site:eab.abime.net show 2400 results today, I got hope
|
10 May 2024, 01:08 | #55 |
Administrator
Join Date: Feb 2001
Location: Paris / France
Age: 46
Posts: 3,099
|
Google continue to index, we have now 85000 pages with site:eab.abime.net requests
|
10 May 2024, 06:39 | #56 |
HOL/FTP busy bee
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 32,409
|
Good news I can also see EAB results returning for standard Amiga related Google searches.
|
10 May 2024, 23:02 | #57 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,753
|
Top stuff.
|
24 May 2024, 17:08 | #58 |
Administrator
Join Date: Feb 2001
Location: Paris / France
Age: 46
Posts: 3,099
|
15 days laters our number of result for site:eab.abime.net is going down to 26800
(ps: I changed nothing) |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Google Drive ? | Washac | project.WHDLoad | 0 | 17 September 2020 11:15 |
You ever search Google..... | Amiga4000 | Nostalgia & memories | 12 | 31 January 2020 15:06 |
Google Search 404 error | DH | project.EAB | 6 | 07 November 2017 15:52 |
Search results with BIG thumbnails? | rsn8887 | HOL suggestions and feedback | 1 | 22 September 2017 02:31 |
WTF Google? | Fingerlickin_B | Amiga scene | 33 | 26 June 2015 12:43 |
|
|