Creating new wiki pages seems broken today...
On Wed, 2010-07-28 at 13:05 -0700, Dan Kegel wrote:
Creating new wiki pages seems broken today...
Yes, due to all the spam, we've hit the ext3 limit of subdirectories (32k). More here: http://www.rooftopsolutions.nl/blog/135
I'm looking into how we can clean this up.
On 28 July 2010 21:49, Dimi Paun dimi@lattica.com wrote:
On Wed, 2010-07-28 at 13:05 -0700, Dan Kegel wrote:
Creating new wiki pages seems broken today...
Yes, due to all the spam, we've hit the ext3 limit of subdirectories (32k). More here: http://www.rooftopsolutions.nl/blog/135 I'm looking into how we can clean this up.
Ubuntu hit this one:
https://bugs.edge.launchpad.net/ubuntu/+source/moin/+bug/217191 http://moinmo.in/MoinMoinBugs/AllPagesSavedToSingleDirectory
The other solution is permanent deletion of the spam pages from the actual file system. I've done such pruning before, and it needs (obviously) to be done with *remarkable* care. It's also very fiddly. I eventually cobbled together scripts to do the deletion for me. (At an old workplace, I don't have them to hand.) The MoinMoin page above lists maintenance scripts that can do it for you.
They also suggest moving the wiki directories to a filesystem that can allow stupid amounts of directories, like XFS. (Even ext4 only scales to 64,000 directories.)
MoinMoin 2.0 will apparently use a database instead of flat files. ETA: some time or other in the far future. "we can't tell exactly when the new storage stuff will be production ready, but I expect end 2008 .. mid 2009." Ahem.
Oh, and moinmo.in regards this as not being a "bug", but the result of bad file system design. (And not, e.g., a wiki that doesn't scale.)
- d.
On Wed, Jul 28, 2010 at 23:35, David Gerard dgerard@gmail.com wrote:
On 28 July 2010 21:49, Dimi Paun dimi@lattica.com wrote:
On Wed, 2010-07-28 at 13:05 -0700, Dan Kegel wrote:
Creating new wiki pages seems broken today...
Yes, due to all the spam, we've hit the ext3 limit of subdirectories (32k). More here: http://www.rooftopsolutions.nl/blog/135 I'm looking into how we can clean this up.
Ubuntu hit this one:
https://bugs.edge.launchpad.net/ubuntu/+source/moin/+bug/217191 http://moinmo.in/MoinMoinBugs/AllPagesSavedToSingleDirectory
The other solution is permanent deletion of the spam pages from the actual file system. I've done such pruning before, and it needs (obviously) to be done with *remarkable* care. It's also very fiddly. I eventually cobbled together scripts to do the deletion for me. (At an old workplace, I don't have them to hand.) The MoinMoin page above lists maintenance scripts that can do it for you.
They also suggest moving the wiki directories to a filesystem that can allow stupid amounts of directories, like XFS. (Even ext4 only scales to 64,000 directories.)
https://ext4.wiki.kernel.org/index.php/Ext4_Howto#Sub_directory_scalability seems to indicate there is no such limit. Maybe this was the case a couple of years ago. Additionally, migrating from ext3 to ext4 should give the least headaches (maybe a kernel recompile, YMMV)
MoinMoin 2.0 will apparently use a database instead of flat files. ETA: some time or other in the far future. "we can't tell exactly when the new storage stuff will be production ready, but I expect end 2008 .. mid 2009." Ahem.
Oh, and moinmo.in regards this as not being a "bug", but the result of bad file system design. (And not, e.g., a wiki that doesn't scale.)
- d.
On Wed, 2010-07-28 at 22:35 +0100, David Gerard wrote:
Ubuntu hit this one:
https://bugs.edge.launchpad.net/ubuntu/+source/moin/+bug/217191 http://moinmo.in/MoinMoinBugs/AllPagesSavedToSingleDirectory
Thanks David for the links.
I've run the cleanup scripts, and we are now down to ~5K pages, down from 32K. So there is still plenty of room to grow for the time being.
If we hit the limit again, please let me know and I'll clean it up right away, now I know what I need to do :)
P.S. There is still something wrong with the Wiki, saving pages takes a really long time with no reason whatsoever (no load on the box, etc). I think we're hitting an inefficiency in Moin, as the httpd process shoots up to 95% CPU usage for a few good seconds. I've trimmed the edit-log and the event-log files, which were very big, but that doesn't seem to help. Any other ideas?
Dimi Paun wrote:
On Wed, 2010-07-28 at 22:35 +0100, David Gerard wrote:
Ubuntu hit this one:
https://bugs.edge.launchpad.net/ubuntu/+source/moin/+bug/217191 http://moinmo.in/MoinMoinBugs/AllPagesSavedToSingleDirectory
Thanks David for the links.
I've run the cleanup scripts, and we are now down to ~5K pages, down from 32K. So there is still plenty of room to grow for the time being.
If we hit the limit again, please let me know and I'll clean it up right away, now I know what I need to do :)
P.S. There is still something wrong with the Wiki, saving pages takes a really long time with no reason whatsoever (no load on the box, etc). I think we're hitting an inefficiency in Moin, as the httpd process shoots up to 95% CPU usage for a few good seconds. I've trimmed the edit-log and the event-log files, which were very big, but that doesn't seem to help. Any other ideas?
Yes, the LocalBadContent page got pretty long; I'm fairly sure it's the spam checking that takes so long.
bye michael
On Thu, 2010-07-29 at 18:17 +0200, Michael Stefaniuc wrote:
Yes, the LocalBadContent page got pretty long; I'm fairly sure it's the spam checking that takes so long.
I tried to empty it, and it does seem to help. However, it's not the only cause of the problem, it's still not fast even with an empty LocalBadContent.
On Thu, 29 Jul 2010, Dimi Paun wrote:
On Thu, 2010-07-29 at 18:17 +0200, Michael Stefaniuc wrote:
Yes, the LocalBadContent page got pretty long; I'm fairly sure it's the spam checking that takes so long.
I tried to empty it, and it does seem to help. However, it's not the only cause of the problem, it's still not fast even with an empty LocalBadContent.
I have a theory: did the script move the remaining files to another directory? If not it may be that there's a fragmentation problem at the directory level; i.e. the directory structure was grown to accomodate 32k entries, not there's only 5k entries but they are spread over the old 32k entries leading to inefficient lookups? If so something like this should fix it:
mkdir newdir mv olddir/* newdir # hope there's no dot file rmdir olddir mv newdir olddir
On Fri, 2010-07-30 at 08:58 +0200, Francois Gouget wrote:
I have a theory: did the script move the remaining files to another directory?
Yes, it did.
On Wed, Jul 28, 2010 at 1:49 PM, Dimi Paun dimi@lattica.com wrote:
Yes, due to all the spam, we've hit the ext3 limit of subdirectories (32k). More here: http://www.rooftopsolutions.nl/blog/135
I'm looking into how we can clean this up.
Should we also add another hurdle (possibly even manual approval) to make it harder for spammers to get accounts?
On Wed, 2010-07-28 at 15:06 -0700, Dan Kegel wrote:
I'm looking into how we can clean this up.
Should we also add another hurdle (possibly even manual approval) to make it harder for spammers to get accounts?
Now that this issue is fixed, we can look again at the spam problem. It was suggested that we use a 'TextChas' for non logged in users: http://moinmo.in/HelpOnSpam
But it seems it's not too easy to come up with decent questions. Should we try it?
On Thu, Jul 29, 2010 at 9:01 AM, Dimi Paun dimi@lattica.com wrote:
Should we also add another hurdle (possibly even manual approval) to make it harder for spammers to get accounts?
Now that this issue is fixed, we can look again at the spam problem. It was suggested that we use a 'TextChas' for non logged in users: http://moinmo.in/HelpOnSpam
But it seems it's not too easy to come up with decent questions. Should we try it?
I like the idea. It is hard, but here are some possible questions:
"What is the first name of the Finn who created the Linux operating system?" "What is the abbreviation for the GNU C Compiler?" "What is the name of the simple text editor that comes with Windows?" "Complete the phrase: ____ screen of death" "What is the name for a billion bytes?"
I have no idea if those will faze spammers. - Dan
On Thu, Jul 29, 2010 at 8:41 PM, Dan Kegel dank@kegel.com wrote:
I like the idea. It is hard, but here are some possible questions:
"What is the first name of the Finn who created the Linux operating system?" "What is the abbreviation for the GNU C Compiler?" "What is the name of the simple text editor that comes with Windows?" "Complete the phrase: ____ screen of death" "What is the name for a billion bytes?"
I have no idea if those will faze spammers.
Those questions will certainly keep away non-geek spammers. We could make a quiz-like anti-spam system with not-so-trivial questions :)
Octavian
"Octavian" == Octavian Voicu octavian.voicu@gmail.com writes:
>> "What is the name for a billion bytes?"
Terabyte, at least in germany. Billion -> 10^12. 10^9 -> "Milliarde"
So these questions can be tricky...
On Thu, Jul 29, 2010 at 18:01, Dimi Paun dimi@lattica.com wrote:
Now that this issue is fixed, we can look again at the spam problem. It was suggested that we use a 'TextChas' for non logged in users: http://moinmo.in/HelpOnSpam
But it seems it's not too easy to come up with decent questions. Should we try it?
How difficult would it be to use ReCaptcha?
http://www.google.com/recaptcha
Thanks, Alex
On Tue, 2010-08-03 at 14:30 +0200, Alexandru Băluț wrote:
How difficult would it be to use ReCaptcha?
Hm, don't know. We could hack our version to support recaptcha, but I'm not familiar with the code base, and I don't have the time right now. But I can take patches if someone is willing to do it.