The Clockwork Mansion

Outskirts => The Villa => Topic started by: Amber Williams on September 12, 2006, 12:43:58 AM

Title: DMFA forum redirected.
Post by: Amber Williams on September 12, 2006, 12:43:58 AM
Just a heads up, the DMFA pages now point to this forum instead of the Nice.  Which means Phase 2 is underway.  Anyone still wanting to get stuff off the old forum best do it now or very soon.  Also, this will likely be the last week or so where DMFA people can get their old postcounts back.

In about 2 weeks I will likely request the old forum on the Nice be removed.
Title: Re: DMFA forum redirected.
Post by: Aridas on September 12, 2006, 01:15:36 AM
I plan to be crazy-obsessive and get the forum archived, hopefully before then. That way, anyone who misses something they want will have a mirror site... technically.
Title: Re: DMFA forum redirected.
Post by: Tapewolf on September 12, 2006, 07:16:27 AM
Quote from: Aridas Soulfire on September 12, 2006, 01:15:36 AM
I plan to be crazy-obsessive and get the forum archived, hopefully before then. That way, anyone who misses something they want will have a mirror site... technically.

I would appreciate that.  I still find myself looking things up on The Nice fairly regularly.
Title: Re: DMFA forum redirected.
Post by: Zedd on September 12, 2006, 02:05:55 PM
Well now ms Amber....I do hope it wont be too much a hurry doing that
Title: Re: DMFA forum redirected.
Post by: Amber Williams on September 12, 2006, 03:13:14 PM
Considering how long it took for me to change the links on the main page to the new forum...I wouldn't expect the old forum to vanish overnight.
Title: Re: DMFA forum redirected.
Post by: Azlan on September 12, 2006, 03:31:48 PM
*poof* Old forum now gone...

~forum faerie
Title: Re: DMFA forum redirected.
Post by: Zedd on September 12, 2006, 04:28:25 PM
Oh noes! 'A'
Title: Re: DMFA forum redirected.
Post by: llearch n'n'daCorna on September 12, 2006, 05:58:32 PM
Quote from: Amber Panyko on September 12, 2006, 03:13:14 PM
Considering how long it took for me to change the links on the main page to the new forum...I wouldn't expect the old forum to vanish overnight.

*grin* but we like you like this, Amber. If you got all efficient and stuff, we wouldn't recognise you anymore :-)
Title: Re: DMFA forum redirected.
Post by: Aridas on September 14, 2006, 12:40:09 AM
Ok, that didn't work too well. my program hit its limit and won't do any more. Is anyone experienced with, or at least potentially capable of saving offline site mirrors? I can give you the page to start at to get the entire DMFA forum as opposed to the past 45 days, but you'll have to figure out the rest yourself and.. well... find something without a page limit. I can't find any that's easy enough for me to use.

Either that, or we beg Amber to let the forum stay up for all eternity.
Title: Re: DMFA forum redirected.
Post by: llearch n'n'daCorna on September 14, 2006, 01:58:20 AM
wget -q -nc -w 1 --random-wait --limit-rate=50k -nH -E -m -p -np $URL

?
Title: Re: DMFA forum redirected.
Post by: Aridas on September 14, 2006, 03:52:07 AM
someone else can go get it, i've put a bit too much effort into it as it is. >.>
besides, aren't we going to need some sort of limity thing to keep it from leaking into things other than the dmfa section and the images displayed from other sites? I had something doing that with enough ease of use that even I could figure it out, but I need $2000 to get a version with a high enough limit to get the whole thing. >_>

But anyway, for those of you wondering though, you'll be able to archive it starting from this url:

http://nice.purrsia.com/cgi-bin/ultimatebb.cgi?ubb=forum&f=79&DaysPrune=1000&submit=Go
Title: Re: DMFA forum redirected.
Post by: llearch n'n'daCorna on September 14, 2006, 05:04:51 AM
in process, then.

Currently on 761 pages, 66Mb downloaded....
Title: Re: DMFA forum redirected.
Post by: Tapewolf on September 14, 2006, 06:39:57 AM
Quote from: llearch n'n'daCorna on September 14, 2006, 05:04:51 AM
in process, then.

Currently on 761 pages, 66Mb downloaded....

It never occurred to me that you could use WGET on a forum.  I'll have to give that a go..
Title: Re: DMFA forum redirected.
Post by: Gabi on September 14, 2006, 10:45:32 AM
Wow, llearch, which server are you hosting that on?
Title: Re: DMFA forum redirected.
Post by: llearch n'n'daCorna on September 14, 2006, 11:04:22 AM
Uh. None, yet. It'll be on my personal server, once I get the machines swapped over, maybe, if i feel that way inclined.

Don't hold your breath. :-)

it's up to 8500 files, 213Mb.


I think it needs some help, though, as it's getting confused...

Nuts. wget won't do it, as it won't filter on the query string. Which means i'll have to come up with something more complex... :-/
Title: Re: DMFA forum redirected.
Post by: Aridas on September 14, 2006, 12:43:53 PM
Teleport Pro was doing an excellent job of saving my pages, til it decided to give up.
Title: Re: DMFA forum redirected.
Post by: Sid on September 15, 2006, 08:06:03 PM
Quote from: llearch n'n'daCorna on September 14, 2006, 11:04:22 AM
Nuts. wget won't do it, as it won't filter on the query string. Which means i'll have to come up with something more complex... :-/

I once (March 31) used cURL on the Print View pages in batches of 50 (to only produce short bandwidth spikes as opposed to a freaking huge OMGWTF block). It worked well from what I can see, and I'll do it again on the weekend. Not stopping anybody else from doing the same or similar things, just doing it because I want to have a local archive anyway.

If anybody is interested, I could make a ZIP file of those pages afterwards and host it. *shrugs*
Title: Re: DMFA forum redirected.
Post by: llearch n'n'daCorna on September 15, 2006, 08:49:35 PM
There's an idea.

*codecode* ok, it's whipping away. I'll have to manually get the css and stuff, but that's not too hard... heck, I could wget a singe page and get all that.

I'll let you all know where it is when I'm done.

Edit:
Oh, and just in case anyone is interested:

pushd /home/llearch/nice
for i in $(seq -w 44 3660); do
    curl --create-dirs -f -# -o "cgi-bin/ultimatebb.cgi?ubb=print_topic;f=79;t=00$i.html" --url "http://nice.purrsia.com/cgi-bin/ultimatebb.cgi?ubb=print_topic;f=79;t=00$i"
    sleep 5
done
popd
Title: Re: DMFA forum redirected.
Post by: skwerly on September 20, 2006, 09:33:43 PM
Welp, I managed to whip Mr Aridas's Teleport Pro into shape, and got all 3655 printer-friendly topics archived.  They weigh in at about 120MB, with another 4000 images (200MB) to go with 'em.

I also grabbed the 102-page topic list, and thought it'd be neat to combine 'em all into one page.  Uh, that didn't work out so well.  I ended up with a 5MB takes-forever-to-load html file.  Looks purdy though. :)

Teleport chokes on the regular version of the forum (waaay too many url references), so I can't help there.  I'm sure Mr llearch or someone has a way to pull it off.  If not, well, at least we'll have something..

Say, while I'm here, does anyone have the images from this topic (http://nice.purrsia.com/cgi-bin/ultimatebb.cgi?ubb=get_topic;f=79;t=001900), where Ms Amber was showing us the process of drawing Mischa?  It's pre-Haxored, so all the links are broken now.  That's the trouble with webcrawlers these days.  They just don't want to archive stuff that's not there anymore. *sigh*
Title: Re: DMFA forum redirected.
Post by: Damaris on September 21, 2006, 12:06:30 AM
The Professor painstakingly recreated that thread in the Art Forum, I do believe.  It has the pictures as well.
Title: Re: DMFA forum redirected.
Post by: Sid on September 21, 2006, 05:07:02 PM
I also got the 120MB package and a small index (no fancy stuff, just a LONG list with the thread names and links - ordered by creation date and with "200x" headings whenever a new year came by). Image URLs are non-local (so it tries to get the images from the remote locations). The folder compresses neatly into 25MB (even though that's just using my default archiver's default options - I might be able to squeeze it more with other means, who knows).

Oh, and the tutorial is at http://clockworkmansion.com/forum/index.php?topic=719.0
Title: Re: DMFA forum redirected.
Post by: Amber Williams on September 21, 2006, 07:57:09 PM
Since you guys seem to have a good handle on things, I'm going to announce that when I return from the States, I'll be making the formal request for the Nice DMFA forum to be removed. (which might take a few weeks to get done for all I know)

I'll be leaving next Mon-Tues and expected to return a week from then if not sooner.
Title: Re: DMFA forum redirected.
Post by: skwerly on September 21, 2006, 11:57:53 PM
I'm making one last attempt at grabbing the regular version with Teleport, using hideous restrictions so it'll stop wandering into too many URLs I don't want and crashing.

Why am I doing this?  Because the printer-friendly version doesn't have avatars or signatures.  And really, what's the point of a forum without avatars and signatures?  No one believes it when you say "I only visit them for the articles!" anyway.

Meanwhile, the Purrsia admins are screaming "STOP WASTING MY BANDWIDTH ALREADY!!"

*cringe*

Damaris, Sid, and Prof: Thank you! *happysnoopydance*

Ms Amber: Enjoy your trip, good luck n all that.
Title: Re: DMFA forum redirected.
Post by: Tapewolf on September 22, 2006, 06:17:49 AM
Sid, I'd be interested in having a copy of your archive if that's not a problem.

**EDIT**
INTERESTED, not INTERESTING dammit!  Don't post in a hurry...
Title: Re: DMFA forum redirected.
Post by: Kasarn on September 22, 2006, 06:25:16 AM
Quote from: skwerly on September 21, 2006, 11:57:53 PM
Meanwhile, the Purrsia admins are screaming "STOP WASTING MY BANDWIDTH ALREADY!!"

What happened when somebody ganked Monoceros Media...
http://tugrik.livejournal.com/496847.html

Of course, there was also the time Tug was slashdotted... that caused The Nice to die for a day...
Title: Re: DMFA forum redirected.
Post by: Sid on September 22, 2006, 06:43:30 AM
Quote from: Tapewolf on September 22, 2006, 06:17:49 AM
Sid, I'd be interesting in having a copy of your archive if that's not a problem.

Going to leave the house in a few (seminar... so full of hate...), but I'll upload it once I return (early evening, I guess). :)

Edit:
http://www.youkai.de/DMFA_Forum_(Sep_17).zip
- Roughly 25 MB (uncompressed size: roughly 120 MB)
- Contains (as far as I was able to verify without going through all threads by hand) the entire Forum in PrintView Mode
- File contains no images, all image URLs point to the original URLs (Advantage: You can download only this file without getting tons of broken images. Disadvantage: Net connection needed for full experience, threads like PAY need LONG to load due to suddenly having to load dozens of images.)
- Contains "index.htm" that shows all thread titles (sorted by thread creation date), links, and when a new year started
- Not sure if this extracts to a new folder automatically. Better be on the safe and put the ZIP in an empty folder before doing careless "Extract all here" actions. :P
Title: Re: DMFA forum redirected.
Post by: skwerly on September 23, 2006, 08:25:35 PM
Kudos, Sid!

Okay, while Sid here's been distributing a printer-friendly archive, I've been trying to beat Teleport Pro into submission over the "normal" version.  What do we call that, anyway? Non-printer-friendly? Graphic-intensive? Whatever.  Anyway, I think I finally got it.

Problem is, TP simply can't handle all the URLs from such a large project, and even trying to break it into chunks has it choking on the multi-page threads. (APF, anyone?)  Sooo, the only thing I could come up with is to break it into the smallest chunks possible: one project per page.  It's almost like telling my browser to Save As.. Complete Web Page, but more automated and much less boring.

Anyway, the batches (8000 of 'em!) are running now, one at a time, and should be ready by.. um.. Tuesday.  I'm doing only one thread with a delay between files so the Purrsia admins don't scream so loud.  Then I just mash it into a "flat" archive and delete the billions of duplicate images.  I better free up some disk space. :)

I suppose I should ask if there's anyone (besides me) who'd be interested in the full graphic version once it's done.
Title: Re: DMFA forum redirected.
Post by: Aridas on September 23, 2006, 08:28:09 PM
It would be meeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee yes indeed
Title: Re: DMFA forum redirected.
Post by: bill on September 23, 2006, 08:29:50 PM
Are those extraneous "e"s necessary?
Title: Re: DMFA forum redirected.
Post by: Aridas on September 23, 2006, 08:30:55 PM
Indeeeeeeeeeeeeeeeed.
Title: Re: DMFA forum redirected.
Post by: llearch n'n'daCorna on September 24, 2006, 04:23:32 AM
Skwerty, I'm happy to host the full version, if you want.

I'm sure there's a bunch of others who'd also be willing...
Title: Re: DMFA forum redirected.
Post by: skwerly on September 24, 2006, 08:39:21 AM

Aridas: m'kay. Eeee.

ttearch :) ,  do you have a size limit?  The images are going to take up at least a couple gig.

Speaking of images... I don't know why I didn't consider this before, but there's a huge mess of copyrighted works here, especially in the fan art threads.  Since they were posted in the forum, does that make them public domain and freely distributable?  I doubt I'll be able to find three years worth of artists for their blessings.
Title: Re: DMFA forum redirected.
Post by: llearch n'n'daCorna on September 24, 2006, 09:43:38 AM
As far as size limits go, approx 30Gb, I think.

I suspect, as far as the forum goes, posting in the forum makes the content owned by the hoster of the forum, and arguing that you really want it taken down is probably not going to fly.

Technically, one should get permission for re-hosting it, but since there's no -explicit- "don't rehost this" disclaimers on any of the forum posts, the -implicit- rules are basically those of the wide web, ie, anything goes.

IANAL, but I'd expect that, as long as we weren't making money off it, and were willing to take down any posts that anyone wanted removed (which, for my part, I am) we'd probably get away with it. You may wish to add a front page that documents that...
Title: Re: DMFA forum redirected.
Post by: Damaris on September 24, 2006, 09:50:40 AM
Posting it for public consumption does not make it public domain.  The original owners still own the copyright, and would be well within their rights to demand anything being taken down (especially since they won't be able to edit it out of their posts anymore.)

However, since you guys are hosting a mirror of stuff that was originally posted somewhere else, and are not altering the forum in any significant way, it should be fine.  Like you said, you should be able to get away with it, especially if credit is given on everything (which it would, since it's a mirror)
Title: Re: DMFA forum redirected.
Post by: llearch n'n'daCorna on September 24, 2006, 10:05:00 AM
Oh, -yes-

If any of the original posters was to appear and ask for stuff to be removed, removed it would be. No arguments, no questions - other than confirming the veractiy of the person claiming to own the content, and that's merely a formality, since I prefer to err on the side of safety - in this, at least, if nothing else...

However, my point was, the copyright may, depending on the EULA that was clicked through to register with Nice, belong to Nice, rather than to the original poster.

The author has rights, but so does the Nice forums, and I don't recall anything twisty about the EULA, but then, I figure anything I post on the web is fair game to -everyone- who can see it, and there's technically nothing I can do to stop anyone taking copies, except asking politely. *shrug* So attempting to stop people from taking copies is just going to stop people from being able to see the thing I've put up, and since I put stuff up so people can see it....


*cough* if that's not enough to explain, nothing is. :-)
Title: Re: DMFA forum redirected.
Post by: skwerly on September 24, 2006, 04:46:41 PM
Eep!  I don't want to get into a discussion on the details of copyright law or anything.. I was mainly concerned for the artists.  We'd be redistributing art that others drew, and I don't want to upset those others.

As for The Nice, their registration page says they don't review the posts, and is not responsible for the content.  "You remain solely responsible for the content of your messages."  So, copyright law aside, I'm taking that to mean I should worry about the artists, not The Nice or Purrsia.

I'll go with the "easier to ask forgiveness than permission" route, I suppose.  If a poster wants his/her post or image removed, it gets removed, no questions asked.  They'd have to actually notice the mirror first, but I'll assume there's enough artists/posters reading this that word will get around quickly enough.

If anyone else has concerns about this, please speak up!  (If I'm the only one, then I'll just write it off as paranoia and move on to my next crisis.  I have a list.)
Title: Re: DMFA forum redirected.
Post by: Aridas on September 24, 2006, 04:50:09 PM
Well, since we're not putting it up anywhere other than where it was already put up, that point should be done with. The fact that we're only preserving what's about to disappear should mean that there's nothing going on.
Title: Re: DMFA forum redirected.
Post by: llearch n'n'daCorna on September 24, 2006, 05:49:13 PM
So.. what's next on the list, skwerly? :-)
Title: Re: DMFA forum redirected.
Post by: skwerly on September 24, 2006, 07:14:46 PM
llearch sez:
QuoteSo.. what's next on the list, skwerly? :-)

Getting the silly thing to actually work!  That's enough of a crisis for now.  1900 pages archived, about 6000 to go.  3 more days of downloading, then another day or two to mash it flat and fix links.  The semicolons in the filenames give Firefox fits.

Wheeee! *thud*
Title: Re: DMFA forum redirected.
Post by: skwerly on September 28, 2006, 08:24:06 AM
(Hmm.. Do double posts count as A Bad Thing if they're 4 days apart?)

Update:

Archive complete!  Burning a backup now in case I screw up the repairs.  Gimme a few days to make it mirrorable.
Title: Re: DMFA forum redirected.
Post by: Tapewolf on September 28, 2006, 08:27:37 AM
Quote from: skwerly on September 28, 2006, 08:24:06 AM
Archive complete!  Burning a backup now in case I screw up the repairs.  Gimme a few days to make it mirrorable.

Cool.  How big is it in total?

**EDIT**
And did you include other resources, such as music?  There are some files that I'm only keeping up because The Nice links to them.. when it goes away, I'll take them down since most of them were WIP files that have since been superceded.
Title: Re: DMFA forum redirected.
Post by: llearch n'n'daCorna on September 28, 2006, 08:28:30 AM
If we're going to mirror it, supplying a compressed version would be significantly better in terms of bandwidth etc.

Not least - opening and closing more than 4000 files generates some overhead...
Title: Re: DMFA forum redirected.
Post by: skwerly on September 28, 2006, 09:30:07 AM
@Tapewolf: It's currently weighing in at 5.3GB, but I suspect at least half of that is from duplicate images.  Since I had to archive each page separately, that means 8000 copies of navigation buttons, for example.  I'll have that cleared up soon enough.

Also, I archived only the text and embedded objects, and I chose not to have links point to the actual URL.  They all point to "where the local file would be if I grabbed it", and I didn't grab anything from links.  Because of all the navigation links, Teleport Pro kept choking on anything past the actual page I wanted.  It's why I had to create 8000 projects in the first place.

Now, if anyone wants to browse their favorite topics, download some linked images/mp3s/etc, and send them to me, I can probably add them to the archive with no problem.  All I'd need is the original filename.  It won't matter which topic or page the link came from.  Remember that I already have the embedded stuff.  If you can see it on the page, I have it.  If it's just a link, I don't have it.

@llearch: I know nothing about hosting, so I'll hand over the raw data and let you take care of compression and whatnot.  Is there anything HTMLish I should take into account while I change filenames and links around?  I'm removing the semicolons, as I said earlier.  I'm also removing cookie references and page headers/footers.  There's really no need for login/register/reply/etc links in an archive.

One thing that may or may not be a concern is all those links generating 404 errors, since they'll be looking at your server for the files.  I hadn't planned on doing anything about 'em, though.  Should I?
Title: Re: DMFA forum redirected.
Post by: llearch n'n'daCorna on September 28, 2006, 10:07:08 AM
uh.

I meant for transporting it from where you are to where my server is. For that step, zipping the whole lot up into a single file, or even a bunch of files of reasonable size, and letting us download those large chunks, is far more efficient, networkwise, than downloading 8000 individual files.

As for the links... hmm. Uh. I guess we can ignore them, for the moment. Not a huge issue - I figure I can set up another domain and a vhost for it.
Title: Re: DMFA forum redirected.
Post by: Aridas on September 28, 2006, 02:22:23 PM
I take it you didn't consider setting exceptions for Teleport?
Title: Re: DMFA forum redirected.
Post by: skwerly on September 28, 2006, 07:26:06 PM
@llearch: D'oh! Well, that pretty much proves I know nothing of hosting, or I woulda understood that the first time.  I was going to send you a DVD, but I can upload a zip file, sure.

@Aridas: I tried just about every combination of settings I could think of, including using exceptions.  Some combos worked better than others, but none worked well enough to grab only what I wanted without hitting the 65k url limit.  That was about two days of experimentation.  Once I finally hit on the one-project-per-page idea, I was tired of testing, and I went with a somewhat minimal approach.

@Tapewolf: New estimate. After finding over 200,000 duplicate files (still deleting, ACDSee's having a blast), the archive will be down to about 1.5GB.