Jump to content

Old f4g posts for UO guilds on europa


Recommended Posts

Hello All,

 

Few years ago the f4g site crashed and all data was lost. There was plethora of stories and reports written there. I've tried to recover the data from google cache and recovered about 1.5GB of data. However I have never gotten around to sorting it and doing anything meaningful with it.

 

The archive still sitting on my hard drive. It's around 1.5 GB and while doing cleaning I would like to get rid of it. If you know someone who might be interested in recovering it I would gladly burn a DVD and send it to them by regular mail in any format they would prefer (ie. XML). I will make a similar offer to several people I know from the community. If I don't hear from anyone 3 weeks from now I shall delete it.

 

The recovery algorithm was preicise, doing heuristic search of various threads and thread pages over google based on member names. I am confident that 95% of content was recovered. So far it is inside a MySQL database, I can convert it to any format for your convinience.

 

Thanks,

 

Murad

Link to post
Share on other sites

I answered you on Skype Murad.I just need Adams advice on what will be the best format to get it in.

 

I know the player of Seriya de Lacey was desperate to get it way back when I asked you if you could retrieve any of it.She kept asking me if you had passed it on, so she will be very happy.

 

I am sure Adam will reply here and once he has I will send you an adress for the disc to go to .

 

Many thanks

Link to post
Share on other sites

That would be the ultimate ending for it all to be put on,as there are SO Many rp stories and the history of Erpa and Core.

 

What would be the best format to have it burned to a disc ?

 

*drags Murad off Skype to speak to Adam *

Link to post
Share on other sites
SQL would be fine, I'd certainly try me best to import it, but if it's a manual process, I'd need help lol

 

Well of course and I would help as it would be just awesome to have it all back up again :)

Link to post
Share on other sites

Adam,

I would say that importing it into a forum is possible and that was my ultimate goal when I set upon this endeavour. However, it is quite tedious work even for a programmer familiar with the vBulletin DB schema. I have personally no time to put into this due to my other engagements :(. I have estimated that this job would need around 40-50 man hours. If somebody else would like to pick up the beacon I'd be glad to consult on the current data format and the way I have devised to import it to a vBulletin schema.

 

Murad

Link to post
Share on other sites

well SQL is possible, however the data aren't normalized or even parsed. There is one line in the table per vBulletin 'page view' and inside the line one column that contains all the HTML code generated by the said page. One needs to write a script to match the page with the thread and parse out the various posts while matching them to various member names (if you decide to maintain the referential integrity with the names). The data are in brute state and not importable into any schema as is. Although I have made sure that all the necessary info for later parsing and matching is present.

Link to post
Share on other sites

of course there may be a simpler way to do this, not parsing the posts and merely stripping the page content off it's headers and footers and insrting the whole page into one vBulletin post in some archive thread but that's up to the admin...

Link to post
Share on other sites

Found someone who MAY be able to help with regards to restoring the data.

The last I heard from the guy was

 

I have been plugging away on this when I have a few moments here and there. I can reassemble the forums and subforum structure and built user ids - all with the same values as before. The script is doing a decent job on some sample data; maybe tonight I'll have some time to finish it off and run it against the database I pulled down from you.

 

If by some miracle the data is import-able into a forum environment, then I'll send a copy to F4G's admin (Motrudrn?), as it's at F4G where the history was originally written, so to speak.

Link to post
Share on other sites

If by some miracle the data is import-able into a forum environment, then I'll send a copy to F4G's admin (Motrudrn?), as it's at F4G where the history was originally written, so to speak.

 

Moturdn will be made up to get it all back I am sure,and its only right and proper that it should go back on F4g as well.

 

Will you still make a UO Europa forum here also as suggested earlier ?

[ just in case it goes *poof* again ]

Link to post
Share on other sites
Which suggestion was that? I can't see any reference to a Europa forum? heh

 

 

Heh wow, that's awesome.

Wonder if it'd be possible to import it into a forum.

 

well ok a 'forum' then.Just assumed it would have an indication that it was Europa shard :P

Link to post
Share on other sites

Ohhh, heh

The problems I forsee however is that there's probably going to be duplicate content, I'm not exactly sure how the data was scraped and/or if a time/date period was set when scraping

So ideally it'll need to be imported into a blank/new forum seperate from UOF or F4G, to see what the data looks like, then go from there.

Link to post
Share on other sites
Ohhh, heh

The problems I forsee however is that there's probably going to be duplicate content, I'm not exactly sure how the data was scraped and/or if a time/date period was set when scraping

So ideally it'll need to be imported into a blank/new forum seperate from UOF or F4G, to see what the data looks like, then go from there.

 

Ah right I understand :)

Link to post
Share on other sites
Ohhh, heh

The problems I forsee however is that there's probably going to be duplicate content, I'm not exactly sure how the data was scraped and/or if a time/date period was set when scraping

 

I started to scrap the google cache about 36 hours after the f4g forums were declared as lost.... At that time the f4g forums weren't refreshed in google cache since the site was flagged as 'down'. By the time I've finished scraping the data, only first posts were starting to appear on f4g and being pulled into the google cache. The duplicate content should be minimal (less than 0.2 %)

 

Murad

Link to post
Share on other sites

I doubt it'll be as straight forward as that Frost, heh

 

Update I had today was

 

Actually I have. The "rescue" code is more than 600 lines long now. I haven't been able to carve off any one big chunk of time to work on this but have been plugging away a little bit each day since; what I discovered is that the Google cache data is not all that consistent; there are plenty of dupes; some data has very little metadata in it to rebuild, etc. There still are 28000 pages and of course many more posts within those pages which I can recover complete with the original forum, thread and linkage info. Formatted as vbCode so the posts can be shoved back into a new forum, I wrote a little HTML to vbCode parser to do it that.

 

Left to do - sort out an issue with the members being parsed - for some reason I'm not finding very many valid members meaning a name and a member ID - that's sort of critical to rebuilding the complete post history. If nothing else I can insert an "apparently written by" text line for those posts, but I hope to do better.

 

Whatever the outcome I'd like to wrap this up today (meaning very late tonight my time - Pacific TZ) the code will produce sql insert statements for posts, forums, threads and members. Some assembly will be required. I'd recommend that an empty forum be created; these "rescue" records be added as new tables within the empty forum database, and then some selective sql updates be done to copy data from the rescue tables to the actual forum tables. That way I don't have to make the data look exactly like vB or whatever forum you choose.

 

So hang tight just a little longer.

Link to post
Share on other sites
There still are 28000 pages and of course many more posts within those pages which I can recover complete with the original forum, thread and linkage info

 

I hope he means that he has still 28k pages to do. If he means that only 28k pages are parsable then the data is in much worse shape than I hoped. :o

Link to post
Share on other sites

There's a lot of duplicate entries, that much I do know

I remember when looking through the raw MySQL tables there's copies of almost each thread/post, as it was grabbed from the VB archive of the old F4G boards.

The archive basically being a copy of every thread/post.

It stands to reason then that half the data is copies of the normal stuff.

Link to post
Share on other sites

Hmm it doesn't add up... if we assume that there were about 10 posts per page then the data would cover about 210k posts, however we must assume there was at least 1 page per thread, therefore for it to add up only 5k threads must have had more than one page. Also the table 'user' which contains member names that I parsed from the posts contains 3749 lines.... hmmm I guess we'll just have to wait and see.

Link to post
Share on other sites
Hmm it doesn't add up... if we assume that there were about 10 posts per page then the data would cover about 210k posts, however we must assume there was at least 1 page per thread, therefore for it to add up only 5k threads must have had more than one page. Also the table 'user' which contains member names that I parsed from the posts contains 3749 lines.... hmmm I guess we'll just have to wait and see.

 

I'm not sure either

But there was a LOT of duplicate information, especially with threads and posts

 

The raw data I'm left with right now is...

 

Posts - 173,142

Threads - 16,169

 

No doubt it's not all of the data which was lost, but if we can get back even some of it, it's a bonus.

 

The guy who was kind enough to write the python script to do this, has also uploaded the scripts to the UOF server, so if need be, others can work on fine tuning it.

Link to post
Share on other sites

Aye, check your PM's MMoudry

 

Ok, so this is what I have so far, but I must warn you, it's not perfect, heh

 

VBTest - Powered by vBulletin

 

There are a few problems, to list a few that I know of, include..

 

1) Some posts are blank

2) Posts don't have a date/time of creation

3) Threads don't have a username showing in the forums

4) You can't search for threads created by a certain user.

 

With the majorly bad out of the way..

 

1) You can search users posts

2) You can search the forum itself (I believe, it works)

3) Posts seem to be relatively intact and most have usernames

 

I can provide the python script files used to extract the data from the raw html that was scraped from google by Murad

I can't provide the database itself, as it's a VBulletin product (I Believe it's illegal to share it? not sure)

But, what I can do is convert it into another (free) forum software like MyBB and then people can have copies of it freely

F4G admins of course, can have all the stuffs

 

If people want to help recover more data etc, they're welcome to help out

Link to post
Share on other sites

Importing it all into the freebie MyBB, seems to have gotten a better result, ironically

 

F4G Net Restored Data

 

I've managed to remove all the forums with no data

I've restored the proper post/thread count as well as the usernames who created the threads etc.

 

If anyone would like a copy of this intact database, I can provide it

You'll need an install of MyBB to make it work like I have.

Link to post
Share on other sites

Hail,

First of all I would like to salute the person who wrote this script in such a short time that is able to extract so much of the data in precise manner.

 

I've checked the script and here are my preliminary findings. There is definetly room for improvment. It would seem (take this affirmation with precaution) that the script didn't support all the skin html patterns in the raw html and sometimes haven't parsed them correctly, even might have parsed only one some of the available page patterns.

 

One example might be this thread from the old KH forum that I've found in the original raw data but not on the BB :

Forums4Games > Ultima Online Entrance > UO Hosted Guilds > Europa Roleplaying Entrance > Knights Hospitaller > Knights Hospitaller > Umbra attacks the Knighthood!

 

It does not appear anywhere in the salvaged BB (although copy/pasted posts from it are in the reports to the grandmistress). It would appear that either this page was skipped for some reason or the script didnt look at the pages with the [Archive] tag. Anyway, to do it propertely and salvage ALL of the data present in the raw dump will take more time and fine tuning as Adam said. The dates are recoverable.

 

To get all the data the script needs to be extended to parse all the data, match the archive and regular threads and merge the different pages (in case we have page 1 of a thread as regular and page 2 as archive).

 

So to conclude I belive we can get more precise and more complete data from this.

 

I'll look more into this and try to see whether I can dedicate some time into this in the coming weeks (my wife sure won't be happy).

 

I'll keep you guys posted.

 

Murad

post-15702-138097421743_thumb.gif

Link to post
Share on other sites

All right, after careful evaluation I may have some time to spare over this summer to work on the data but I ask the following from the community. Everybody who would like to see the data in their full available state please post a small message on this thread and based on the number of responses I will make the final decision whether letting this be at the current state or taking time to improve it. The partial data have been available for nearly 24 hours and not many people have acessed it so far. It would be suboptimal to spend my time over something nobody will use.

 

Thanks,

 

Murad

Link to post
Share on other sites

I think the only fields we need back, are the date/times of threads/posts

And of course, any threads/posts which haven't been imported

Not sure why it seemed to skip some, heh

But yea, it's only really worth it if people actually WANT it done.

Link to post
Share on other sites

Id just like to respond to Murads question of wether it is worth his time to perfect the data further.

 

I'd first like to thank Murad for saving the data when he had the opportunity a few years back.

 

Secondly a big thankyou to Adam for providing a format to show the data and doing what needed to be done for it to be available to read.

 

I am part of a group on Face Book of old Europa Players [ mostly CORE] and I am going to check shortly that it is posted there about the F4g forums,so people can come see what is here.I am sure there will be interested parties there.

 

I am also sure Moturdn who now hosts F4g will be interested.

 

Gendin now is guild leader of KT and I am sure he will have an interest.He has been away on holiday till yesterday i think, so may not of seen the forum, so Ill make sure he knows about it .

 

ALso Irvyn of Trinsic will most likely have an interest also.

 

Most are available to contact via UOForums,so I will let as many as I can find know to come and post.

 

For me personally it is great to get back any history of Tabbitha's time in UO.I have read quite a few of the pages but havent had a great deal of time to spend there at the moment ,as I am moving house Friday and as you can imagine,have a few things more urgent than reading forums :P.

 

I think its great what has been achieved so far,but would not want anyone to be using their free time in a way that caused a problem.

 

I think anyone who wants some thing more than has been provided should post here asap.

 

Other than that I,for one, am grateful of what has been done already and can at least now save what I wanted to from what has been made available.

 

Again a BIG THANKYOU to you both !!!!

Link to post
Share on other sites

I'd be interesting for definate in a version of the data in it's full state, but I also understand what it's like to spend a lot of time on something that nobody will use (I do expect quite a few people wanting the data however). As mentioned earlier I'm available to provide help if you need me :)

 

I'd also like to express a great big thank you to all involved parties.

Link to post
Share on other sites
*looks excited*

Sooooo? any news on this one?

 

I gonna cry if i can read the old KMK Forums :))

 

Get the tissues ready then, cos even And why dont we have a Babble-Tread? is there :P

 

Posts from Pekka,Barbara,Jin Xau,Corpus Deus to name a few :)

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...