Jump to content
Sign in to follow this  
Magister Lajciak

Saving news websites?

Recommended Posts

I read a lot of news online and try to archive all the articles I read for future offline access. I do this by saving each webpage to my computer manually. Needless to say, this can be somewhat annoying and time-consuming, when I have to do it over and over and over many times per day every day. Is there some easier and less time-consuming way to do such archiving?

Share this post


Link to post
Share on other sites

Hmm...if you use Firefox, there's an extension/add on called Scrapbook. I have not tried it so can't tell you if it's any good or if it does all of what you might want.

 

The Firefox page of user-reviews is here.

 

edit: typos

Edited by LadyCrimson

“Things are as they are. Looking out into the universe at night, we make no comparisons between right and wrong stars, nor between well and badly arranged constellations.” – Alan Watts

Share this post


Link to post
Share on other sites

Not a reply to help you, but to actually discourage you. :(

 

Speaking from my experience, I think that's a huge waste of your time. I used to pile the articles and documents I read "for future reference", and guess what? I have never ever ever ever read them again. Not once. If is worthwhile material, it gets propagated or replicated throughout the web, so there is little chance of it becoming unavailable. If it does become unavailable 5 years down the road, you wouldn't care (or remember, for that matter).

 

Today, if I need some textual information, I find myself turning to Google 99% of the time, and to my local files only 1% of the time. Heck, even my bookmarks outlive their usefulness so quickly, I am not even sure why shouldn't I just clear them all right now.


This statement is false.

Share this post


Link to post
Share on other sites

I bookmark lots of links for "future reference" and the only time I ever use them is on a forum to prove a point. It's really not worth it, though as something to show your kids or comment on culture years from now I'm sure it has validity.

 

P.S.: how do you live without Firefox? Or specifically how do you live with IE.

Share this post


Link to post
Share on other sites
Heck, even my bookmarks outlive their usefulness so quickly, I am not even sure why shouldn't I just clear them all right now.

There's probably 20 bookmarks that I've used all the time, for a long time. The rest languish in bookmark purgatory for periods of time that are eons beyond the actual existence of most of the pages in question.


“Things are as they are. Looking out into the universe at night, we make no comparisons between right and wrong stars, nor between well and badly arranged constellations.” – Alan Watts

Share this post


Link to post
Share on other sites
Not a reply to help you, but to actually discourage you. :thumbsup:

 

Speaking from my experience, I think that's a huge waste of your time. I used to pile the articles and documents I read "for future reference", and guess what? I have never ever ever ever read them again. Not once. If is worthwhile material, it gets propagated or replicated throughout the web, so there is little chance of it becoming unavailable. If it does become unavailable 5 years down the road, you wouldn't care (or remember, for that matter).

 

Today, if I need some textual information, I find myself turning to Google 99% of the time, and to my local files only 1% of the time. Heck, even my bookmarks outlive their usefulness so quickly, I am not even sure why shouldn't I just clear them all right now.

 

Well, I think you are partially correct. I return to the information I collect very rarely indeed (though I do occassionally return to some of it). As such, it is to a large extent a waste of time, yes, though it leaves me with a large archive of materials that are indeed no longer available online (most news articles, for example, have a limited life-span on the net) as well as the ability to use all the materials offline (though it is true that I make use of that very rarely).

 

I guess my recognition that such archiving is a considerable time-sink is one reason why I would want to at least partially automate the process.

Share this post


Link to post
Share on other sites
I bookmark lots of links for "future reference" and the only time I ever use them is on a forum to prove a point. It's really not worth it, though as something to show your kids or comment on culture years from now I'm sure it has validity.

 

There's probably 20 bookmarks that I've used all the time, for a long time. The rest languish in bookmark purgatory for periods of time that are eons beyond the actual existence of most of the pages in question.

 

Well, I don't really use bookmarks and 'favorites'. As you say, the page might well disappear in the future, so why bookmark it? Instead, I tend to download it.

Share this post


Link to post
Share on other sites
P.S.: how do you live without Firefox? Or specifically how do you live with IE.

 

What's so bad about IE? I am not some sort of "advanced user", so for me it's an ease of use issue - the 'ease' of IE being that it is already on my computer and I am already used to it. Barring auto-archiving, it does most things I want it to do, so why switch?

Share this post


Link to post
Share on other sites
Well, I don't really use bookmarks and 'favorites'. As you say, the page might well disappear in the future, so why bookmark it? Instead, I tend to download it.

:o

 

About the only things I want to save are pictures (photos, comics, news pics), so I do a lot of r-clicking/save image. Tons of folders for those and most of the time I never browse through them later. So I do know the feeling, kinda.

 

I run across a lot of articles/columns, usually the ones that tickle my funny bone, that I think about saving but ... I know I'll never read them again and forget about 'em in a few hours. But if it's short enough I'll just take a screenie or two of it, instead of saving the whole page/site scripts and all. Heh.


“Things are as they are. Looking out into the universe at night, we make no comparisons between right and wrong stars, nor between well and badly arranged constellations.” – Alan Watts

Share this post


Link to post
Share on other sites
P.S.: how do you live without Firefox? Or specifically how do you live with IE.

 

What's so bad about IE? I am not some sort of "advanced user", so for me it's an ease of use issue - the 'ease' of IE being that it is already on my computer and I am already used to it. Barring auto-archiving, it does most things I want it to do, so why switch?

 

The number one reason would be that it's not secure. Another would be that it doesn't render a lot of pages correctly.

Share this post


Link to post
Share on other sites

Once upon a time Adobe acrobat could save entire websites, you would just set a number of 'click levels' for it to follow and those would be included in the folder it was saved to. If you included off site clicks you would get a lot of unneeded stuff, but it was a nice feature.

 

I believe EI had the a similar function but without the handy 'clicks'. Haven't used either in ages.

Edited by Gorgon

Na na  na na  na na  ...

greg358 from Darksouls 3 PVP is a CHEATER.

That is all.

 

Share this post


Link to post
Share on other sites
Well, I think you are partially correct. I return to the information I collect very rarely indeed (though I do occassionally return to some of it). As such, it is to a large extent a waste of time, yes, though it leaves me with a large archive of materials that are indeed no longer available online (most news articles, for example, have a limited life-span on the net) as well as the ability to use all the materials offline (though it is true that I make use of that very rarely).

 

I guess my recognition that such archiving is a considerable time-sink is one reason why I would want to at least partially automate the process.

Really? Major online newspapers have searchable archives, as far as I can see. Other sites tend to not delete their articles and posts, it's not like they are taking any space. A lot of blogs do not have extensive archives, because they did not exist all that long. May I ask what exactly do you archive news for? Except for gratifying your OCD, that is. ;) Are you going to datamine them? It's rather hard to find anything in a few gigabytes worth of text if you don't precisely know what you are looking for.

 

If I was to do something like this, I'd run a script to periodically scrape text from HTML pages (or RSS feeds) and shove them into a database. But that's just me, I guess.

 

 

What's so bad about IE? I am not some sort of "advanced user", so for me it's an ease of use issue - the 'ease' of IE being that it is already on my computer and I am already used to it. Barring auto-archiving, it does most things I want it to do, so why switch?

The single biggest reason is hundreds of extensions. Including, but not limited to bookmark syncing, ad removal, tweaks to sites with Greasemonkey scripts.

 

 

The number one reason would be that it's not secure. Another would be that it doesn't render a lot of pages correctly.

Actually, that's no longer a good argument in favour of Firefox. It has become so popular (almost reached IE market share), which makes it a popular target. In the past month, I can't remember a week when my Ubuntu did not download firefox or xulrunner security patch.

 

Anecdotal evidence suggests Chrome is better in this respect. And if you want to be really secure, run Lynx. :)

Edited by Diamond

This statement is false.

Share this post


Link to post
Share on other sites

Of course, it's losing 'security through obscurity' as it becomes more widely known, but it's still safer than IE purely by design. And it still renders pages more correctly.

 

Eh, I'll switch to Chrome when it actually becomes available on Ubuntu and Mac. I'll seriously miss Firefox's "recently visited pages", "reopen closed tabs", and "bookmark all tabs" features, though.

Share this post


Link to post
Share on other sites
Well, I don't really use bookmarks and 'favorites'. As you say, the page might well disappear in the future, so why bookmark it? Instead, I tend to download it.

:*

 

About the only things I want to save are pictures (photos, comics, news pics), so I do a lot of r-clicking/save image. Tons of folders for those and most of the time I never browse through them later. So I do know the feeling, kinda.

 

Well, I also do that with comics... it's not just news I archive - that was more of an example than anything else.

Share this post


Link to post
Share on other sites
Once upon a time Adobe acrobat could save entire websites, you would just set a number of 'click levels' for it to follow and those would be included in the folder it was saved to. If you included off site clicks you would get a lot of unneeded stuff, but it was a nice feature.

 

I believe EI had the a similar function but without the handy 'clicks'. Haven't used either in ages.

 

Now that would be nice. I would not include off-site clicks, of course, but the Adobe Acrobat feature you describe would be really nice!

Share this post


Link to post
Share on other sites
P.S.: how do you live without Firefox? Or specifically how do you live with IE.

 

What's so bad about IE? I am not some sort of "advanced user", so for me it's an ease of use issue - the 'ease' of IE being that it is already on my computer and I am already used to it. Barring auto-archiving, it does most things I want it to do, so why switch?

 

The number one reason would be that it's not secure. Another would be that it doesn't render a lot of pages correctly.

 

What's so bad about IE? I am not some sort of "advanced user", so for me it's an ease of use issue - the 'ease' of IE being that it is already on my computer and I am already used to it. Barring auto-archiving, it does most things I want it to do, so why switch?

The single biggest reason is hundreds of extensions. Including, but not limited to bookmark syncing, ad removal, tweaks to sites with Greasemonkey scripts.

 

 

The number one reason would be that it's not secure. Another would be that it doesn't render a lot of pages correctly.

Actually, that's no longer a good argument in favour of Firefox. It has become so popular (almost reached IE market share), which makes it a popular target. In the past month, I can't remember a week when my Ubuntu did not download firefox or xulrunner security patch.

 

Anecdotal evidence suggests Chrome is better in this respect. And if you want to be really secure, run Lynx. :*

 

Of course, it's losing 'security through obscurity' as it becomes more widely known, but it's still safer than IE purely by design. And it still renders pages more correctly.

 

Eh, I'll switch to Chrome when it actually becomes available on Ubuntu and Mac. I'll seriously miss Firefox's "recently visited pages", "reopen closed tabs", and "bookmark all tabs" features, though.

 

OK, fair enough - I am not some sort of IE advocate - I have simply been using it, because it's the default browser and I didn't really see any reason to switch. I am perfectly open to using Firefox and Chrome if they offer tangible benefits over Internet Explorer.

 

On that note, I am about to do a system wipe/formatting and complete reinstall on my new computer (about 6 months old now, so time for a system reset). I have been hearing a lot of good about the Ubuntu version of Linux on these boards and elsewhere, so I am sort of entertaining the thought in the back of my mind about installing that OS instead of Windows XP or Windows Vista (I generally favor XP over Vista, but it may be time for an upgrade [i bought both for cheap - some sort of university deal]). Still, I have many worries about it:

 

1) Compatibility Issues - Work: I would definitely want to install Microsoft Office 2007, since I have many documents that I worked on in that program as well as its older counterpart, Microsoft Office 2003 (and indeed prior Microsoft software such as Winword...). I want to be able to work on them in a familiar program (Office 2007) and everything must work flawlessly in terms of compatibility of older and newer document files, excel files and so on.

 

2) Compatibility Issues - Gaming: I want to play games on the computer, including upcoming ones such as Dragon Age.

 

3) Ease of Use: This includes having a graphical 'click-based' interface that is easy to transition to from Windows and without me being left wondering what kind of 'click' I need to perform for function X and so on.

 

4) Availability: Where can I obtain it from? I understand that it is free, but is it just downloadable online, or should it be ordered?

 

5) Other: Any other issues I should know about or other advice you care to give?

 

As you can see, I am slightly tempted, but my doubts are still substantial about whether it is worth the hassle or if I should just stick with Windows with which I am already familiar.

Share this post


Link to post
Share on other sites
Well, I think you are partially correct. I return to the information I collect very rarely indeed (though I do occassionally return to some of it). As such, it is to a large extent a waste of time, yes, though it leaves me with a large archive of materials that are indeed no longer available online (most news articles, for example, have a limited life-span on the net) as well as the ability to use all the materials offline (though it is true that I make use of that very rarely).

 

I guess my recognition that such archiving is a considerable time-sink is one reason why I would want to at least partially automate the process.

 

Really? Major online newspapers have searchable archives, as far as I can see. Other sites tend to not delete their articles and posts, it's not like they are taking any space. A lot of blogs do not have extensive archives, because they did not exist all that long. May I ask what exactly do you archive news for? Except for gratifying your OCD, that is. :grin: Are you going to datamine them? It's rather hard to find anything in a few gigabytes worth of text if you don't precisely know what you are looking for.

 

If I was to do something like this, I'd run a script to periodically scrape text from HTML pages (or RSS feeds) and shove them into a database. But that's just me, I guess.

 

Well, many Slovak news sites have only limited archives and Slovak press agencies tend not to have public archives at all. Other sites do have archives, but searching them is often a paid service or requires that you buy the specific articles in question. An example of that would be Scientific American, which permits the reading some articles and moves other to pay to view sections after a while.

 

I don't datamine the articles. When I come back to them it is generally because I want to read something about a topic. I generally have few problem finding the archived articles I want, since I archive them in a very organized manner with many folders and sub-folders. If I want to know about the various proposals for changes in the Slovak pension system, as well as the actual changes instituted or how it functioned before reforms X and Y, for example, I can go to the Archive and look up the Economics folder, the Pensions subfolder, Slovakia sub-sub-folder and therein I have articles, government documents, analyses of various institutes/think-tanks and other information on the topic. Before you ask, yes, I am the type of person who wants to look up things like this and in this regard the archive has proven very useful, since many of the articles about and analyses of the pension system I can no longer find online if I check again.

 

On top of that, I also feel that I have build up a huge uninterrupted archive (must be getting close to a decade now) - sort of like saving old newspapers from a bygone era. To me, this has an intrinsic value in and of itself.

 

That said, yes, to a large extent what I am doing with the archiving is some sort of OCD. It would therefore be nice to automate it or failing that at least to automate it to some extent. :*

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
Sign in to follow this  

×
×
  • Create New...