Skip to main content

Hello, we are currently researching using Starter for the archving of a not for profit community website.  When we produce, with a crawler, a test crawl of the site producing a 1GB file, the WARC file is shown correctly in the viewer.

When we do a full crawl , just over 2GB , it doesn’t render.

Is there an upper limit to the allowable size of WARC files that you know of?

thank you

It’s not so much that there is a fixed maximum size, as that ReplayWeb.page expects a WACZ so it has to be transcoded first, and if that takes too long the render request will get timed out.

If you can get your crawler to generate a WACZ and ingest that into Preservica instead you might get a better outcome.


That is very interesting, thank you Richard.  We will try again with a WACZ: we had no luck with that before, but it’s worth trying again.

 

Thanks


Reply