Combining multiple WARCs from Archive-It

Question

We recently started with Preservica 7.2.1, Professional Plus.

We want to ingest website crawls created in Archive-IT. Each Archive-IT crawl produced multiple WARC files which we are able to harvest using the WASAPI tool.

My question is: Has anyone recently successfully ingested multiple WARC files from a single Archive-IT Website crawl and integrated them so that Replay rendered them as a single viewable resource?

I would love to get notes on your workflow!

Thanks in advance-

Elizabeth Altman

lcoufal · Answer

Hi ElizabethCheck out this (somewhat) related thread:As you can see, the individual WARC files for a crawl are stored in Preservica as multiple content objects under an asset.I am not sure it would work but I am wondering if you could use PAX ingest as described in the Standard Workflows document here to produce such required hierarchyIf it worked, you would then have to probably also tweak the metadata to point to the starting URL for the replay. I believe this has been discussed / described in another post so you should be able to find how to do it here.

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded