Skip to content

Missing database-level <lobFolder> causes incorrect paths for CLOBs #749

@seso-kdrs

Description

@seso-kdrs

If a database-level <lobFolder> is not defined, the path generated using only the column-level <lobFolder> is incorrect.

This causes incompatibility with SIARD 2.1 files generated by SC Full Convert 25.08.1688, as they put the full path in the column-level lobFolder.

The SIARD 2.1 specification says the database-level tag is optional.

I have included 2 SIARD files to reproduce this issue, the first:

broken_clob.siard.zip

This file only has a column level <lobFolder>:

<lobFolder>content/schema0/table74/lob3/</lobFolder>

And browsing gives the following error:

File "./content/schema0/table74/lob3//rec0.txt" is missing in container

Full error message

2025-12-17T15:06:00.699Z ERROR 1 --- [nio-8080-exec-3] c.d.m.s.i.c.SIARD20ContentImportStrategy : Failed to open lob at rec0.txt

com.databasepreservation.model.exception.ModuleException: File "./content/schema0/table74/lob3//rec0.txt" is missing in container
        at com.databasepreservation.modules.siard.in.read.ZipReadStrategy.createInputStream(ZipReadStrategy.java:46)
        at com.databasepreservation.modules.siard.in.read.ZipAndFolderReadStrategy.createInputStream(ZipAndFolderReadStrategy.java:44)
        at com.databasepreservation.modules.siard.in.content.SIARD20ContentImportStrategy.createInputStream(SIARD20ContentImportStrategy.java:490)
        at com.databasepreservation.modules.siard.in.content.SIARD20ContentImportStrategy.startElement(SIARD20ContentImportStrategy.java:335)
        at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
        at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source)
        at org.apache.xerces.impl.xs.XMLSchemaValidator.emptyElement(Unknown Source)
        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at com.databasepreservation.modules.siard.in.content.SIARD20ContentImportStrategy.importContent(SIARD20ContentImportStrategy.java:189)
        at com.databasepreservation.modules.siard.in.input.SIARDImportDefault.migrateDatabaseTo(SIARDImportDefault.java:64)
        at com.databasepreservation.DatabaseMigration.migrate(DatabaseMigration.java:123)
        at com.databasepreservation.common.server.controller.SIARDController.convertSIARDtoSolr(SIARDController.java:807)
        at com.databasepreservation.common.server.controller.SIARDController.loadFromLocal(SIARDController.java:755)
        at com.databasepreservation.common.api.v1.CollectionResource.createCollection(CollectionResource.java:215)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:255)
        at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:188)
        at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118)
        at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:926)
        at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:831)
        at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
        at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1089)
        at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:979)
        at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
        at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914)
        at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:547)
        at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885)
        at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:614)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
        at com.databasepreservation.common.filter.CasApiAuthFilter.doFilter(CasApiAuthFilter.java:82)
        at com.databasepreservation.common.filter.OnOffFilter.doFilter(OnOffFilter.java:92)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
        at org.apereo.cas.client.util.HttpServletRequestWrapperFilter.doFilter(HttpServletRequestWrapperFilter.java:72)
        at com.databasepreservation.common.filter.OnOffFilter.doFilter(OnOffFilter.java:92)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
        at org.apereo.cas.client.validation.AbstractTicketValidationFilter.doFilter(AbstractTicketValidationFilter.java:152)
        at com.databasepreservation.common.filter.OnOffFilter.doFilter(OnOffFilter.java:92)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
        at org.apereo.cas.client.session.SingleSignOutFilter.doFilter(SingleSignOutFilter.java:102)
        at com.databasepreservation.common.filter.OnOffFilter.doFilter(OnOffFilter.java:92)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
        at com.databasepreservation.common.filter.OnOffFilter.doFilter(OnOffFilter.java:94)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
        at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
        at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:577)
        at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:494)
        at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:431)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
        at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
        at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
        at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:167)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:482)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:115)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
        at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:731)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:344)
        at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:391)
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63)
        at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:896)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1736)
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52)
        at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
        at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63)
        at java.base/java.lang.Thread.run(Thread.java:1583)

There seems to be two problems here.

  1. Extra ./ added at the start causes there to be no entry.
  2. Because there is a trailing slash, there are 2 slashes in the final path before the filename. (Zip standard does not specify to normalize this like UNIX does)

I managed to make this readable by DBPTK by adding the following database-level lobFolder:
<lobFolder>content</lobFolder>
and changing the column-level lobFolder to:
<lobFolder>schema0/table74/lob3</lobFolder>
(notice I also removed the trailing slash. Not removing it causes the CLOB to still not be found)

This fixed SIARD you can download here:

fixed_clob.siard.zip

Maybe java.nio.file.Path.normalize() can be used here? At least on Linux it corrects both of these issues (Windows not tested)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIssue type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions