Skip to content

Latest commit

 

History

History
92 lines (62 loc) · 3.17 KB

File metadata and controls

92 lines (62 loc) · 3.17 KB

Agent Instructions

Repository Shape

This repository contains BioFSharp.ML, an F#/.NET library with CNTK-backed machine learning helpers, plus a DPPOP command-line wrapper.

  • BioFSharp.ML.sln is the root solution.
  • src/BioFSharp.ML contains the library and embedded DPPOP CNTK model resources.
  • src/DPPOP.CLI contains the dppop executable/tool project.
  • tests/BioFSharp.ML.Tests and tests/DPPOP.Tests contain xUnit tests.
  • build contains the FAKE build project.
  • plans/rescue_modernize.md records the CNTK rescue and DPPOP modernization plan.
  • scripts/Export-ImlpLegacyRuntime.ps1 extracts the legacy CNTK/OpenMPI runtime from the local Docker image.

Build And Test

The repo pins .NET SDK 10.0.100 in global.json with latestMinor roll-forward.

Use the FAKE entry points from the repository root:

.\build.cmd
.\build.cmd RunTests

On Unix-like shells:

./build.sh
./build.sh RunTests

The default build target builds the solution. RunTests cleans, builds, and runs both test projects with coverage collection enabled.

Legacy CNTK Runtime

CNTK is a preserved legacy dependency. Treat it as runtime infrastructure to rescue and stabilize, not as a dependency to modernize casually.

The rescued runtime is now archived in the Zenodo record:

https://doi.org/10.5281/zenodo.20026836

The local extraction script is retained as provenance/recovery tooling:

.\scripts\Export-ImlpLegacyRuntime.ps1

The script reads csbdocker/imlp:1.0.0 by default and writes:

  • legacy-runtime.tar.gz
  • runtime-manifest.json
  • SHA256SUMS

The rescued archive contains:

  • /usr/local/cntk/cntk/lib
  • /usr/local/cntk/cntk/dependencies/lib
  • /usr/local/mpi/lib

The container runtime path assumptions are:

PATH=/usr/local/cntk/cntk/lib:/usr/local/mpi/bin:$PATH
LD_LIBRARY_PATH=/usr/local/cntk/cntk/dependencies/lib:/usr/local/cntk/cntk/lib:/usr/local/mpi/lib:$LD_LIBRARY_PATH

artifacts/ and zenodo-record/ are gitignored, so do not assume large binary runtime payloads are present in a fresh checkout.

Container Workflow

After publishing/pulling the base image csbdocker/cntk-dotnet:1.0.1-cntk2.7-dotnet10, build the DPPOP container from the repository root:

docker build -t csbdocker/dppop .

Run it with input files mounted under /data:

docker run --rm --mount "type=bind,source=C:/my-data,target=/data" csbdocker/dppop --proteome /data/proteome.fasta --proteins-of-interest /data/targets.fasta --model nonplant --output /data/results.tsv

Development Notes

  • Keep the library target conservative unless a task explicitly requires a target change; src/BioFSharp.ML currently targets netstandard2.0.
  • src/DPPOP.CLI targets net10.0 and references the library project.
  • Keep DPPOP CLI behavior script-compatible where practical, but prefer the compiled CLI for deployment and tests.
  • Be careful around project and solution registration when adding, renaming, or moving F# files. F# compile order is explicit in .fsproj files.
  • Do not replace the CNTK runtime rescue path with upstream downloads. The modernization plan treats the local image as the recovery anchor.