Skip to content

andrewtrotman/CommonIndexFileFormat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CommonIndexFileFormat

Code to read Jimmy Lin's Common Index File Format files without using protobuf

The format is describes by the protocol buffer definition:

syntax = "proto3";
package io.anserini.cidxf;
message Posting {
  int32 docid = 1;
  int32 tf = 2;
}
message PostingsList {
  string term = 1;
  int64 df = 2;
  int64 cf = 3;
  repeated Posting posting = 4;
}

Each postings list is written in the protobuf Delimited format. This means that you don't need to read the entire file into memory to process it - but this program does.

About

Code to read Jimmy Lin's Common Index File Format files without using protobuf

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors