Class LineWordReader

java.lang.Object
it.unimi.dsi.io.LineWordReader
All Implemented Interfaces:
WordReader, Serializable

public class LineWordReader extends Object implements WordReader, Serializable
A trivial WordReader that considers each line of a document a single word.

The intended usage of this class is that of indexing stuff like lists of document identifiers: if the identifiers contain nonalphabetical characters, the default FastBufferedReader might do a poor job.

Note that the non-word returned by next(MutableString, MutableString) is always empty.

See Also:
  • Constructor Details

    • LineWordReader

      public LineWordReader()
  • Method Details

    • next

      public boolean next(MutableString word, MutableString nonWord) throws IOException
      Description copied from interface: WordReader
      Extracts the next word and non-word.

      If this method returns true, a new non-empty word, and possibly a new non-word, have been extracted. It is acceptable that the first call to this method after creation or after a call to WordReader.setReader(Reader) returns an empty word. In other words both word and nonWord are maximal.

      Specified by:
      next in interface WordReader
      Parameters:
      word - the next word returned by the underlying reader.
      nonWord - the nonword following the next word returned by the underlying reader.
      Returns:
      true if a new word was processed, false otherwise (in which case both word and nonWord are unchanged).
      Throws:
      IOException
    • setReader

      public LineWordReader setReader(Reader reader)
      Description copied from interface: WordReader
      Resets the internal state of this word reader, which will start again reading from the given reader.
      Specified by:
      setReader in interface WordReader
      Parameters:
      reader - the new reader providing characters.
      Returns:
      this word reader.
    • copy

      public LineWordReader copy()
      Description copied from interface: WordReader
      Returns a copy of this word reader.

      This method must return a word reader with a behaviour that matches exactly that of this word reader.

      Specified by:
      copy in interface WordReader
      Returns:
      a copy of this word reader.