May 17, 2017

How to make a string index human readable

The problem

Assume you are writing a parser of some kind and it is important for you to detect errors in multiline texts(I am not being very creative here, since I actually faced the exact same problem :P). The problem is that your parser program detects errors in the form of total character index, say it says the problem starts in the 104th character in the text. How do you change it to something that makes more sense, say a line count(row count) and a char count(column count) i.e. 4th line, 10th character?

Before the solution, came the garbage

My first approach was to split the text in line seperators, make a list by progrssively adding the line lengths (after taking into considerations the length of the line seperator itself) then throwing functions from the bisect module to see what sticks! Unfortunately, I am not very good at handling off-by-one subtleties and the whole thing collapsed at edge cases.

Then it hit me!

A correct solution can be achieved with really simple means. To get the line count, all you have to do is slice the string upto the specified index and count the number of line seperators in that substring. Partition the substring from right upto the right-most lineseperator; the length of the right partition(also known as tail) is the column count.

The solution

from os import linesep

def make_human_readable_index(text, index):
    lineindex = text[:index].count(linesep)
    charindex = len(text[:index].rpartition(linesep)[2])

    # One is added to make zero based indices more in-line with
    # common text editor line and character numbering.
    return (lineindex + 1, charindex + 1)
Tags: Python , Programming