Narrative form of callnumber parsing, preparatory for sorting. Algorithm copyright by Roy Zimmer, Western Michigan University Designed for LC type callnumbers, and various local callnumbers. This results in nearly perfect sorting at our installation, handling all the numbers we run across. sections surrounded by "---", "===", and "+++" refer to subroutines, inlined here for better understanding. set up variables and the callnumber part array remove any leading separator characters, as they are meaningless determine the length of the callnumber while (the character_index is before the end of the callnumber) >> parse the callnumber grab the current_character from the callnumber in upper case format if (we're on the second call number part) if (the first callnumber_part is letters only) and (this, the second callnumber_part, is digits only) and (the current_character is a space) and (the next_character in the callnumber is not a period) ----------------------------------- increment the callnumber_part_index set callnumber_part_character_index to start ----------------------------------- callnumber_part[callnumber_part_index] set to ".0" endif endif remember if the current_character is a period if (the current_character is not a separator) if ((the last character was a letter) and (we have a letter now)) or (the last character was a digit) and (we have a digit now)) or (we currently have neither letter nor digit)) ============================================= increment the callnumber_part_character_index if (the length of callnumber_part[callnumber_part_index] is 0) callnumber_part_character_index is set to 0 if (the current_character is not a period) store the current_character in the current position of the current callnumber part else if (callnumber_part_character_index is 0) and (we had a period recently) store the current_character in the current position of the current callnumber part endif endif ============================================= endif if (the current_character is neither letter nor digit) remember if we just had a letter remember if we just had a digit ============================================= increment the callnumber_part_character_index if (the length of callnumber_part[callnumber_part_index] is 0) callnumber_part_character_index is set to 0 if (the current_character is not a period) store the current_character in the current position of the current callnumber part else if (callnumber_part_character_index is 0) and (we had a period recently) store the current_character in the current position of the current callnumber part endif endif ============================================= endif if ((the last character was a letter) and (we currently have a digit)) or (the last character was a digit) and (we currently have a letter)) ----------------------------------- increment the callnumber_part_index set callnumber_part_character_index to start ----------------------------------- if (the last character was a letter) and (we currently have a digit) ============================================= increment the callnumber_part_character_index if (the length of callnumber_part[callnumber_part_index] is 0) callnumber_part_character_index is set to 0 if (the current_character is not a period) store the current_character in the current position of the current callnumber part else if (callnumber_part_character_index is 0) and (we had a period recently) store the current_character in the current position of the current callnumber part endif endif ============================================= endif ============================================= increment the callnumber_part_character_index if (the length of callnumber_part[callnumber_part_index] is 0) callnumber_part_character_index is set to 0 if (the current_character is not a period) store the current_character in the current position of the current callnumber part else if (callnumber_part_character_index is 0) and (we had a period recently) store the current_character in the current position of the current callnumber part endif endif ============================================= remember if we just had a letter remember if we just had a number endif else >> we do have a separator if (the character_index is before the callnumber end) remember the next_character from the callnumber if (the current_character is a period) or (the next_character is a semicolon) increment the character_index remember the next_character from the callnumber endif if (the next_character is a separator) if (the next_character is a period) and (the current_character is one of the following: semicolon, comma, space) remember that we had a period ----------------------------------- increment the callnumber_part_index set callnumber_part_character_index to start ----------------------------------- ============================================= increment the callnumber_part_character_index if (the length of callnumber_part[callnumber_part_index] is 0) callnumber_part_character_index is set to 0 if (the current_character is not a period) store the current_character in the current position of the current callnumber part else if (callnumber_part_character_index is 0) and (we had a period recently) store the current_character in the current position of the current callnumber part endif endif ============================================= remember that we had neither letter nor digit increment the character_index endif if (the current_character is a period) and (the next_character is not a space) ============================================= increment the callnumber_part_character_index if (the length of callnumber_part[callnumber_part_index] is 0) callnumber_part_character_index is set to 0 if (the current_character is not a period) store the current_character in the current position of the current callnumber part else if (callnumber_part_character_index is 0) and (we had a period recently) store the current_character in the current position of the current callnumber part endif endif ============================================= ----------------------------------- increment the callnumber_part_index set callnumber_part_character_index to start ----------------------------------- increment the character index endif else >> currently have a separator and the next_character is not a separator if (the current_character is a period) if (the last character was not a letter) ----------------------------------- increment the callnumber_part_index set callnumber_part_character_index to start ----------------------------------- endif ============================================= increment the callnumber_part_character_index if (the length of callnumber_part[callnumber_part_index] is 0) callnumber_part_character_index is set to 0 if (the current_character is not a period) store the current_character in the current position of the current callnumber part else if (callnumber_part_character_index is 0) and (we had a period recently) store the current_character in the current position of the current callnumber part endif endif ============================================= if (the last character was a letter) ----------------------------------- increment the callnumber_part_index set callnumber_part_character_index to start ----------------------------------- endif else ----------------------------------- increment the callnumber_part_index set callnumber_part_character_index to start ----------------------------------- endif remember that we had neither letter nor digit endif endif endif increment the character_index end while >> pad callnumber parts to the correct length for sorting, if necessary, >> and perform some further editing loop through the callnumber parts grab a callnumber part and get its length if (the current callnumber part is all digits) ++++++++++++++++++++++++++++++++++++++++++++++++ if (we're at least on the fourth callnumber part) remember if the previous callnumber part is all letters if (we have a non-empty previous callnumber part) remember if the previous callnumber part is all letters, excluding the first character endif if (the last character was a letter) or (the previous callnumber part is all letters, exluding the first character) if (the second callnumber part before the current one is all digits) prepend a period to the current callnumber part remember that we did this endif endif endif if (we just prepended a period to the current callnumber part) pad this callnumber part with leading zeros endif ++++++++++++++++++++++++++++++++++++++++++++++++ else if (the first character of the current callnumber part is a letter or a digit) and (the current callnumber part is not empty) if (this callnumber part is all digits except for the last character, which is a separator) ++++++++++++++++++++++++++++++++++++++++++++++++ if (we're at least on the fourth callnumber part) remember if the previous callnumber part is all letters if (we have a non-empty previous callnumber part) remember if the previous callnumber part is all letters, excluding the first character endif if (the last character was a letter) or (the previous callnumber part is all letters, exluding the first character) if (the second callnumber part before the current one is all digits) prepend a period to the current callnumber part remember that we did this endif endif endif if (we just prepended a period to the current callnumber part) pad this callnumber part with leading zeros endif ++++++++++++++++++++++++++++++++++++++++++++++++ endif endif endif store the current callnumber part in the output array endloop have array of callnumber parts, ready for sorting