 
While I was working on the enhanced document symbol menu for Neovim, I came across an interesting fact. The comparison of strings takes the same time as the comparison of integers. After an hour of looking up information on the internet, I found very few pieces of information about it. However, the information didn't have enough details and was contradictory. As a result, I decided to find it on my own. I had a hunch that the string comparison might be more sophisticated, because of this piece of code in LuaJIT:
typedef struct GCstr { GCHeader; uint8_t reserved; /* Used by lexer for fast lookup of reserved words. */ uint8_t hashalg; /* Hash algorithm. */ StrID sid; /* Interned string ID. */ StrHash hash; /* Hash of string. */ MSize len; /* Size of string. */ } GCstr;
The field hash gave me an impression that it might be used in the string comparison.  I imagined that strings are compared by their hashes and then by their content.  I checked the references to the hash field in the code, but it wasn't used for string comparisons.
I found a function that seemed to implement string comparison. Here it is:
/* Ordered compare of strings. Assumes string data is 4-byte aligned. */ int32_t LJ_FASTCALL lj_str_cmp(GCstr *a, GCstr *b)
Even the comment whispered that, but GDB didn't stop at the function. I was puzzled, but found that I can get the byte code of my program with the command:
luajit -bl source.lua
I hoped that I could find the massive switch over all of the types of the Lua virtual machine operations and see the code which is invoked for the string comparison operation. I created this sample to keep the byte code easier to comprehend.
local clock = os.clock local function sleep(n) -- seconds local t0 = clock() while clock() - t0 <= n do end end local function main() print("game begins\n") sleep(1) print(arg[1] == "arst") print("game over\n") end main()
The pause and the print commands were needed to see in GDB whether the execution stopped somewhere inside the initialization of the script.  The byte code of the program is 40 lines long, so I post here only the relevant place:
0007 GGET 0 0 ; "print" 0008 GGET 2 2 ; "arg" 0009 TGETB 2 2 1 0010 ISEQS 2 3 ; "arst" 0011 JMP 2 => 0014 0012 KPRI 2 1 0013 JMP 3 => 0015 0014 => KPRI 2 2 0015 => CALL 0 1 2 0016 GGET 0 0 ; "print" 0017 KSTR 2 4 ; "game over\n"
So, ISEQS it is.  I found a few places in the code the operation occurs, but GDB didn't stop at any of them. I saw a few matches in the files with the extension dasc.  I ignored them in the beginning, but later I ran out of sensible guesses and just checked.  I found that the names of the files have architecture, so I picked my x64, and inside of them I found this:
case BC_ISEQS: case BC_ISNES: vk = op == BC_ISEQS; | ins_AND // RA = src, RD = str const, JMP with RD = target | mov RB, [BASE+RA*8] | add PC, 4 | checkstr RB, >3 | cmp RB, [KBASE+RD*8]
Is it some sort of assembly? It is, and its name is DynASM. As of now, the page has this text:
Sorry, right now there is no proper documentation included other than some Examples and of course the source code. The source is well documented, though (IMHO).
However, the same page refers to the unofficial documentation and the tutorial from the same website. To be honest, I mostly worked on website backends. Reading assembly is not something I do regularly. I understand some basics, which learned from:
Nevertheless, out of the 7 lines seemed DynASM, I can see the following:
int_AND doesn't make sense so far.  I see that it's macro |.macro ins_AND; not RD; .endmacro, and I understand the macro's content, but its purpose is not yet clear.|.define RB,		rbp		// Must be rbp (C callee-save).// RA = src, RD = str const, JMP with RD = target.  I understand that the RA register has the string (or address to it) from my arg[1].  The register RD has my constant "arst".arg[1] is set to RB.  However, I see that the value is multiplied by 8, and it's taken with some shift BASE.checkstr.  It unwraps to the code which checks the type of the value arg[1].>3.  This puzzled me until I found in the tutorial this chapter.  It says that >3 jumps to the next code under the mark 3:.
Given that, the entire line can be read as "make sure that the type in RB is string or jump to "3:"."arst" with manipulations similar to the item 5, and compares it to the register RB, where we have our string from arg[1].After that piece of code I see these five lines:
if (vk) {
  |  jne >2
} else {
  |  je >1
}
However, I have to be sure that I'm looking at the right place.  I confirmed that this was the code which is invoked when strings are compared.  Namely, I swapped jne >2 and je >1, which I expected to negate the comparison operator, and it did.
So far, I found the code which performs the comparison, and the code allegedly has constant time complexity.  Let's confirm.  I ran luajit and set the breakpoint at lj_BC_ISEQS to see the block of the generated assembly.
Please note, that I compiled luajit with the following command:
make DEFAULT_CC="musl-gcc -static" BUILDMODE="static" TARGET_XCFLAGS=-DLUAJIT_NO_UNWIND Q=
The block of the assembly code is big enough, so I cover only the part we looked at, splitting it into logical parts:
These two lines set string from arg[1] and the constant "arst" to registers.
Dump of assembler code for function lj_BC_ISEQS: => 0x000000000043508f <+0>: not rax 0x0000000000435092 <+3>: mov rbp,QWORD PTR [rdx+rcx*8]
These two lines correspond to the ins_AND macro in the DynASM code.  It's a good sign.  The address in the register rbp is 0xfffdfffff7fe3038. When I try to access it, I get Cannot access memory at address 0xfffdfffff7fe3038.  Why?  Hasn't Linux allocated the address for the process?  Let's have a look at the memory mappings of the process:
(gdb) info proc map
process 336700
Mapped address spaces:
          Start Addr           End Addr       Size     Offset  Perms  objfile
            0x400000           0x401000     0x1000        0x0  r--p   /home/laladrik/Nest/personal/luajit/current/src/luajit
            0x401000           0x4cb000    0xca000     0x1000  r-xp   /home/laladrik/Nest/personal/luajit/current/src/luajit
            0x4cb000           0x4eb000    0x20000    0xcb000  r--p   /home/laladrik/Nest/personal/luajit/current/src/luajit
            0x4eb000           0x4ed000     0x2000    0xea000  rw-p   /home/laladrik/Nest/personal/luajit/current/src/luajit
            0x4ed000           0x4ee000     0x1000        0x0  rw-p
            0x4ee000           0x4ef000     0x1000        0x0  ---p   [heap]
            0x4ef000           0x4f0000     0x1000        0x0  rw-p   [heap]
           0x58f0000          0x5900000    0x10000        0x0  r-xp
      0x7ffff7fd9000     0x7ffff7ff9000    0x20000        0x0  rw-p
      0x7ffff7ff9000     0x7ffff7ffd000     0x4000        0x0  r--p   [vvar]
      0x7ffff7ffd000     0x7ffff7fff000     0x2000        0x0  r-xp   [vdso]
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0  rw-p   [stack]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0  --xp   [vsyscall]
None of the address ranges includes the value.  It's not an address.  Let's have a look at the DynASM code again. It has the lines: |  mov RB, [BASE+RA*8] and |  checkstr RB, >3.  We see that the register RB (which is rpb in the regular assembly) is used in both of them.  The first line sets it, the second reads it for the macro checkstr.  The code of the macro is
|.macro checkstr, reg, target; checktp reg, LJ_TSTR, target; .endmacro
The macro has 2 arguments: reg and target.  The argument reg is used in the macro checktp, the second argument, which is passed to the macro, is LJ_TSTR.  The code of the macro is
|.macro checktp, reg, tp, target | mov ITYPE, reg | cleartp reg | sar ITYPE, 47 | cmp ITYPEd, tp | jne target
The ITYPE register is r11.  However, the name says the value in the register is a type. Given that, checktp starts making sense - it's "check type".  Therefore, the register rbp contains an encoded type, and this code decodes it and compares to LJ_TSTR.  Grepping the code of LuaJIT, I found this in lj_obj.h
#define LJ_TSTR (~4u)
The value ~4u is 0xfffffffb, which is the value we can see in the assembly code
0x000000000043509d <+14>: shl rbp,0x11 ; cleaning the original value 0x00000000004350a1 <+18>: shr rbp,0x11 ; cleaning the original value 0x00000000004350a5 <+22>: sar r11,0x2f ; decoding the type 0x00000000004350a9 <+26>: cmp r11d,0xfffffffb ; checking the type 0x00000000004350ad <+30>: jne 0x4350d4 <lj_BC_ISEQS+69> ; leaving the code if the check fails.
In conclusion, in the first place the type is checked, which makes sense for a scripting language.
As of now, the value in the register rbp has changed.  Effectively, the first 11 bits of the value are zeroes now.  Given that, the value 0xfffdfffff7fe3038 became 0x7ffff7fe3038.  After we have this two lines:
0x00000000004350af <+32>: cmp rbp,QWORD PTR [r15+rax*8] 0x00000000004350b3 <+36>: je 0x4350b5 <lj_BC_ISEQS+50> 0x00000000004350b5 <+38>: movzx eax,WORD PTR [rbx-0x2]
The value at the address pointed by the expression [r15+rax*8] is 0xf7fe3038,  well at least GDB says it.  The last 4 bytes of the value are equal to the last 4 bytes of the value at the register rpb.  However, the cmp instruction sets Zero Flag.  That means that the values are equal.  Let's do a little hack.  Before the instruction let's put the following code:
| push rdx | mov rdx, [KBASE+RD*8] | pop rdx
With this, I see that the value is 0x7ffff7fe3038 from the register rdx.  After poking around, I realized that I misused GDB. I should've used x/g $r15+($rax*8) to see 8 byte long value at the address. I didn't set any format after x, as a consequence I saw just 4 byte long value, which is the default behavior.
As a result, a string comparison in Lua is comparison of two integers. The first for the type, the second for the value. This is the reason why string comparison takes as long as integer comparison.
Despite the fact, that I found the way the strings are compared, I've got some questions which I couldn't let go without answers.
I can't help but wonder about the value 0x7ffff7fe3038.  It's big enough to look like an address.  However, the value at the address is zero.  I had a hunch that the string is located somewhere near. Let's check the address which stores the value.  The address is 0x7ffff7fe3038,  and this range includes it:
0x7ffff7fd9000 0x7ffff7ff9000 0x20000 0x0 rw-p
I guess my string constant "arst" is somewhere there as well.  Let's check it:
(gdb) find 0x7ffff7fd9000, (0x7ffff7ff9000 - 1), "arst" 0x7ffff7fe3050 1 pattern found.
It is there!  The address 0x7ffff7fe3050 is 24 bytes away from 0x7ffff7fe3038.  This is exactly the amount of memory which the structure GCstr takes if LuaJIT is compiled without the flag -DLUAJIT_DISABLE_GC64.  The flag affects the size of the fields which are embedded by the macro GCHeader, which is inside the structure.  The macro embeds three fields, one of which takes either 4 or 8 bytes depending on the flag.  I compiled LuaJIT with the same parameters and with the flag. After that, the address I got 0x40008680 and the pattern is found at 0x40008694, which is 20 bytes as expected.
After I set a watch point on the address in GDB, I found the code which allocates memory for the string.
static GCstr *lj_str_alloc(lua_State *L, const char *str, MSize len, StrHash hash, int hashalg) { GCstr *s = lj_mem_newt(L, lj_str_size(len), GCstr); // ... }
The macro lj_str_size(len) unwraps into the following code:
#define lj_str_size(len) (sizeof(GCstr) + (((len)+4) & ~(MSize)3))
This shows two things:
GCstr as I guessed.In conclusion, the value which is used to compare the strings is their addresses.
Strings are compared by their addresses - clear. Why is the address the same? Is there any registry of strings? Well, there is. The global state owns the registry. The registry is a hash-table of chains (linked lists). LuaJIT computes Add-Rotate-Xor hash for the string in constant time, looks up the registry by the hash for the same string. If something matches the string, the creation is canceled, and the match from the registry is returned. The following code shows the implementation (careful - original formatting):
/* Intern a string and return string object. */ GCstr *lj_str_new(lua_State *L, const char *str, size_t lenx) { // ... GCobj *o = gcref(g->str.tab[hash & g->str.mask]); // ... while (o != NULL) { GCstr *sx = gco2str(o); if (sx->hash == hash && sx->len == len) { if (memcmp(str, strdata(sx), len) == 0) { // ... return sx; /* Return existing string. */ } coll++; } coll++; o = gcnext(o); } // ... }
It's worth to mention that LuaJIT deals with the case when a string has more than 32 collisions. It recalculates hashes for the items of the chain.
1 to a convenience variable if certain conditions were met and continued execution. That means the breakpoint doesn't interrupt the process.  At the place, which I needed to inspect, I set a breakpoint with a condition which checks that convenience variable is equal to 1.