Weird Path in Git directory Structure -
i have seen following path in directory of .git.
.git/object/3b/12abef878787483abeceddaa5544489abff789a
when infact sha 3b12abef878787483abeceddaa5544489abff789a
which sha of contents of file , hence should stored without /. why did git store blob in weird path,what advantages of doing so?
the reason prevent there being many files in 1 directory. sha values start 3b
stored in sub-directory 3b
, workload on single directory 1/256th of if blobs in single directory. ultimately, speeds performance; there less searching find particular blob.
you can see similar effects in terminfo
directory, entries sub-divided directories based on first letter of terminal entry. cpan system has authors/id/a/aa/aardvark
in naming hierarchy.
please elaborate little bit me.
suppose git
wants find blob 3b12abef878787483abeceddaa5544489abff789a
, directory partitioning scheme not in use. there might be, sake of argument, 512 blobs, , file, kernel might have read 512 directory entries in .git/objects
find right entry.
now suppose directory partitioning scheme in use, , miracle of statistical mischance, there 256 subdirectories each containing 2 files. kernel @ worst has read 256 directory entries 2-byte names in each entry (compared 512 directory entries 32-byte names) in ./git/objects
directory, , has read @ worst 2 entries 30-byte names in ./git/objects/3b
directory.
there complicating factors, such imperfectly balanced hashing , memory caching , disk accesses, general idea distributing files multiple subdirectories means os kernel has less work find file. if number of files in directory extend multiple hundreds, worth considering breaking down.
Comments
Post a Comment