Merge pull request #2 from chucklz1515/known-bugs

Known Bugs
2025-01-31 16:32:45 -05:00
parent aaab436dbd 7b8d8f41ba
commit d603e189a2
2 changed files with 44 additions and 10 deletions
--- a/README.md
+++ b/README.md
@@ -148,7 +148,7 @@ Yay! I see file markers! That means I should be able to extract them right?

 ...there's no files here are there?

-## I DO!
+### I DO!
 I knew I would fall into a recurring pattern of extracting/decompressing possible scenarios when that may not even be whats happening. I also couldn't garuntee that I would have filenames or paths. I really needed all 3:
 * data
 * filename
@@ -243,6 +243,34 @@ After mapping it out, I then proceeded to write a python script that would:
 * Scrape the file's full path information
 * Copy the data file to the new cluster with the path information

+# Limitations/Known Issues
+## Zero Byte Data Files
+Once I ran the script to start scraping the data, I found that there were several cases where the script will crash due to an error similar to:
+```
+old file: /mnt/test/5.1b_head/all/#5:d8b4db0d:::1000009b975.00000000:head#/data
+new file: /mnt/ceph-fs-storage/data/some_dir/some_file
+Traceback (most recent call last):
+  File "/root/scrape.py", line 158, in <module>
+    shutil.copyfile(pgDataFile, newFile)
+  File "/usr/lib/python3.11/shutil.py", line 258, in copyfile
+    with open(dst, 'wb') as fdst:
+         ^^^^^^^^^^^^^^^
+NotADirectoryError: [Errno 20] Not a directory: '/mnt/ceph-fs-storage/data/some_dir/some_file'
+```
+
+This occurs when:
+ * Previously, the script found a `_parent` and `data` file for `/mnt/ceph-fs-storage/data/some_dir`.
+   * The `data` file had a size of 0 bytes.
+   * The script created a __file__ at the path `'/mnt/ceph-fs-storage/data/some_dir`.
+ * Now, the script is trying to create a file at `/mnt/ceph-fs-storage/data/some_dir/some_file` only to find that `some_dir` is a file and not a directory.
+
+### Mitigations
+For now, I just skip processing any `data` file that has a size of 0 bytes. I thought of 2 reasons why there are `data` files with size of 0 bytes:
+ * It is not a file, but a (possibly empty?) folder
+ * The file is empty
+
+I'll handle them later as I'm focused on getting back online as fast as possible.
+
 # Conclusion
 I was able to successfully recover my files. Granted, they have no metadata (correct permissions, datetime, etc), but I haven't lost anything.

--- a/scrape.py
+++ b/scrape.py
@@ -21,8 +21,6 @@ relativeNextFileDefinitionAddress = 0x09 - offsetRelativeAddressCorrection

 # ####################################################################################################
 # walk through the filesystem
-
-# these are test paths
 testFileAttrName = '/mnt/test/5.f_head/all/#5:f1988779:::10002413871.00000000:head#/attr/_parent'
 testFileDataName = '/mnt/test/5.f_head/all/#5:f1988779:::10002413871.00000000:head#/data'

@@ -134,10 +132,18 @@ for fullPaths, dirNames, fileNames in os.walk(fuseRoot):
                    # ####################################################################################################
                    # FILE RECOVERY
                    
-                    # make that dir
-                    if not os.path.exists(newDir):
-                        os.makedirs(newDir)
-                        
-                    # copy the data file to the fullpath file
-                    if not os.path.exists(newFile):
-                        shutil.copyfile(pgDataFile, newFile)
+                    #BUG: a data file can be zero bytes. however i found that in the OSD, a zero byte data file can also be an empty folder
+                    #     in this case, ill skip them. by inverting the condition, i can process them if i find that im missing files
+                    
+                    # skip files data files that are 0 bytes
+                    if not os.stat(pgDataFile).st_size == 0:
+                        # make that dir
+                        if not os.path.exists(newDir):
+                            print('new dir:  ' + newDir)
+                            os.makedirs(newDir)
+                            
+                        # copy the data file to the fullpath file
+                        if not os.path.exists(newFile):
+                            print('old file: ' + pgDataFile)
+                            print('new file: ' + newFile)
+                            shutil.copyfile(pgDataFile, newFile)