Talk about anything here as long as it is not against the rules. Post count not affected.
Oct 13th, 2019, 3:33 pm
I only know the upload method. I would be great though if we could get books from that site as they do have a good selection.
Oct 13th, 2019, 3:33 pm
Dec 8th, 2019, 11:48 pm
Is there a WORKING way to rip from Scribd? In short yes...and no.
Here's how *I* have been able to do it--so far...
And it appears to be working fiiiiine...?????
AFAIK, anyway...? Tested on an Android phone, testing on android tablet later, as always, your device/mileage may vary lol.
This WILL require the installation of ONE .apk, a file viewer which I will return to in a minute.
First, what you're going to want to do--if you have reactivated your subscription to Scribd, that is--
(If not, hit up some poor sod like me that's paying
lol!) Is go to the page of the book you want to read/keep. Say, here:
https://www.scribd.com/read/202741013/Wanderling-s-Choice#
THREE THINGS:
ONE: you will PROBABLY, in fact MOST LIKELY (not 100% certain but I already have it so the point is moot on my end) need to have the Scribd app itself in case you want to hit the "real" Download button
TWO: MAKE SURE the page link says "read" NOT "book" UNLESS YOU HAVE NOT ALREADY DOWNLOADED IT OFFLINE YET
THREE: This method will work ONLY with the program/s specified, exactly as linked.
IF you are not on Android, I cannot help you lol!
Now, once you've gone to the "read" page, it SHOULD pull you up the file you are attempting to--here's the surprise!--read.
Go into your browser's menu (in my Chrome for Android it's the three dots in the upper-right-hand corner, again, ymmv)
HERE'S THE KEY: PRESS THE "DOWNLOAD" BUTTON ((android it's the underlined arrow)
With luck, the page SHOULD say "downloading".
NOW, once that file has been downloaded--with any luck and all good fortune it should work that way!--
Go into your "Downloads" folder on your device and you should see a file labeled (thisismybooktitle).MHTML
We want this. This is good. DO NOT ATTEMPT TO RELOCATE THE FILE YET, OR POSSIBLY EVER (lol).
No, but the first time I moved the .mhtml into a dedicated folder and tried to open it the viewer told me it didn't exist--if it does that, redownload it and move it a second time. Technology is dumb.
Install THIS VIEWER:
https://www.mobileaction.co/app/android ... htmlviewer
I have tried AT LEAST three others (Adobe Acrobat, Moon+ Reader Pro and another mht viewer), NONE of which did the thing.
Now, open viewer, open file IN viewer (might take a couple of mashes to get it to open use the "triple-stack" icon in the top-left corner, but rest assured the two I tried so far have worked fine!)
Read offline, profit!
If it should be necessary download and run the Scribd app first to get the data the initial .mhtml is based on.
OR, if anybody knows a way to WRANGLE that .mhtml data into some kind of "preservable" ebook file (.i.e. .epub, .mobi, .pdf), SEND HELP, PLZ!
Dec 8th, 2019, 11:48 pm
Dec 14th, 2019, 9:17 am
Dec 14th, 2019, 9:17 am
Dec 14th, 2019, 1:45 pm
Space001, I have tried a thousand times to install that >*REDACTED!*< greasemonkey script but can't find the way to do it... >.< If there is any way you could help us download by using it, that would be greatly appreciated!
I have *attempted* to use that mht converter, HOWEVER, it ONLY wants files with the MHT extension, NOT MHTML. I will copy one and rename it to try one out for kicks but I am not certain I hold out a lot of hope...
Dec 14th, 2019, 1:45 pm
Dec 15th, 2019, 5:57 am
Nope, the mht program says file is corrupted.
Any help?
Dec 15th, 2019, 5:57 am
Dec 15th, 2019, 10:08 pm
Anyone find a way to rip books from scrbid? So far I have been unsuccessful in all my attempts.

I tried your method AquilaLorelei, but it only downloads the first page as .mhtml file and then when you click it to open it or change pages it takes you back to the scribd app. The full book is contained in the .json files but i have no idea what to do with those.
Dec 15th, 2019, 10:08 pm
Dec 16th, 2019, 4:41 am
@thevoiceofreason: okay, let me see if I I simplify this any, because the .mhtml file that downloaded from Scribd pulled up fine for me in the secondary .Mhtml reader program, found HERE:
https://apps.alldbx.cn/apps/5da02aa14a4629cdd23dbd2b
Cut and paste this link into your browser:
https://www.scribd.com/read/239948642/G ... d-Betrayal
As an example, say.
Hit the "Download" button, it SHOULD come out as "Greek Myths[...].mhtml
Open .mhtml reader. Open .mhtml file IN .mhtml reader by clicking grey prompt bar indicating "Select a File to Open." WARNING, it MAY show up as a "frame" initially. It also may give you a message that says "We've moved you to where you were reading on your.. " If so, close that. Also, if given a "frame," click on the "triple-stack" icon in the upper-left-hand corner. It SHOULD load the file properly, though TBH It MAY take a couple of "button-mashes," perhaps involving the "Back" button, but TRUST ME. It will only APPEAR to kick you back to Scribd, BUT if you minimize out of the program it SHOULD indicate you're in the MHT reader. Let me know if that works.
Dec 16th, 2019, 4:41 am
Dec 18th, 2019, 9:33 am
@Aquilalorelei well I researched that userscript and other methods I posted..but none of them are supposed to work on Scribd books. I didn't read things properly before posting.

The only solution I have found that is supposed to work on books is a python script that was on github but later taken down.
https://libraries.io/github/ritiek/scribd-downloader this is an archive of the project, but I don't have a Scribd premium account to test it. (It says it works on books, in the description given )
Also, if you search on Google 'json to text converter' a number of links pop up. Can you try using them to rip text from the book?
I found the mhtml converter from a Google search, there were other sites too, I posted one that worked on a mhtml from another site.(.mht is short for mhtml)
Maybe scribd does something to make them not work.
P.S. If you don't mind, can you upload a Scribd json or mhtml somewhere? I don't have a premium Scribd account.
I'll try and see if I can get something from it.
Dec 18th, 2019, 9:33 am
May 29th, 2020, 9:09 pm
https://www.reddit.com/r/Piracy/comments/gbabde/progress_on_scribd_book_ripping/

New way to deal with . JSON files from Scribd. Copying the steps here:
( credit to
u/hurltossthrowaway from reddit)


Prerequisite: - Paid Scribd account - Microsoft Excel - Microsoft Word

Phase I : Extract

1. Start by logging into your Scribd account and downloading a book to your mobile device. As the named feature implies, you cannot do this from a macOS or Windows based laptop.

On iOS, the files are stored in /var/mobile/Containers/Data/Application/Scribd/Library/Application Support/documents. This folder will contain one or more folders with a numerical notation which corresponds to the document ID referenced in the Scribd URL (i.e. https://www.scribd.com/book/[documentID]).

2. Send the folder contents to a proper computer (zip/email/unzip).

3. Start a new document in Microsoft Excel and select Data > Get Data > From File > From JSON.

4. Navigate to the folder you transferred, then to the chapters folder (there appears to be one chapter folder for every item in the Scribd book's table of contents). Select the chapter folder's contents.json file and click Open.


About the contents.json structure. This structure has a lot of columns and rows, but most are not relevant to extracting text. Conversely, very few columns are relevant. As we open these structures, you want to be on the lookout for one of these three types: words, src, and cells. Each of these contain substructures with content you want. In each case, you kind of just keep drilling down in the JSON tree until there are no structures left and you arrive to the actual content.

The 'words' structure contains a 'text' substructure. If there are no further structures, this likely contains text. Otherwise you will find another 'words' substructure with text.

The 'src' structure has no substructure and contains references to images stored in the file structure which you will want to rebuild the document in all its fidelity.

The 'cells' structure is the most complex, but it's still really just a bunch of nested structures which you must click through to get to the actual text content. The structure looks a little something like this: - cells - nodes - words - text - words



5. With the contents.json file opened, you should be looking at a two-row structure, the first row named 'blocks' and the second named 'title'.

6. Click on the word 'List' in the 'blocks' row, which takes you to a new screen. Click the Convert to Table button in the top-left of your screen. A prompt appears with two questions - accept the defaults and click OK.

7.
    A table should appear with one Column (named 'Column1') and a few record rows.
    Click the small Expand icon to the right of the 'Column1' name. A dropdown list will appear. Be sure to click 'Load More' to ensure all columns appear.
    Deselect 'All Columns' and select any checkbox named either src, cells, or text.
    Click OK.

8. Depending on the choices you made, you will get a table with one or more columns.
Some columns already appear to have actual text from the book in them. Other columns may appear to have text references to images in them. Other columns may appear blank except for the same Extract icon you saw earlier. These columns will need to be expanded further until you get additional text.

9. Once you've expanded as many src, text, or cells columns in this JSON file, click the Close & Load button in the top left. This will load the data into Excel in a tabular format.

Phase II: Transform

10. At this stage, you should have anywhere from one to four columns of data in excel, with about 1-2 words per cell.

This is the point you want to be moving the content to Microsoft Word to clean up the content with a bunch of simple search and replace operations.

These are the types of things you want to be looking for:

remove all tab characters (t) by replacing them with nothing
find all double paragraph entries (pp) and replace with a unique placeholder like "trumpnuts"
find all single paragraph entries (p) and replace with a single space (" ")
find all unique placeholders you created (like "trumpnuts") and replace with a single paragraph entry (p).
After you perform these quick search and replace operations, do a quick scan for any other cleanup activities you can fix. You can automate all the search and replace activities by creating a macro and saving the Word document as a template.


11. Save the chapter file as a .docx file. It's not a bad idea to name the document after the chapter from which it was derived.

Phase III: Load

Repeat Phases I and II for every chapter in the book.

12. When you're done, open the first chapter you converted in Microsoft Word. At the end of the document, choose Insert > Object > Text From File... and select all the other chapter documents you converted.

At this point you should have an entire book's worth of content. You may want to style the document using block styles and headings. You can generate a table of contents as well.

When you're done, import the docx file into Calibre, where you can pull in metadata, document covers, and convert to epub, PDF, azw, or whatever you want.
May 29th, 2020, 9:09 pm
May 30th, 2020, 6:43 pm
Is there a way to do that with OpenOffice software instead of Microsoft? It GALLS me that something so theoretically simple should be neyond my grasp because I am using OpenOffice Calc and Writer instead of Microsoft Excel and Word. Cheers for the help!
May 30th, 2020, 6:43 pm
May 30th, 2020, 9:33 pm
Start a new document in Microsoft Excel and select Data > Get Data > From File > From JSON.


Can you do the above in the Excel equivalent of OpenOffice? (Calc I think)

Or you can ask at the reddit thread I linked, they might help.
May 30th, 2020, 9:33 pm
Oct 18th, 2020, 3:09 pm
Is there any way someone could download the books for me if I send the proper links?
Oct 18th, 2020, 3:09 pm
Oct 22nd, 2020, 9:04 pm
the scrimtec thing works, but they books will not have proper retail layout and have some minor bugs. mostly good enough tho.
Oct 22nd, 2020, 9:04 pm