Would such a feature be possible?
Where all the elements in the source file are parsed individually and group-assigned via gui to a pre-determined set of selectors/classes/markup? IE: whatever selectors you choose from a css file you've designed.
There would be a preview of how it displayed originally: so that you can recognize what part of a book it is -- chapter heading, epigram, first letter of the chapter, etc. Maybe you could jump through to each of the set of identical styling occurrences until you were satisfied that you knew what it was. Then you assign them all to the structure and classes that you prefer from your pre-developed list. It wouldn't exclude you from additional occurences later to selectors already assigned. (IE: you've found some more chapter headings with a slightly different style or another picture description or epigram or whatever.)
When you finish, you would have the structure of the book determined as something like a pure html file that could be opened and edited easily or convert pretty darn cleanly.
I don't know why, but this idea has recurred to me several times since the ebook editor went into Calibre. I know I'm not knowledgable enough on how complex this might be, but it seems to me that the conversion process does most of this determination already when it assigns individual "calibre#" classes to things. Lots of these classes are redundant, and if the original source was lacksadaisical about properly distinguishing similar things... I'm thinking particularly about conversion from DOC files -- so many paragraphs that are all variations on the same thing, generally the body paragraphs that make up the work overall. Or headers that are stylized paragraphs instead of properly labeled header styles, etc.
Seems to me this would be a way to use a wizard to harness human judgement into the conversion, collapse the number of classifications and assign structure efficiently. With the ebook editor, all the display stuff is there. With the conversion ability, all the parsing of groups of similar elements is being done already. It's just the assigning of those into structure that is the hard part to program -- so why not ask our pattern-recognition-genius brains instead? It could still occasionally be a lot of classes to look through, but at least the computer is doing a lot of the hard part of grouping things at the largest possible levels.
I hope I've explained clearly; it's a simple idea in my head but hard to put into words tonight. I feel a bit like I understand the concepts but lack the proper terminology.
Also, please point me to where it is if something like this already exists! :o It seems so obvious to me that this would result in clean conversions that I can't help feeling that it had to occur to someone else earlier that something like this is possible?
PS: this might work best as a conversion to FB2 -- pure structure? Then develop a CSS file for FB2 that converts easily to ePub, etc. Not sure. Maybe it limits you from adding things that didn't occur to the FB2 developers?
Where all the elements in the source file are parsed individually and group-assigned via gui to a pre-determined set of selectors/classes/markup? IE: whatever selectors you choose from a css file you've designed.
There would be a preview of how it displayed originally: so that you can recognize what part of a book it is -- chapter heading, epigram, first letter of the chapter, etc. Maybe you could jump through to each of the set of identical styling occurrences until you were satisfied that you knew what it was. Then you assign them all to the structure and classes that you prefer from your pre-developed list. It wouldn't exclude you from additional occurences later to selectors already assigned. (IE: you've found some more chapter headings with a slightly different style or another picture description or epigram or whatever.)
When you finish, you would have the structure of the book determined as something like a pure html file that could be opened and edited easily or convert pretty darn cleanly.
I don't know why, but this idea has recurred to me several times since the ebook editor went into Calibre. I know I'm not knowledgable enough on how complex this might be, but it seems to me that the conversion process does most of this determination already when it assigns individual "calibre#" classes to things. Lots of these classes are redundant, and if the original source was lacksadaisical about properly distinguishing similar things... I'm thinking particularly about conversion from DOC files -- so many paragraphs that are all variations on the same thing, generally the body paragraphs that make up the work overall. Or headers that are stylized paragraphs instead of properly labeled header styles, etc.
Seems to me this would be a way to use a wizard to harness human judgement into the conversion, collapse the number of classifications and assign structure efficiently. With the ebook editor, all the display stuff is there. With the conversion ability, all the parsing of groups of similar elements is being done already. It's just the assigning of those into structure that is the hard part to program -- so why not ask our pattern-recognition-genius brains instead? It could still occasionally be a lot of classes to look through, but at least the computer is doing a lot of the hard part of grouping things at the largest possible levels.
I hope I've explained clearly; it's a simple idea in my head but hard to put into words tonight. I feel a bit like I understand the concepts but lack the proper terminology.
Also, please point me to where it is if something like this already exists! :o It seems so obvious to me that this would result in clean conversions that I can't help feeling that it had to occur to someone else earlier that something like this is possible?
PS: this might work best as a conversion to FB2 -- pure structure? Then develop a CSS file for FB2 that converts easily to ePub, etc. Not sure. Maybe it limits you from adding things that didn't occur to the FB2 developers?