|
Post by vovchik on May 19, 2019 9:58:50 GMT 1
Dear Alex, Working nicely on RPI3. I am using Fira Mono in my terminal (Fira is a free Mozilla font) and am seeing most if not all languages in proper UTF8. Here is the link: fira. With kind regards, vovchik
|
|
|
Post by alexfish on May 19, 2019 15:11:35 GMT 1
Hi vovchik
Thanks for testing
as regards the parser did find what can break lib tidy
esp this forum so now looking further esp code blocks
in posts of code blocks have something like
TRIML(to,TRIML(to,""))
obvious to lib tidy the myparser
TRIML(to,"") will snap IE output
stringstream ss (fmt)<br> html=" "<br> WHILE (getline(ss,to,';')) DO<br> TRIML(to," << Snaps here
so from this end need to correct the file if remove that bit say as a quick test
TRIML(to," ")
then all the output shows
IE test as is with
myhtml2text 'http://basic-converter.proboards.com/thread/1073/html2text?page=3'
then find line at about 616 of the index.html
and change as posted BR Alex
|
|
|
Post by alexfish on May 19, 2019 19:06:59 GMT 1
Hi All have fixed this forum bug + have added some markers for tables to see where I am at; !,|,@ and ~ the @ and the ~ denote a count of the table if the table header has a count of 4 "|" then if say the data line has a count of 5 or more of @ or ~ then hopeful can wrap the table data lines passed 4 at column 4 only note here if it meets a poorly composed site the nothing will show. no looking as to the Why New and final exec(Raspberry pi ARM) only now sorting the lib and table formatting BR Alex Attachments:myhtml2text.bz2 (25.55 KB)
|
|
|
Post by alexfish on May 19, 2019 22:19:55 GMT 1
Hi All with ref to above { some sites = 0} tested one link on the bacon web site QB64 - QuickBasic 4.5 compatible C++ emitter for Windows XP, Vista, 7, 8, 8.1 and 10, Linux and Mac OSX. QB64now know why did a search and came up with www.portal.qb64.org/it took a while to fully load in rpi chromiun BR Alex
|
|
|
Post by alexfish on May 25, 2019 21:45:30 GMT 1
Hi All
I still be on with the myparser @ Tables
most parser libs can convert the node into text , yet getting table info is a bit DIY so this is where I am sitting
getting the table bits into a format that now needs paring
here are two bits from this forum
part of the grid
<tbody> <tr id="board-5" class="o-board board item first"> <td class="icon"><img src="//storage.proboards.com/forum/images/icons/board-no-new-post.png"title="No New Posts" alt="No New Posts"> </td> <td class="main clickable"><span class="link"><a class="js-board__title board-link board-5" href="/board/5/news">News</a></span><br> <p class="description">News and announcements</p> <p class="moderators">Moderator: <a href="/user/1" title="@admin"data-id="1" class="o-user-link js-user-link user-link user-1 group-1">Pjot</a></p> </td> <td class="threads">93</td> <td class="posts">832</td> <td class="latest last"><a class="js-thread__title-link thread-link thread-1084 board-5" href="/threads/recent/1084">BaCon 3.9 released</a><br> <span class="date"><abbr class="o-timestamp time" data-timestamp="1556734540000" title="May 1, 2019 19:15:40 GMT 1">May 1, 201919:15:40 GMT 1</abbr></span></td> </tr>
and a table with the colspan and it has 2 Colums declared at the head of table contents. this was the one bit I was stuck on
++++++++++++++++++++++++++++++++++++++++++++
<tr class="last"> <td colspan="2"> <table> <tr> <td class="icon"><img src="//storage.proboards.com/forum/images/info/online_24.png" title="24 Hours" alt="24 Hours"> </td> <td class="info last"> <table> <tbody> <tr> <th>Users Online in the Last 24 Hours</th> </tr>
<tr> <td>1 Staff, <a class="members-link guest-prompt js-more-active-members" href="/members?dir=desc&sort=last_online&view=today">6Members</a>, 117 Guests.</td> </tr>
<tr> <td><a data-id="7" class="o-user-link js-user-link user-link user-7 group-0" href="/user/7"title="@vovchik">vovchik</a>, <a class="o-user-link js-user-link user-link user-217 group-0" data-id="217"title="@juppel" href="/user/217">juppel</a>, <a href="/user/1"title="@admin" data-id="1" class="o-user-link js-user-link user-link user-1 group-1">Pjot</a>,<a href="/user/42" title="@bigbass" data-id="42" class="o-user-link js-user-link user-link user-42 group-0">bigbass</a>,<a class="o-user-link js-user-link user-link user-196 group-0"data-id="196" title="@ptitjoz" href="/user/196">ptitjoz</a>,<a class="o-user-link js-user-link user-link user-157 group-0"data-id="157" title="@axelmoe" href="/user/157">axelmoe</a>,<a href="/user/57" title="@alexfish" data-id="57" class="o-user-link js-user-link user-link user-57 group-0">alexfish</a></td> </tr>
</tbody>
Now finalising the rest of the table & list decoder bits , IE order lists and un-ordered list BR Alex
|
|
|
Post by alexfish on May 28, 2019 2:26:20 GMT 1
Hi All
can remember saying there is something making the forum post bombing out
with the lib
Well another on going task solved
and what was it
looks as simple as this esp in bacon concat an some of what i posted with & in it;
str = "Huh" & "Huh" & "Sh*t"
simple cure was replace the & with correct amp; before using lib-tidy
my own decoder looks at these bits and leave the text as is if not correct!
this happens in lib-tidy when mode set is
tidySetCharEncoding( tdoc, "utf8" )
BR Alex
|
|
|
Post by alexfish on May 28, 2019 3:14:12 GMT 1
Hi Alex
on update on formatting looks
+ the links got rid of the number [12]here to look mor inline
like url_12_here
some bits of basic-converter.org
BaCon - BASIC to C converter Easy to learn arrow1.png BaCon syntax is based on old-school BASIC. app-bg.png arrow2.png Fast and powerfull You can use the power and speed of C. About url_11_BaCon is a free BASIC to C translator for Unix-based systems, which runs on most Unix/Linux/BSD platforms, including MacOSX. It intends to be a programming aid in creating tools which can be compiled on different platforms (including 64bit environments), while trying to revive the days of the good old url_12_BASIC. BaCon can be described as a translator, a converter, a source-to-source compiler, a transcompiler or a transpiler. It also can be described as a very elaborate preprocessor to C. BaCon is implemented in generic shell script and in itself. Therefore, to start using Bacon, the target system must have either url_13_Korn Shell, or url_14_ZShell, or url_15_Bourne Again Shell (BASH) available. Furthermore, BaCon also works with a newer Kornshell implementation like the url_16_MirBSD Korn Shell. The shell script implementation can convert and compile the BaCon version of BaCon. This will deliver the binary version of BaCon which has an extremely high conversion performance. On newer systems, the average conversion rate usually lies above 10.000 lines per second. Code converted by BaCon can be compiled by url_17_GCC, the url_18_Compaq C Compiler, url_19_TCC, the url_20_clang/LLVM compiler (and possibly by other C compilers), but also by C++ compilers like url_21_g++ or url_22_clang++. News May 1, 2019: BaCon 3.9 released - see url_23_CHANGES. Documentation updated. March 15, 2019: Today BaCon celebrates 10 years of existence! January 6, 2019: It came to my attention that the original PDKSH website does not exist anymore. I made a copy and a patch url_24_here. January 1, 2019: BaCon 3.8.1 released - see CHANGES. Documentation updated. October 1, 2018: BaCon 3.8 released - see CHANGES. Documentation updated. June 1, 2018: BaCon 3.7.3 released - see CHANGES. Documentation updated. March 1, 2018: BaCon 3.7.2 released - see CHANGES. Documentation updated. Older news can be found url_25_here. Downloads
Extensions Wrappers Wrappers are BaCon functions built around external libraries. INCLUDE: Highlevel Universal GUI url_145_here Documentation: url_146_online A tutorial for the High level Universal Gui (HUG) can be found url_147_here - created by forum member Bigbass. Import definitions when using HUG as a shared object: url_148_HUG imports Landscape generator url_149_here and a url_150_screenshot Textual copy between computers url_151_here A remake of url_152_Color Lines and a url_153_screenshot A remake of url_154_Qix and a url_155_screenshot A url_156_BaCon Forum checker and a url_157_screenshot by Tomaaz System url_158_graphical disk fill and a url_159_screenshot by Vovchik The url_160_15 puzzle game and a url_161_screenshot Example moving url_162_stars demonstration ( url_163_screenshot )
|
|
|
Post by alexfish on May 28, 2019 20:33:53 GMT 1
Hi All still on with the formating; as explained grid / table composition as a bit diy coding wise now in respect of this forum spanning and placement is difficult if say the left column is based on an image/icon it is where to put the first text column, although not very html logic in one sense but grammatical in another tried post fixing the 2nd column with icon title or alt text this is now how the bits looks :: from here can make these bits to look more like a grid/table as the libbstand now here is full Forum page :: a bit to look at html wise to show what i have done <tr id="board-5" class="o-board board item first"> <td class="icon"><img src="//storage.proboards.com/forum/images/icons/board-no-new-post.png" title="No New Posts" alt="No New Posts" /></td> <td class="main clickable"> <span class="link"><a href="/board/5/news" class="js-board__title board-link board-5">News</a></span><span class="viewing"> - 1 Viewing</span><br /> <p class="description">News and announcements</p> <p class="moderators"> Moderator: <a class="o-user-link js-user-link user-link user-1 group-1" href="/user/1" title="@admin" data-id="1">Pjot</a> </p> </td> <td class="threads">93</td> <td class="posts">832</td> <td class="latest last"> <a class="js-thread__title-link thread-link thread-1084 board-5" href="/threads/recent/1084">BaCon 3.9 released</a><br /> by <a class="o-user-link js-user-link user-link user-1 group-1" href="/user/1" title="@admin" data-id="1">Pjot</a><br /> <span class="date"><abbr title="May 1, 2019 19:15:40 GMT 1" data-timestamp="1556734540000" class="o-timestamp time">May 1, 2019 19:15:40 GMT 1</abbr></span> </td> </tr> and full page View :: a bit of the Iframe showing :: now need to sort that bit + have a step problem wher two blocks look merged , importand bit here is where the No New Posts is:: and code looks wise does not happen in normal grid/table. basic-converter.proboards.com/ The BAsic CONverter Forum Skip Navigation Home Help Search Goto the BaCon website Welcome Guest. Please Login or Register. The BAsic CONverter Forum Home General News Documentation Code Projects Troubleshooting area Bugs, features General Board Threads Posts Last Post News - 1 Viewing No New Posts News and announcements Moderator: Pjot 93 832 BaCon 3.9 released by Pjot May 1, 2019 19:15:40 GMT 1 Documentation - 1 Viewing No New Posts Tutorials & demonstrations Moderator: Pjot 131 1,817 Bacon manual translated into German by ptitjoz May 25, 2019 17:30:52 GMT 1 Code Projects No New Posts Programs, challenges, competitions Moderator: Pjot 204 4,258 html2text (c++) by alexfish May 28, 2019 3:14:12 GMT 1 Troubleshooting area No New Posts Problems, issues, tips & tricks Moderator: Pjot 392 2,880 OpenBSD 6.5 hug gtk patch by Pjot May 16, 2019 13:16:32 GMT 1 Bugs, features No New Posts Report a bug, request a feature Moderator: Pjot 238 2,272 sort command does not work by juppel Apr 28, 2019 8:37:02 GMT 1 Legend New Posts No New Posts New Posts No New Posts Forum Information & Statistics Board Statistics Threads and Posts Total Threads: 1,058 Total Posts: 12,059 Last Updated: html2text (c++) by alexfish ( May 28, 2019 3:14:12 GMT 1 ) Recent Threads - Recent Posts - RSS Feed Members Members Total Members: 201 Newest Member: shell Most Users Online: 144 Aug 22, 2013 23:04:29 GMT 1 View today's birthdays Members Online Users Online 0 Staff, 1 Member, 3 Guests. vovchik 24 Hours Users Online in the Last 24 Hours 1 Staff, 4 Members, 111 Guests. bigbass, Pjot, ptitjoz, alexfish iframe id="ad2" data-id="pb-bottom-left-ad" style= Click here to remove banner ads from this forum. This Forum Is Hosted For FREE By ProBoards Get Your Own Free Forum ! Terms of Service | Privacy | Cookies | FTC Disclosure | Report Abuse | Report Ad | Consent BR Alex
|
|
|
Post by alexfish on May 28, 2019 22:10:50 GMT 1
Hi All
Lib-Tidy Code Wise
here are some bits
A tidy doc Class
class TidyDocCPP{ TidyDoc tdoc; public: TidyDocCPP() { tdoc = tidyCreate(); } ~TidyDocCPP() { tidyRelease(tdoc); }
operator TidyDoc() const { return tdoc; }
// other accessor functions & clean up bits TODO
};
some of the code
found this best formulae for interrogating the bits
// TODO //const char* input = Get_Input(foo)
TidyBuffer output = {0}; TidyBuffer errbuf = {0}; int rc = -1; Bool ok;
TidyDocCPP tdoc ; tidySetCharEncoding( tdoc, "utf8" ) ok = tidyOptSetBool( tdoc, TidyXhtmlOut, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyIndentCdata, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyFixUri, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyMakeBare, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyDropEmptyParas, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyFixComments, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyHideComments, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyBreakBeforeBR, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyEncloseBlockText, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyEncloseBodyText, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyVertSpace, yes );
if ( ok ) rc = tidySetErrorBuffer( tdoc, &errbuf );
if ( rc >= 0 ) rc = tidyParseString( tdoc, input );
if ( rc >= 0 ) rc = tidyCleanAndRepair( tdoc );
if ( rc >= 0 ) rc = tidyRunDiagnostics( tdoc );
if ( rc > 1 ) rc = ( tidyOptSetBool(tdoc, TidyForceOutput, yes) ? rc : -1 );
if ( rc >= 0 ) rc = tidySaveBuffer( tdoc, &output );
Added if need to get buufer as c++ string grap the bit before freeing
IE
string foo_bar = (const char*)output.bp;
the only bits to free are the buffers
tidyBufFree( &output ); tidyBufFree( &errbuf ); // the TidyDocCPP tdoc is a Class and can not be tidyRelease() or C++ deleted() Externally // See the TidyDocCPP Class.
BR Alex
|
|
|
Post by vovchik on May 28, 2019 22:55:42 GMT 1
Dear Alex,
The results are looking very clean, I must say. Nicely done...
With kind regards, vovchik
|
|
|
Post by alexfish on May 29, 2019 0:06:13 GMT 1
Dear Alex, The results are looking very clean, I must say. Nicely done... With kind regards, vovchik Thank you for the kind words. And as a summary of where the lib sits at present regarding the formatting here as part of last bits of the Bacon Web Site: Do we really need to write BaCon keywords in capitals? This is the default. It is a consequence of the fundamental decision to pass expressions as-they-are to the C compiler, which otherwise can cause name conflicts with existing C keywords and C functions from external libraries. For example, the Libc function 'exit' would conflict with 'EXIT', 'read' with 'READ', 'free' with 'FREE', etc. However, BaCon can accept lowercase also by using the '-z' command line option. Feel free to do so at your own risk. Your mileage may vary. The generated C code is hard to read! That maybe so. But contrary to other BASIC to C converters, BaCon generates C code which does a lot of things for you. For example, when using string variables, BaCon adds code which makes sure that sufficient memory is allocated. And if the string gets bigger, BaCon also includes code to enlarge that memory. Another example is that BaCon can break out loops to any preferred level. To achieve such functionality, loops like 'while' and 'repeat' contain extra code. And there is much more going on behind your back, to make the BASIC program work as it should. Therefore, the generated C code may look confusing and complicated. How to compile GTK programs in OpenBSD? Compile as follows:./bacon -l pthread gtkprogram.bac How about a Win32 Version? That will never be. This project started just because there was no decent BASIC to C converter for Unix. However, BaCon works in a Cygwin environment. For Windows, a native Basic to C converter can be found here. Where can I find more free BASIC interpreters and compilers? Check out the website of The Free Country, they have a lot of programming tools for all kinds of languages! Is there any relation with this BACON Basic converter or the Bacon programming language? No. Are you the author of the GTK-server project? Yes, I am. How to provide feedback on this project? There is a Message Board where all issues and problems can be logged. Is there some other way I can help? PayPal - The safer, easier way to pay online! pixel.gif You might consider to provide a donation to keep BaCon free: Which platforms are supported? unix-logo.png linux.png tru64.png opensolaris.png freebsd.png openbsd.png macosx.png Last update: November 17, 2018 - © Peter van Eerten. BaCon uses Fossil for its software versioning and revision control. Fossil_SCM_small.png PayPal - The safer, easier way to pay online! pixel.gif to help the ongoing development of BaCon. tracks the site usage. counter for wordpress counter for wordpress Copyright © 2018 by Peter van Eerten, template by BLACKTIE.CO, implementation by Tomaaz Thanks again + BR Alex
|
|
|
Post by alexfish on Jun 1, 2019 21:30:16 GMT 1
Hi All
Status
as can see this lib is Html2Text & a one that can be semi interactive if using a search engine;
can almost say after testing some terminals
xterm = best
have written the primary formatting codes required to get the likes of tables to work
in essence I need to calculate the max width of the page & or set the fontsize
the 'and or' likes of lxterminal on the Raspberry pi has geometry but no cmd switch for the font size
hence thinking from an end user point of view
Recommends XTERM
whilst happy with the libs as stands yet there will have to be a boot script + the lib to get thinks to work as in a -dump or -interactive mode
now on with this
BR Alex
|
|
|
Post by alexfish on Jun 4, 2019 6:33:37 GMT 1
Hi All the Class as it stands now class TidyDocCPP{ TidyDoc tdoc; public: int doc_type; vector <string> tags; int tag_count; vector <int> tag_type; vector <int> tag_position; vector <string> text; vector <string> url; vector <string> menus; vector <int> has_menus; vector <int> has_url; vector <string> title; vector <string> alt; vector <string> doc; // terminal consol bit + dump; map <int ,string> address; map <int ,string> css; vector<string> column; vector<string> row; vector<string> color; vector<void *> callback; vector<int> has_callback; int scroll_ypos; int scroll_xpos; int cur_x; int cur_y; int row_pos; int dump; string html; string dump_string; vector <unsigned char * > image; vector <const char*> image_string; int dump_log; string term; string display; TidyDocCPP() { tdoc = tidyCreate(); }
~TidyDocCPP() { tidyRelease(tdoc); }
operator TidyDoc() const { return tdoc; }
// other accessor function & clean up bits TODO
};
and part of the bit string display:: If have GM convert or Convert then should be able to disply images direct from the site module test of concept an image from bacon web site can test string gm display http://www.basic-converter.org/qix.jpg
or display http://www.basic-converter.org/qix.jpg
BR Alex Attachments:
|
|
|
Post by alexfish on Jun 6, 2019 19:56:01 GMT 1
Hi All in with the terminal drivers have found a way to display images think this may be of use to a lot of terminal buffs RE: thinking of Peters xterm canvas Proof of concept 'image in xterm using sixel's (image is a YouTube arrow) placing image at x,y BR Alex Attachments:
|
|
|
Post by alexfish on Jun 6, 2019 21:50:46 GMT 1
xterm with Text over image not good but respectable Attachments:
|
|