|
Post by alexfish on Jun 15, 2019 20:45:43 GMT 1
Hi All
Now way behind
I found a problem with html tidy output / whilst the seperation & repair and convertion of html utf8 are good
the formatted output looks like the splits are at "space", now that space is not appended in the output Bah
hence have spent the week trying to resolve the isue and Failed ?
on the good side have managed to get the correct text out tidy using tidy bits
so now back on track;
this sample is from the new GetText()
text :Last update: May 2, 2019 - ©
text :Peter van Eerten
text : .
text : BaCon uses
text :Fossil
text : for its software versioning and revision control.
text :
text : tracks the site usage.
text :Copyright © 2019 by Peter van Eerten, template by
text :BLACKTIE.CO
text : , implementation by Tom
as can see I need to concat some of those bits / the initial output buffer does not have these trailing spaces
Hence need to pass the cleaned up tdoc directly to a kind of toc dump text the above is a example;
In short if on original at say
text : BaCon uses
text :Fossil
text : for its software versioning and revision control.
I could get
BaConuses Fossil for its software versioning and revision control.
instead of
BaCon uses Fossil for its software versioning and revision control.
below is some of the buffer before the dump
as can see in those bits there are no spaces at the end of line
<li style="list-style: none">That maybe so. But contrary to other BASIC to C converters, BaCon generates C code which does a lot of things for you. For example, when using string variables, BaCon adds code which makes sure that sufficient memory is allocated. And if the string gets bigger, BaCon also includes code to enlarge that memory. Another example is that BaCon can break out loops to any preferred level. To achieve such functionality, loops like 'while' and 'repeat' contain extra code. And there is much more going on behind your back, to make the BASIC program work as it should. Therefore, the generated C code may look confusing and complicated.<br> <br></li>Dt :If you think this is impossible or strange or
BR Alex
|
|
|
Post by alexfish on Jun 15, 2019 21:35:12 GMT 1
now of the text positioning re :: column / table
example before final decode of html utf8 bits not done in tidy
if can follow the position's can see have something that can look like an output similar to a normal browser ; only need to get rest of info from Href and images
BR Alex
this forum example
text :8:Home | The BAsic CONverter Forum
text :1:<
text :1:
text :1:
text :23:The BAsic CONverter Forum
text :99:Skip Navigation
text :1:
text :1:Home
text :1:Help
text :1:Search
text :4:Goto the BaCon website
text :1:Welcome Guest. Please
text :55:Login
text :1: or
text :56:Register
text :1: .
text :24:The BAsic CONverter Forum
text :24:Home
text :25:General
text :25:News
text :25:Documentation
text :25:Code Projects
text :25:Troubleshooting area
text :25:Bugs, features
text :28:General
text :18:Board
text :21:Threads
text :19:Posts
text :25:Last Post
text :68:News
text :24:News and announcements
text :1:Moderator:
text :104:Pjot
text :21:93
text :19:832
text :94:BaCon 3.9 released
text :1:by
text :104:Pjot
text :98:May 1, 2019 19:15:40 GMT 1
text :77:Documentation
text :24:Tutorials & demonstrations
text :1:Moderator:
text :104:Pjot
text :21:131
text :19:1,817
text :94:Bacon manual translated into German
text :1:by
text :112:ptitjoz
text :99:May 25, 2019 17:30:52 GMT 1
text :77:Code Projects
text :1:
text :23:- 2 Viewing
text :24:Programs, challenges, competitions
text :1:Moderator:
text :104:Pjot
text :21:204
text :19:4,272
text :94:html2text (c++)
text :1:by
text :110:alexfish
text :99:Jun 15, 2019 20:45:43 GMT 1
text :84:Troubleshooting area
text :24:Problems, issues, tips & tricks
text :1:Moderator:
text :104:Pjot
text :21:394
text :19:2,890
text :1:xubuntu 18.04 for the raspberry pi3 desktop only
text :1:by
text :109:bigbass
text :99:Jun 12, 2019 17:56:52 GMT 1
text :77:Bugs, features
text :24:Report a bug, request a feature
text :1:Moderator:
text :104:Pjot
text :21:238
text :19:2,272
text :94:sort command does not work - SOLVED
text :1:by
text :111:juppel
text :98:Apr 28, 2019 8:37:02 GMT 1
text :5:Legend
text :1:
text :7:New Posts
text :1:
text :7:No New Posts
text :5:Forum Information & Statistics
text :5:Threads and Posts
text :5:Total Threads: 1,060 Total Posts: 12,083
text :1:Last Updated:
text :94:html2text (c++)
text :1: by
text :110:alexfish
text :6: (
text :99:Jun 15, 2019 20:45:43 GMT 1
text :8: )
text :27:Recent Threads
text :1: -
text :25:Recent Posts
text :1: -
text :23:RSS Feed
text :5:Members
text :5:Total Members: 201
text :1:Newest Member:
text :110:shell
text :1:Most Users Online: 144
text :21:(
text :99:Aug 22, 2013 23:04:29 GMT 1
text :8: )
text :69:View today's birthdays
text :5:Users Online
text :5:0 Staff, 2 Members, 3 Guests.
text :110:alexfish
text :1: ,
text :106:vovchik
text :5:Users Online in the Last 24 Hours
text :1:0 Staff,
text :122:3 Members
text :5: , 78 Guests.
text :109:bigbass
text :1:
text :98:Click here to remove banner ads from this forum.
text :1:This Forum Is Hosted For FREE By
text :53:ProBoards
text :1:Get Your Own
text :71:Free Forum
text :5: !
text :55:Terms of Service
text :1: |
text :61:Privacy
text :1: |
text :69:Cookies
text :1: |
text :63:FTC Disclosure
text :1: |
text :64:Report Abuse
text :1: |
text :52:Report Ad
text :1: |
text :56:Consent
|
|
|
Post by vovchik on Jun 15, 2019 22:02:03 GMT 1
Dear Alex,
Thanks - it looks very good. Could I bother you to post the sources on one convenient place? I am now using Joe's xubuntu-for-pi distribution and get a segfault with the old PI binaries, for some reason (not all, but the html/tidy bits). I have tidy installed, too, but the xubuntu version is much newer - so.5.2.0 (not 0.99.so.0), although I could probably make a symlink and it might work.
With kind regards, vovchik
|
|
|
Post by alexfish on Jun 16, 2019 15:54:31 GMT 1
Hi Vovchik
will try later;
still in the process of trying to get the rest of the bits out of lib tidy as of now have 'how to get the attributes' not much in the way of examples exist; but getting there by trial an error esp the att
How to get the attribute attribute case of href
TidyAttr attr; attr = tidyAttrGetHREF( tnod ); if(attr){
ctmbstr bits= tidyAttrValue (attr ) ; cout << "Attr " << bits << ":" ;
hence will sort later
example now of what is going on
TAG : LI TAG : TABLE TAG : TR TAG : TD TAG : none TEXT:1:You might consider to provide a donation to keep BaCon free:
TAG : TD TAG : FORM TAG : INPUT TAG : none TEXT:1:
TAG : INPUT TAG : none TEXT:1:
TAG : INPUT TAG : none TEXT:1:
TAG : IMG TAG : BR TAG : LI TAG : none TEXT:5:Which platforms are supported?
TAG : LI TAG : BR Attr http://www.unix.org:TAG : A TAG : IMG TAG : none TEXT:1:
Attr http://www.linux.org:TAG : A TAG : IMG TAG : none TEXT:1:
Attr http://www.tru64.org:TAG : A TAG : IMG TAG : none TEXT:1:
Attr http://www.opensolaris.com:TAG : A TAG : IMG TAG : none TEXT:1:
Attr http://www.freebsd.org:TAG : A TAG : IMG TAG : none TEXT:1:
Attr http://www.openbsd.org:TAG : A TAG : IMG TAG : none TEXT:1:
Attr http://www.apple.com/macosx/:TAG : A TAG : IMG TAG : BR TAG : BR TAG : BR TAG : BR TAG : BR TAG : HR TAG : none TAG : none TAG : none TAG : DIV TAG : DIV TAG : DIV TAG : none TEXT:1:Last update: May 2, 2019 - ©
Attr mailto:peter@remove-no-spam.basic-converter.org:TAG : A TAG : none TEXT:59:Peter van Eerten
TAG : none TEXT:5: .
TAG : BR TAG : BR Attr http://www.fossil-scm.org/:TAG : A TAG : IMG TAG : none TEXT:1: BaCon uses
Attr http://www.fossil-scm.org/:TAG : A TAG : none TEXT:38:Fossil
TAG : none TEXT:6: for its software versioning and revision control.
TAG : BR TAG : BR TAG : none TAG : none TEXT:1:
TAG : none TAG : TABLE TAG : TR TAG : TD TAG : none TAG : none TAG : TD TAG : none TEXT:1: tracks the site usage.
TAG : BR TAG : BR TAG : none TAG : none TAG : none TAG : DIV TAG : DIV TAG : DIV TAG : P TAG : none TEXT:1:Copyright © 2019 by Peter van Eerten, template by
Attr http://www.blacktie.co:TAG : A TAG : none TEXT:34:BLACKTIE.CO
TAG : none TEXT:5: , implementation by Tomaaz
TAG : none TAG : none TAG : none TAG : none
BR Alex
|
|
|
Post by alexfish on Jun 16, 2019 19:02:07 GMT 1
Hi Vovchik RE:: lib tidy /usr/lib /usr/lib/libtidy-0.99.so.0 symlink /usr/lib/libtidy-0.99.so.0.0.0 real /usr/lib/libtidy.so symlink
have put copy in archive check permissions BR Alex
|
|
|
Post by alexfish on Jun 16, 2019 21:07:02 GMT 1
hi vovchik now have a all in one lib as it stands of now compile in g++ g++ -ltidy -lcurl tidy5.cpp -o tidy5 requires zenity for file dialog this i just a straight dump of the bits from here the lib bits = further work is require IE need to get the dump bits into the html class; in use start with file dialog tidy5 select search engines tidy5 google 'bacon linux'
tidy5 youtube 'pink floyd'
or enter url / file or http BR Alex Attachments:tidy5.cpp.bz2 (6.5 KB)
|
|
|
Post by alexfish on Jun 18, 2019 21:27:44 GMT 1
Hi Vovchik last exec was a variant of 'Print formating for testing' :: based on last myparser lib sources IE the sources do not exist as such ' anymore ' only thing here is 'do the original sources work + last demo sources BR Alex
|
|
|
Post by vovchik on Jun 19, 2019 7:25:53 GMT 1
Dear Alex, Thanks.... I will do some fiddling. With kind regards, vovchik
|
|
|
Post by alexfish on Jun 19, 2019 18:05:07 GMT 1
Hi vovchik
hope you find a solution since I do not know much about xubuntu other than it has two modes
64 & 32 , so little endian & big endian. blagh blagh . come into play .
and some libs may not be in correct lib :: Raspbian rpi has opt/ , usr/include + /usr/include/arm-linux-gnueabihf
anyway
now have a means of concat the bits shown by lib tidy
did not know this but lib tidy can return the line number from where the Bits come from
uint lin = tidyNodeLine( child );
now from the broken output , some bits
1602: If you think this is impossible or strange or 1602: lame 1604: , refer to the common 1604: Kornshell website 1606: where Kornshell is described as a command and 1606: programming language
BR Alex
|
|
|
Post by alexfish on Jun 19, 2019 20:48:16 GMT 1
Hi Vovchik & All have manage to get the all tidy version working as regards con-cat-ing the bits. Now at a stage ready for final formatting IE the dumped text needs to go into the html Class It works well here, yet not sure about some members on Rpi xubuntu BR Alex compile as g++ -ltidy -lcurl tidy5.1.cpp -o tidy5_1 Attachments:tidy5.1.cpp.bz2 (6.83 KB)
|
|
|
Post by alexfish on Jun 22, 2019 17:17:52 GMT 1
hi all Will be posting new html2text based on latest html tidy soon for latest lib follow Joe's method HEREI am tidying a few loose ends up as regards some of the methods the main one been getting images into the terminal although have a choice xterm , but mlterm provides best graphics also for images :: require img2sixel Why ml term this is the base lib in action with images BR Alex Attachments:
|
|
|
Post by alexfish on Jun 22, 2019 17:44:17 GMT 1
Hi All MLTERM typical setup for color and to retail scrollback function mlterm --deffont FiraMono-Bold -s false -b black -f white -l unlimited -g 120x40 mlterm will not cut through the image data like xterm example screen shot BR Alex Attachments:
|
|
|
Post by alexfish on Jun 22, 2019 20:46:00 GMT 1
Could not resist trying BaCon Web Site Attachments:
|
|
|
Post by alexfish on Jun 23, 2019 1:59:29 GMT 1
Hi All had a problem with sixel :: sixel will not retrieve from http without a www. and notable is this forum images have now added a work around for this www. lack now cleaning up the code base and post lib in a few hours; Picky of this forum Attachments:
|
|
|
Post by alexfish on Jun 23, 2019 14:36:55 GMT 1
Hi All have managed to get a all in one demo for testing together from here will separate the bits to a bacon compatible lib as regards img2sixel :: need a one that is compiled with curl I downloaded the latest from github and a straight compile will be aware of lib-curl since lib-curl is a dependency then all should be ok Also need latest tidy lib as mentioned above if not wanting images set the IMAGE= in main to IMAGES=0; xterm mode IMAGES=1; mlterm mode IMAGES=2; have removed the zentity file dialog since the results are also saved in a file cache.six the cache contains the Base Url href = in use for cache cat cache.six
general usage tidy53 google 'what to search for' tidy53 youtube 'what to search for'
tidy53 http://www.basic-converter.org
compile as g++ -ltidy -lcurl tidy5_3.cpp -o tidy53
or what ever you want to call it terminal commands to set correct mode xterm -ti vt340 -aw -T html2text -fa FiraMono-Bold:size=11 -geometry 132x40 will possible have to set xterm scroll back function as well mlterm --deffont FiraMono-Bold -b black -f white -l unlimited -g 120x40 BR Alex Attachments:tidy5_3.cpp.bz2 (5.91 KB)
|
|