|
Post by alexfish on Aug 23, 2019 17:41:10 GMT 1
Hi All After finding the base64 bits in the above now about a couple of days behind however have completed the main get widths etc for likes of table iframe/table IE an attempt to use one function to pas the bits and get the results as to where they will go results of adjusting data bits to fit width of terminal in this case Iframe table Testing ------------------------------ ** Get col widths ------------------------------ colums coun = 5
Term Info ----------------------------------------------
Win width = 0 colums = 80 cell width = 0
Win height = 0 rows = 24 cell height = 0 col 0 : 10 col 1 : 23 col 2 : 10 col 3 : 10 col 4 : 27 IFRAME WIDTH = 80
Testing ------------------------------ ** Get col widths ------------------------------ colums coun = 5
Term Info ----------------------------------------------
Win width = 0 colums = 101 cell width = 0
Win height = 0 rows = 39 cell height = 0 col 0 : 10 col 1 : 44 col 2 : 10 col 3 : 10 col 4 : 27 IFRAME WIDTH = 101 now applying the bits to the lib BR Alex
|
|
|
Post by alexfish on Aug 24, 2019 21:09:57 GMT 1
Hi All
got most of the table bits in and in progress of applying the complex vector array
but here before can do this need to get header an all table bits (sizes etc)
so here at point of the Iframe/table and printing some bits
the table is within scope of the terminal width :: Hooray
other bits need taming 'of course'
important bits for the table are below HAVE COLUMN =1 that indicts the column that can be spand
so can compare like for like in a normal browser:
BR Alex
putting some of the bits together
---------------------------------------- 0 : 0 : <DIV> 0 : <DIV><H2> [1]The BAsic CONverter Forum [2]Skip Navigation [3] 0 : <DIV> 0 : 0 : [4]Home 0 : [5]Help 0 : [6]Search 0 : [7]Goto the BaCon website <P>Welcome Guest. Please [8]Login or [9]Register. 0 : <DIV> 0 : <DIV> 0 : 0 : 0 : <DIV> [10]<SPAN>The BAsic CONverter Forum 0 : 0 : <DIV> [11]<SPAN>Home 0 : <DIV> 0 : 0 : [12]<SPAN>General 0 : <DIV> 0 : 0 : [13]<SPAN>News 0 : <DIV> 0 : [14]<SPAN>Documentation 0 : <DIV> 0 : [15]<SPAN>Code Projects 0 : <DIV> 0 : [16]<SPAN>Troubleshooting area 0 : <DIV> 0 : [17]<SPAN>Bugs, features 0 : <DIV> 0 : <DIV><IFRAME> 0 : <DIV> 0 : <DIV> 0 : <DIV> 0 : <DIV><H2>General HAVE COLUMN =1
5 Pass Get Cols Len
Max : 137 Header Board Threads Posts Last Post 0: 10 1: 42 2: 10 3: 10 4: 22
++++++++++++++++++++++++++++++++++++ 0 : <DIV> 0 : <DIV><H2>Legend
2 Pass Get Cols Len
Max : 37 0: 17 1: 20
++++++++++++++++++++++++++++++++++++ 0 : <DIV> 0 : <DIV><H2>Forum Information & Statistics
1 Pass Get Cols Len
Max : 0 0: 0
++++++++++++++++++++++++++++++++++++ 0 : <DIV> [60]Click here to remove banner ads from this forum. 0 : <DIV>This Forum Is Hosted For FREE By [61]ProBoards <BR>Get Your Own [62]Free Forum ! 0 : <DIV> [63]Terms of Service | [64]Privacy | [65]Cookies | [66]FTC Disclosure | [67]Report Abuse | [68]Report Ad | [69]Consent 0 :
|
|
|
Post by alexfish on Sept 8, 2019 21:01:37 GMT 1
Hi All Been debugging the table format(layout) Oh yes there are a few bugs or were ( now sorting last bits ) essentially if one can get all info from different section and pages from this forum to display correctly then think one has got a cracker of a table/iframe layout below out-put is from code projects page & in particular where most of the bugs happen all info is there and more than what PI Chromium or any flavour of Web-Kit will show and so 'some of the bits and showing the errors IE if term=122 0: 12 1: 32 2: 44 = 3 columns yet data section shows 6 = segfault under the table scheme : hence Hopeful the last bug + also shows parts of <TD> that need to be linked under the Scheme [42]New [43] [44] [45] [46]html2text (c++) [46]html2text (c++)
= this is some of what most browsers miss ?? and also another reason for a segfault >> that bit is fixed since it now shows some of the bits term=150 0: 12 1: 32 2: 44 [30]
[31]New [32] [33] [34] [35]Parsing JSON [35]Parsing JSON
Pages: » [36]1 » [37]2 <TD> [38]Pjot <TD> [39]21 <TD>509 <TD>by [40]Pjot Sept 4, 2019 21:00:37 GMT 1 term=150 0: 12 1: 35 2: 47 [41]
[42]New [43] [44] [45] [46]html2text (c++) [46]html2text (c++)
Pages: » [47]1 » [48]2 » [49]3 » [50]4 » [51]5 » [52]6 » [53]7 » [54]8 » [55]9 <TD> [56]alexfish <TD> [57]121 <TD>1,859 <TD>by [58]alexfish Aug 24, 2019 21:09:57 GMT 1 Have doubts about IT: download links2 an use -g to show this page links2 -g http://basic-converter.proboards.com/board/2/code-projects Have fun + BR Alex Added Picky Attachments:
|
|
|
Post by alexfish on Sept 14, 2019 20:17:26 GMT 1
hi all With reference to solving some of the above found that only way when using lib-tidy was to walk backwards through the node(parent) tree since one cannot get a reference to the end node as in </foo> so have added a method of walking backwards I say method of but as yet need to get the bits into place the function and some code + results with properties; Macro Function Instr (Basic term for string.find) c++; #define Instr(arg1,arg2) arg1.find(arg2)+1
function TidyNode Get_Previous_Parent( TidyNode tnod) { TidyNode parent = tidyGetParent( tnod ); return parent;
}
using the child node to walk back int its=1; // ctmbstr noddy=tidyNodeGetName(child);
TidyNode nod = Get_Previous_Parent(child); while( nod = Get_Previous_Parent(nod)){
if(nod){ ctmbstr noddy2=tidyNodeGetName(nod); // cout << nod << " : " ; if(noddy2 ) { string ck = (const char*) noddy2;
if(Instr(ck,"html") ||Instr(ck,"body")) {
}else{
cout << "<" << noddy2 << ">" ;
}
}
}else{ its=0;
} } then try against one of the difficult pages of this forum //storage.proboards.com/forum/images/info/stats.png title= Board Statistics Board Statistics class= info
colspan= 2
<tr><table><td><tr><table><td><tr><table><div><div><div><div> <text>Threads and Posts
<tr><table><td><tr><table><td><tr><table><div><div><div><div> <text>Total Threads:
<tr><table><td><tr><table><td><tr><table><div><div><div><div> <text>207
<tr><table><td><tr><table><td><tr><table><div><div><div><div> <text>Total Posts:
<tr><table><td><tr><table><td><tr><table><div><div><div><div> <text>4,388
colspan= 2
<tr><table><td><tr><table><td><tr><table><div><div><div><div> <text>This board has
href= # id= moderators-link moderators-link
<td><tr><table><td><tr><table><td><tr><table><div><div><div><div> <text>1 moderator
class= icon
alt= Members src= //storage.proboards.com/forum/images/info/members.png //storage.proboards.com/forum/images/info/members.pngtitle= Members Members class= info last
colspan= 1
<tr><table><td><tr><table><td><tr><table><div><div><div><div> <text>On This Board
<tr><table><td><tr><table><td><tr><table><div><div><div><div> <text>You
<td><tr><table><td><tr><table><td><tr><table><div><div><div><div>
as can see there are emeded tables and emeded divs + have col span = foo; What a pile of spaghetti but now have the bits to sort it :: How BR Alex
|
|
|
Post by alexfish on Sept 15, 2019 20:27:24 GMT 1
Hi All
in with the above I have been working on a compositor based on cairo using mlterm
need to find a way of writing the text + add images to one surface + resize all to Terminal constraints (sizes)
then penetrate the buffer with links[number];
screen shot of the compositor progress ; note the [19] which is highlight
and here is the bit from the clipboard > 19
BR Alex
|
|
|
Post by alexfish on Oct 6, 2019 23:57:54 GMT 1
Hi All Think have managed to get the Tables Resolved although their may be undiscovered bugs this first version = Raspberry Pi compatible since testing youtube + omxplayer Next Post will have a version - these options usage:: enter file for local file.html : inmost cases after a 'get url' the file is saved as index.html' enter get for download url :: IE http://www.foo | http://basic-converter.proboards.com
enter search engine IE google or youtube then enter search string join by '+' IE 'nine+million+bicycles' to avoid playing youtube videos press q to pass : hence should work on none Raspberry pi devices' pressing q returns to previous until exit:: to select a url etc at the prompt double click on a numbered links IE [98]; then single click and press return to compile g++ -std=c++11 -s -ltidy -lcurl tidy5_9_0.cpp -o tidy-parser the terminal geometry should be set to 180x90 and font size 10 IE mlterm --deffont mono -w 10 -b black -f white -l unlimited -g 180x90
there is a warning if term size is too small or too big width wise; BR Alex the archive Attachments:tidy5_9_0.cpp.bz2 (12.83 KB)
|
|
|
Post by alexfish on Oct 20, 2019 17:39:05 GMT 1
Hi All have now reached end goal - 1 as mentioned some of these forum pages are a Pita to parse / table / td type layouts with embedded tables + embedded table with embedded DIVS the -1 = the footer needs a bit of cleanup/formatting yet now have a text out similar to most modern gui browsers as a comparison have two archives one using the latest lib - the step -1 and and the index.html can test the index.html with lynx or links or any html2text apps and see the diff now in process of cleaning up the code base final release of lib = next week version 3.9.2 BR Alex Attachments:output.txt.bz2 (2.37 KB)
index.html.bz2 (12.04 KB)
|
|
|
Post by alexfish on Oct 28, 2019 20:20:29 GMT 1
Hi All
The new Lib based on lib tidy5 has been released
Version = tidy_parser_5_9_3
See POST #1 for code & archive of the lib
BR Alex
|
|
|
Post by alexfish on Oct 28, 2019 20:39:22 GMT 1
& Peter
I am getting this in the output file and is coursing problems at the start of the parser using bacon NETWORK;
HTTP/1.1 200 OK Server: nginx Date: Mon, 28 Oct 2019 19:05:30 GMT Content-Type: text/html Content-Length: 44768 Connection: keep-alive Last-Modified: Sun, 08 Sep 2019 20:21:15 GMT ETag: "aee0-5921069caf979" Accept-Ranges: bytes
can it be fixed
BR Alex
|
|
|
Post by alexfish on Oct 28, 2019 21:53:33 GMT 1
OK
Sorted udated code First Post
BR Alex
|
|
|
Post by Pjot on Oct 29, 2019 19:04:13 GMT 1
Glad to see you worked it out Alex!
If you run into an issue again, can you please also provide some details (like which URL you are using, which compiler, etc)? I am happy to check it out.
Cordialement, Peter
|
|