|
HTTPIE
Sept 6, 2020 19:46:57 GMT 1
Post by alexfish on Sept 6, 2020 19:46:57 GMT 1
Hi All Whilst on with GET http in short curl & wget where failing at the likes of ubuntuforms.org & of recent google & youtube api's in past have been using python httpie during which time looked for a 'c' solution I stumbled on some bits from github using libsoup the resulting bits are a hack of , did not have to hack it that much. I tried to convert it using BaCon so have put the file in an include *.c and can be used in this form 'Prog HTTPIE PRAGMA LDFLAGS `pkg-config --libs libsoup-2.4` PRAGMA OPTIONS `pkg-config --cflags libsoup-2.4`
USEH #include "soupinc.c" END USEH
PROTO HTTPIE LOCAL ck TYPE int
ck = HTTPIE (argc,argv) IF (ck) THEN PRINT "Fail" ELSE PRINT "Success" END IF and from the terminal using the -o option save file ./HTTPIE http://www.basic-converter.org -o bacon.html view the results cat bacon.html + printed results are in the terminal Like /: 200 OK BR Alex Attachments:soupinc.c.bz2 (2.37 KB)
|
|
|
HTTPIE
Sept 6, 2020 20:02:47 GMT 1
Post by vovchik on Sept 6, 2020 20:02:47 GMT 1
Dear Alex,
The output is formatted beatifully and is very clean. Thanks. It can do what wget/curl do in many instances. I wonder how hard it will be to Baconize the c bit...
With kind regards, vovchik
|
|
|
HTTPIE
Sept 6, 2020 22:21:02 GMT 1
Post by alexfish on Sept 6, 2020 22:21:02 GMT 1
Hi Vovchik
Thanks for testing
agree with the output style if standard mostly html tags and hope it presents in such a way to hone your 'html2text'
& all
can also handle internally like so
'Prog HTTPIE
PRAGMA LDFLAGS `pkg-config --libs libsoup-2.4` PRAGMA OPTIONS `pkg-config --cflags libsoup-2.4`
USEH #include "soupinc.c" END USEH
PROTO HTTPIE LOCAL ck TYPE int
/* Application Options: -c, --ca-file=FILE Use FILE as the TLS CA file --cert=FILE Use FILE as the TLS client certificate file --key=FILE Use FILE as the TLS client key file -d, --debug Show HTTP headers -h, --head Do HEAD rather than GET -n, --ntlm Use NTLM authentication -o, --output=FILE Write the received data to FILE instead of stdout -p, --proxy=URL Use URL as an HTTP proxy -q, --quiet Don't show HTTP status code -s, --sync Use SoupSessionSync rather than SoupSessionAsync */
LOCAL isarg$ [5]= {"HTTPIE","http://www.basic-converter.org","-q","-o","bacon.html"} TYPE STRING
ck = HTTPIE (5,isarg$) IF (ck) THEN PRINT "Fail" ELSE content$ = LOAD$("bacon.html") PRINT "Content of 'bacon.html': \n", content$ END IF
BR Alex
|
|
|
HTTPIE
Sept 6, 2020 22:33:47 GMT 1
Post by alexfish on Sept 6, 2020 22:33:47 GMT 1
Dear Alex, The output is formatted beatifully and is very clean. Thanks. It can do what wget/curl do in many instances. I wonder how hard it will be to Baconize the c bit... With kind regards, vovchik Well :: I be and was in the process of testing a straight through method same as like curl : but was from the libsoup docs and the was a 'FAIL' on the say the likes ubuntuforums.org , try wget or curl on that one , or try download from openstreet hence the be. still looking and the hence of the above post , to handle the bits internally , at present best option , or does it need to be 'Baconize' note the the option -o write to file IE here say what if file = image BR Alex
|
|
|
HTTPIE
Sept 6, 2020 22:55:50 GMT 1
Post by alexfish on Sept 6, 2020 22:55:50 GMT 1
Hi All & @ Vovchik
I forgot to mention , libsoup will look for a gzip format & if available will download the zip by using
SOUP_TYPE_CONTENT_DECODER
saving your data allowance
can test that one with the likes of youtube , have a look at the download file and look for libsoup
BR Alex
|
|
|
HTTPIE
Sept 12, 2020 11:42:37 GMT 1
Post by alexfish on Sept 12, 2020 11:42:37 GMT 1
Hi ALL
here is a straight through BaCon'ized version of GET using libsoup the function HTTPIE prints the responce and saves the file / sort to what one requires
PRAGMA LDFLAGS `pkg-config --libs libsoup-2.4` PRAGMA OPTIONS `pkg-config --cflags libsoup-2.4`
PRAGMA INCLUDE <libsoup/soup.h>
DECLARE session TYPE static SoupSession * DECLARE loop TYPE static GMainLoop * DECLARE debug, head, quiet TYPE static gboolean DECLARE output_file_path = NULL TYPE static const gchar *
DECLARE opts TYPE GOptionContext * DECLARE url TYPE const char * DECLARE *proxy_uri, *parsed TYPE SoupURI DECLARE error = NULL TYPE GError * DECLARE logger = NULL TYPE SoupLogger * DECLARE help TYPE char *
PROTO gprint,soup_session_send_message,soup_message_get_uri,soup_message_get_https_status PROTO g_free,soup_uri_free,g_object_unref
SUB HTTPIE(STRING url$ , STRING saveto$ ) LOCAL parsed$ TYPE STRING LOCAL name TYPE STRING LOCAL msg TYPE SoupMessage * LOCAL header TYPE STRING LOCAL GTlsCertificateFlags flags LOCAL responce$ TYPE STRING msg = soup_message_new ( "GET", url$) soup_session_send_message (session, msg) name = soup_message_get_uri (msg)->path
IF (msg->status_code == SOUP_STATUS_SSL_FAILED) THEN
LOCAL flags TYPE GTlsCertificateFlags
IF (soup_message_get_https_status (msg, NULL, &flags)) THEN PRINT name, msg->status_code, msg->reason_phrase, flags FORMAT "%s: %d %s (0x%x)\n" ELSE PRINT name, msg->status_code, msg->reason_phrase FORMAT "%s: %d %s (no handshake status)\n" END IF
END IF
IF ( SOUP_STATUS_IS_TRANSPORT_ERROR (msg->status_code)) THEN PRINT name, msg->status_code, msg->reason_phrase FORMAT "%s: %d %s\n" END IF
IF (SOUP_STATUS_IS_REDIRECTION (msg->status_code)) THEN header = soup_message_headers_get_one (msg->response_headers,"Location")
IF header THEN LOCAL uri TYPE SoupURI * LOCAL uri_string TYPE STRING uri = soup_uri_new_with_base (soup_message_get_uri (msg), header) uri_string = soup_uri_to_string (uri, FALSE) HTTPIE (uri_string,saveto$) g_free (uri_string) soup_uri_free (uri) ELSE PRINT "redirect fail" END IF
END IF
IF (SOUP_STATUS_IS_SUCCESSFUL (msg->status_code)) THEN responce$ = msg->response_body->data PRINT responce$ SAVE responce$ TO saveto$ END IF
END SUB
session = g_object_new (SOUP_TYPE_SESSION, \ SOUP_SESSION_ADD_FEATURE_BY_TYPE, SOUP_TYPE_CONTENT_DECODER, \ SOUP_SESSION_ADD_FEATURE_BY_TYPE, SOUP_TYPE_COOKIE_JAR, \ SOUP_SESSION_USER_AGENT, "get ", \ SOUP_SESSION_ACCEPT_LANGUAGE_AUTO, TRUE, \ NULL)
'HTTPIE ("http://www.basic-converter.org","index.html") HTTPIE ("https://www.youtube.com/results?search_query=pink+floyd","index.html") g_object_unref (session)
BR Alex
|
|
|
HTTPIE
Sept 12, 2020 11:50:48 GMT 1
Post by alexfish on Sept 12, 2020 11:50:48 GMT 1
Typo in code
POST above now Updated
BR Alex
|
|
|
HTTPIE
Sept 12, 2020 23:34:45 GMT 1
Post by vovchik on Sept 12, 2020 23:34:45 GMT 1
Dear Alex, Thanks. Working for me. With kind regards, vovchik
|
|
|
HTTPIE
Sept 13, 2020 20:30:38 GMT 1
Post by alexfish on Sept 13, 2020 20:30:38 GMT 1
Hi Vovchik
Thanks for testing
Along with this I was looking for a html parser although gumbo is classed as a c/c++ parser it was more eccentric in the c++ area
then I stumble on lexbor from git hub
the serialised tree output is ideal for parsing in bacon
did have some problems getting a result and nearest BaCon'ized code now looks like
PRAGMA LDFLAGS -llexbor
PRAGMA INCLUDE <lexbor/core/fs.h> PRAGMA INCLUDE <lexbor/html/html.h>
PROTO lxb_html_parser_destroy,lxb_html_document_destroy,printf
' strore code in DECLARE MYDATA$ TYPE STRING
FUNCTION SERIALSE(const lxb_char_t *data, size_t len, void *ctx) TYPE lxb_inline lxb_status_t 'transfere to data to MYDATA$ PRINT (int) len, (const char *) data FORMAT "%.*s" TO MYDATA$ ' print results PRINT MYDATA$
RETURN LXB_STATUS_OK END FUNCTION
LOCAL *html_one TYPE lxb_char_t LOCAL html_one_len TYPE size_t LOCAL status TYPE lxb_status_t LOCAL doc_one TYPE lxb_html_document_t*
html_one = lexbor_fs_file_easy_read("/home/pi/youtube.html", &html_one_len)
USEC lxb_html_parser_t *parser; parser = lxb_html_parser_create(); END USEC
status = lxb_html_parser_init(parser) IF (status != LXB_STATUS_OK) THEN PRINT "Failed to create HTML parser" END IF
doc_one = lxb_html_parse(parser, html_one, html_one_len) IF (doc_one == NULL) THEN PRINT "Failed to create Document object" END IF lxb_html_parser_destroy(parser)
status = lxb_html_serialize_pretty_tree_cb(lxb_dom_interface_node(doc_one), \ LXB_HTML_SERIALIZE_OPT_UNDEF, \ 0, SERIALSE, NULL)
lxb_html_document_destroy(doc_one)
BR Alex
|
|
|
HTTPIE
Sept 13, 2020 22:11:46 GMT 1
Post by vovchik on Sept 13, 2020 22:11:46 GMT 1
Dear Alex, Thanks. It all works nicely. The only down side is that lexbor is pretty fat animal (18 MB source). And, the cmake/make have to be adjusted to get the lib into /usr/lib. Too bad there is not a vegan version of that animal, otherwise it is very useful for parsing. With kind regards, vovchik
|
|
|
HTTPIE
Sept 13, 2020 22:13:05 GMT 1
Post by alexfish on Sept 13, 2020 22:13:05 GMT 1
Hi All
for those wanting to install lexbor
Must have git ,cmake & make
open terminal
cd Downloads git clone https://github.com/lexbor/lexbor.git cd lexbor cmake . -DLEXBOR_BUILD_TESTS=ON -DLEXBOR_BUILD_EXAMPLES=ON -DLEXBOR_BUILD_SEPARATELY=ON make make test sudo make install sudo ldconfig
BR Alex
|
|
|
HTTPIE
Sept 13, 2020 22:35:02 GMT 1
Post by alexfish on Sept 13, 2020 22:35:02 GMT 1
Dear Alex, Thanks. It all works nicely. The only down side is that lexbor is pretty fat animal (18 MB source). And, the cmake/make have to be adjusted to get the lib into /usr/lib. Too bad there is not a vegan version of that animal, otherwise it is very useful for parsing. With kind regards, vovchik Novice wise, only thing to do is leave as is, then ldconfig after the make install , yes the bits are in local but it should work Fat wise I agree, if can find a one that has been on a Diet , then who knows till then , it is what it is , and bit lest fat and less farting around with node versions BR Alex
|
|
|
HTTPIE
Sept 14, 2020 22:00:55 GMT 1
Post by alexfish on Sept 14, 2020 22:00:55 GMT 1
Hi All
the first lexbor was a bit bugged
this works
' needed for parsing final results OPTION QUOTED FALSE PRAGMA LDFLAGS -llexbor
PRAGMA INCLUDE <lexbor/core/fs.h> PRAGMA INCLUDE <lexbor/html/html.h>
PROTO lxb_html_parser_destroy,lxb_html_document_destroy DECLARE MYDATA_RESULTS$ TYPE STRING
FUNCTION SERIALSE(const lxb_char_t *data, size_t len, void *ctx) TYPE lxb_inline lxb_status_t LOCAL dimension,ck TYPE int LOCAL MYDATA$ TYPE STRING DECLARE c$,d$,e$ TYPE STRING DECLARE qt$ = CHR$(34) DECLARE lt$ , rt$ TYPE STRING 'transfere to data to MYDATA$ PRINT (int) len, (const char *) data FORMAT "%.*s" TO MYDATA$ ' print results MYDATA_RESULTS$ = MYDATA_RESULTS$ & MYDATA$ RETURN LXB_STATUS_OK END FUNCTION
LOCAL html_one TYPE lxb_char_t * LOCAL html_one_len TYPE size_t LOCAL status TYPE lxb_status_t LOCAL doc_one TYPE lxb_html_document_t* ' change path and or file to suit html_one = lexbor_fs_file_easy_read("bacon.html", &html_one_len)
USEC lxb_html_parser_t *parser; parser = lxb_html_parser_create(); END USEC
status = lxb_html_parser_init(parser) IF (status != LXB_STATUS_OK) THEN PRINT "Failed to create HTML parser" END IF
doc_one = lxb_html_parse(parser, html_one, html_one_len) IF (doc_one == NULL) THEN PRINT "Failed to create Document object" END IF lxb_html_parser_destroy(parser)
status = lxb_html_serialize_pretty_tree_cb(lxb_dom_interface_node(doc_one), \ LXB_HTML_SERIALIZE_OPT_UNDEF, \ 0, SERIALSE, NULL) lxb_html_document_destroy(doc_one)
PRINT MYDATA_RESULTS$
BR Alex
|
|
|
HTTPIE
Sept 20, 2020 23:31:10 GMT 1
Post by alexfish on Sept 20, 2020 23:31:10 GMT 1
Hi All Updated lexbor parser a simple html parser using liblexbor & encoders archives 1. lexbor code . the lexer + internal html entities by ascii encoder 2. include ent.c 'entities by name encoder' does not include new encoders for javascript this will appear in a new thread usage "app 'filename.html'" BR Alex Attachments:lexbor_demo.bac.bz2 (1.44 KB)
ents.c.bz2 (1.39 KB)
|
|
|
HTTPIE
Sept 27, 2020 20:03:32 GMT 1
Post by alexfish on Sept 27, 2020 20:03:32 GMT 1
Hi Vovchik & All Well , after find what I be doing wrong in BaCon, now find that can replicate what exbor does this is first stage of the parser so if one wants to carry forward there own parsed results Look at where the quotes are + if have loads of javascript then results one wants should be visible line by line IE text= and lable = & url = BR Alex The Code OPTION QUOTED FALSE
USEH #include "ents.c" END USEH
DECLARE MY_ENTS$[256][2] TYPE STRING DECLARE my_ent_start$ = "" TYPE STRING DECLARE my_ent_end$ =";" TYPE STRING
LOCAL dimention TYPE int LOCAL ck$ TYPE STRING LOCAL ap$ = argv[0]
LOCAL HTML$
FOR t = 0 TO 255 MY_ENTS$[t][0]= my_ent_start$ & CHR$(t) & my_ent_end$ MY_ENTS$[t][1]= CHR$(t) NEXT
IF argc = 2 THEN file$ = argv[1] PRINT file$ , "\n\n" ELSE PRINT "usage" PRINT ap$ , " <file.html>" END END IF '" "
IF ( FILEEXISTS( file$)) THEN
HTML$= LOAD$(file$) IF INSTR(HTML$,"<") THEN HTML$ = REPLACE$(HTML$,">", ">" NL$ & CHR$(34) ) END IF
IF INSTR(HTML$,">") THEN HTML$ = REPLACE$(HTML$,"<", CHR$(34) & NL$ & "<") END IF
IF INSTR(HTML$,"{") THEN HTML$ = REPLACE$(HTML$,"{","{" & NL$) END IF
SPLIT HTML$ BY NL$ TO my_array$ SIZE dimention HTML$ = "" FOR t = 0 TO dimention -1 ck$ = my_array$[t] ck$= CHOP$(ck$) HTML$ = HTML$ & ck$ & NL$ NEXT
SPLIT HTML$ BY NL$ TO my_array2$ SIZE dimention HTML$="" FOR t = 0 TO dimention -1 ck$ = my_array2$[t] ck$= CHOP$(ck$) HTML$ = HTML$ & ck$ & NL$ NEXT HTML$ = REPLACE$(HTML$,CHR$(34) & NL$ & CHR$(34) & NL$,"") HTML$ = REPLACE$(HTML$,CHR$(34) & CHR$(34),"") FOR t = 0 TO 252 p$ = NAMED_ENTITIES[t][0] pr$ = NAMED_ENTITIES[t][1] IF INSTR(HTML$,p$) THEN HTML$= REPLACE$(HTML$,p$,pr$) END IF NEXT
FOR t = 0 TO 255 p$ = MY_ENTS$[t][0] pr$ = MY_ENTS$[t][1] IF INSTR(HTML$,p$) THEN HTML$= REPLACE$(HTML$,p$,pr$) END IF NEXT PRINT HTML$ ' can imp rest of parser here ELSE
PRINT "file not exist " END IF
need the ents.c in same folder as the code Attachments:ents.c.bz2 (1.39 KB)
|
|