HTTPIE

alexfish
God

A world without windows

Posts: 3,060

HTTPIE Sept 6, 2020 19:46:57 GMT 1

Quote

Post by alexfish on Sept 6, 2020 19:46:57 GMT 1

Hi All

Whilst on with GET http

in short curl & wget where failing at the likes of
ubuntuforms.org & of recent google & youtube api's

in past have been using python httpie

during which time looked for a 'c' solution

I stumbled on some bits from github using libsoup

the resulting bits are a hack of , did not have to hack it that much.

I tried to convert it using BaCon

so have put the file in an include *.c

and can be used in this form

'Prog HTTPIE
PRAGMA LDFLAGS `pkg-config   --libs libsoup-2.4`
PRAGMA OPTIONS `pkg-config --cflags libsoup-2.4`

USEH
#include "soupinc.c"
END USEH

 PROTO HTTPIE
LOCAL ck TYPE int

 ck = HTTPIE (argc,argv)
 IF (ck) THEN
 PRINT "Fail"
 ELSE
 PRINT "Success"
 END IF

and from the terminal using the -o option save file

./HTTPIE http://www.basic-converter.org -o bacon.html
view the results
cat bacon.html

+ printed results are in the terminal Like
/: 200 OK
BR
Alex

Attachments:

soupinc.c.bz2 (2.37 KB)

Two tins cans are better than an Iphone

vovchik God Posts: 2,792	HTTPIE Sept 6, 2020 20:02:47 GMT 1 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by vovchik on Sept 6, 2020 20:02:47 GMT 1 Dear Alex, The output is formatted beatifully and is very clean. Thanks. It can do what wget/curl do in many instances. I wonder how hard it will be to Baconize the c bit... With kind regards, vovchik
	Last Edit: Sept 6, 2020 21:16:49 GMT 1 by vovchik

alexfish
God

A world without windows

Posts: 3,060

HTTPIE Sept 6, 2020 22:21:02 GMT 1

Quote

Post by alexfish on Sept 6, 2020 22:21:02 GMT 1

Hi Vovchik

Thanks for testing

agree with the output style if standard mostly html tags
and hope it presents in such a way to hone your 'html2text'

& all

can also handle internally like so

'Prog HTTPIE


PRAGMA LDFLAGS `pkg-config   --libs libsoup-2.4`
PRAGMA OPTIONS `pkg-config --cflags libsoup-2.4`

USEH
	#include "soupinc.c"
END USEH

PROTO HTTPIE
LOCAL ck TYPE int

/*
Application Options:
  -c, --ca-file=FILE     Use FILE as the TLS CA file
  --cert=FILE            Use FILE as the TLS client certificate file
  --key=FILE             Use FILE as the TLS client key file
  -d, --debug            Show HTTP headers
  -h, --head             Do HEAD rather than GET
  -n, --ntlm             Use NTLM authentication
  -o, --output=FILE      Write the received data to FILE instead of stdout
  -p, --proxy=URL        Use URL as an HTTP proxy
  -q, --quiet            Don't show HTTP status code
  -s, --sync             Use SoupSessionSync rather than SoupSessionAsync
*/

LOCAL isarg$ [5]= {"HTTPIE","http://www.basic-converter.org","-q","-o","bacon.html"} TYPE STRING

ck = HTTPIE (5,isarg$)
IF (ck) THEN
	PRINT "Fail"
ELSE
	content$ = LOAD$("bacon.html")
PRINT "Content of 'bacon.html': \n", content$
END IF

BR
Alex

Last Edit: Sept 6, 2020 23:47:45 GMT 1 by alexfish: Added Options for app

Two tins cans are better than an Iphone

alexfish
God

A world without windows

Posts: 3,060

HTTPIE Sept 6, 2020 22:33:47 GMT 1

Quote

Post by alexfish on Sept 6, 2020 22:33:47 GMT 1

Sept 6, 2020 20:02:47 GMT 1 vovchik said:

Dear Alex,

The output is formatted beatifully and is very clean. Thanks. It can do what wget/curl do in many instances. I wonder how hard it will be to Baconize the c bit...

With kind regards,
vovchik

Well :: I be and was in the process of testing a straight through method same as like curl : but was from the libsoup docs

and the was a 'FAIL' on the say the likes ubuntuforums.org , try wget or curl on that one , or try download from openstreet
hence the be. still looking

and the hence of the above post , to handle the bits internally , at present best option , or does it need to be 'Baconize'

note the the option -o write to file IE here say what if file = image

BR
Alex

Two tins cans are better than an Iphone

alexfish
God

A world without windows

Posts: 3,060

HTTPIE Sept 6, 2020 22:55:50 GMT 1

Quote

Post by alexfish on Sept 6, 2020 22:55:50 GMT 1

Hi All & @ Vovchik

I forgot to mention , libsoup will look for a gzip format & if available will download the zip
by using
SOUP_TYPE_CONTENT_DECODER

saving your data allowance

can test that one with the likes of youtube , have a look at the download file and look for libsoup

BR
Alex

Two tins cans are better than an Iphone

alexfish
God

A world without windows

Posts: 3,060

HTTPIE Sept 12, 2020 11:42:37 GMT 1

Quote

Post by alexfish on Sept 12, 2020 11:42:37 GMT 1

Hi ALL

here is a straight through BaCon'ized version of GET using libsoup
the function HTTPIE prints the responce and saves the file / sort to what one requires


PRAGMA LDFLAGS `pkg-config   --libs libsoup-2.4`
PRAGMA OPTIONS `pkg-config --cflags libsoup-2.4`

PRAGMA INCLUDE  <libsoup/soup.h>


DECLARE session TYPE static SoupSession *
DECLARE loop TYPE static GMainLoop *
DECLARE debug, head, quiet TYPE static gboolean
DECLARE output_file_path = NULL TYPE static const gchar *


DECLARE opts TYPE	GOptionContext *
DECLARE url TYPE	const char *
DECLARE	 *proxy_uri, *parsed TYPE SoupURI
DECLARE	error  = NULL TYPE GError *
DECLARE	logger = NULL TYPE SoupLogger *
DECLARE	help TYPE char *

PROTO gprint,soup_session_send_message,soup_message_get_uri,soup_message_get_https_status
PROTO g_free,soup_uri_free,g_object_unref


SUB HTTPIE(STRING url$ , STRING saveto$ )
	LOCAL parsed$ TYPE STRING
	LOCAL name TYPE STRING
	LOCAL msg TYPE	SoupMessage *
	LOCAL header TYPE STRING
	LOCAL GTlsCertificateFlags flags
	LOCAL responce$ TYPE STRING
	msg = soup_message_new ( "GET", url$)
	soup_session_send_message (session, msg)
	name = soup_message_get_uri (msg)->path

	IF (msg->status_code == SOUP_STATUS_SSL_FAILED) THEN

		LOCAL flags TYPE GTlsCertificateFlags

		IF (soup_message_get_https_status (msg, NULL, &flags)) THEN
			PRINT name, msg->status_code, msg->reason_phrase, flags FORMAT "%s: %d %s (0x%x)\n"
		ELSE
			PRINT name, msg->status_code, msg->reason_phrase FORMAT "%s: %d %s (no handshake status)\n"
		END IF

	END IF

	IF ( SOUP_STATUS_IS_TRANSPORT_ERROR (msg->status_code)) THEN
		PRINT  name, msg->status_code, msg->reason_phrase FORMAT "%s: %d %s\n"
	END IF

	IF (SOUP_STATUS_IS_REDIRECTION (msg->status_code)) THEN
		header = soup_message_headers_get_one (msg->response_headers,"Location")

		IF header THEN
			LOCAL uri TYPE SoupURI *
			LOCAL uri_string TYPE STRING
			uri = soup_uri_new_with_base (soup_message_get_uri (msg), header)
			uri_string = soup_uri_to_string (uri, FALSE)
			HTTPIE (uri_string,saveto$)
			g_free (uri_string)
			soup_uri_free (uri)
		ELSE
			PRINT "redirect fail"
		END IF

	END IF

	IF (SOUP_STATUS_IS_SUCCESSFUL (msg->status_code)) THEN
		responce$ = msg->response_body->data
		PRINT responce$
			SAVE responce$ TO saveto$
	END IF

END SUB

session = g_object_new (SOUP_TYPE_SESSION, \
	SOUP_SESSION_ADD_FEATURE_BY_TYPE, SOUP_TYPE_CONTENT_DECODER, \
	SOUP_SESSION_ADD_FEATURE_BY_TYPE, SOUP_TYPE_COOKIE_JAR, \
	SOUP_SESSION_USER_AGENT, "get ", \
	SOUP_SESSION_ACCEPT_LANGUAGE_AUTO, TRUE, \
	NULL)

'HTTPIE  ("http://www.basic-converter.org","index.html")
HTTPIE  ("https://www.youtube.com/results?search_query=pink+floyd","index.html")
	g_object_unref (session)

BR
Alex

Last Edit: Sept 12, 2020 13:32:42 GMT 1 by alexfish: code clean + add unref to free session

Two tins cans are better than an Iphone

alexfish God A world without windows Posts: 3,060	HTTPIE Sept 12, 2020 11:50:48 GMT 1 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by alexfish on Sept 12, 2020 11:50:48 GMT 1 Typo in code POST above now Updated BR Alex
	Two tins cans are better than an Iphone

vovchik God Posts: 2,792	HTTPIE Sept 12, 2020 23:34:45 GMT 1 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by vovchik on Sept 12, 2020 23:34:45 GMT 1 Dear Alex, Thanks. Working for me. With kind regards, vovchik

alexfish
God

A world without windows

Posts: 3,060

HTTPIE Sept 13, 2020 20:30:38 GMT 1

Quote

Post by alexfish on Sept 13, 2020 20:30:38 GMT 1

Hi Vovchik

Thanks for testing

Along with this I was looking for a html parser
although gumbo is classed as a c/c++ parser it was more eccentric in the c++ area

then I stumble on lexbor from git hub

the serialised tree output is ideal for parsing in bacon

did have some problems getting a result and nearest BaCon'ized code now looks like


PRAGMA LDFLAGS -llexbor

PRAGMA INCLUDE <lexbor/core/fs.h>
PRAGMA INCLUDE <lexbor/html/html.h>

PROTO lxb_html_parser_destroy,lxb_html_document_destroy,printf

' strore code in
DECLARE MYDATA$ TYPE STRING 

FUNCTION SERIALSE(const lxb_char_t *data, size_t len, void *ctx) TYPE lxb_inline lxb_status_t
    'transfere to data to MYDATA$
    PRINT (int) len, (const char *) data FORMAT "%.*s" TO MYDATA$
	' print results
	PRINT MYDATA$

	RETURN LXB_STATUS_OK
END FUNCTION

LOCAL *html_one TYPE lxb_char_t
LOCAL html_one_len TYPE size_t
LOCAL status TYPE lxb_status_t
LOCAL doc_one TYPE lxb_html_document_t*

html_one = lexbor_fs_file_easy_read("/home/pi/youtube.html", &html_one_len)

USEC
	lxb_html_parser_t *parser;
	parser = lxb_html_parser_create();
END USEC

status = lxb_html_parser_init(parser)
IF (status != LXB_STATUS_OK) THEN
	PRINT "Failed to create HTML parser"
END IF

doc_one = lxb_html_parse(parser, html_one, html_one_len)
IF (doc_one == NULL) THEN
	PRINT "Failed to create Document object"
END IF
lxb_html_parser_destroy(parser)

status = lxb_html_serialize_pretty_tree_cb(lxb_dom_interface_node(doc_one), \
	LXB_HTML_SERIALIZE_OPT_UNDEF, \
	0, SERIALSE, NULL)

lxb_html_document_destroy(doc_one)

BR
Alex

Two tins cans are better than an Iphone

vovchik
God

Posts: 2,792

HTTPIE Sept 13, 2020 22:11:46 GMT 1

Quote

Post by vovchik on Sept 13, 2020 22:11:46 GMT 1

Dear Alex,

Thanks. It all works nicely. The only down side is that lexbor is pretty fat animal (18 MB source).

And, the cmake/make have to be adjusted to get the lib into /usr/lib. Too bad there is not a vegan version of that animal, otherwise it is very useful for parsing.

With kind regards,
vovchik

alexfish
God

A world without windows

Posts: 3,060

HTTPIE Sept 13, 2020 22:13:05 GMT 1

Quote

Post by alexfish on Sept 13, 2020 22:13:05 GMT 1

Hi All

for those wanting to install lexbor

Must have git ,cmake & make

open terminal


cd Downloads
git clone https://github.com/lexbor/lexbor.git
cd lexbor
cmake . -DLEXBOR_BUILD_TESTS=ON -DLEXBOR_BUILD_EXAMPLES=ON -DLEXBOR_BUILD_SEPARATELY=ON
make
make test
sudo make install
sudo ldconfig

BR
Alex

Two tins cans are better than an Iphone

alexfish
God

A world without windows

Posts: 3,060

HTTPIE Sept 13, 2020 22:35:02 GMT 1

Quote

Post by alexfish on Sept 13, 2020 22:35:02 GMT 1

Sept 13, 2020 22:11:46 GMT 1 vovchik said:

Dear Alex,

Thanks. It all works nicely. The only down side is that lexbor is pretty fat animal (18 MB source).

And, the cmake/make have to be adjusted to get the lib into /usr/lib. Too bad there is not a vegan version of that animal, otherwise it is very useful for parsing.

With kind regards,
vovchik

Novice wise, only thing to do is leave as is, then ldconfig after the make install , yes the bits are in local
but it should work

Fat wise I agree, if can find a one that has been on a Diet , then who knows
till then , it is what it is , and bit lest fat and less farting around with node versions

BR
Alex

Two tins cans are better than an Iphone

alexfish
God

A world without windows

Posts: 3,060

HTTPIE Sept 14, 2020 22:00:55 GMT 1

Quote

Post by alexfish on Sept 14, 2020 22:00:55 GMT 1

Hi All

the first lexbor was a bit bugged

this works


' needed for parsing final results
 OPTION QUOTED FALSE
PRAGMA LDFLAGS -llexbor

PRAGMA INCLUDE <lexbor/core/fs.h>
PRAGMA INCLUDE <lexbor/html/html.h>

PROTO lxb_html_parser_destroy,lxb_html_document_destroy
DECLARE MYDATA_RESULTS$ TYPE STRING

FUNCTION SERIALSE(const lxb_char_t *data, size_t len, void *ctx) TYPE lxb_inline lxb_status_t
LOCAL dimension,ck TYPE int 
LOCAL MYDATA$ TYPE STRING
DECLARE c$,d$,e$ TYPE STRING
DECLARE qt$ = CHR$(34)
DECLARE lt$ , rt$ TYPE STRING
    'transfere to data to MYDATA$
    PRINT (int) len, (const char *) data FORMAT "%.*s" TO MYDATA$
	' print results
MYDATA_RESULTS$ = MYDATA_RESULTS$ & MYDATA$
	RETURN LXB_STATUS_OK
END FUNCTION

LOCAL html_one TYPE lxb_char_t *
LOCAL html_one_len TYPE size_t
LOCAL status TYPE lxb_status_t
LOCAL doc_one TYPE lxb_html_document_t*
' change path and or file to suit
html_one = lexbor_fs_file_easy_read("bacon.html", &html_one_len)

USEC
	lxb_html_parser_t *parser;
	parser = lxb_html_parser_create();
END USEC

status = lxb_html_parser_init(parser)
IF (status != LXB_STATUS_OK) THEN
	PRINT "Failed to create HTML parser"
END IF

doc_one = lxb_html_parse(parser, html_one, html_one_len)
IF (doc_one == NULL) THEN
	PRINT "Failed to create Document object"
END IF
lxb_html_parser_destroy(parser)

status = lxb_html_serialize_pretty_tree_cb(lxb_dom_interface_node(doc_one), \
	LXB_HTML_SERIALIZE_OPT_UNDEF, \
	0, SERIALSE, NULL)
lxb_html_document_destroy(doc_one)

 PRINT MYDATA_RESULTS$

BR
Alex

Two tins cans are better than an Iphone

alexfish
God

A world without windows

Posts: 3,060

HTTPIE Sept 20, 2020 23:31:10 GMT 1

Quote

Post by alexfish on Sept 20, 2020 23:31:10 GMT 1

Hi All

Updated lexbor parser

a simple html parser using liblexbor & encoders

archives

1. lexbor code . the lexer + internal html entities by ascii encoder
2. include ent.c 'entities by name encoder'

does not include new encoders for javascript this will appear in a new thread

usage "app 'filename.html'"

BR
Alex

Attachments:

lexbor_demo.bac.bz2 (1.44 KB)

ents.c.bz2 (1.39 KB)

Two tins cans are better than an Iphone

alexfish
God

A world without windows

Posts: 3,060

HTTPIE Sept 27, 2020 20:03:32 GMT 1

Quote

Post by alexfish on Sept 27, 2020 20:03:32 GMT 1

Hi Vovchik & All

Well , after find what I be doing wrong in BaCon, now find that can replicate what exbor does

this is first stage of the parser so if one wants to carry forward there own parsed results

Look at where the quotes are + if have loads of javascript then results one wants should be visible
line by line IE text= and lable = & url =

BR
Alex

The Code


OPTION QUOTED FALSE

USEH
	#include "ents.c"
END USEH

DECLARE MY_ENTS$[256][2] TYPE STRING
DECLARE my_ent_start$ = "&#" TYPE STRING
DECLARE my_ent_end$ =";" TYPE STRING

LOCAL dimention  TYPE int
LOCAL ck$ TYPE STRING
LOCAL ap$ =  argv[0]

LOCAL HTML$

FOR t = 0 TO 255
	MY_ENTS$[t][0]= my_ent_start$ & CHR$(t) & my_ent_end$
	MY_ENTS$[t][1]=  CHR$(t)
NEXT

IF argc = 2 THEN
	file$ = argv[1]
	PRINT file$ , "\n\n"
ELSE
	PRINT "usage"
	PRINT ap$ , " <file.html>"
	END
END IF
'" 	"

IF ( FILEEXISTS( file$)) THEN

	HTML$= LOAD$(file$)
	IF INSTR(HTML$,"<") THEN
		HTML$ = REPLACE$(HTML$,">", ">" NL$ & CHR$(34) )
	END IF

	IF INSTR(HTML$,">") THEN
		HTML$ = REPLACE$(HTML$,"<", CHR$(34) & NL$ & "<")
	END IF

	IF INSTR(HTML$,"{") THEN
		HTML$ = REPLACE$(HTML$,"{","{" & NL$)
	END IF

	SPLIT HTML$ BY NL$ TO my_array$ SIZE dimention
	HTML$ = ""
	FOR t = 0 TO dimention -1
		ck$ = my_array$[t]
		ck$= CHOP$(ck$)
		HTML$ = HTML$ & ck$ & NL$
	NEXT

	SPLIT HTML$ BY NL$ TO my_array2$ SIZE dimention
	HTML$=""
	FOR t = 0 TO dimention -1
		ck$ = my_array2$[t]
		ck$= CHOP$(ck$)
		HTML$ = HTML$ & ck$ & NL$
	NEXT
HTML$ = REPLACE$(HTML$,CHR$(34) & NL$ & CHR$(34) & NL$,"")
HTML$ = REPLACE$(HTML$,CHR$(34) & CHR$(34),"") 
	FOR t = 0 TO 252
		p$ = NAMED_ENTITIES[t][0]
		pr$ = NAMED_ENTITIES[t][1]
		IF INSTR(HTML$,p$) THEN
			HTML$= REPLACE$(HTML$,p$,pr$)
		END IF
	NEXT

	FOR t = 0 TO 255
		p$ = MY_ENTS$[t][0]
		pr$ = MY_ENTS$[t][1]
		IF INSTR(HTML$,p$) THEN
			HTML$= REPLACE$(HTML$,p$,pr$)
		END IF
	NEXT
	PRINT HTML$
	' can imp rest of parser here
ELSE

	PRINT "file not exist "
END IF

need the ents.c in same folder as the code

Attachments:

ents.c.bz2 (1.39 KB)

Last Edit: Sept 27, 2020 20:21:21 GMT 1 by alexfish: code updated

Two tins cans are better than an Iphone

Post by alexfish on Sept 6, 2020 19:46:57 GMT 1

Post by vovchik on Sept 6, 2020 20:02:47 GMT 1

Post by alexfish on Sept 6, 2020 22:21:02 GMT 1

Post by alexfish on Sept 6, 2020 22:33:47 GMT 1

Post by alexfish on Sept 6, 2020 22:55:50 GMT 1

Post by alexfish on Sept 12, 2020 11:42:37 GMT 1

Post by alexfish on Sept 12, 2020 11:50:48 GMT 1

Post by vovchik on Sept 12, 2020 23:34:45 GMT 1

Post by alexfish on Sept 13, 2020 20:30:38 GMT 1

Post by vovchik on Sept 13, 2020 22:11:46 GMT 1

Post by alexfish on Sept 13, 2020 22:13:05 GMT 1

Post by alexfish on Sept 13, 2020 22:35:02 GMT 1

Post by alexfish on Sept 14, 2020 22:00:55 GMT 1

Post by alexfish on Sept 20, 2020 23:31:10 GMT 1

Post by alexfish on Sept 27, 2020 20:03:32 GMT 1