9 Comments
Jul 17, 2023Liked by Rod Trent

very cool Rod,

Where I'm struggling is:

Importing mixed media - scraped web sites, json files, complex CSV files (like kql that includes double quotes an other common separators)

Once I have my data indexed I want to work with multiple indexes - eg. security news, mitre att&ck library, etc - so I haven't thought much about how to work with multiple indexes/dataframes.

keep it up!

(I'm doing all my stuff in python/langchain/openai functions etc. but I'd love to have an Azure way to do it all.).

Expand full comment
author

You might try putting each media type in its own container with its own indexer.

Expand full comment

Hi Rod,

I'd be curious if you know of any good 'universal ingest scripts'.

For example there's a github for 'PrivateGPT' that uses this script for ingesting pdf, txt and csv files:

'https://github.com/imartinez/privateGPT/blob/main/ingest.py.

I feel that langchain has the lead on making GPT practical with all they're putting into their code.

Expand full comment
author
Jul 27, 2023·edited Jul 27, 2023Author

Nothing at the moment, but I could slap together a PowerShell script to do it.

What I'm doing now is assigning a drive mapping to my blob container and just updating/uploading files that way.

Thinking about it further, it could be possible to use an Event Hub to do it.

Expand full comment

yeah, I've had some inconsistencies on getting various data sources ingested into my dataframe so I'm looking for a clean way to do it for the most common file formats. I'll keep looking, thanks.

eg. ingesting enterprise-attack.json (not that I need to, just an example)

ingesting kql queries in csv format - finding a good delimiter can be tricky w/o creating a specific python parser.

Expand full comment
Jul 27, 2023Liked by Rod Trent

The links to "QuickStart" pages don't appear to be working. Is there something wrong?

Expand full comment
author

Thanks for the heads up, Mark. All fixed.

Expand full comment

No copy and paste when in the side bar. 😕

Expand full comment
author

Yeah...working on that.

Expand full comment