Importing mixed media - scraped web sites, json files, complex CSV files (like kql that includes double quotes an other common separators)
Once I have my data indexed I want to work with multiple indexes - eg. security news, mitre att&ck library, etc - so I haven't thought much about how to work with multiple indexes/dataframes.
keep it up!
(I'm doing all my stuff in python/langchain/openai functions etc. but I'd love to have an Azure way to do it all.).
yeah, I've had some inconsistencies on getting various data sources ingested into my dataframe so I'm looking for a clean way to do it for the most common file formats. I'll keep looking, thanks.
eg. ingesting enterprise-attack.json (not that I need to, just an example)
ingesting kql queries in csv format - finding a good delimiter can be tricky w/o creating a specific python parser.
very cool Rod,
Where I'm struggling is:
Importing mixed media - scraped web sites, json files, complex CSV files (like kql that includes double quotes an other common separators)
Once I have my data indexed I want to work with multiple indexes - eg. security news, mitre att&ck library, etc - so I haven't thought much about how to work with multiple indexes/dataframes.
keep it up!
(I'm doing all my stuff in python/langchain/openai functions etc. but I'd love to have an Azure way to do it all.).
You might try putting each media type in its own container with its own indexer.
Hi Rod,
I'd be curious if you know of any good 'universal ingest scripts'.
For example there's a github for 'PrivateGPT' that uses this script for ingesting pdf, txt and csv files:
'https://github.com/imartinez/privateGPT/blob/main/ingest.py.
I feel that langchain has the lead on making GPT practical with all they're putting into their code.
Nothing at the moment, but I could slap together a PowerShell script to do it.
What I'm doing now is assigning a drive mapping to my blob container and just updating/uploading files that way.
Thinking about it further, it could be possible to use an Event Hub to do it.
yeah, I've had some inconsistencies on getting various data sources ingested into my dataframe so I'm looking for a clean way to do it for the most common file formats. I'll keep looking, thanks.
eg. ingesting enterprise-attack.json (not that I need to, just an example)
ingesting kql queries in csv format - finding a good delimiter can be tricky w/o creating a specific python parser.
The links to "QuickStart" pages don't appear to be working. Is there something wrong?
Thanks for the heads up, Mark. All fixed.
No copy and paste when in the side bar. 😕
Yeah...working on that.