Regex search problem
If you need help with a project or need to know how to do something specific in VB.NET then please ask your questions in here.
Forum rules
Please LOCK your topics once you have found the solution to your question so we know you no longer require help with your query.
Please LOCK your topics once you have found the solution to your question so we know you no longer require help with your query.
5 posts
Page 1 of 1
Hello im having problems searching my string using regex it works fine when source is a static string but as soon as i download it via a webclient it dosent work, any sugguestions?
the regex search works fine but it does not work when the string is downloaded from webclient
heres part of the website it downloads, i cannot post entire html file or url the site is an adult site ...
Code: Select all
What im trying to do is get the text between, <a href="/dc/doujin/-/list/=/article=keyword/id=Test1/" class="genreTag__txt"> and its ending tag </a>Dim tReturn As New ArrayList
Dim strRegex As String = "<a href=""\/dc\/doujin\/-\/list\/=\/article=keyword\/id=.*\/"" class=""genreTag__txt"">(\s\n.*?)<\/a>"
Dim myRegex As New Regex(strRegex, RegexOptions.IgnoreCase Or RegexOptions.Multiline)
For Each myMatch As Match In myRegex.Matches(source)
If myMatch.Success Then
TextBox1.Text = TextBox1.Text & vbnewline & myMatch.Groups(1).Value.Trim
End If
Next
the regex search works fine but it does not work when the string is downloaded from webclient
heres part of the website it downloads, i cannot post entire html file or url the site is an adult site ...
Code: Select all
And this is my webclient
<ul class="genreTagList">
<li class="genreTagList__item">
<div class="m-genreTag">
<div class="genreTag__item">
<a href="/dc/doujin/-/list/=/article=keyword/id=Test1/" class="genreTag__txt">
Test1 </a>
</div>
</div>
</li>
<li class="genreTagList__item">
<div class="m-genreTag">
<div class="genreTag__item">
<a href="/dc/doujin/-/list/=/article=keyword/id=Test2/" class="genreTag__txt">
Test2 </a>
</div>
</div>
</li>
</ul>
Code: Select all
Dim WClient As New Net.WebClient
WClient.Encoding = System.Text.Encoding.UTF8
Dim source As String = WClient.DownloadString(url)
Last edited by AnoPem on Tue Jan 05, 2016 5:11 pm, edited 1 time in total.
It would help if you posted what your source looks like or the link to what your webclient is downloading and in addition, post what value you're trying to get .
So for example say the string you want to parse is "the man ate 3 burgers" and the value you want is '3', you would post what is bolded.
I would also like to state that using regex isn't a good way to parse html. I would look at using HtmlAgilityPack or even just the built in webbrowser class.
So for example say the string you want to parse is "the man ate 3 burgers" and the value you want is '3', you would post what is bolded.
I would also like to state that using regex isn't a good way to parse html. I would look at using HtmlAgilityPack or even just the built in webbrowser class.
SumCode wrote:It would help if you posted what your source looks like or the link to what your webclient is downloading and in addition, post what value you're trying to get .
So for example say the string you want to parse is "the man ate 3 burgers" and the value you want is '3', you would post what is bolded.
I would also like to state that using regex isn't a good way to parse html. I would look at using HtmlAgilityPack or even just the built in webbrowser class.
Thanks for the reply i have updated my main post with some of the information you asked.
Using Regex
Well the reason it doesn't work is because you put '\s' where there is no extra whitespace. So you can change it to '\s?' to make it so it still matches when there is no whitespace. You are also able to shorten your regex to just '<a href=""\/dc\/doujin\/-\/list\/=\/article=keyword\/id=.*\/"" class=""genreTag__txt"">\n(.*?)<\/a>'. You can even shorten your pattern further (although this depends on the html you would get from the website) to 'genreTag__txt.+\n(.+)<\/a>'.
My final code with regex:
Well the reason it doesn't work is because you put '\s' where there is no extra whitespace. So you can change it to '\s?' to make it so it still matches when there is no whitespace. You are also able to shorten your regex to just '<a href=""\/dc\/doujin\/-\/list\/=\/article=keyword\/id=.*\/"" class=""genreTag__txt"">\n(.*?)<\/a>'. You can even shorten your pattern further (although this depends on the html you would get from the website) to 'genreTag__txt.+\n(.+)<\/a>'.
My final code with regex:
Code: Select all
Using HtmlAgilityPack
Dim r = Regex.Matches(_code, "genreTag__txt.+\n(.+)<\/a>", RegexOptions.IgnoreCase)
MsgBox(r(0).Groups(1).Value)
Code: Select all
Dim doc = New HtmlDocument
doc.LoadHtml(_code)
Dim results = doc.DocumentNode.SelectNodes("//a[@class=""genreTag__txt""]")
Dim r1 = results(0).InnerText
MsgBox(r1)
SumCode wrote:Using Regex
Well the reason it doesn't work is because you put '\s' where there is no extra whitespace. So you can change it to '\s?' to make it so it still matches when there is no whitespace. You are also able to shorten your regex to just '<a href=""\/dc\/doujin\/-\/list\/=\/article=keyword\/id=.*\/"" class=""genreTag__txt"">\n(.*?)<\/a>'. You can even shorten your pattern further (although this depends on the html you would get from the website) to 'genreTag__txt.+\n(.+)<\/a>'.
My final code with regex:Code: Select allUsing HtmlAgilityPackDim r = Regex.Matches(_code, "genreTag__txt.+\n(.+)<\/a>", RegexOptions.IgnoreCase) MsgBox(r(0).Groups(1).Value)
Code: Select allDim doc = New HtmlDocument doc.LoadHtml(_code) Dim results = doc.DocumentNode.SelectNodes("//a[@class=""genreTag__txt""]") Dim r1 = results(0).InnerText MsgBox(r1)
That seems to work, thank you very much
5 posts
Page 1 of 1
Copyright Information
Copyright © Codenstuff.com 2020 - 2023