Skip to content

Conversation

@dale-wahl
Copy link
Member

This collects a lot of data from Bilibili, but there are two issues.

  • HTML needs parsing. Does not appear to be embedded JSON though there is a script with JS variables containing info. Meaning either parse HTML per page type or parse the JS funciton.
  • There are lots of unused items in JSON objects

Notes on JSON objects:
Bilibili seems to request multiple JSON objects and then decide what to place and where after the fact. Main array is used in order, but will stop after an arbitrary X number of items (seems to depend on layout and other injected cards). There are also some other items--I have identified live videos and "hot" videos--injected into the blocks. These array will have 5-10 items and only one is used. Not necessarily the first item 😢

So right now this records lots of extra items and am unsure how to determine if they are used. Also the order is difficult to ascertain, but that seems a secondary concern.

@dale-wahl
Copy link
Member Author

Added a parser to handle the pinia JS function embedded in the HTML on some pages. Should work regardless of the HTML formatting. THOUGH I discovered that they do not use all the items here either! (only the first 7-8).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants